Degree Distribution (in-degree)

how many articles directly link to a particular article?

In [1]:
from collections import defaultdict

import pandas as pd
from scipy import stats 
import numpy as np
import json

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

path = "/Users/mark/Dropbox/Math/Complex_Systems/research/wikipedia-network/paper/writeup/graphics/"
results_path = "/Users/mark/Desktop/wiki_v4/"
In [3]:
#load ndegree data

with open(results_path + 'direct_links.json', 'r') as fp:
    direct_links = json.load(fp)

dldf = pd.DataFrame(direct_links.items())
dldf.columns = ['article', 'direct links']

#add ndegree dataframe column
l = lambda x: len(x)
dldf['indegree'] = dldf['direct links'].map(l)
In [3]:
dldf
Out[3]:
article direct links indegree
0 Software release life cycle#Release [Going gold, Gone Gold, Gold (software), Gold ... 5
1 bolt action rifle [Remington Model 673, Alejandro Sniper Rifle, ... 3
2 Pig (1998 film) [Nico B.] 1
3 Saturn I#S-I_stage [S-I] 1
4 Anniston Museum of Natural History [Anniston museum of natural history] 1
5 Bitiče [Bitice] 1
6 AEF Monotrace [AEF Air Lift System Monotrace, Monotrace] 2
7 Xel-Ha Park [Xel Ha Park, Xel-Há Park, Xel-Há Eco Park, Xe... 8
8 Canton of Chalamont [Canton of chalamont] 1
9 Diocese of Nidaros [Diocese of nidaros, Bishop of Nidaros] 2
10 Milhan District [Milhan, Yemen] 1
11 Venus and Adonis#Paintings [Venus and Adonis (Titian)] 1
12 Hathua (Vidhan Sabha constituency) [Sainik School Gopalganj] 1
13 Dreyfus affair [L'affaire Dreyfus, Dreyfus, Antidreyfusard, D... 25
14 circus skills [Risley (circus act)] 1
15 Conceição da Barra [Conceicao da Barra, Espirito Santo, Conceicao... 5
16 Amblyomma cajennense [Cayenne Tick, Cayenne tick] 2
17 Ceriporia xylostromatoides [Poria velata, Poria interrupta, Poria xylostr... 16
18 The Magic Land of Allakazam [Magic Land of Allakazam] 1
19 List of music festivals#Germany [Festivals in Germany] 1
20 Eura [Honkilahti, Eura Airfield, Euran Pallo] 3
21 L'Estrange [Lestrange] 1
22 Per Anger#Per Anger Prize [Per Anger Prize] 1
23 Eure [Roumois, Eure departement, Agglomeration comm... 9
24 Nick Kamen [Each Time You Break My Heart, I Promised Myself] 2
25 Euro [Euros, Danish euro referendum, 2000, Euro cur... 58
26 Surin, Iran [Surin, Iran (disambiguation)] 1
27 Podić [Podic] 1
28 Qaleh Now-ye Alireza Bek [Qal'eh Now-ye Alireza Bek, Qal'eh Now-e Alire... 3
29 Bích Động [Bich Dong] 1
... ... ... ...
3104763 Gagnières [Gagnieres] 1
3104764 List of Tokyo Mew Mew characters#Chimera_Anima [Kirema Anima, Chimera Animal, Predacytes, Kim... 4
3104765 Fitch House [Fitch House (disambiguation)] 1
3104766 Caryl Phillips [The European Tribe, Colour Me English, A Dist... 6
3104767 La Thuile [La Thuile (disambiguation)] 1
3104768 Gallués – Galoze [Gallues - Galoze, Gallues – Galoze, Gallués -... 3
3104769 Poisson–Boltzmann equation [Poisson-Boltzmann, Poisson-Boltzmann equation] 2
3104770 The Best Damn Thing#The Best Damn Tour [The Best Damn Tour, The Best Damn Tour 2008] 2
3104771 Człopy [Czlopy] 1
3104772 EuroSprinter#ES 64 U [Taurus (locomotive), Taurus train] 2
3104773 Germany's Next Topmodel (cycle 3) [Germany's Next Topmodel, Cycle 3] 1
3104774 Guido de Bres [Guy de Bres, Guy de Bray, Guido de Brés, Guid... 7
3104775 Sebastianópolis do Sul [Sebastianopolis do Sul] 1
3104776 Lake Harriet (Oregon) [Lake Harriet (Clackamas County, Oregon), Lake... 2
3104777 Beware the Gonzo [Beware The Gonzo] 1
3104778 Camblain-Châtelain [Camblain-Chatelain] 1
3104779 Rankine–Hugoniot conditions [Hugoniot elastic limit, Rankine–Hugoniot rela... 12
3104780 Unfair labor practice (Japan) [Unfair Labor Practice (Japan)] 1
3104781 2010–11 Campionato Sammarinese di Calcio [2010-11 Campionato Sammarinese di Calcio] 1
3104782 José María Merchán [Jose Maria Merchan] 1
3104783 Flag of Jersey [Flag of jersey] 1
3104784 Canal Street (New York City Subway)#IRT Lexing... [Canal Street (6 Line), Canal Street (IRT Lexi... 3
3104785 Vineland Avenue [East Valley High School (Los Angeles)] 1
3104786 Gabriel Elorde [Gabriel "Flash" Elorde, Flash Elorde] 2
3104787 A-sharp [A Sharp (programming language), A Sharp, A sh... 5
3104788 Charles Cousin-Montauban, Comte de Palikao [Charles Montauban, Count of Palikao, Charles ... 9
3104789 Janzour Museum [Janzur Museum, Zanzur Museum] 2
3104790 Paraguayan Communist Party [Partido Comunista Paraguayo, Movimiento por l... 4
3104791 Document Exploitation (DOCEX) [National Media Exploitation Center] 1
3104792 Dungannon Middle [Barony of Dungannon Middle] 1

3104793 rows × 3 columns

What are the highest ranking articles by indegree?

indegree: number of direct first links to an article

In [4]:
dldf_sorted = dldf.sort(columns='indegree', ascending=False)
/usr/local/lib/python2.7/site-packages/ipykernel/__main__.py:1: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
  if __name__ == '__main__':
In [6]:
dldf_sorted
Out[6]:
article direct links indegree
2383074 United States [Death of a Soldier, Newton D. Baker, Six Degr... 80249
1404177 village [Crnoklište, Tiruvali, Gola, Krosno Odrzańskie... 68526
1113168 moth [Nemapogon alticolella, Iconostigma morosa, Eo... 52007
2334701 Communes of France [Champanges, Régat, Mauzac, Haute-Garonne, Mau... 35940
1617818 species [Orania purpurea, Pedicularia vanderlandi, Mit... 29251
750292 genus [Hyposada, Semidelitschia, Amoebites, Malva, F... 26603
203610 Canada [Irving Meretsky, Montgomery Steele, Marilyn L... 22064
2166394 American football [Chuck Bernard, Milo Sukup, Roynell Young, Cha... 21549
3066124 association football [Jimmy O'Neill (footballer born 1941), Alf Tin... 18948
2349366 United Kingdom [Landing ship, infantry, Belief (album), Bette... 18539
177352 Germany [The Voice of Germany, Afghanistan Analysts Ne... 18271
2447852 England [Joseph Tilley Brown, Jim Hall (footballer bor... 17010
405899 Association football [Albert Riera (footballer, born 1983), Benjami... 16511
863129 France [Jean-Baptiste Prosper Jollois, Sylvaine Duban... 16489
2735097 Italy [Giuseppe Veronese, Andrea Tripicchio, Ernesto... 15234
2382306 beetle [Eriocharis, Mordella lapidicola, Ochrus gramm... 15158
2522252 Russia [Igor Gindis, Yuri Noskov, Alexander Ushakov (... 13166
756569 Australia [Turia Pitt, Patrick Carew, Grim Reaper (adver... 11972
1741585 Japan [Yasushi Matsumoto, Pinky (magazine), Ayane Mi... 11472
2051116 Unincorporated area [Bakerton, West Virginia, Malone, Oregon, Wood... 11296
1069494 Sweden [Lazze Ohlyz, Sven Israelsson, Opus Atlantica ... 10521
1523982 India [Unity Centre of Communist Revolutionaries of ... 10460
2148553 Netherlands [Robert Jan Stips, Piet Jongeling, Joris Carol... 9614
2145614 radio station [Freeform (radio format), WMMY, WMMX, WMMH, WM... 9449
1366567 tributary [Breazova River (Bârzava), Ormindea River, Tit... 9269
1699821 English people [Ray Whittaker, Benjamin Blackburn (cricketer)... 8728
2850666 basketball [Terry Stotts, Oriol Junyent, Daniel Hackett, ... 8585
2078272 unincorporated area [Buel, Kentucky, Brinktown, Missouri, Muldon, ... 8575
888757 studio album [Tooth & Nail (Billy Bragg album), By Nicole (... 8562
2577309 Brazil [Dedé (footballer, born 1988), Davi José Silva... 8499
... ... ... ...
1704791 Jaguar XK140 [Jaguar xk140] 1
1704797 TNT (TV station)#News_.26_Current_Affairs [Southern Cross Nightly News] 1
1704832 List of Coronation Street characters (2003) [List of minor Coronation Street characters (2... 1
357423 DPW [DPW (disambiguation)] 1
1704831 Kent East [Kent East (Canada)] 1
1704830 Hojjatol-Islam [Mousa Qorbani] 1
202074 Avenger class [Avenger-class] 1
202077 Coquitlam–Maillardville [Coquitlam-Maillardville] 1
1704827 Stephen Andrew Lynch [S.A. Lynch] 1
653139 Ladislav Kříž [Ladislav Kriz] 1
1704825 Linów, Masovian Voivodeship [Linow, Masovian Voivodeship] 1
1704824 Ceratandra [Ceratandropsis] 1
1704821 Capacity planning [Capacity Requirements Planning] 1
1704820 Francesc Piera Martínez [Francesc Piera Martinez] 1
653142 Joseph Kertes [Joe Kertes] 1
1704815 FVC [FVC (disambiguation)] 1
1704814 Plestiodon lagunensis [P. lagunensis] 1
1704812 Michael Wertheimer [Mike wertheimer] 1
1704810 Design by Humans [Design By Humans] 1
1704809 Crofton (surname) [Crofton] 1
1704808 Proper Bay [Grantham Island] 1
1704807 National Spiritualist Association of Churches [National Spiritualist Association] 1
1704806 Grindon [Grindon (disambiguation)] 1
653144 George Washington Carroll [George W. Carroll] 1
1704803 Roccella [Roccella (disambiguation)] 1
202078 Ronald Reagan Federal Building and Courthouse [Ronald Reagan Federal Building and Courthouse... 1
1704801 Ringing generator [Magenta box] 1
653145 Loughgall Parish [Derrycrew] 1
1704799 Savoyard crusade [Savoyard Crusade] 1
3104792 Dungannon Middle [Barony of Dungannon Middle] 1

3104793 rows × 3 columns

In [5]:
dldf_sorted['indegree (k)'] = dldf_sorted['indegree'] / 1000
dldf_sorted.head(20).iloc[::-1].plot(x="article", y="indegree (k)", kind="barh", fontsize=14,
                            legend=False, figsize=(4,6), color="#268bd2")
#no background
ax = plt.gca()
ax.patch.set_visible(False) 


plt.xlabel("Highest Ranking by in-degree \n (in thousands)", fontsize=12)
plt.ylabel("")
ax.xaxis.tick_top()
ax.xaxis.set_label_position('top')

plt.tick_params(axis='x', which='major', labelsize=12)

#save figure
plt.savefig(path+'articles_ndegree.png', format='png', dpi=300, bbox_inches='tight')
In [11]:
dldf[dldf['article'] == "philosophy"]
Out[11]:
article direct links indegree
1992476 philosophy [1536 in philosophy, Theophysics, Lorenzo Peña... 581
In [13]:
phil_links = list(dldf[dldf['article'] == "philosophy"]["direct links"])
phil_links
Out[13]:
[[u'1536 in philosophy',
  u'Theophysics',
  u'Lorenzo Pe\xf1a',
  u'Draft:Data as code',
  u'Cogito (magazine)',
  u'Indeterminacy (philosophy)',
  u'List of philosophy categories',
  u'Philosophy of geography',
  u'Randal Marlin',
  u'David C. Lane',
  u'On Certainty',
  u'Essence',
  u'Hassan Hanafi',
  u'Nade\u017eda \u010ca\u010dinovi\u010d',
  u'Absurdism',
  u'Immutable truth',
  u'John Alexander Gunn',
  u'Existentialism and Humanism',
  u'Jesuism',
  u'Kierkegaard Society of the UK',
  u'Underworld of philosophy',
  u'African philosophy',
  u'Antinatalism',
  u'Active citizenship',
  u"Philosophers' Imprint",
  u'1926 in philosophy',
  u'Transmodernity',
  u'Environmental hermeneutics',
  u'Evolutionary Enlightenment',
  u'Pancasila (politics)',
  u'Herbert Marcuse',
  u'Event (philosophy)',
  u'Paul Taylor (philosopher)',
  u'International Kierkegaard Society',
  u'Good faith',
  u'Deflationary theory of truth',
  u'Aporia',
  u'Franz Brentano',
  u'Totality and Infinity',
  u'Robert P. Crease',
  u'Virtue epistemology',
  u'Solipsism',
  u'Correlative-based fallacies',
  u'Technoromanticism (book)',
  u'Unity of the proposition',
  u'Society for the Philosophy of Sex and Love',
  u'Logical quality',
  u'Sandra Bartky',
  u'Belief',
  u'Supertask',
  u'Philosophy of chemistry',
  u'Language, Truth, and Logic',
  u'Predeterminism',
  u'Philistinism',
  u'Mental image',
  u'Anthony Beavers',
  u'Philosophy Now',
  u'Derech Hashem',
  u'Ingo Zechner',
  u'Synoptic philosophy',
  u'Antiperistasis',
  u'Transcendental homelessness',
  u'Trialism',
  u'Joe Rogan Experience',
  u'Panpsychism',
  u'Palingenesis',
  u'Jewish philosophy',
  u'Symposium (Plato)',
  u'Peter Boghossian',
  u'Sidney Hook',
  u'Qualia',
  u'Matheolus Perusinus',
  u'Wisconsin Philosophical Association',
  u'North Texas Philosophical Association',
  u'Roberto Torretti',
  u'Self-interest',
  u'Metaphysical nihilism',
  u'Population ethics',
  u'Graeme Nicholson',
  u'Philosophy of medicine',
  u'Robert Arp',
  u'Johannes Jacobus Poortman',
  u'International Association for the Semiotics of Law',
  u'Transcendental apperception',
  u'Existential phenomenology',
  u'Thick concept',
  u'Harbinger (zine)',
  u'List of philosophical organizations',
  u'Rosi Braidotti',
  u'Tad Schmaltz',
  u'Platonism',
  u'List of books about philosophy',
  u'Nicole C. Karafyllis',
  u'Paul Cobben',
  u'Philosophy encyclopedia',
  u'Truth-apt',
  u'Philosophy of mind',
  u'William W. Tait',
  u'Universal class',
  u'Didier Anzieu',
  u'Leibniz Society of North America',
  u'Cambridge University Moral Sciences Club',
  u'Epistemic possibility',
  u'Supervenience',
  u'Leonid Grinin',
  u'International Association for Greek Philosophy',
  u'Society for Exact Philosophy',
  u'Marty Ball',
  u'Louis Rougier',
  u'Sven Ove Hansson',
  u'Particular',
  u'Irreducibility',
  u'1962 in philosophy',
  u'Wolfgang Preiss',
  u'Classical element',
  u'Action theory (philosophy)',
  u'Emergence',
  u'Notion (philosophy)',
  u'Stephen Davies (philosopher)',
  u'Fictionalism',
  u'Common good',
  u'Michael Neumann',
  u'Transcendence (philosophy)',
  u'Australian realism',
  u'Aztec philosophy',
  u"Ryle's regress",
  u'Theodore Kisiel',
  u'List of rationalists',
  u'Eternal oblivion',
  u'Philosophy of science',
  u'Adolph St\xf6hr',
  u'Metaphilosophy',
  u'Infinite divisibility',
  u'Downward causation',
  u'Process of embodiment (physical theatre)',
  u'Transcendental idealism',
  u'Moral psychology',
  u'1969 in philosophy',
  u'Fontana Modern Masters',
  u'Digital philosophy',
  u'Philosophical Papers',
  u'Activity-based communication analysis',
  u'North Carolina Philosophical Association',
  u'Potentiality and actuality',
  u"Bloch's principle",
  u'Ilkka Niiniluoto',
  u'Knowledge space (philosophy)',
  u'Ramsey sentences',
  u'Frank Meyer (political philosopher)',
  u'Naturphilosophie',
  u'Phenomenological life',
  u'Lon L. Fuller',
  u'Dialogues Concerning Natural Religion',
  u'20th-century French philosophy',
  u'Alief (belief)',
  u'Moritz Brasch',
  u'Genealogy (philosophy)',
  u'Public philosophy',
  u'Equiprobability',
  u'Frithjof Bergmann',
  u'1974 in philosophy',
  u'Aesthetics',
  u'Epistemological pluralism',
  u'Embodied cognition',
  u'Martha Klein',
  u'Paul Russell (philosopher)',
  u'Quality (philosophy)',
  u"Newcomb's paradox",
  u'Dudeism',
  u'Claim rights and liberty rights',
  u'Postmodern philosophy',
  u'Fallibilism',
  u'Rainer Forst',
  u'Dorothy Emmet',
  u'Barbara Forrest',
  u'Philosophy of physics',
  u'Action (philosophy)',
  u'Forum for European Philosophy',
  u"'Pataphysics",
  u'Philosophical consultancy',
  u'Liar paradox',
  u'Rachida Triki',
  u'Zeitschrift f\xfcr Kulturphilosophie',
  u'Philosophical theology',
  u'Nihilism',
  u'Transcendent theosophy',
  u'Kurt Flasch',
  u'Alexander George (philosopher)',
  u'Gila Sher',
  u'Cyberethics',
  u'Environmental philosophy',
  u'Afterlife',
  u'Panlogism',
  u'The Bed of Procrustes',
  u'Nomothetic',
  u'Embodied embedded cognition',
  u'Identity and change',
  u'Gnosology',
  u'Spiritual philosophy',
  u'Modern philosophy',
  u'Bedeutung',
  u'Meontology',
  u'Ontological maximalism',
  u'French philosophy',
  u'Western philosophy',
  u'Quietism (philosophy)',
  u'The Ego and Its Own',
  u'Fred Evans (philosopher)',
  u'1972 in philosophy',
  u'Philosophical logic',
  u'L\xe9ontine Zanta',
  u'Plotinus',
  u'Philosophical methodology',
  u'Ethnophilosophy',
  u'Roderick T. Long',
  u'Ken McMullen (film director)',
  u'Kant and the Problem of Metaphysics',
  u'Days of War, Nights of Love',
  u'Mark Steiner',
  u'Women in philosophy',
  u'Core ontology',
  u'Fran\xe7ois Ch\xe2telet',
  u'Policraticus',
  u'J\xf3zef Emanuel Jankowski',
  u'R. James Long',
  u'Object (philosophy)',
  u'Hylozoism',
  u'Philosophical movement',
  u'Philosophical Explorations',
  u'Occasionalism',
  u'Max More',
  u'Louis Pojman',
  u'Epistemic theories of truth',
  u'Summum',
  u'Adam Ignacy Zabellewicz',
  u'The Philosophical Library',
  u'Psychophysical parallelism',
  u'Henry Augustus Pearson Torrey',
  u'Idea',
  u'International Society for Universal Dialogue',
  u'John Haugeland',
  u'Neoplatonism',
  u'The Consolation of Philosophy',
  u'Proceedings of the American Philosophical Society',
  u'EnlightenNext',
  u'Temporality',
  u'Causal chain',
  u'Jacob Golomb',
  u'Philosophy and Phenomenological Research',
  u'Philosophy of healthcare',
  u'International Association for Environmental Philosophy',
  u'1971 in philosophy',
  u'International Society for the History of Philosophy of Science',
  u'Phenomenology (architecture)',
  u'Foundations of Natural Right',
  u'Wang Keping (academic)',
  u'Henry Horace Williams',
  u'Ethics',
  u'Philisophy',
  u'Aufheben',
  u'Four-dimensionalism',
  u'Pluralism (philosophy)',
  u'Philosophy of dialogue',
  u'James Fieser',
  u'1976 in philosophy',
  u'The System of Nature',
  u'Joseph Dietzgen',
  u'Eastern philosophy',
  u'Ontic',
  u'Sigr\xed\xf0ur \xdeorgeirsd\xf3ttir',
  u'Kh\xf4ra',
  u'Modal fictionalism',
  u'Renate Holub',
  u'Corruption',
  u'Wagnerism',
  u'Ruth Chang',
  u"Zeno's paradoxes",
  u'Pietro Verri',
  u'Denys Turner',
  u'Umberto Pagano',
  u'Thomas J. McKay',
  u'Michael Ferejohn',
  u'1977 in philosophy',
  u'Russian cosmism',
  u'Meta-rights',
  u'Categorical imperative',
  u'Postpositivism',
  u'Evan Thompson',
  u'Six Myths about the Good Life',
  u'Human spirit',
  u'Mechanism (philosophy)',
  u'Problem of evil in Hinduism',
  u'Johann Georg Schwarz',
  u'Gettier problem',
  u'Virginia Philosophical Association',
  u'1970 in philosophy',
  u'Natural philosophy',
  u'Uncertainty',
  u'John Shelton Lawrence',
  u'History of Early Analytic Philosophy Society',
  u'Meditations on First Philosophy',
  u'German Romanticism',
  u'Society for Applied Philosophy',
  u'Hyle',
  u'Value theory',
  u'Florida Philosophical Association',
  u'Vilhj\xe1lmur \xc1rnason',
  u'Johann Georg Ritter von Zimmermann',
  u'Song Du-yul',
  u'British Society of Aesthetics',
  u'Norwood Russell Hanson',
  u'Society for European Philosophy',
  u'1975 in philosophy',
  u'Rational emotive behavior therapy',
  u'Write once, compile anywhere',
  u'Deterministic system (philosophy)',
  u'Metamodernism',
  u'Radomir \u0110or\u0111evi\u0107',
  u'Kantian Review',
  u'Asian Philosophical Association',
  u'Anamnesis (philosophy)',
  u'Proving too much',
  u'Esa Saarinen',
  u'Functionalism (philosophy of mind)',
  u'R. R. Rockingham Gill',
  u'The Discovery of the Future',
  u'American Society for Aesthetics',
  u'Accident (philosophy)',
  u'S\xf8ren Kierkegaard Society',
  u'Studies in Logic, Grammar and Rhetoric',
  u'Philosophy of history',
  u'1980 in philosophy',
  u'1979 in philosophy',
  u'Herman Philipse',
  u'Truth by consensus',
  u'Pim Haselager',
  u'History of philosophy in Poland',
  u'Modern art',
  u'Desert (philosophy)',
  u'Erkenntnis',
  u'Torah Umadda',
  u'Brazilian Society for Analytic Philosophy',
  u'Jerome Leocata',
  u'Kyle Stanford',
  u'Martin Bunzl',
  u'Normative',
  u'HowTheLightGetsIn',
  u'Immediacy (philosophy)',
  u'Freethought',
  u'New realism (philosophy)',
  u'Minnesota Philosophical Society',
  u'Will (philosophy)',
  u'British Society for the Philosophy of Science',
  u'Hans Blumenberg',
  u'Platonic realism',
  u'C. D. C. Reeve',
  u'Dewitt H. Parker',
  u'Anguish',
  u'Extrinsic finality',
  u'Possible world',
  u'Hylomorphism',
  u'Reliabilism',
  u'James Burnham',
  u'Non-wellfounded mereology',
  u'Of Grammatology',
  u'David John Farmer',
  u'Tennessee Philosophical Association',
  u'Arindam Chakrabarti',
  u'Rational reconstruction',
  u'Scott Soames',
  u'Philosophy of space and time',
  u'Society for Phenomenology and Existential Philosophy',
  u'Wolfgang Fritz Haug',
  u'Analytical Thomism',
  u'Ontology',
  u'Qualification problem',
  u'List of years in philosophy',
  u'Historicity (philosophy)',
  u'Frankfurt cases',
  u'Philosophy of technology',
  u'Luca Incurvati',
  u'Emergentism',
  u'Truth and Method',
  u'Rational fideism',
  u'European Society for Analytic Philosophy',
  u'Problem of induction',
  u'German philosophy',
  u'Analytic philosophy',
  u'Incorrigibility',
  u'North American Nietzsche Society',
  u'Eternalism (philosophy of time)',
  u'Philosophy and economics',
  u'Theoreticism',
  u'Accidentalism (philosophy)',
  u'Constructive realism',
  u'Ernst Theodor Echtermeyer',
  u'David Sobel',
  u'Intercultural philosophy',
  u'Constructive empiricism',
  u'Jessica Wilson',
  u'Caribbean Philosophical Association',
  u'Hamid Reza Namazi',
  u'Philosophy of law',
  u'Intelligible form',
  u'Philosophy of life',
  u'Kentucky Philosophical Association',
  u'Glossary of philosophy',
  u'Kant-Studien',
  u'\xc9lisabeth Badinter',
  u'Henology',
  u'Idealism',
  u'Nanzan Institute for Religion and Culture',
  u'Ohio Philosophical Association',
  u'Clandestine cell system',
  u'Lorraine Smith Pangle',
  u'Libertarianism (metaphysics)',
  u'Philosophy of culture',
  u'Moral responsibility',
  u'Judith Andre',
  u"Pascal's mugging",
  u'Musica universalis',
  u'Robert Stern (philosopher)',
  u'Christopher W. Morris',
  u'Christian philosophy',
  u'Rietdijk\u2013Putnam argument',
  u'Lightness (philosophy)',
  u'Means to an end',
  u'Philosophy of color',
  u'Becoming (philosophy)',
  u'Robert Blanch\xe9',
  u'Existential nihilism',
  u'Robert Zimmer (philosopher)',
  u'Emil Lask',
  u'Formative epistemology',
  u'Metaphysics',
  u'South Carolina Society for Philosophy',
  u'Psychical nomadism',
  u'Golden mean (philosophy)',
  u'Metadiscourse',
  u'Logos',
  u'British Society for Ethical Theory',
  u'1649 in philosophy',
  u'Choiceless awareness',
  u'Anomalous monism',
  u'Philosophy of computer science',
  u'1922 in philosophy',
  u'Christopher New',
  u'Speculative realism',
  u'Innate language',
  u'Robert L. Holmes',
  u'James Childress',
  u'Paul Boghossian',
  u'Hylopathism',
  u'Metaphysical Society of America',
  u'Jennifer Lackey',
  u'Munich phenomenology',
  u'Annemarie Gethmann-Siefert',
  u'Knowledge relativity',
  u'Term logic',
  u'Ronald H. Nash',
  u'Object theory',
  u'Philosophy and the Mirror of Nature',
  u'The Oxford Companion to Philosophy',
  u'Harold F. Cherniss',
  u'Hassan Hasanzadeh Amoli',
  u'Richard Baron (philosopher)',
  u'Personal identity',
  u'Anat Biletzki',
  u'Philosophia Mathematica',
  u'Discourse on the Method',
  u'List of metaphysicians',
  u'Composition of Causes',
  u'Philosophy: The Quest for Truth',
  u'Feminist philosophy',
  u'Semantic unification',
  u'Mich\xe8le Le D\u0153uff',
  u'Penelope Deutscher',
  u'Paul Crowther',
  u'1978 in philosophy',
  u'Antiphilosophy',
  u'Tara Smith (philosopher)',
  u'John Foster (philosopher)',
  u'Thierry of Chartres',
  u'J\xe9r\xf4me Demers',
  u'Karl Jaspers Society of North America',
  u'Accidental necessity',
  u'Nelson Thomas Potter, Jr.',
  u'Epicureanism',
  u'George Pappas',
  u'Abstracta',
  u'Dysteleology',
  u'Psychologism',
  u'Cogito ergo sum',
  u'Biofact (philosophy)',
  u'Philosophy of religion',
  u'Internalism and externalism',
  u'Progressivism',
  u'Australasian Association for Logic',
  u'Mereology',
  u'Nomological',
  u'Indigenous American philosophy',
  u'Humanism',
  u'Kai Wehmeier',
  u'Michael Detlefsen',
  u'Swampman',
  u'Ferdinand Christoph Oetinger',
  u'Sam Gillespie',
  u"Philosopher's Information Center",
  u'Katharina Hacker',
  u'Marxist philosophy',
  u'Experientialism',
  u'Equipossibility',
  u'Organic architecture',
  u'Natural order (philosophy)',
  u'Infinity (philosophy)',
  u'Conventionalism',
  u'Peter D. Klein',
  u'Kurt Riezler',
  u'Passions (philosophy)',
  u'Private language argument',
  u'Communitarianism',
  u'L.A. Paul',
  u'Identity (philosophy)',
  u'Reasons and Persons',
  u'Antoine de Vinck',
  u'State of affairs (philosophy)',
  u'Self-Constitution',
  u'Acatalepsy',
  u'Leonard M. Fleck',
  u'Spiritualism (philosophy)',
  u'Objective precision',
  u'Intervention philosophy',
  u'Well-founded phenomenon',
  u'Stanis\u0142aw Musia\u0142',
  u'Facticity',
  u'Jesse Prinz',
  u'Humanistic Buddhism',
  u'Practical reason',
  u'Philosopher',
  u'Mississippi Philosophical Association',
  u'David Farrell Krell',
  u'Karel Lambert',
  u'Werner Leinfellner',
  u'Ressentiment',
  u'The Fragility of Goodness',
  u'Vladimir Jank\xe9l\xe9vitch',
  u'1973 in philosophy',
  u'Theses on Feuerbach',
  u'Ramification problem',
  u'Love and Pain',
  u'Active intellect',
  u'Neo-Kantianism',
  u'Analytic\u2013synthetic distinction',
  u'Mind Association',
  u'Euphraeus',
  u'Technics and Time, 1',
  u'Intelligibility (philosophy)',
  u'Fraternity (philosophy)',
  u'Razor (philosophy)',
  u'Nomological determinism',
  u'Krystyn Lach Szyrma',
  u'Silloi',
  u'Preston Covey',
  u'Gary Gutting',
  u'Philosophy of sex',
  u'Passive intellect',
  u'Medieval philosophy',
  u'Andy Clark',
  u'Aesthetic Theory',
  u'Philosophy of sport',
  u'Multitudes',
  u'Sophism',
  u'Bad faith (existentialism)',
  u'Human Affairs',
  u'Dialectic of Enlightenment',
  u'Brain in a vat',
  u'Involution (philosophy)',
  u'Introducing... and ...For Beginners book series',
  u'Odo Marquard',
  u'Eudemian Ethics',
  u'Index of ethics articles']]

indegree distribution

In [37]:
dldf.describe()
Out[37]:
ndegree
count 3104793.000000
mean 3.632298
std 89.540298
min 1.000000
25% 1.000000
50% 1.000000
75% 3.000000
max 80249.000000

~7.9 million articles have 0 indegree (~80% of all articles)

How many articles with > 100 indegree?

In [13]:
dldf[dldf['indegree'] > 100]['indegree'].count()
Out[13]:
4826

What's the distribution of the top ndegree quartile?

In [14]:
dldf[dldf['indegree'] > 3].describe()
Out[14]:
indegree
count 568638.000000
mean 13.289071
std 208.948619
min 4.000000
25% 4.000000
50% 6.000000
75% 9.000000
max 80249.000000
In [16]:
sns.boxplot(x='indegree', data=dldf[dldf['indegree'] > 3])
plt.title("ndegree distribution of top quartile")
Out[16]:
<matplotlib.text.Text at 0x121511d90>

ndegree on log-log scale

In [5]:
dldf_sorted['rank'] = np.arange(1, dldf_sorted.shape[0]+1)
dldf_sorted['log(rank)'] = np.log10(dldf_sorted['rank'])
dldf_sorted['log(indegree)']=  np.log10(dldf_sorted['indegree'])

highest ranking sample

In [5]:
plt.scatter(dldf_sorted['log(rank)'][:100], dldf_sorted['log(indegree)'][:100], color="#F08080", 
            label=r'$\alpha$ = -0.788'+"\n$\gamma$ = -0.266"+"\nPearson\'s r = -0.98 ")
plt.xlabel("$\log_{10}$(rank)", fontsize=14)
plt.ylabel("$\log_{10}$(indegree)", fontsize=14)

#make axis font size larger
plt.tick_params(axis='both', which='major', labelsize=14)
plt.legend(fontsize=14)

#change axis labeling to 10^#
ax.semilogy(np.log10(x), y)
plt.plot(range(0, 3), [x*-0.7882627 + 5.034976 for x in range(0, 3)])
plt.plot()
Out[5]:
[]

note: model above is the best fit for the entire dataset

  • For Power Law fitting see "power_law_indegree.ipnyb"

full dataset

In [ ]:
plt.scatter(dldf_sorted['log(rank)'], dldf_sorted['log(indegree)'], color="#F08080",
                        label=r'$\alpha$ = -0.788'+"\n$\gamma$ = -0.266"+"\nPearson\'s r = -0.98 ")
plt.xlabel("$\log_{10}$(rank)", fontsize=14)
plt.ylabel("$\log_{10}$(indegree)", fontsize=14)
plt.tick_params(axis='both', which='major', labelsize=14) #axis font size
plt.legend(fontsize=14)

plt.plot(range(0, 8), [x*-0.7882627 + 5.034976 for x in range(0, 8)])

#define plot axis limits
axes = plt.gca()
xticks = axes.xaxis.get_major_ticks()
xticks[0].label1.set_visible(False)
yticks = axes.yaxis.get_major_ticks()
yticks[0].label1.set_visible(False)


#save figure
plt.savefig(path+'ndegree_loglog.png', format='png', dpi=300, bbox_inches='tight')
In [12]:
 
Out[12]:
[u' ', u'', u'', u'', u'', u'', u'', u'', u'', u'']
In [ ]:
 
In [7]:
slope, intercept, r_value, p_value, std_err = stats.linregress(dldf_sorted["log(rank)"], dldf_sorted["log(indegree)"])
print slope, intercept, r_value, p_value, std_err 
-0.788262734511 5.03497683312 -0.978552362215 0.0 9.41747005457e-05

Data Loading and Generation

In [ ]:
# ONE TIME TO GENERATE N-degree

#load first link network 
    # runtime < 30min
results_path = "/Users/mark/Desktop/wiki_v4/"
with open(results_path + "fln.json") as f:
    fln = json.load(f)
fldf = pd.DataFrame(fln.items())
fldf.columns = ['article', 'first link']

#reverse hash of direct_links
direct_links = defaultdict(list)

for article, first_link in fln.iteritems():
    direct_links[first_link].append(article)
    
with open(results_path + 'direct_links.json', 'w') as fp:
    json.dump(direct_links, fp)

dldf = pd.DataFrame(direct_links.items())
dldf.columns = ['article', 'direct links']

#add ndegree dataframe column
l = lambda x: len(x)
dldf['ndegree'] = dldf['direct links'].map(l)