Feeling peckish? Download all 22 Flipbooks at once:
$\mbox{Flipbook}~\flipbooktwitter$—Word use on Twitter:
US Presidential Election (2016-11-09)
versus the Charlottesville Unite the Right Rally (2017-08-13);
Variation of $\alpha$.
$\mbox{Flipbook}~\flipbooktwitterRT$—Word use on Twitter:
US Presidential Election (2016-11-09)
versus the Charlottesville Unite the Right Rally (2017-08-13);
Variation of inclusion of retweets from 1% to 100%;
$\alpha = 1/3$.
$\mbox{Flipbook}~\flipbooktwittertimediff$—Word use on Twitter:
Variation of time comparing 2019/01/04 going forward
roughly logarithmically in number of days to a year ahead,
2020/01/03, the day of the assassination of Qasem Soleimani;
$\alpha = 1/3$.
$\mbox{Flipbook}~\flipbooktrees$—Tree species abundance on Barro Colorado Island:
Fig. 3
with variation of $\alpha$.
The Flipbook shows how increasing $\alpha$ from 0 leads to an increasingly poor fit on
the rank-rank histogram.
$\mbox{Flipbook}~\flipbookgirlsyears$—Baby girl names over time:
Described in Sec. III D,
comparisons of baby girl name distributions 50 years apart
starting in 1880 and going forward in 5 year increments,
with $\alpha = 1/3$.
Ends with Fig. 4.
$\mbox{Flipbook}~\flipbookboysyears$—Baby girl names, 1968 vs. 2018:
Described in Sec. III D,
shows effect of varying $\alpha$,
with Fig. 4
as the fifth page.
$\mbox{Flipbook}~\flipbookgirlsalphas$—Baby boy names over time:
Described in Sec. III D,
comparisons of baby girl name distributions 50 years apart
starting in 1880 and going forward in 5 year increments,
with $\alpha = 1/3$.
Ends with Fig. 5.
$\mbox{Flipbook}~\flipbookboysalphas$—Baby boy names, 1968 vs. 2018:
Described in Sec. III D,
shows effect of varying $\alpha$,
with Fig. 5
as the fifth page.
$\mbox{Flipbook}~\flipbookmarketcapsyears$—Market caps:
Comparison of market caps for publicly traded companies
in the fourth quarter six years apart,
starting with 1995 versus 2001 and ending with 2012 versus 2018,
and with $\alpha$ fixed at 1/3.
$\mbox{Flipbook}~\flipbooktwittertrunc$—Word use on Twitter, truncated:
Full series of allotaxonographs
corresponding to histograms of
row 1 in Fig. 7 with $\alpha=1/3$.
$\mbox{Flipbook}~\flipbooktreestrunc$—Tree species abundance, truncated:
Full series of allotaxonographs
corresponding to histograms of
row 2 in Fig. 7 with $\alpha=0$.
$\mbox{Flipbook}~\flipbookgirlnamestrunc$—Baby girl names, truncated:
Full series of allotaxonographs
corresponding to histograms of
row 3 in Fig. 7 with $\alpha=\infty$.
$\mbox{Flipbook}~\flipbookboynamestrunc$—Baby boy names, truncated:
Full series of allotaxonographs
corresponding to histograms of
row 4 in Fig. 7 with $\alpha=\infty$.
$\mbox{Flipbook}~\flipbookcompaniestrunc$—Market caps, truncated:
Full series of allotaxonographs
corresponding to histograms of
row 5 in Fig. 7 with $\alpha=1/3$.
$\mbox{Flipbook}~\flipbooknba$—Season total points scored by players in the National Basketball Association:
Season to season comparison of total player points per season, $\alpha$ = 1/3.
The Flipbook starts with 1996–1997 versus 1997–1998
and ends in
2017–2018 versus 2018–2019.
Rookies, retirements, injuries are all in evidence.
For $\alpha=1/3$, Carmelo Anthony in 2003–2004 has the strongest debut, just ahead
of Lebron James in the same year.
Overall, Dwyane Wade's 2008–2009 season produced
the highest $\rtdelement{1/3}$, moving from
$\zipfrank$=51 to 1 over the previous year where
he was limited in playing time with injuries.
In 2008–2009, Wade's points per game of 30.2 would be the highest of his career
but his team, the Miami Heat, would founder, achieving the worst record in the NBA.
$\mbox{Flipbook}~\flipbookgoogleonegrams$—Google Books, Fiction in 1948 versus 1987, 1-grams:
The first of three Flipbooks exploring $n$-gram usage in books
by varying $\alpha$.
We have
elsewhere
documented the deeply problematic influence of
scientific literature and individual books,
rendering the Google Books $n$-grams project unreliable, as is.
Nevertheless, the Version 2 $n$-grams dataset for English fiction
is
worth exploring
with different instruments,
and we are endeavoring separately to provide corrective measures.
For 1948, we see characters and place names dominate, and these
come from a few books (e.g., Upton Sinclair's 'Lanny Budd', 'Raintree County').
The 1987 side shows words that are not tied to specific books
but rather cultural and temporal phenomena, as well as cruder language:
'KGB', 'CIA', 'Vietnam',
'lesbian', 'television', 'computer', and 'fucking'.
Tuning $\alpha$ towards $\infty$, we can see pronouns changing slightly in rank
with 'her and 'she' elevating and 'he' and 'his' dropping.
$\mbox{Flipbook}~\flipbookgooglebigrams$—Google Books, Fiction in 1948 versus 1987, 2-grams:
For 2-grams, we again see character names dominate 1947 for low $\alpha$
('Sung Chiang', 'the Perfessor'), while 'the CIA' and 'the KGB' stand out for 1987.
Increasing $\alpha$ brings in the same words as for 1-grams preceded by 'the'
('the phone', 'the computer').
As $\alpha \rightarrow \infty$, bigrams with 'not' as part appear more strongly for 1987.
$\mbox{Flipbook}~\flipbookgoogletrigrams$—Google Books, Fiction in 1948 versus 1987, 3-grams:
For 3-grams, while we still see characters and place names for 1947, we now
have what we call 'pathological hapax legomena', words (or trigrams in this case) that occur
once in many books. The 3-grams are all from standardized, legal-speak front matter
coming from outside of the story: 'change without notice', 'your local bookstore', and 'Cover art by'.
A second kind of trigram that dominates appears to be one that appears as
part of a book's title printed on every page in the header or footer.
As we increase $\alpha$, we again see 'not' appearing in contributing 1987 trigrams.
Because of the combinatorial explosion around words like 'computer' and 'phone',
we no longer see them in the trigram lists.
One upshot of this brief inspection of Google Books is to highlight the value of
separately examining $n$-grams.
We also note that the 3-gram example is our largest system-system comparison with
system sizes on the order of $10^9$.
$\mbox{Flipbook}~\flipbookharrypotter$—Harry Potter books, all 1-grams:
Comparison of each Harry Potter book relative to all all other books in the series
combined, using $\alpha$=1/2
(the single book is the right hand system, the merged set of 6 books the left system).
Character names and major objects and places dominate,
and the first book is most different from the others combined.
$\mbox{Flipbook}~\flipbookharrypotternocaps$—Harry Potter books, uncapitalized 1-grams:
The same comparison as the previous Flipbook but now with
all capitalized words excluded, as an example attempt to
use a different lens on our allotaxonometer.
Hagrid's speech in part separates Book 1 ('yer', 'ter'),
Book 3 has 'rat', 'dementor', and a relative abundance of em dashes ('—'),
Book 7 has 'sword', 'wand', and 'goblin'.
The dominant elements are things, places, and repeated
actions (e.g., spells) and descriptors.
To examine changes in functional word usage, which may reveal
changes in Rowling's writing, we would increase $\alpha$
as we did for Google Books.
Again, we see the relative ease of taking subsets with ranks
for allotaxonometry.
$\mbox{Flipbook}~\flipbookdeathcauses$—Causes of Death in Hong Kong:
Five year gap comparison of causes of death reported per year in Hong Kong,
starting with 2001 versus 2006 and moving through
to 2012 versus 2017.
Overall, pneumonia is the leading cause of death.
In the second half of the time frame,
'kidney disease' and 'dementia'
stand out as becoming more prevalent.
Deaths listed as due to heroin drop off markedly in 2012 and 2013
relative to 5 years before.
We note that changes in diagnoses, practices, and
categorization are all confounding issues.
$\mbox{Flipbook}~\flipbookjobnames$—Job titles:
US job titles based on text analysis of online postings,
2007 compared with 2018;
variation across three kinds of job categorization,
from coarse- to fine-grainated groupings,
with suitable variation of $\alpha$
($\alpha=0$, $\alpha=1/12$, and $\alpha=1/3$).