I had to use fic posted to AO3, since it's pretty difficult/impossible to automatically scrape information about fic posted to LJ or scattered across many other websites. I did scrape livejournal once, to look at fic written for the acd_holmesfest fic exchange, but it took ages, whereas scraping AO3 is easy.
So I wrote a script to automatically extract the metadata info for any subset of fic on AO3, and started by looking at the number of words per fic.
Data and images under the cut:
For "Entire Holmes Fandom", I used around 10,000 fics tagged: "Sherlock Holmes - Arthur Conan Doyle" (2,550 fics), "Sherlock Holmes (Downey films)" (1,350 fics), "Elementary (TV)" (1,700 fics) or "Sherlock (TV)" (4,300 fics).
For "Holmestice" and "ACD Holmesfest", it's fic tagged for those collections (318 and 104 fics respectively).
- For the fic exchanges, the "fics" under 1000 words all seem to be art/podfic. I didn't attempt to automatically exclude them.
- Since there are 87,000 Sherlock fics on AO3, and it would have taken 36 hours to scrape them all, I only sampled 4,300 randomly chosen ones.
- I excluded everything tagged for more than one verse, because I wanted to see if there were some interesting differences in fic length between the four verses.
- Obviously, this only includes fic posted to AO3, e.g for ACD Holmesfest only 104 out of around 170 works that have been created for that exchange.
How to read these figures:
The horizontal axis is the number of words per fic (e.g. 0-1000 words, 1000-2000 words etc.) and the vertical axis is how often fic with that number of words occurs.
The log-scale plots let you see some details better by "zooming in" on the lower part of the vertical axis.
This is Holmes fandom in general vs fic written for the two exchanges:
This figure separates out the two exchanges:
So for exchanges, people mostly write fic around 1000-5000 words, and rarely go above 15,000 words or so. No surprises there, given that these fic are all written to a deadline.
I was surprised by the data for fandom in general, though. The fics I write myself fit the distribution for fic exchanges pretty well. I hadn't realised the distribution for fandom in general was so different. So many short fics, and some incredibly long fics!
Finally, here's something cool in the raw data. Each point on these plots is one individual fic, and the vertical axis is truncated so we only see fics with less than 700 words. As well as the 100-word drabbles and the 60-word Sherlock60 fics, you can clearly see all the 221B's people are writing in ACD and Sherlock verses :)