- 11K Latin Books. 11,261 OCR’d Latin texts from the Internet Archive (1.38B words), along with associated metadata detailing the dates of composition.
- CMU Book Summary Dataset. 16,559 book plot summaries + metadata.
- CMU Movie Summary Dataset. 42,306 movie plot summaries + metadata
- Twitter14K Dataset. Aggregated word counts from 14,464 Twitter users (9.2M tweets)