Last Updated: 10/17/2024
I am going to start adding to this.
Language Modeling Data Sets:
Link – Salesforce // The WikiText Long Term Dependency Language Modeling Dataset
Link – Paperswithcode – Language Modeling
Link – Paperswithcode – Penn Treebank Dataset
Link – Kili – Open-Sourced Training Datasets for Large Language Models (LLMs) [ list of datasets [
AI
Reading Time: < 1 minute