Data Sets | Jason Hao's Blog
0%

Data Sets

Here, we colloct all useful datasets for different taks

大区 Dataset

  1. 各个领域 dataset awesome data

  2. Google Datasets contains 2.5 million datasets, which can be searched by keywords. It collects datasets from vast domains.

  1. Huggingface Datsets (spare github link) includes many datasets for NLP tasks.

  1. Kaggle Datasets is a well-known machine learning dataset collection.

  1. Paper with Code datasets contains 4075 machine learning datasets. It contacts papers with their code and dataset.

  1. Reddit Datasets is also a famous dataset which supports discussion over each dataset.

  1. CLUE Datasets is a big Chinese NLP dataset.

  1. Some other datasets:
    • https://www.datasetlist.com/
    • https://github.com/awesomedata/awesome-public-datasets
    • https://tinyletter.com/data-is-plural
    • https://jupyter-tutorial.readthedocs.io/en/latest/data/index.html
    • https://www.openml.org/search?type=data
    • https://github.com/InsaneLife/ChineseNLPCorpus

NLP

nlp-datasets - 很好的自然语言资料集集合 The Big Bad NLP Database CLUEDatasetSearch

Ontology Learning - Concept Formation

References