Here, we colloct all useful datasets for different taks

大区 Dataset

Post author: Jason Hao
Post link: <a href="https://jason-huanghao.github.io/2021/07/27/Research%20Method/Data-Sets/" title="Data Sets">https://jason-huanghao.github.io/2021/07/27/Research Method/Data-Sets/
Copyright Notice: All articles in this blog are licensed under <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/" rel="noopener" target="_blank"> BY-NC-SA unless stating additionally.

各个领域 dataset awesome data
Google Datasets contains 2.5 million datasets, which can be searched by keywords. It collects datasets from vast domains.

Paper with Code datasets contains 4075 machine learning datasets. It contacts papers with their code and dataset.

Reddit Datasets is also a famous dataset which supports discussion over each dataset.

NLP

SimLex-999 designed a gold standard for similarity measurement for pairs of words. It emphasis the similarity other than the relatedness between words, which is focused by the WordSim-353. paper
VC-SLAM Versatile Corpus for Semantic Labeling And Modeling contains 101 data sets from different open data portals, and a target ontology and an additional ontology for mappings to the PLASMA platform. vc-slam
https://www.wikidata.org/wiki/Wikidata:Database_download 、
中文 https://dumps.wikimedia.org/zhwiki/
YAGO 抽取、构建代码
Microsoft Concept Graph
https://lod-cloud.net/ | https://lod-cloud.net/clouds/geography-lod.svg
Ordnance Survey Linked Data Platform
复旦 CN-DBpedia
清华 XLORE
KBpedia
GCMD（Global Change Master Directory）RDF、OWL、CSV、JSON 格式下载 https://gcmdservices.gsfc.nasa.gov/static/kms_save/
GeoNames Ontology
LinkedGeoData
OSM Semantic Network
http://schemas.opengis.net/ gml等，部分内容为 rdf 格式