Curlie - Computers: Artificial Intelligence: Machine Learning: Datasets

Datasets

Repositories of data used to test/validate machine learning algorithms.
More information

Sites 14 Sorted by Review Date Sorted Alphabetically

The 20 Newsgroups Data Set

20 Newsgroups for text categorization. Widely used dataset.

aiHitdata

Random 10,000 worldwide companies sampled from aiHit. All data in this DB extracted and updated automatically from WWW using AI and machine learning.

ArrayExpress - functional genomics data

ArrayExpress is a database of functional genomics experiments that can be queried and the data downloaded. It includes gene expression data from microarray and high throughput sequencing studies.

Dataset Generator

Datgen is a computer program that generates data to systematically test programs that consume data. These synthetic datasets can be used to validate learning algorithms.

DELVE - Data for Evaluating Learning in Valid Experiments

A standardized environment designed to evaluate the performance of methods that learn relationships based primarily on empirical data. Delve makes it possible for users to compare their learning methods with other methods on many datasets.

Face recognition dataset

A dataset of face images for face recognition algorithms.

Learning Relational Concepts from Sensor Data of a Mobile Robot

A set of data sets, where each data set is represented in first order logic. Maintained at the University of Dortmund, Germany.

Machine Learning and Data Mining - Datasets

Machine Learning and Data Mining - Datasets (USPS digits, faces, links to various datasets prepared for Matlab)

National Space Science Data Center

Provides access to a wide variety of astrophysics, space physics, solar physics, lunar and planetary data from NASA space flight missions, in addition to selected other data and some models and software.

NIST Special Database 4.

This NIST database of fingerprint images contains 2000 8- bit gray scale fingerprint image pairs. NIST charges $90+$30 shipping for the data.

The RCSB Protein Data Bank (PDB)

Archive of experimentally-determined, biological macromolecule 3-D structures from the Brookhaven National Laboratory.

Reuters-21578 Text Categorization Corpus

A classic benchmark for text categorization algorithms.

TREC Data

Text datasets used in information retrieval and learning in text domains.

Web->KB dataset

Web pages partitioned into classes, with hyperlink data. The dataset has been used for text categorization and learning to extract symbolic knowledge from the World Wide Web.

The RCSB Protein Data Bank (PDB)

Archive of experimentally-determined, biological macromolecule 3-D structures from the Brookhaven National Laboratory.

aiHitdata

Random 10,000 worldwide companies sampled from aiHit. All data in this DB extracted and updated automatically from WWW using AI and machine learning.

Dataset Generator

Datgen is a computer program that generates data to systematically test programs that consume data. These synthetic datasets can be used to validate learning algorithms.

DELVE - Data for Evaluating Learning in Valid Experiments

Learning Relational Concepts from Sensor Data of a Mobile Robot

A set of data sets, where each data set is represented in first order logic. Maintained at the University of Dortmund, Germany.

Machine Learning and Data Mining - Datasets

Machine Learning and Data Mining - Datasets (USPS digits, faces, links to various datasets prepared for Matlab)

ArrayExpress - functional genomics data

ArrayExpress is a database of functional genomics experiments that can be queried and the data downloaded. It includes gene expression data from microarray and high throughput sequencing studies.

The 20 Newsgroups Data Set

20 Newsgroups for text categorization. Widely used dataset.

NIST Special Database 4.

This NIST database of fingerprint images contains 2000 8- bit gray scale fingerprint image pairs. NIST charges $90+$30 shipping for the data.

Reuters-21578 Text Categorization Corpus

A classic benchmark for text categorization algorithms.

Web->KB dataset

Web pages partitioned into classes, with hyperlink data. The dataset has been used for text categorization and learning to extract symbolic knowledge from the World Wide Web.

TREC Data

Text datasets used in information retrieval and learning in text domains.

National Space Science Data Center

Face recognition dataset

A dataset of face images for face recognition algorithms.

Last update:

October 30, 2023 at 5:15:15 UTC

Computers

Games

Health

Home

News

Recreation

Reference

Regional

Science

Shopping

Society

Sports

All Languages

Arts

Business

"Computers ... Datasets" search on:

AOL - Bing - Brave - DuckDuckGo - Ecosia - Mojeek - Google - StartPage - Tiger - Wiby - Yahoo - Yandex