My Account
Repositories of data used to test/validate machine learning algorithms.
More information
20 Newsgroups for text categorization. Widely used dataset.
Random 10,000 worldwide companies sampled from aiHit. All data in this DB extracted and updated automatically from WWW using AI and machine learning.
ArrayExpress is a database of functional genomics experiments that can be queried and the data downloaded. It includes gene expression data from microarray and high throughput sequencing studies.
Datgen is a computer program that generates data to systematically test programs that consume data. These synthetic datasets can be used to validate learning algorithms.
A standardized environment designed to evaluate the performance of methods that learn relationships based primarily on empirical data. Delve makes it possible for users to compare their learning methods with other methods on many datasets.
A dataset of face images for face recognition algorithms.
A set of data sets, where each data set is represented in first order logic. Maintained at the University of Dortmund, Germany.
Machine Learning and Data Mining - Datasets (USPS digits, faces, links to various datasets prepared for Matlab)
Provides access to a wide variety of astrophysics, space physics, solar physics, lunar and planetary data from NASA space flight missions, in addition to selected other data and some models and software.
This NIST database of fingerprint images contains 2000 8- bit gray scale fingerprint image pairs. NIST charges $90+$30 shipping for the data.
Archive of experimentally-determined, biological macromolecule 3-D structures from the Brookhaven National Laboratory.
A classic benchmark for text categorization algorithms.
Text datasets used in information retrieval and learning in text domains.
Web pages partitioned into classes, with hyperlink data. The dataset has been used for text categorization and learning to extract symbolic knowledge from the World Wide Web.
Archive of experimentally-determined, biological macromolecule 3-D structures from the Brookhaven National Laboratory.
Datgen is a computer program that generates data to systematically test programs that consume data. These synthetic datasets can be used to validate learning algorithms.
Random 10,000 worldwide companies sampled from aiHit. All data in this DB extracted and updated automatically from WWW using AI and machine learning.
A standardized environment designed to evaluate the performance of methods that learn relationships based primarily on empirical data. Delve makes it possible for users to compare their learning methods with other methods on many datasets.
Machine Learning and Data Mining - Datasets (USPS digits, faces, links to various datasets prepared for Matlab)
A set of data sets, where each data set is represented in first order logic. Maintained at the University of Dortmund, Germany.
20 Newsgroups for text categorization. Widely used dataset.
This NIST database of fingerprint images contains 2000 8- bit gray scale fingerprint image pairs. NIST charges $90+$30 shipping for the data.
ArrayExpress is a database of functional genomics experiments that can be queried and the data downloaded. It includes gene expression data from microarray and high throughput sequencing studies.
Text datasets used in information retrieval and learning in text domains.
A classic benchmark for text categorization algorithms.
Provides access to a wide variety of astrophysics, space physics, solar physics, lunar and planetary data from NASA space flight missions, in addition to selected other data and some models and software.
Web pages partitioned into classes, with hyperlink data. The dataset has been used for text categorization and learning to extract symbolic knowledge from the World Wide Web.
A dataset of face images for face recognition algorithms.
Last update:
October 30, 2023 at 5:15:15 UTC
Computers
Games
Health
Home
News
Recreation
Reference
Regional
Science
Shopping
Society
Sports
All Languages
Arts
Business