Skip to main content

80 Million Tiny Images

80 Million Tiny Images is a dataset intended for training machine-learning systems constructed by Antonio Torralba, Rob Fergus, and William T. Freeman in a collaboration between MIT and New York University. It was published in 2008.

The dataset has size 760 GB. It contains 79,302,017 32×32-pixel color images, scaled down from images scraped from the World Wide Web over 8 months. The images are classified into 75,062 classes. Each class is a non-abstract noun in WordNet. Images may appear in more than one class. The dataset was motivated by non-parametric models of neural activations in the visual cortex upon seeing images.

The CIFAR-10 dataset uses a subset of the images in this dataset, but with independently generated labels, as the original labels were not reliable. The CIFAR-10 set has 6000 examples of each of 10 classes, and the CIFAR-100 set has 600 examples of each of 100 non-overlapping classes.

Construction

It was first reported in a technical report in April 2007, during the middle of the construction process, when there were only 73 million images. The full dataset was published in 2008.

They began

Source: Wikipedia

No Comments yet!

Your Email address will not be published.