Sport & Auto
- About Future
- Digital Future
- Cookies Policy
- Terms & Conditions
- Investor Relations
- Contact Future
The Internet is a wonderful place filled with text, videos, and images. Lots and lots of images. In fact, Yahoo's popular Flickr photo sharing service is the lucky recipient of millions of historical images plucked from 600 million library book pages scanned in by the Internet Archive. The project is spearheaded by Kalev Leetaru, who began work on the massive undertaking while researching communications technology Georgetown University as part of a fellowship sponsored by Yahoo.
One thing that always bothered Leetaru was that digitization projects tend to focus on words while leaving out the pictures. What he's doing is the exact opposite. Leetaru went so far as to write his own software to sidestep the way books had originally been digitized.
According to BBC, the Internet Archive used an optical character recognition (OCR) program to analyze all 600 million scanned pages and turn the image of each word into searchable text. The software could detect which parts of a page were pictures, and it would discard them.
Leetaru's software taps into the process by taking that information and focusing on parts that the original OCR ignored. Each one was then saved as a separate JPEG picture. His software also copied the caption for each image.
The end result will be a searchable database of more than 12 million historical copyright-free images available on Flickr. At present, Leetaru has uploaded more than 2.6 million pictures, all of which you can browse here.