I learn so many cool things in my preservation class. Lately we've been talking about digitization and whether it is a viable form of preservation or just a way to increase access. I believe the two go hand in hand, but that's not the point of this post.
Have you ever wondered where the images of distorted words used for online security come from? the text comes from digitized books and newspapers, and every time you solve one, you help to digitize the text of a book.
A digitized book isn't much good if it's just a series of images representing the pages. Users demand instant searchability within the text, so Optical Character Recognition was developed to let computers "read" the text. But not every text is so easily read, which leaves a lot of words unrecognized and unsearchable. So a CAPTCHA program is used to allow the general public help decipher the words. Each CAPTCHA displays two words for the user to transcribe as a security measure, to prove that the transaction is legitimate. One of the words will be completely unreadable to computers, but the other is. If the known answer is correct, the program assumes the unknown is as well, and the translation is added.
Pretty cool, huh? I bet you didn't know you were helping to digitize the New York Times when you logged in to leave a comment on a blog or post a link on facebook. That knowledge makes deciphering the twisted CAPTCHA words a little less annoying.
You can find out more about CAPTCHA and reCAPTCHA here.
-Kim
Librarian, You're a grand old
11 years ago
No comments:
Post a Comment