<$BlogRSDUrl$>

Tuesday, October 14, 2003

Google is the Most Popular Search Engine Yet only Indexes One-Third of Web 

The Google search might be the most popular search, but it is misses six million web pages in the scan for matches to your query. According to a September 15th article by Andrew Orlowski in The Register, the problem is access-" Information costs money, and this has taken the sheen off the 'Internet' as it was once sold to us. The most valuable collections limit their access, for very good economic reasons: they can't afford not to." To recoup costs and perhaps fund acquisitions, website art and information collections are often gated and require memberships or pay-per-view fees on photos/documents. Other proprietary interests are competitors and thus do not want their results to be available to Google users, so their entire databank is skipped.

Libraries are an example he uses. Google does not scan library collections if materials are held behind a gate that demands affiliation and tribute. In the example he uses for the San Francisco Library however, it appears any public library card would grant you access, and thus this gate is an illusion. Other libraries are more strict, like university libraries for students, alumni and faculty only, or medical libraries for doctors only, but in the quest for universal access to information they might all be wise reconsider their position and make an effort to allow public viewing of unique material.

Newspapers online are more often than not behind a subscriber gate and so clicking on a link to The NY Times Or LA Times does not result in the full story, but demand for payment. Huh? Newspapers are basically now an advertising device that make lots of money from charging willing merchants and politicians for advertising, way beyond the costs of production and management. It is common knowledge that newspapers give out subsidized or free copies in order to boost circulation figures and raise ad costs. So, the cost of producing a story has already been paid for numerous times over by the advertisers, yet the paper is so petty as to charge the public for each entry to view the story? How will I ever see the advertisers message if I have to pay-to-play? How can they keep one guy from going in to get the story and recirculating it freely?

Google and the other top search engines should want to be more integrated with one another to become complimentary instead of competitive. One way to do this would be to sponsor an international collective quest to index the entire net.
The main benefactor would be the Google users as the data rolls in and Google adds connections the hidden web, but the sites added would benefit also from new traffic as they join the wired wide world.

Why do we want to redo all the expensive, valuable work that was already done by the early pioneers scanning and transcribing away to get their pet projects electronically available? Who has the time to go backward? Why should every library have to scan the same book (Huck Finn, Little Women)? Can we trust each little individual library to do it and not alter or omit things? Why do sites with some books not link to other sites with different titles by their author or other versions?

My solution resolves all the questions at once. Remember the Book People in Farenheit 451--They went to the free place beyond the city and each picked one book to memorize to preserve and transmit ideas and stories the Firemen tried to expunge. In this case, our free land is one impregnable and interminable repository (LOC, IPL) (Google, are you listening? Why not build the Google Worldwide Public Library on your spacious canvas?) for as many versions/editions of each title as necessary. Each individual library would link to this one server to get the author's version(s) of a book and also see entries for edited, translated, or parody editions of this title. Then they could stop wasting their precious money and time each individually spending time copying Henry Huggins or the daily newspaper and just tap into the one universal directory of all books and media. They could better use their time copying unique local contibutions that noone else has access to and fighting the privatization of information.

Comments: Post a Comment

This page is powered by Blogger. Isn't yours?