Saturday, October 18, 2003

Pinging Trackbacks Broke Google But The Register Readers Show Quick Fix 

Trackback, a utility from Movable Type that web bloggers are using to let readers note they have read the associated post, is causing tremendous changes in your search results at Google. The reason is that this ping caused by trackback creates a whole new ostensibly blank page linked to the contents of the associated entry. This might be a useful enhancement for the individual blog author, but it creates chaos at Google's results.

The Register did a test with a search term on both AltaVista and Google, and though the AltaVista top-ten results were all credible and active links, seven out of the top-ten Google results linked to blank trackback pages.

This is only one of the problems with Google these days. They have another big work-around scheduled as they figure out what to do to shut out the Page-Rank manipulators who figured how to raise their rank in the results by having linked similar pages and from mining keywords.

The Register readers offer two simple fixes to help amend your results. The first is to add '-trackback' to the search query. The other proposed solution is to add this variable to the search at Google: '-mt-tb.cgi' and we can only hope Google is listening.

Microdoc News Maps What Google Misses in Searching Web 

Microdoc News did a study in May of this year to try and map what Google leaves out when it searches the Internet. They found from Google that it indexes 3,083,324,652 web pages out of an estimated 10 billion or so pages that are actually on the Internet.

After using a desktop crawler to seek out a specific term, they found Google missed 71% of the web pages with that term, 50% of the blogs with the term, 63% of the educational sites, 55% of the news and information providers, and 92% of personal pages using the search term.

Friday, October 17, 2003

According to Orrin Hatch, General Accounting Office Estimates Fifty-Percent of Seach Request Results Are Pornographic Entries 

Senator Orrin Hatch (R-Utah) has launched a new career as porn czar. In a speech as chairman to Senate Judiciary Committee Wednesday afternoon, he boasted, "I am currently considering legislative solutions to the many risks inherent in the use of peer-to-peer networks. Almost half of the people who use these networks are minors. Recent studies have shown that millions and millions of pornographic files are available for downloading on these networks at any given time. This is simply unacceptable. Many parents -- possibly the majority of them -- are unaware of this problem. Even more disturbing is that searches on these networks using search terms that a child would be expected to use, such as Harry Potter or Pokemon, turn up an enormous percentage -- over 50 percent in one study according to the General Accounting Office -- of pornographic materials including child pornography."

Here is how he arrived at that conclusion, according to the article at dc.internet.com:
"The March GAO report cited by Hatch conducted a Kazaa search for image files using 12 keywords known to be associated with child pornography on the Internet. Of 1,286 items identified in the search, approximately 42 percent were associated with child pornographic images. The remaining items included 34 percent that were classified as adult pornography and 24 percent as non-pornographic. In another Kazaa search, the U.S. Customs CyberSmuggling Center used three keywords to search for and download child pornography image files. The search identified 341 image files, of which approximately 44 percent were classified as child pornography and 29 percent as adult pornography."

A Wired news story about the hearings on September 9th of this year has Hatch issuing his warnings to providers of such internet gateways: "Hatch, well-known as an outspoken critic of peer-to-peer trading of copyright music, warned that if file-swapping networks do not rein in illicit porn trafficking, lawmakers "might have to do something detrimental. At one point, Hatch asked law enforcement witnesses on the panel: "Do you suggest we put out of business the networks that allow this to occur?"

Does anyone else notice how Hatch fudged the figures up to suit his needs? Also, looking for pornographic terms deliberately is not the same as conducting normal searches, so this data is skewed from the start. A more valuable study could be made by using the same number of searches with random keywords, not just pornographic keywords. Another thing the censors might want to consider about all this is that by sheltering children from the real world, they are crippled when they have to live in it.

One-Quarter of All Search Request Are For Porn According to Internet Filter Review 

Internet Filter Review is an sales and evaluation site for internet filters users install to not see porographic words or images. The defined obscenity each of these commercial filters target is variable and can be user-modified, so it is difficult to see how they arrived at their figures objectively. Still, the Internet Filter Review estimates that there are 4.2 million porn Web sites out there now allowing access to 72 million worldwide visitors annually (40 million Americans). "These statistics have been derived from a number of different reputable sources including Google, WordTracker, PBS, MSNBC, NRC, and Alexa research," they advise on their site.

The Internet Filter Review report claims 68 million daily search engine requests, or one-quarter of all requests, are for pornographic material. They cite as evidence to this extrapolated number that daily Gnutella child pornography requests at 116 thousand per day. I could not find any footnotes, links, or other proof of this number at Internet Filter Review.

Gnutella? Haven't they been picked on enough?Info-Anarchy from 2001 tells: "Ferrero has started to censor various Gnutella sites because Gnutella violates their rights on the Nutella trademark. (Nutella is a chocolote-hazelnut-cream produced by Ferrero.) First, gnutella.de, a German Gnutella-site, has been sued and forced by a Cologne court not to use the domain any longer (under threat of paying $250,000 for non-compliance). Ferrero argued that because Nutella & Gnutella sounded so similarly, Nutella was harmed by the website. It would be an obvious danger to the trademark if "millions of Internet-users no longer associate Nutella with our family-friendly nut-nougat cream, but with a virtual conglomerate of pirates and child pornographers". "

Orrin Hatch made a speech Wednesday that reconfirmed his intent to stop child porn and he is looking at these P2P networks for a way to shut them down. In an older story reposted today,Gnutella.com, William Brown wonders about the motives behind using the threat of P2P downloads as the ruination of youth. Many Gnutella-users do not want to even discuss the issue, since defending porn is akin to promoting it in the eyes of the filters.

Thursday, October 16, 2003

Mail- Feedback 

"I share you sentiments 100 per cent, but the cold truth is that
information costs money. Can you work through the finances for me?"
(It will appear here over the next few days.)

"The copyright holders only allow the collections to use their
material because of limited distribution. What's in it for them?"
(I understand the rights of property-owners and the urge to profit from your work. The thought is not to trespass on their property rudely, but show them how providing free-access public-use materials will create goodwill, which is a tangible commodity, and enhance their ongoing project by making more people aware of what they have to sell or share. I am not advocating tearing down all walls, just asking proprietors to provide a lobby--some place where users can see things without paying first.)

>Newspapers are basically now an advertising device that make lots of
>money from charging willing merchants and politicians for
>advertising, way beyond the costs of production and management. It
>is common knowledge that newspapers give out subsidized or free
>copies in order to boost circulation figures and raise ad costs.

Yes, they're advertising vehicles. So are we, in that sense.
(Wait, I don't get paid for this and have no sponsors other than the passive one at the top that hosts this page. Advertisers are paying the major daily newspapers to air their latest campaign and project the image they want to the public. This means the newspaper will not print negative things about the advertiser or face losing the funds. That means the paper has been compromised and is no longer objective, but subjective to the tone of the advertiser.)

But "way beyond the costs of production and management" isn't true if you look
at the margins. If you look at the UK national market, three titles
operate at a healthy profit, three more get by, and the rest are
subsidized or run otherwise run at a loss.
(True, I should have qualified that with more adjectives. Okay, I will take back the hyperbole and say that "newspapers generally run at a profit, or soon cease to exist.")

>How much would this cost?
It's a question you have to try to answer, rather than ask, if the
proposal is to be taken seriously. I'll help if I can...(xxxxxx xxxxxxxx)
(more mail later)

Tuesday, October 14, 2003

Public Library of Science Debuted on Web Sunday with One-Half a Million Hits 

The Public Library of Science (PLoS) is a nonprofit effort based in San Francisco by like-minded scientist who want to make research available to the public for free, according to a CNET article today. The instead will pay a $1500 listing fee for PLoS to review,edit, host and serve their papers for general public access. Many science articles archived online are currently available only in abstract form to non-subscribers of the journals that publish them. Even current issues of medical and science journals are sometimes are unobtainable without an industry affiliation.

The amazing amount of web traffic that ensued on Sunday, over a half-million hits in eight hours, crashed their server. "We always expected a lot of interest, but we're surprised by this response," said Nick Twyman, director of information technology and computer operations. They are working to expand their capacity and expect to be able to handle all comers.

It's speculated the journal has generated so much attention because a report in it is about brain implants in monkeys that enable them to control a robotic arm with their thoughts.

The article drawing all this atention is entitled Learning to Control a Brain-Machine Interface for Reaching and Grasping by Primates, available only in PDF format because the full-report is 3.3 mb. A synopsis description is included:"With visual feedback, macaque monkeys learn to control a robot arm through a neural interface which records activity from multiple cortical areas."

Search Engine Results Increasingly Burdened by Hidden Commercial Content 

CNET has a series of articles about the proliferation of stealth ads and the reasons behind them. Back in June Evan Hansen told how the Federal Trade Commission sent a letter to the seven companies that owned the twelve leading search engines that they would have to clarify their language to make it easier to tell if a search result were commercially sponsored. Acting on a complaint from Commercial Alert, a consumer advocacy organization, the FTC established a set of guidelines that they intend to enforce. AltaVista, AOL Time Warner, Direct Hit Technologies, iWon, LookSmart, Microsoft and Terra Lycos were the targets of this action. A copy of the recent FTC letter was also sent to Overture, Yahoo, InfoSpace, About.com, Google and Disney.

The agency singled out a long list of terms that it considers inadequate, including "Recommended Sites," "Featured Listings," "Premier Listings," "Search Partners," or "Start Here."
"Other sites use much more ambiguous terms such as 'Products and Services,' 'News,' 'Resources,' 'Featured Listings,' 'Partner Search Results,' or 'Spotlight,' or no labels at all," Commercial Alert Executive Director Gary Ruskin said.

Stephanie Olsen asks in a July CNET article, "The trend has raised concerns that the public might be misled about the editorial independence of search listings, which have frequently been promoted as unbiased research tools."

She has isolated two prongs of the problem, paid placement and paid inclusion. Paid placement lets you sponsor keywords so you will rank higher in search results. These obvious ads are usually contained to a header or footer to the true results. As an example, Tire companies buy the top spot in the search results for wheels, rims, snow chains, whitewalls, check tread, repair tube, or any other such tire-themed request. Those unlucky bloggers writng about stealing hubcaps or how to check your tread with a penny, and the collectors with their wire rim galleries and old valve stem box sets won't be found beneath all the commercial appeals.

The other form of advertising is more hidden and possibly dangerous. "Paid inclusion largely pertains to "organic" search engines such as Inktomi, AltaVista and Fast Search and Transfer's AlltheWeb, which provide technology that scours the Web and uses mathematical algorithms to compile relevant results. Under financial pressure, many such sites developed programs to guarantee companies that they would "crawl" or search a Web address more often, for a price," said Dean Forbes, an attorney with the FTC's division of advertising practices. The price could be an overall listing fee or a pay-per-click arrangement based on the number of follow-through consumers.

In addition to charging sites to crawl their data, search engines have another version of this money-maker that charges a fee for expedited listing in their directory. They all boil down to the same thing, which is the guy who pays gets more attention, more hits, more authority-ranking than all the rest of us.

Ms. Olsen asks the proper questions: "One area of concern for Web site owners is that the search providers could artificially keep their indices stale to promote the for-fee program. The question some ask is what's the incentive to buy into a search index if the technology is already visiting all of its pages every week or two? But if search providers let themselves grow outdated, they face rivals at every turn. "

Some of the engines mentioned are still not complying, according to yesterday's update at CNET by Stepanie Olsen. She cites Inkotomi as an example of a web search engine that persisit in offering pay-per-click results undesignated as such. Others like AltaVista consider a link to a disclosure page enough.

"Search engines like to say it doesn't affect the rankings. But there have been cases where rankings on AltaVista and Inktomi were boosted (for marketers that pay)," said Danny Sullivan, an editor of Search Engine Watch, an online industry newsletter.

"It's much more noticeable then it was in the past," he added, even though out of the 1.5 billion Web pages being indexed, only about 3 million pay to be crawled more often. "The way that it's mixed in with ordinary content can be favorable to (marketers)," he said.

"Yahoo spokeswoman Diana Lee said that as long as the search results are relevant, the company is doing its job.
"Results are based on relevancy, irregardless of whether a site participates in paid inclusion," Lee said, though she did not define how that relevancy is determined."

"Some companies that offer paid inclusion, including AlltheWeb and AltaVista, have disclosed it by adding a tiny link labeled "about" near results pages. The link leads to a disclaimer that describes how companies can pay to have their sites visited more frequently. Yet Sullivan and others say that search providers need to separate these results or label them conspicuously."

Gary Price posted this review of yesterday's article at ResourceShelf.com:
"Stefanie Olson writes about the labeling of search results, paid inclusion, and paid placement.
Like a Business Week article from 10 days ago, Olson's article makes no mention about how the work of the search engine optimization industry influences the results you see, even with Google.
What does this mean for the researcher? 1) Knowledge of the problem 2) The ability to use several web engines in an advanced manner.
This can help you get to the most precise results possible in the shortest amount of time.

Google is the Most Popular Search Engine Yet only Indexes One-Third of Web 

The Google search might be the most popular search, but it is misses six million web pages in the scan for matches to your query. According to a September 15th article by Andrew Orlowski in The Register, the problem is access-" Information costs money, and this has taken the sheen off the 'Internet' as it was once sold to us. The most valuable collections limit their access, for very good economic reasons: they can't afford not to." To recoup costs and perhaps fund acquisitions, website art and information collections are often gated and require memberships or pay-per-view fees on photos/documents. Other proprietary interests are competitors and thus do not want their results to be available to Google users, so their entire databank is skipped.

Libraries are an example he uses. Google does not scan library collections if materials are held behind a gate that demands affiliation and tribute. In the example he uses for the San Francisco Library however, it appears any public library card would grant you access, and thus this gate is an illusion. Other libraries are more strict, like university libraries for students, alumni and faculty only, or medical libraries for doctors only, but in the quest for universal access to information they might all be wise reconsider their position and make an effort to allow public viewing of unique material.

Newspapers online are more often than not behind a subscriber gate and so clicking on a link to The NY Times Or LA Times does not result in the full story, but demand for payment. Huh? Newspapers are basically now an advertising device that make lots of money from charging willing merchants and politicians for advertising, way beyond the costs of production and management. It is common knowledge that newspapers give out subsidized or free copies in order to boost circulation figures and raise ad costs. So, the cost of producing a story has already been paid for numerous times over by the advertisers, yet the paper is so petty as to charge the public for each entry to view the story? How will I ever see the advertisers message if I have to pay-to-play? How can they keep one guy from going in to get the story and recirculating it freely?

Google and the other top search engines should want to be more integrated with one another to become complimentary instead of competitive. One way to do this would be to sponsor an international collective quest to index the entire net.
The main benefactor would be the Google users as the data rolls in and Google adds connections the hidden web, but the sites added would benefit also from new traffic as they join the wired wide world.

Why do we want to redo all the expensive, valuable work that was already done by the early pioneers scanning and transcribing away to get their pet projects electronically available? Who has the time to go backward? Why should every library have to scan the same book (Huck Finn, Little Women)? Can we trust each little individual library to do it and not alter or omit things? Why do sites with some books not link to other sites with different titles by their author or other versions?

My solution resolves all the questions at once. Remember the Book People in Farenheit 451--They went to the free place beyond the city and each picked one book to memorize to preserve and transmit ideas and stories the Firemen tried to expunge. In this case, our free land is one impregnable and interminable repository (LOC, IPL) (Google, are you listening? Why not build the Google Worldwide Public Library on your spacious canvas?) for as many versions/editions of each title as necessary. Each individual library would link to this one server to get the author's version(s) of a book and also see entries for edited, translated, or parody editions of this title. Then they could stop wasting their precious money and time each individually spending time copying Henry Huggins or the daily newspaper and just tap into the one universal directory of all books and media. They could better use their time copying unique local contibutions that noone else has access to and fighting the privatization of information.

Internet EKG Sorts Out Who's Searching for What 

Internet EKG is a demographic consultant and statistical management expert. They provide a useful list of resources in their Search Word Watch page. It is not kept up-to-date though, as several of the links are dead. Of interest here for word watchers is the scrolling marquee of 50 unattributed Top Seach Words. Here are the words I captured today:

Internet EKG Top Search Words

1) mp3
2) travel
3) sex
4) music
5) free
6) movies
7) games
8) jobs
9) real estate
10) lyrics
11) food
12) maps
13) pictures
14) hotels
15) cars
16) health
17) ebay
18) education
19) software
20) wallpaper
21) books
22) clip art
23) cracks
24) britney spears
25) pokemon
26) free sex
27) recipe
28) web sites
29) web hosting
30) playstation 2
31) sex stories
32) boats
33) playboy
34) porno
35) legal
36) icq
37) business
38) adult
39) wireless
40) sony
41) dating
42) quotes
43) used cars
44) auction
45) internet
46) greeting cards
47) clipart
48) computers
49) news
50) crack

SearchUK.com Invites You to Spy on Them 

If you wonder what is England's desire, view the Top 50 search requests from the past day over at SearchUK. These results seem to be in no particular order. The sidebar also holds another list of popular searches on SearchUK and they provide links to other international search spies.

A series of popular words on the Search Uk site would include:

1. Florist
2. Strip+Poker
3. Breeders
4. adult party games
5. Adult
6. AVS
7. Clinics+and+Practitioners
8. C-sharp
9. Education
10. Escorts
11. Female
12. Restaurant+Chains
13.sex shop
14. Shopping
15. Government

StockCharts.com Voyeur Sneaks a Glance at What the Other Guy is Doing 

Glance over stocks being currently researched by other computer users at StockCharts.com's handy SharpCharts Voyeur.

Free and simple, they flash one chart per screen that was just requested by somebody else about twenty minutes before. A series of charts from today were:

1. United Online, Inc.
2. Motorola, Inc.
3. Euro Index
4. Superior Consultant Holdings Corp.
5. Environmental Techtonics Corp.
6. Ivanhoe Energy, Inc.
7. DCGN Daily
8. Bank of America, Corp.
9. Nasdq 100 Index
10. Provident Energy Trust
11. Copper Futures-COMEX
12. Roxio, Inc.
13. Cour De Alene Mines Corp.
14. Axonyx, Inc.
15. General Motors Corp.
16. Advanced Micro Dvcs (devices), Inc.
17. Mitsubishi Tokyo Financia
18. Medimmune, Inc.
19. US Unwired, Inc.
20. Gen-Probe, Inc.

The best feature about the Voyeur at Stock Charts.com is the pause button, which enables you to temporarily disable the refresh screen, which refreshes every 30-45 seconds.

InfoTiger Voyeur Witnesses Real Search Action 

InfoTiger (a.k.a. metatiger.com) is a sleek search engine based in Germany with a clear and attractive layout and interesting results. Their directory features alot of gaming and technology.

They offer a voyeur with randomly selected keyword results with two options, unfiltered or content-sensitive. A sample of the latest twelve unfiltered InfoTiger search words includes:
1. nesticle
2. amtrak
3. online+dictionary
4. pictures
5. chyna
6. pontius+copilot
7. greece
8. chess
9. free+screen+savers
10. canada
11. xxx
12. free+games

Nesticle turns out to be a freeware NES emulator so you can download broken-code Nintendo games and movies to play on your pc or other system. I rather thought it might be a nest of testicles.

Monday, October 13, 2003

MoPilot Live Searches Spy on the Wireless World 

Mopilot is a leader in bridging the gap between mobile computing and HTML databases. " Developed in early 1999 by wap4.com, a small team led by its CTO Dieter Kneffel and CIO Sandra Leyh tailored what soon should become the world's first real search engine for wml pages. Since then, our automated html/wml-crawler constantly digs the internet for appropriate contents," they explain on the mopilot site.

The mopilot Live Search page is the voyeur for their search engine, and it provides lists of the fifteen latest search phrases designated by agent- mobile WML or HTML. Here is one set of the latest searches on mopilot today:

1. new mobile phone's
2. pinkworld
3. gay
4. pics
5. hooligan
6. tones
7. girls
8. Gay downloads
9. pictures
10. girls
11. pictures
12. girls
13. skin head
15. orange

Lycos Search Engine Examines Their Most-Popular Requests in a Daily Weblog 

Lycos Network has about 12 million different queries everyday, averaging 2.3 words each. They make available weekly a list of the most-popular terms under the title Lycos 50, which Lycos began in 1999. During the week you can read their accompanying weblog which breaks down and analyzes trends and methodology.

The Lycos 50 most-popular searches of the week is a filtered final list compiled with the following adjustments:

-Misspellings and plurals are counted toward the single term sought. "Variations on a theme (e.g. campers, camping, campouts) may be combined into a single representative term, based on our editors' judgment."

-spam as detected by automated or mechanical entries or commercially-motivated activity. They will eliminate a flood of inquiries from one source if it is out of character.

-general categorical terms, like news, weather, music, and tattoos.

-purient content consisting of pornography, four-letter words (!) and otherwise lewd inquiries. They will stonewall searches on the names of adult film stars unless the term is driven by news events.

-queries on general computer utilities are ignored "to prevent the medium itself from skewing the list." Still, they include the specific names of file-swapping utilities because of the "political controversy" about them.

-company names are excluded because hits on them are largely based on the level of web presence. Exceptions are made as company names emerge in mergers, catastrophes, breaking news, or if the company name becomes an element of popular culture, like Xerox.

-countries are excluded because they are often added as an additional search term to speed location. They make exceptions for countries with current international news interest.

The top 20 search terms used on Lycos Networks for the past week ending October 4th:

1. Halloween
2. KaZaA
3. Costumes
4. Britney Spears
5. NFL
6. Brooke Burke
7. Soccer rape scandal
8. Clay Aiken
9. Apollo 11
10. Dragonball
11. Las Vegas
12. Christmas
13. Pamela Anderson
14. Lord of the Rings
15. Baseball
16. Mary Carey
18. Hilary Duff
19. The Bible
20. Final Fantasy

Sponsor Popular Search Phrases at ixquick Search Spotlight 

The search engine ixquick reveals the most commonly sought terms in their ixquick Spotlight. They use this data to offer sponsorship of select words and phrases to allow your website to rise to the top of the heap. There are three levels of sponsorship, beginning with any of the top 100 most sought words available for $500 each, and then offering one of the top 1,000 search phrases for $150, and finally one of the top 10,000 most entered search terms for $50 each.

The ixquick Spotlight of the top hundred most-requested words is alphabetically presented and dominated by generic nouns like car, baby, bank, girl, model, sex, and slots.

Their list of the current one thousand most-sought terms is again made up mostly of single word nouns, verbs, and adjectives but they must be arranged in some odd order here. It starts alphabetically and then randomness ensues.

Here is a passage of twenty of the thousand most-popular search terms from the top of the document:

1. employment
2. erotic
3. erotica
4. flower
5. free porn
6. free porno
7. free sex
8. gamble
9. gameboy
10. gamez
11. gays
12. gaysex
13. gift basket
14. gift baskets
15. giftbasket
16. giftbaskets
17. health
18. holiday
19. home security
20. indian

Here is a passage of ten more of the top thousand searches on ixquick from lower in the same document:

1. covers
2. carat
3. can
4. preteen
5. carnaval
6. stock
7. hepatitis
8. jokes
9. grand

Ask Jeeves Search Spy IQ Tells Wrong from Right 

Jeeves IQ is the peek behind the screen for the most common searches on their engine. Ask Jeeves is one of the most user-friendly engines, simulating a consultation with a researcher by facilitating full sentence requests. They probably get more questions about how to do things and what is the formula.

They have several small lists of results for the past week, but the best one is the Top Misspelled Searches. Here are the five most-popular misspelled Ask Jeeves searches for the week of Ocober 3rd as they are commonly misspelled:

1. Jonny Depp
2. Blazin squad
3. Josh Harnett
4. Fabolous
5. Britany Spears

The Top Ask Jeeves searches for the week of October 3rd are:

1. Lyrics
2. Dictionary
3. Games
4. Jokes
5. Free ringtones
6. Maps
7. Horoscopes
8. Free clipart
9. Halloween costumes
10. Food

It can be ascertained from this list that Jeeves has selective hearing and does not account for all the searches for free sex and free porn that must innevitably come his way.

Webmasters Tell of their Most Disturbing Search Requests  

Disturbing Search Requests is a blog that specializes in the strange requests blog webmasters screen. Disturbing Search Requests is a collabrative effort aimed at projecting misleading search engine results and bizarre connections. Webmasters can post their odd results and solicit comments from others. They do have some guidelines: "You cannot post racist or homophobic comments. You cannot solicit child porn. You cannot post a searcher's IP address, DSR is for amusement, not punishment."

This is a fun search spy site that gets a lot of traffic, judging by the number of daily posts. Some of the topics posted here over the past day include:

pictures of people who have been scalped, "father pimping" child UK, intense and shameless sex, arse quotes, greasy levi boots, condoleezza rice lesbian bitch, Wrigley Field gallery nudity, definition of a freeloader re: alcohol, "mom on a porn site", and the bitch had my car towed.

AltaVista Real Searches Displays the Latest Wants in News, Audio and Video 

AltaVista has a spy feature called Real Searches to enable users to see five of the latest requests sought in six different categories: Audio, News, Video, Images, Directory, and Web. This is not a total spy, as the results seem to be filtered for some offensive content. The AltaVista voyeur filter seems to skew the results toward common themes in each category, as you can see by refreshing the lists.

Looking just at the web requests on AltaVista shows these consecutive responses:

1. thai recipes
2. quilting patterns
3. life insurance quotes
4. divx
5. post-traumatic stress
6. hinduism
7. human anatomy
8. rural Russia
9. hydroquinone
10. pictures of london

Another set of ten recent AltaVista requests for audio disclose:

1. south park
2. john philip sousa
3. howard stern fartman
4. james bond 007
5. the doors "the end"
6. surfin bird
7. johann sebastian bach
8. battle hymn of the republic
9. "ray charles"
10. nuns

See alltheweb Behind the Search Requests 

Overture Services has one of the leading search engines with their tool alltheweb. The alltheweb spy posts the last ten terms sought in two formats, unfiltered or filtered for offensive content. A set of the latest searches filtered for offensive content yields:

1. goodrich power systems
2. "cuneiform"writing
3. lpga lesbians
4. insurance underwriter exams
5. garmin unlock code
6. mokelumne hill
7. Denver realestate
8. nilufer
9. fiber gel pills
10. Texas Law

An immediate check of the unfiltered list of terms from alltheweb brought up:

1. pearl tea
2. link.all:dalewarlandsingers.org
3. petition camera courtroom
4. Mission
5. Hewlett-Packard HP NX700 (DG706A)
6. nassa
7. san diego union-tribune classified ads
8. chester county internet
9. nylon nurse uniform
10. sexy woman in lingerie

Metaspy Lets You Search the Search Engines 

The giant meta-searcher Metacrawler has a place where you can see what other users are looking for at Metaspy. Every 15 seconds Metaspy presents a new sample of current terms, and the sister site Meatspy Exposed allows you to see an unfiltered list of in-progress searches for mature audiences only.

This is a sample from the filtered Metaspy:

1. "diet pregnancy"
2. helena mt airport
3. andrew mcdermott artist author
4. wyandotte michigan community center
5. kay yerger
6. free babyshower games
7. african american christmas cards
8. personal naturist photos
9. yahoo photos

The Meatspy Exposed was a letdown, as instead of raw torrents of adult-oriented language, it was simply a more charmingly intellectual version of Metaspy. There were no cuss words or gross misspellings, or even bizarre queries. Here is one set of words being sought:

1. illustrator tutorials
2. jessica's cafe delafield wisconsin
3. "snoopy"+party+supplies
4. ross medical education
5. freud "invented dreams"
6. bermuda dunnes, ca
7. machining data handbook
8. google
9. threesome stories
10. art emily adams

Kanoodle Can Show Odd Search Requests 

The search engine Kanoodle uses a database of over 350 million keywords for finding information about anything on the web. Kanoodle Search Spy is the real-time tool to view what others seek. As they offer raw queries, they have posted a warning "The following page contains unfiltered content which may be considered obscene or offensive to some people. If you believe that you will not be offended by such content and are over the age of 21 years you must agree to the terms below by clicking the "I Agree" link."

The Kanoodle Search Spy tool posts the last 10-12 terms sent to their engine, refreshing the list every ten seconds. Here are some terms that popped up on Kanoodle today:

vitamin D, Yahoo.com, sell, horse weight gain, woman health, pulled groin muscle treatment, levitra, people search, caffeine studies, colorlinez, splenda and pregnancy, Tim Dede, door, Hedstrom gym set, bilder, trip Brazil, shopping, Winsor pilates discount, disabled development, moonalisa, Big Mac calories.

Search.com's Top Search Terms at CNET Shows Users Want Kazaa 

CNet has publishes a list of the top 100 search terms for the past day at their engine site Search.com. The latest data is for the week ending October 5th:

Top 20 Searches at Search.com
1. kazaa
2. kazza lite
3. games
4. soulseek
5. google
6. winmx
7. music
8. dead aim
9. mp3
10. lyrics
11. winzip
12. screensavers
13. msn
14. winmix
15. drivers
16. msn messenger
17. search engines
18. kaaza lite
19. antivirus
20. aim

As expected for a technology website, the predominant subjects sought are computer utilities, software, or activities. Kazaa is so popular it has three diffent entries in the top twenty. The general search terms "music" and "lyrics" also are popular. With no entries for sex or porn, you can be sure this is not really what people type in, but a child-friendly edited list of what folks might want to find on the internet.

Daypop Crawls Blogs and News Media for Common Themes 

Everyday Daypop measures nearly sixty thousand news sites and blogs to come up with the most popular subjects of the day. In explaining the relevency ratings for link analysis, they state, "trend analysis constructs a ranked list of hyperlinks based on freshness and popularity culled from weblogs within its index."

"The second method of trend analysis uses the concept of "Word Bursts". These are words that have experienced a heightened usage within the past couple days. The Word Burst page ranks the top twenty of these words in the blogging world. They act as indicators of memes that don't necessarily have an authoritative link and therefore wouldn't make it on to the Top 40 page," according to Daypop, who also produces a daily Top News Burst page.

Daypop Top Blog Bursts
1. Shirin, lech
2. rehab
3. olympian
4. lolita
5. pondered
6. walesa
7. screener
8. Lackawanna
9. rafah
10. OIC (Organization of the Islamic Conference)
11. portent
12. slush
13. inhaler
14. rendezvous
15. bulldozing
16. decorate
17. losung
18. nomenclature

Daypop Top News Bursts
1. conjoined
2. dodgers
3. subs
4. Ebadi
5. Shoemaker
6. rehab
7. separated
8. ACC
9. resignations
10. rafah
11. jockey
12. footsteps
13. disputes
14. Robertson
15. shiites
16. frets
17. Ferrari
18. insecurities
19. tunnels
20. Mindy

Wordtracker Reveals What Web Searchers Seek 

A truer reflection of what things people search for is scrolling at Wordtracker. This service daily publishes a list of the top 50 search terms for various search engine voyeurs and metacrawlers for the past 24 hours.

Wordtracker Top 20 for October 13
1. sex
2. google
3. porn
4. halloween costumes
5. ebay
6. jokes
7. yahoo
8. health
9. free porn
10. games
11. msn emotions
12. map
13. mapquest
14. dictionary
15. pussy
16. search engines
17. online degrees
18. kazaa lite
19. hotmail
20. yahoo.com

Wordtracker can prepare a larger report of keywords in three formats: all keywords, adult content keywords only, or keywords without adult content. They offer to send you the top 500 words searched for free every week. They then offer the 20,000 most popular searches for about $100; get 100,000 words from search requests for about $500; and get 500,000 search strings for about $2000.00.

A recent September 4th article of an interview by Scott Buresh with the founder of Wordtracker Andy Mindel can be found at Search Engine Guide. In it he reveals the sources for his data are the meta-searchers Dogpile and Metacrawler. "We examined keywords from other engines and noticed a distortion from position checkers and hard coded queries. One thing we notice is that the top keywords always fall into a certain pattern and these usually consist of google, hotmail, sex, mp3, etc."

Mindel continues, "When this pattern changes then we know something is up and often it's because the engine database is being used at another site (for example gambling or shopping sites)."

Sunday, October 12, 2003

Google Zeitgeist  

Trends at Google, Inc. can be seen in lists, maps and charts at their colorful Google Zeitgeist page. The word zeitgeist is a cognative noun from the German words for time and spirit and reflects the general intellectual, cultural, and moral climate of the period.

Mystery must be the general spirit of this time, for (according to Wired, May 2003) Google users collectively use the Google search engine for 3,000 inquiries every second, totaling 260 million searches per day.

The most recent data at Google Zeitgeist is from the week ending October 6th, and is broken into the top ten declining inquiries and top ten gaining queries.

Top 10 Gaining Queries

1. siegfried and roy

2. rush limbaugh

3. premiership players

4. yom kippur

5. coetzee

6. elections ontario

7. paradise hotel

8. rugby world cup

9. halle berry

10. anastasia volochkova

Top 10 Declining Queries

1. robert palmer

2. berlin marathon

3. edward said

4. the bachelor

5. evanescence

6. oktoberfest

7. rosh hashanah

8. george plimpton

9. india

10. trillian

Returning to the Wired magazine article, it is Michael S. Malone's account of 24 hours spent witnessing how it all works. He explains, "To honor good manners, the program filters out obscene requests. Whether out of ignorance, faith, or belief in the safety of numbers, an estimated 52 million people around the world, 42 percent of all search engine users, entrust the site with some of their deepest, most vulnerable thoughts and desires."

Wired Magazine interjected real search phrases in the article. Reading the raw unfiltered requests to Google's search engine is more illuminating than any published list:

Krispy Kreme Donuts, Rhumba,Naturist Boy, How to Pray to the Rosary, Fishnet stocking, Poem procrastination, Pulpotomy, Marijuana for sell, Hottest young boys for free, Attracted to my professor, Horse + penetration, Timid dog, Tarot shops in New Delhi, Mario chick wit da braids, Couple voyeur, Battlefield 1942, Cuckold wife.

Yahoo Buzz Index Weighted by Omissions 

Search Engine trackers are the utilities that show what subjects people are looking for on the internet. They might be useful to judge the general public mood and effectiveness of advertising. They could also show trends in language, reflect changing ideas, or help predict interest in an upcoming sports event.

Yahoo Buzz Index tracks the strange words people type into their search engine and makes available a weekly list of the top twenty most queried subjects. Here is their list for this past week ending October 12th:

  1. Halloween

  2. Britney Spears

  3. Sigfried and Roy

  4. Fifty Cent

  5. NFL

  6. Jennifer Lopez

  7. Chicago Cubs

  8. Christina Aguilara

  9. Kazaa

  10. Linkin Park

  11. Boston Red Socks

  12. Beyonce Knowles

  13. Eminem

  14. R. Kelly

  15. Kaaza Lite

  16. NASCAR

  17. Chingy

  18. Roy Horn

  19. Ludachris

  20. Kill Bill

Is anything filtered out? "Company names (such as Yahoo!), utilities and formats (email, MP3), and general terms (movies, downloads, football) are filtered out by the editors of the Yahoo! Buzz Index", according to their frequently asked questions.

They proceed "The editors' goal is to list subjects that are interesting to the broadest possible audience. To this end, terms related to adults-only content are also excluded. In some cases, the editors may also exclude terms that they believe have been elevated by similarity to unrelated popular terms. For example, the movie The Rock might be excluded if the buzz was determined to be solely generated by interest in the WWF star, The Rock."

Reprising the list they give, Yahoo Buzz omits these:
A. Company Names
B. Computer Utilities
C. Computer Formats
D. General Terms
E. Adult Content
F. similar but unrelated terms

This means the Yahoo Buzz is not really what people want, but what Yahoo thinks we want to see. Use this service only if you want a revisionist list of popularity skewed toward what Yahoo wants you to like.

This page is powered by Blogger. Isn't yours?