Wednesday, November 12, 2008

Week 11 readings/muddiest point

Deep Web: Michael Bergman

This is an interesting article on 'deep web."  I'm still not sure how it gets such a fancy name.  I'm still not sure exactly what it is.  For example, they listed ebay as a deep-web page.  I can get to that via search engine very easily.  In the web, it seems that deep web will not look any different than surface web.  In fact, all web was deep web until maybe the advent of search engines.
Maybe it is differentiated because search engines can reach only about 16% of the web.  It is a shame, because there is 400-550 more times the public information on the deep web.  To conclude, I surmise that the deep web is not inaccessible, just not randomly searched.  People who use the deep web know where they're going and so don't google it.  Though I could be wrong.

Web Search Engines: part 1 /David Hawking

The premise here is that web search engines provide high quality information quickly. They cannot and should not attempt to index the web in its entirety. Indexing begins with a "seed" Url.   The search engine can then search inside the seed (ex: topics within wikipedia).
Different search computers search different areas, and forward search requests to the machines that are assigned it.  They also make sure that web browsers are not overwhelmed with requests by adding a politeness delay to make sure each request goes in 1 at a time. 
Robots do not-recrawl over all the web.


Web search Engines part 2/David Hawking

The vocabulary of the web includes many languages, including new words specific to internet culture, and also includes misspellings and grammatical errors.
Most queries are 2 words long. All query searches include all query words.
Search engines have strategies to speed up searches: they can skip, make lists of decreasing value, assign number scores according to their decreasing value.  They can cache: pre-store anticipate search answers.

OAI Protocol for Metadata harvesting: 
I think this about metadata and steps taken to be able to comprehensively search it?  Seriously, I'm lost.

Muddiest Point: Is there something I'm missing about the deep web?  Why is it not linked to search engines?
Could someone explain OAI to me in simple language?

2 comments:

jean said...

I liked the deep web article as well. It seems crazy that search engines reach only a small percentage of the web; I guess I always get so many results from my searches, that it sort of boggles my mind to think about how much more is out there that didn't show up in my results.

bkd10 said...

I was lost also on the article about metadata, hopefully some more reading will help make it clearer.