Until fairly recently, we weren't getting all the available Web content from our favorite search engine. A lot of dynamic databases were hidden from search engines due to the limitations of Web crawling technology.

Now the Web is changing. It's going from a limited database of text-only documents to a wider collection of data in a variety of formats -- thanks to technology advances and user demand. So we have access to files of many formats like audio, video, PDF, Excel, Power Point, and so forth.

A while back, search engines indexed only "surface Web" documents (files found by traditional crawlers). Now, most major search engines have the technology to index the "deep Web" -- a vast depository of hidden content in dynamic databases previously untapped.

The difference between the surface and deep Web is both qualitative and quantitative. Qualitatively, deep Web content includes images, sounds, presentations, and many types of media invisible to search engine crawlers. Quantitatively, it's estimated to be about 500 times larger than the surface Web, although this can be misleading.

How Deep?

A BrightPlanet study conducted in March 2000 estimated public information in the deep Web to be about 500 times larger than the existing World Wide Web. The study stated the deep Web "contains billions of documents in hundreds of thousands of specialty databases hidden from public view." However, it may not be useful to index all of these documents.

Some estimates say the useful deep Web is merely two to three times the size of the surface Web. That's because some dynamic sites (content management solutions) can generate millions of variations of the same page (personalization, price, experiments, etc.) -- but should each of these be considered a unique page? Perhaps not.

Dynamic Site Indexing

A number of search engines now index dynamic content, including AltaVista, FAST, Google, Inktomi, and Lycos to name a few. Many of the paid-inclusion programs will index additional dynamic content for a fee. We have AltaVista Trusted Feed, FAST PartnerSite (through Lycos InSite), and Inktomi Search/Submit.

These premium services do not influence positioning like the pay-per-click programs. Rather, they will index more of the Web and include more frequent refreshes. While pricey, it's ideal for submitting Web pages traditionally difficult to crawl (large database dynamic sites and framed sites).

Deep Web Search Tools

There are a number of products and services that enhance deep Web searching, including BrightPlanet, Intelliseek's Invisible Web, ProFusion, Quigo, and C|Net's Search.com. Quigo is capable of retrieving, normalizing and indexing documents in an offline crawling process, while the others focus on expanding the meta-search engine concept, which enables users to submit queries to thousands of sites (rather than the dozen or so through regular meta-search engines).

Quigo technology operates behind the scenes, allowing portals and search engines to access and manage dynamic Web content that is not indexed by traditional search engines. It maps these pages and uses Information Extraction (IE) algorithms to restructure the information within a page, keeping each piece of data within its relevant context. Restructuring works as follows:

1. Categorized results - Upon each query, users are presented with a list of all relevant categories in which results were found. This enables quick refinement, pinpointing the most relevant information. (A search for "ford" presents categories such as person, car, actor, company, etc.).

2. Associative search - Certain attributes are hyperlinked in each search result. By clicking the linked words, users can traverse within Quigo's deep Web database to locate other similar documents. (A search for "Jurassic Park" will bring up several book sites; click the author's name, and all books found by Michael Crichton are displayed.)

This ability to restructure data can be expanded in many ways. Information can be sorted by date in comparison shopping, or to locate the best performing stocks, to find the closest coffee shop by zip code, and so forth. Since deep Web sites can be among the most authoritative sites on the Web, this solution can help overcome the relevancy issue, providing a highly useful service for users as well.

It's apparent that Web search continues to change as the Web matures. The portals, engines, and directories that give users the best results are those that will prosper and help define the nature of Web search in the 21st century.

Subscribe today...it's free!

MarketingProfs provides thousands of marketing resources, entirely free!

Simply subscribe to our newsletter and get instant access to how-to articles, guides, webinars and more for nada, nothing, zip, zilch, on the house...delivered right to your inbox! MarketingProfs is the largest marketing community in the world, and we are here to help you be a better marketer.

Already a member? Sign in now.

Sign in with your preferred account, below.

Did you like this article?
Know someone who would enjoy it too? Share with your friends, free of charge, no sign up required! Simply share this link, and they will get instant access…
  • Copy Link

  • Email

  • Twitter

  • Facebook

  • Pinterest

  • Linkedin


ABOUT THE AUTHOR

Paul J. Bruemmer is founder of trademarkSEO (www.trademarkseo.com). Reach him at paul@trademarkseo.com.