When Will Programmable and Semantic Search Replace Spiders and Web Crawling?

Changes are coming to search that will fundamentally change not only the way Web sites get ranked but also the way we search, and see search ads.

The most immediate changes are at Google. Experts say the company could release requirements this year that will force Web sites to revamp their optimization techniques to stay atop results. Today, the most powerful popular search engines - Google, Yahoo!, Ask - all use "spiders" to scan sites and "index" them. The spiders crawl a site, looking at keywords, page tags, words in documents, links from other sites and so on to give a Page Ranking. The engines also are in constant games of cat and mouse, trying to improve their algorithms to penalize sites that try to fool them into illegitimately high rankings.

Google's new system, say the experts, will add layers of context, delve into rich media and make search results increasingly relevant to individual users. Carefully reading five patents Google applied for earlier this year, Bear Stearns analysts identify Google's next step as the Programmable Search Engine (PSE).

PSE will require sites to submit .xml feeds, telling the search engine what's on a site in a more fulsome way than the spiders can grasp. That way, a site can feed data for all its content, including video, moving graphics and other non-text items that today are not well-considered by search engines. It will also allow sites to tell the search engine of material on the "dark Web" that today is blocked by passwords and other walls. Cross-referencing that feed with normally indexed results and then again with data collected on an individual's past search history, PSE will make a search more meaningful for that individual.

If, for example, someone interested in baseball looks for "bats," their results will be different from a zoologist, whose results in turn will be different from a child's. It also means ads served using the PSE technology will be more relevant, targeted and valuable than ever. And it means that sites using current search engine optimization (SEO) practices will find their rankings sink over time, says search consultant Steve Arnold, who contributed to the Bear Stearns report. "The [new] architecture is a quantum leap," Arnold said at a recent iBreakfast gathering of technology and media executives in Manhattan. "The entire SEO structure will change."

Semantic Search Uses New Methods

Meanwhile, as Google's PhDs work on ways to further corner a market the company already controls, other companies such as Hakia, Lexxe and Cognition Search are working on natural language or "semantic" search, which is an even further conceptual leap from what Google is planning.

Google's indexing algorithm, they say, ultimately relies on popularity to find results. It doesn't really understand a true question that a human being would ask. The natural search programmers are using linguistic and machine theory techniques to teach computers to understand human speech. Phrase a question normally and their search engines will answer. You'll get results that are targeted to you, and what you wanted without having to adjust your query to match what the search engine can understand or pore over reams of un-needed results.

Manhattan-based Hakia hopes to come out of Beta by the end of this year. Late in July they released a method of highlighting searches that shows contiguous phrases instead of the kinds of words with ellipses between them from unconnected terms appearing throughout a document, as in many a Google search. Hakia CEO Riza Berkan says Hakia is unearthing relevant results from the long tail, something a popularity-based system like Google can't do well. "There are literally an infinite number of queries that people can ask that aren't popular," Berkan told Jack Myers Media Business Report in an exclusive interview. "There is a gold mine sitting there."

Perhaps more importantly, next year Hakia plans to release a semantic ad serving technology they say will understand relevance of ads next to content and place them more appropriately than an indexed system ever can. "Google's algorithm, based on popularity, doesn't work for ads," Berkan says. "When they push ads, it's much more rudimentary, just keyword matches."

Berkan may be right. But he can also be sure that Google has enough brainpower, server power and cash to match anything his 50-person firm is doing - or just to buy Hakia if it needs to. Berkan acknowledges it could be ten years before natural language search takes hold.

For anyone who relies on search for their success, though, now is a good time to start rethinking ways of doing business. And to think of ways to use the new techniques for advertising and content opportunities.

Hakia can be reached through chief communications officer Rob Wyse at 212.920.1470.

Dorian Benkoil, a regular contributor toJack Myers Media Business Reportis a senior consultant for Teeming Media, a digital media business consultancy. He can be reached at Dorian@JackMyers.com.

Maryann Teller

Maryann has been part of the Myers team for over 30 years. She manages internal operations including content distribution, web management, human resources, accounting and research administration. read more