Apr 25, 2008

Is Keyword Search About To Hit Its Breaking Point?

Is the Web swells with more and more data, the predominant way of sifting through all of that data—keyword search—will one day break down in its ability to deliver the exact information we want at our fingertips. In fact, some argue that keyword search is already delivering diminishing returns—as the slide above by Nova Spivack implies. Spivack is the CEO and founder of semantic Web startup Radar Networks and is pushing his view that semantic search will help solve these problems. But anyone frustrated by the sense that it takes longer to find something on Google today than it did even a year ago knows there is some truth to his argument.

internet-user-chart-tiny.png“Keyword search is okay,” he says, “but if the information explosion continues we need something better.” Today, there are about 1.3 billion people on the Web, and more than 100 million active Websites. As more people pile on, the amount of information on the Web keeps growing exponentially to accommodate all those seekers, and they themselves feel compelled to put their own personal and social information onto the Web as well.

At a certain point, with billions and billions of Web pages to sift through, keyword search just won’t cut it anymore. It’s a needle-in-the-haystack problem, with the haystacks just getting bigger and bigger every second.

Spivack explains:

Keyword search engines return haystacks, but what we really are looking for are the needles . The problem with keyword search such as Google’s approach is that only highly cited pages make it into the top results. You get a huge pile of results, but the page you want—the “needle” you are looking for—may not be highly cited by other pages and so it does not appear on the first page. This is because keyword search engines don’t understand your question, they just find pages that match the words in your question.

So how do we get beyond keyword search and Google’s PageRank? There are many approaches being tried: social search, tagging, guided search, natural-language search, statistical methods, open search, semantic search, and (way out there) artificial intelligence. They all have their problems. Tags are too messy and inconsistent. Natural-language requires too much computing power, is difficult to scale, and doesn’t deal with structured data well. Semantic search is perhaps the most promising, but it essentially requires every single Webpage to be re-written.

No comments: