Keyword is broken?

April 28th, 2008 by Andrew Jenks

This week the world of Web 2.0 converged on the Palace Hotel in San Francisco. Most of the hype surrounding the new web is about social technologies and the way we interact with each other using technology. This got me thinking about how it applies to E-Discovery. The outward appearance may look different, but the underlying element of the new web frontier is still search. In E-Discovery we are well aware of the issues of general keyword search, like finding a needle in a haystack. However, I’d like to repost some thoughtful comments from Nova Spiviack, the Founder and CEO of Radar Networks, from his presentation at the Next Web Conference in Amsterdam (Nova and Discovery Mining’s CEO Matthew Work used to work together in a previous life).

Here’s what Nova had to say about next generation searching on large datasets:

Keyword search engines return haystacks, but what we really are looking for are the needles . The problem with keyword search such as Google’s approach is that only highly cited pages make it into the top results. You get a huge pile of results, but the page you want—the “needle” you are looking for—may not be highly cited by other pages and so it does not appear on the first page. This is because keyword search engines don’t understand your question, they just find pages that match the words in your question.

Sound familiar? Nova’s assertion is that keyword search on big data sets is broken. How do you find what you’re looking for in all those haystacks? I think this is where the E-Discovery market is already out in front of commercial web. At Discovery Mining we’re constantly working on ways to promote the ‘needles in the haystacks’ of search results. Using LSA (Latent Semantic Analysis) and other identifiers, we’ve been successful at moving relevant information to the forefront view for our clients. As data sets get larger even these technologies can’t keep up. However, you can put pieces together to review smarter and make the relevant information “pop out” at you.

What does this mean? Firstly, congratulations to us (E-Discovery market), for understanding that this type of technology is what is necessary to sort through the mountain of data typically involved in litigation. Secondly, we’re still nowhere near the “holy grail” of search. There needs to be an underlying platform to advance beyond what is out there now. As an industry we’ve only just started to see the monster cases that will be commonplace in a year or two, and the platforms are just now achieving acceptance. The tools are fine, for now, but we should be out there innovating beyond the keyword search.

Over at TechCrunch, Nova was interviewed about the Semantic Web and its implications. Radar Network’s product Twine is crazy but cool. I’m in the beta pool checking it out. The full article about the Semantic Web is here at TechCrunch. Take a look at the embedded PPT from Nova and you’ll see that what we’re using today won’t scale so we need to stay on our toes and keep working.

Posted in General


Leave a Comment

You must be logged in to post a comment.