<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.2" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>Electronic Discovery Perspective</title>
	<link>http://www.discoverymining.com/dmblog</link>
	<description>e-discovery from our perspective</description>
	<pubDate>Fri, 02 May 2008 23:27:16 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2</generator>
	<language>en</language>
			<item>
		<title>Happy 6th Birthday Discovery Mining!</title>
		<link>http://www.discoverymining.com/dmblog/index.php/2008/05/happy-6th-birthday-discovery-mining/</link>
		<comments>http://www.discoverymining.com/dmblog/index.php/2008/05/happy-6th-birthday-discovery-mining/#comments</comments>
		<pubDate>Fri, 02 May 2008 23:27:16 +0000</pubDate>
		<dc:creator>Andrew Jenks</dc:creator>
		
		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.discoverymining.com/dmblog/index.php/2008/05/happy-6th-birthday-discovery-mining/</guid>
		<description><![CDATA[
Yesterday was Discovery Mining&#8217;s 6 year anniversary. This is a big milestone for any company. Over the last 6 years we&#8217;ve seen the E-Discovery industry grow quickly, become a &#8220;hot&#8221; market, and show signs of maturing. When we started Discovery Mining, we had no idea the ride we were in for. Even today, the evolution [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://farm4.static.flickr.com/3271/2460550752_f3401822b3.jpg" height="316" width="500" /></p>
<p>Yesterday was Discovery Mining&#8217;s 6 year anniversary. This is a big milestone for any company. Over the last 6 years we&#8217;ve seen the E-Discovery industry grow quickly, become a &#8220;hot&#8221; market, and show signs of maturing. When we started Discovery Mining, we had no idea the ride we were in for. Even today, the evolution and sophistication of our client is moving at such a rate it feels like we&#8217;re sprinting a marathon.</p>
<p>We could not have done it without the great people that make Discovery Mining what it is. This is a tough market, but the faces above are just a few of the people who try to make a difference for our clients everyday.</p>
<p>Pictured above is the staff from the Discovery Mining San Francisco Headquarters. Not yet added are our teams from the Discovery Mining New York, London, Chicago and Washington DC offices.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.discoverymining.com/dmblog/index.php/2008/05/happy-6th-birthday-discovery-mining/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Keyword is broken?</title>
		<link>http://www.discoverymining.com/dmblog/index.php/2008/04/keyword-is-broken/</link>
		<comments>http://www.discoverymining.com/dmblog/index.php/2008/04/keyword-is-broken/#comments</comments>
		<pubDate>Mon, 28 Apr 2008 19:14:35 +0000</pubDate>
		<dc:creator>Andrew Jenks</dc:creator>
		
		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.discoverymining.com/dmblog/index.php/2008/04/keyword-is-broken/</guid>
		<description><![CDATA[This week the world of Web 2.0 converged on the Palace Hotel in San Francisco. Most of the hype surrounding the new web is about social technologies and the way we interact with each other using technology. This got me thinking about how it applies to E-Discovery. The outward appearance may look different, but the [...]]]></description>
			<content:encoded><![CDATA[<p>This week the world of <a href="http://www.web2summit.com/" target="_blank">Web 2.0</a> converged on the Palace Hotel in <st1:city w:st="on"><st1:place w:st="on">San Francisco</st1:place></st1:city>. Most of the hype surrounding the new web is about social technologies and the way we interact with each other using technology. This got me thinking about how it applies to E-Discovery. The outward appearance may look different, but the underlying element of the new web frontier is still search. In E-Discovery we are well aware of the issues of general keyword search, like finding a needle in a haystack. However, I&#8217;d like to repost some thoughtful comments from <a href="http://novaspivack.typepad.com/" target="_blank">Nova Spiviack,</a> the Founder and CEO of <a href="http://www.radarnetworks.com/" target="_blank">Radar Networks,</a> from his presentation at the Next Web Conference in <st1:city w:st="on"><st1:place w:st="on">Amsterdam</st1:place></st1:city> (Nova and Discovery Mining&#8217;s CEO Matthew Work used to work together in a previous life).<o:p></o:p></p>
<p>Here&#8217;s what Nova had to say about next generation searching on large datasets:<o:p></o:p></p>
<p><em>Keyword search engines return haystacks, but what we really are looking for are the needles . The problem with keyword search such as Google’s approach is that only highly cited pages make it into the top results. You get a huge pile of results, but the page you want—the “needle” you are looking for—may not be highly cited by other pages and so it does not appear on the first page. This is because keyword search engines don’t understand your question, they just find pages that match the words in your question.<o:p></o:p></em></p>
<p> <a href="http://www.discoverymining.com/dmblog/index.php/2008/04/keyword-is-broken/#more-23" class="more-link">(more&#8230;)</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.discoverymining.com/dmblog/index.php/2008/04/keyword-is-broken/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Do you have an Erdős number?</title>
		<link>http://www.discoverymining.com/dmblog/index.php/2008/04/do-you-have-an-erdos-number/</link>
		<comments>http://www.discoverymining.com/dmblog/index.php/2008/04/do-you-have-an-erdos-number/#comments</comments>
		<pubDate>Mon, 21 Apr 2008 18:47:55 +0000</pubDate>
		<dc:creator>Andrew Jenks</dc:creator>
		
		<category><![CDATA[Geeky]]></category>

		<category><![CDATA[data]]></category>

		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.discoverymining.com/dmblog/index.php/2008/04/do-you-have-an-erdos-number/</guid>
		<description><![CDATA[Analytics and social links are becoming very popular in e-discovery. Certain tools claim to be able to provide you with the number of links between two people in a document collection. Does this help? Can you find what you&#8217;re looking for faster? Maybe, but I don’t believe it is the magic bullet that will make [...]]]></description>
			<content:encoded><![CDATA[<p>Analytics and social links are becoming very popular in e-discovery. Certain tools claim to be able to provide you with the number of links between two people in a document collection. Does this help? Can you find what you&#8217;re looking for faster? Maybe, but I don’t believe it is the magic bullet that will make other methods obsolete.</p>
<p>You may have heard of the 6 degrees of Kevin Bacon, but the real game started with a Hungarian Mathematician named <a href="http://en.wikipedia.org/wiki/Paul_Erd%C5%91s" target="_blank">Paul Erdős</a>. Colleagues of Erdős referred to an <a href="http://en.wikipedia.org/wiki/Erd%C5%91s_number" target="_blank">Erdős Number</a> to describe the &#8220;collaborative distance” between an author of a mathematical paper and Erdős himself. This study has lead to some very interesting models of graph theory, namely &#8220;social connectedness&#8221;. Within infinite communities, even the entire web, you will find that individuals are closely connected. The average Erdős number for any self respecting Mathematician is 4.65, which means that a majority of published Math authors are within 5 degrees of Erdős. I once had a Professor with an Erdős number of 2 and even Bill Gates has an Erdős number of 4. <span> </span>What does this mean from an e-discovery perspective? Well simply that clustering around people is an interesting concept, however in homogeneous document collections you&#8217;ll most likely find that everyone is closely connected to everyone else regardless of significance.<o:p></o:p></p>
<p>Think about your company, or firm&#8217;s, email. I bet you have a very close degree of distance between you and somebody who may be in a different office altogether. While I believe that using social networking features in e-discovery is a step in the right direction, based on the connectedness of any organization, we may just be adding a neat &#8220;wiz-bang&#8221; graphical feature that does not really tell you something you don&#8217;t already know. I believe that there are applications of social networks in e-discovery, but telling me that Person X is connected to Person Y through Person Z doesn&#8217;t give me anything. I could have looked at the org chart and determined the same thing without spending a ton of time and money. I think this is going to be an important feature set in the future, but for now I think I&#8217;ll pass.<o:p></o:p></p>
]]></content:encoded>
			<wfw:commentRss>http://www.discoverymining.com/dmblog/index.php/2008/04/do-you-have-an-erdos-number/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Data Portability Continued&#8230;</title>
		<link>http://www.discoverymining.com/dmblog/index.php/2008/04/data-portability-continued/</link>
		<comments>http://www.discoverymining.com/dmblog/index.php/2008/04/data-portability-continued/#comments</comments>
		<pubDate>Mon, 14 Apr 2008 18:30:05 +0000</pubDate>
		<dc:creator>Andrew Jenks</dc:creator>
		
		<category><![CDATA[standards]]></category>

		<category><![CDATA[Technology]]></category>

		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.discoverymining.com/dmblog/index.php/2008/04/data-portability-continued/</guid>
		<description><![CDATA[In Sedona last week we were discussing QC and sampling mostly. However, a short sidebar happened surrounding the ability to get data from one system to another.  As I mentioned in my previous post this is becoming more and more important to our clients, which means we need to be moving toward an accepted [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://www.thesedonaconference.org/" target="_blank">Sedona</a> last week we were discussing QC and sampling mostly. However, a short sidebar happened surrounding the ability to get data from one system to another.  As I mentioned in my previous post this is becoming more and more important to our clients, which means we need to be moving toward an accepted standard, whatever that may be.  We not only need to transfer data between vendors, but also across upgrades.</p>
<p>This week at Discovery Mining we&#8217;ve heard from more than one client about the problems with data portability within internal software applications.  Specifically upgrading from Concordance 7 to 8 (a.k.a. 2007).  I&#8217;m not a Concordance expert, nor do I pretend to be, but this example screams for a universal standard.  The specific issue our clients were experiencing was the inability to view some new databases that were created in the new version with the old version 7.  Because databases converted to version 8 cannot be viewed by 7.3 backward, the client is left with a big headache, to say the least.   The upgrade process is time consuming and expensive.  I&#8217;m sure there&#8217;s a workaround, but for most cases workarounds are not adequate.</p>
<p>Having a data portability plan and standard will help with a situation like the one described above.  I&#8217;m looking forward to continuing the efforts on this front, and trying to stay technology agnostic, but as Google just <a href="http://code.google.com/appengine/" target="_blank">opened up their platform</a> maybe litigation support can follow the leader. Google has opened up the entire platform and lets you mashup almost every service using a series of open web api&#8217;s.  Yes, you need to know Python, but hey, it&#8217;s a start.  In 30 minutes I was able to build a silly little test app <a href="http://ajtest123.appspot.com/" target="_blank">here</a>.  My app doesn&#8217;t do much but as you can see it&#8217;s integrated into the login/users data of Google.  Amazon is the leader in this area with their <a href="http://aws.amazon.com" target="_blank">AWS</a>, but this is not about web services.</p>
<p>As the larger data intensive companies know, being open to standards ultimately helps everyone.  I think we should all consider the efforts of the EDRM XML group and petition The Sedona Conference to weigh in and influence the bench, thereby, helping and  influencing the industry.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.discoverymining.com/dmblog/index.php/2008/04/data-portability-continued/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Are we promoting &#8220;walled gardens&#8221;?</title>
		<link>http://www.discoverymining.com/dmblog/index.php/2008/03/are-we-promoting-walled-gardens/</link>
		<comments>http://www.discoverymining.com/dmblog/index.php/2008/03/are-we-promoting-walled-gardens/#comments</comments>
		<pubDate>Mon, 31 Mar 2008 18:06:24 +0000</pubDate>
		<dc:creator>Andrew Jenks</dc:creator>
		
		<category><![CDATA[data]]></category>

		<category><![CDATA[standards]]></category>

		<guid isPermaLink="false">http://www.discoverymining.com/dmblog/index.php/2008/03/are-we-promoting-walled-gardens/</guid>
		<description><![CDATA[On a flight back from L.A. last week I was reading the Economist, which I try to do regularly. In the March 22nd issue there was mention of social networks.  The substance of two articles, &#8220;Everywhere and Nowhere&#8221; and &#8220;Break Down These Walls&#8220;, covered the rise of the &#8216;walled gardens&#8217; we once had with [...]]]></description>
			<content:encoded><![CDATA[<p>On a flight back from L.A. last week I was reading the <a href="http://www.economist.com">Economist</a>, which I try to do regularly. In the March 22nd issue there was mention of social networks.  The substance of two articles, &#8220;<a href="http://www.economist.com/business/displaystory.cfm?story_id=10880936">Everywhere and Nowhere&#8221;</a> and <a href="http://www.economist.com/opinion/displaystory.cfm?story_id=10880516">&#8220;Break Down These</a><a href="http://www.economist.com/opinion/displaystory.cfm?story_id=10880516"> Walls</a>&#8220;, covered the rise of the &#8216;walled gardens&#8217; we once had with AOL, Prodigy, and CompuServe.  As the Internet matured and opened up, the question became why would anyone live in just AOL, a proprietary platform, when there was a whole WWW out there to explore? Over the years the world realized, mostly through the launch of Netscape, that open communication standards and data portability are long term winners.</p>
<p>So when I read <a href="http://www.law.com/jsp/legaltechnology/pubArticleLTN.jsp?id=1206357952124&amp;rss=ltn">Craig Ball&#8217;s article</a> from Tuesday it got me thinking&#8211;are we propagating the mentality of web circa 1994?  I would say Craig&#8217;s skepticism is expected.  When I first learned about the EDRM XML project, I too had reservations. However, we do need something to help the data portability problem.   But is this the right way to handle the portability issue? I have no idea and I don&#8217;t think this is actually the central issue.  What is at stake is the &#8216;openness or lack of openness&#8217; of our clients&#8217; data to move from one system to another, and this is not a trivial issue. It can involve tremendous cost&#8211;not only price, but also time.</p>
<p>When a market matures standards tend to emerge. This pushes competition more towards increasing the value of their product offering, which is a good thing.  This is exactly where the eDiscovery market is&#8211;ready for standards.  We need to introduce a standard to the market; the first standard out may not be perfect, but it will push vendors to compete on value adding initiatives.  We hear it all the time&#8230; &#8220;I&#8217;m having such a hard time with vendor x, but there&#8217;s nothing I can do.  My data is being held hostage.&#8221; We do a big dis-service to our clients if we don&#8217;t offer a standard in terms of data portability.</p>
<p>As the social networks are proving, there is value in data portability.  eDiscovery vendors need to conform to some basic standards so that we can then compete on our merits, and keep turning up the heat on innovation.  Is XML the way?  I&#8217;m not certain, but I&#8217;m willing to give it a try.  After we investigate one area maybe we&#8217;ll uncover an even better approach&#8211;why not open up the web platforms to an API which allows anyone to write some code or move the data?  Most of the work on the XML schema is working with text files.  Any self respecting Perl programmer could quickly design and test to see how it works.</p>
<p>Implementing standards takes time and before one &#8217;sticks&#8217;, there will be a few that fall down. I say let&#8217;s try.  What do we have to lose? What we have to gain is increased innovation, and that&#8217;s good for everyone.</p>
<p>I&#8217;m sure we&#8217;ll be discussing this at Sedona, check me out on <a href="http://twitter.com/ajenks" title="tweet" target="_blank">twitter</a> later this week from the meeting.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.discoverymining.com/dmblog/index.php/2008/03/are-we-promoting-walled-gardens/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Pre-Discovery</title>
		<link>http://www.discoverymining.com/dmblog/index.php/2008/03/pre-discovery/</link>
		<comments>http://www.discoverymining.com/dmblog/index.php/2008/03/pre-discovery/#comments</comments>
		<pubDate>Tue, 25 Mar 2008 18:20:06 +0000</pubDate>
		<dc:creator>Matthew Work</dc:creator>
		
		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.discoverymining.com/dmblog/index.php/2008/03/pre-discovery/</guid>
		<description><![CDATA[


Looking back at LegalTech New   York , two things impressed me:

Every      company seemed to be touting E-Discovery capabilities. The market noise,      instead of reducing as the market matures, seems to be increasing. This is harmful to the customer who is trying to make [...]]]></description>
			<content:encoded><![CDATA[<p><br clear="all" /><br />
<a href="http://www.discoverymining.com/about/management.html"><img src="http://farm4.static.flickr.com/3255/2367642108_d148417740_t.jpg" alt="Matt Work, CEO" align="left" /></a><br />
<br clear="all" /><br />
Looking back at LegalTech <st1:place w:st="on"><st1:state w:st="on">New   York</st1:state></st1:place> , two things impressed me:</p>
<ol>
<li>Every      company seemed to be touting E-Discovery capabilities. The market noise,      instead of reducing as the market matures, seems to be increasing. This is harmful to the customer who is trying to make an informed decision for an important matter. We vendors need to facilitate apples-to-apples comparisons, not obfuscate them.
<ol></ol>
</li>
<li>The      EDRM model (which is a little like saying ATM machine) graphic has been widely picked up by the vendor and analyst community and used to start to sort out the noise and claims of the vendors.</li>
</ol>
<ol style="margin-top: 0in" start="1" type="1">
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I support the use of the EDRM graphic.</p>
<p class="MsoNormal"><a href="http://www.edrm.net"><img src="http://farm3.static.flickr.com/2154/2345542855_2616777ace.jpg" height="250" width="500" /></a></p>
<p> <a href="http://www.discoverymining.com/dmblog/index.php/2008/03/pre-discovery/#more-19" class="more-link">(more&#8230;)</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.discoverymining.com/dmblog/index.php/2008/03/pre-discovery/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Filter overload&#8230; stop already</title>
		<link>http://www.discoverymining.com/dmblog/index.php/2008/03/filter-overload-stop-already/</link>
		<comments>http://www.discoverymining.com/dmblog/index.php/2008/03/filter-overload-stop-already/#comments</comments>
		<pubDate>Mon, 17 Mar 2008 21:31:11 +0000</pubDate>
		<dc:creator>Andrew Jenks</dc:creator>
		
		<category><![CDATA[Technology]]></category>

		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.discoverymining.com/dmblog/index.php/2008/03/filter-overload-stop-already/</guid>
		<description><![CDATA[I&#8217;ve got it. Do you?
Filter overload, in many instances, is the problem we&#8217;re trying to solve. I don&#8217;t think this is a new problem, but are the methods that vendors are using today the best methods?
I have a huge number of documents scattered across my network and in my email, and I can&#8217;t forget my [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve got it. Do you?</p>
<p>Filter overload, in many instances, is the problem we&#8217;re trying to solve. I don&#8217;t think this is a new problem, but are the methods that vendors are using today the best methods?</p>
<p>I have a huge number of documents scattered across my network and in my email, and I can&#8217;t forget my phone and laptop. Today&#8217;s E-Discovery companies are out there trying to get large pools of necessary documents whittled down to the smallest document set for review. If you&#8217;re using a technology that does &#8220;smart&#8221; collection prior to putting that data online, you&#8217;re getting rid of &#8220;stuff&#8221; but probably are still sitting with a very large document set that you have to review.</p>
<p>I think &#8220;smart&#8221; collection is really only putting a bandage on the wound and not truly fixing the problem. Filters or no filters, somebody still has to read an exponentially growing mound of electronic data. So say you&#8217;ve gone from 10TB to 500GB. 500GB is still a mountain of information for any team of reviewers. This is where technology can save the day.</p>
<p>I say process everything. Who cares if it&#8217;s junk. If you don&#8217;t want to see it, hide it. Put it out of the way and don&#8217;t worry about it. Why? Because the larger your data set the better the algorithms can be at identifying the junk and separating it from the &#8220;real data&#8221;. Think about it. Don&#8217;t filter, then process, then search. Instead, use the technology to do your filtering once you&#8217;ve got the entirety of the document set in one location. This will give you the ability to use technology to bring the most interesting, important documents to the front, and not miss anything. People aren&#8217;t reading any faster, but they should have a way to read the most important things first and build a review strategy from there.</p>
<p>Discovery Mining is working from the premise that getting the most relevant documents in front of the people who need to evaluate them first is the best way to conduct a review. Take Gmail as an example…getting the most important information to surface first using &#8220;search&#8221; not &#8220;delete&#8221; to get there.</p>
<p>Most people say they filter first because it&#8217;s too expensive not to. But if you don’t have all the data, you don’t have the entire context. What if we took cost out of the equation? We could process everything, host everything and then have “smart” technology identify the most relevant clusters or topics in a collection.</p>
<p>The next time you are thinking about your review strategy, think about Gmail&#8217;s motto…&#8221;don&#8217;t folder search&#8221;. FINDING what you’re looking for does not translate to DISCARDING everything that you think may not be what you&#8217;re looking for. If you do that, who knows, you might miss the “smoking gun” email.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.discoverymining.com/dmblog/index.php/2008/03/filter-overload-stop-already/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Why we build it here</title>
		<link>http://www.discoverymining.com/dmblog/index.php/2008/02/why-we-build-it-here/</link>
		<comments>http://www.discoverymining.com/dmblog/index.php/2008/02/why-we-build-it-here/#comments</comments>
		<pubDate>Sat, 23 Feb 2008 01:04:57 +0000</pubDate>
		<dc:creator>Andrew Jenks</dc:creator>
		
		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.discoverymining.com/dmblog/index.php/2008/02/why-we-build-it-here/</guid>
		<description><![CDATA[I often get &#8216;ownership&#8217; questions about our data processing software and specifically questions such as, &#8220;Who do you use for the back-end of your review tool?&#8221; To which I reply, &#8220;It&#8217;s all homegrown&#8221;. This is something that we take for granted at Discovery Mining because we are a technology company that believes in developing our [...]]]></description>
			<content:encoded><![CDATA[<p>I often get &#8216;ownership&#8217; questions about our data processing software and specifically questions such as, &#8220;Who do you use for the back-end of your review tool?&#8221; To which I reply, &#8220;It&#8217;s all homegrown&#8221;. This is something that we take for granted at Discovery Mining because we are a technology company that believes in developing our system fully in-house. Our thought is, if you want to add value, build it yourself and make a better mousetrap.</p>
<p>&#8216;Homegrown&#8217; software is actually an important distinction in the E-Discovery market.  There are vendors buying licenses to pieces of code and &#8220;setting up shop&#8221;. But I ask, &#8220;Where&#8217;s the &#8216;value-add&#8217; people?&#8221; Don&#8217;t get me wrong, there&#8217;s great software out there that can get you most of the way there, but it can only take you so far.  The question becomes, &#8220;What happens when you need to scale or there&#8217;s a bug?&#8221;  I for one wouldn&#8217;t want to be left waiting for feature updates that I really need, only to watch three other vendors release those features before my vendor can get to my request.</p>
<p>By Discovery Mining developing everything in-house, we are nimble and can offer our technical expertise to customize more frequently, and in ways that licensed software cannot.  Getting the product right for the client quickly and frequently is an notable distinction that only a few top-tier vendors in the market can accomplish.  When looking for how vendors are unique and different in this noisy E-Discovery space, be sure to investigate how much of the vendor application / process is based on software developed in-house.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.discoverymining.com/dmblog/index.php/2008/02/why-we-build-it-here/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Near Dupes goes Open Kimono</title>
		<link>http://www.discoverymining.com/dmblog/index.php/2007/10/near-dupes-goes-open-kimono/</link>
		<comments>http://www.discoverymining.com/dmblog/index.php/2007/10/near-dupes-goes-open-kimono/#comments</comments>
		<pubDate>Tue, 09 Oct 2007 00:32:25 +0000</pubDate>
		<dc:creator>Andrew Jenks</dc:creator>
		
		<category><![CDATA[De-dupe]]></category>

		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.discoverymining.com/dmblog/index.php/2007/10/near-dupes-goes-open-kimono/</guid>
		<description><![CDATA[

I think everyone can agree that when it comes to conceptual search and the &#8220;near-dupe&#8221; functionality, it&#8217;s been pretty much a black box. Everyone says &#8220;trust our algorithm&#8221;, everyone else does. At times this can be a bit like a leap of faith and like the Jedi mind trick &#8220;these are not the droids you&#8217;re [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.discoverymining.com/dmblog/wp-content/uploads/2007/10/view_diffs_ss.jpg" title="View Diff Screen Shot"><img src="http://www.discoverymining.com/dmblog/wp-content/uploads/2007/10/view_diffs_ss.thumbnail.jpg" alt="View Diff Screen Shot" /></a><br />
<br clear="all" /><br />
I think everyone can agree that when it comes to conceptual search and the &#8220;near-dupe&#8221; functionality, it&#8217;s been pretty much a black box. Everyone says &#8220;trust our algorithm&#8221;, everyone else does. At times this can be a bit like a leap of faith and like the Jedi mind trick &#8220;these are not the droids you&#8217;re looking for&#8221;.</p>
<p>In our weekly release, we&#8217;re adding something that we like to call View Diff. What is View Diff, you may ask. Well, it&#8217;s basically a comparison of two documents that you&#8217;re looking at side by side. The exciting feature here is that not only are the documents side by side, but each are color highlighted to show you where the differences are. Not only will this give a quick glance to see if it&#8217;s worth exploring, but it will also allow you to see the near-dupe technology in action.</p>
<p>Giving transparency like this will not only allow you to see the algorithm, it will give you the decision power to decide if it is really a near-dupe. As features that uncover &#8216;black box&#8217; functionality roll out, clients will in turn have more trust in the technology and greater acceptance of Automated Discovery.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.discoverymining.com/dmblog/index.php/2007/10/near-dupes-goes-open-kimono/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Review Velocity</title>
		<link>http://www.discoverymining.com/dmblog/index.php/2007/10/review-velocity/</link>
		<comments>http://www.discoverymining.com/dmblog/index.php/2007/10/review-velocity/#comments</comments>
		<pubDate>Thu, 04 Oct 2007 22:25:54 +0000</pubDate>
		<dc:creator>Andrew Jenks</dc:creator>
		
		<category><![CDATA[Law Firm]]></category>

		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.discoverymining.com/dmblog/index.php/2007/10/review-velocity/</guid>
		<description><![CDATA[


This week we had quite a fire drill over at DM&#8212;important client, tight turnaround, and a small to medium data volume.   Jobs like this happen all day every day in our market, because the technology allows it to happen.  When all was said and done DM processed  about 30 GB, before [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://farm3.static.flickr.com/2329/2294581732_b86d43a64c.jpg"><img src="http://farm3.static.flickr.com/2188/2294542510_650d3e831c_t.jpg" height="80" width="100" /></a><a href="http://farm3.static.flickr.com/2188/2294542510_650d3e831c.jpg"><br />
</a><br />
<br clear="all" /><br />
This week we had quite a fire drill over at DM&#8212;important client, tight turnaround, and a small to medium data volume.   Jobs like this happen all day every day in our market, because the technology allows it to happen.  When all was said and done DM processed  about 30 GB, before filtering and posted roughly 150,000 documents to the site.  A relative average page volume on this collection, just over 10 pages per doc or 2 million pages to review.  The fire drill part was the condensed time frame, data arriving Friday, Saturday, and Sunday with a production deadline of MONDAY!</p>
<p>Just a few years ago this type of insanity wasn&#8217;t even possible. Today a vendor has the ability to make it happen.  How did we get here in such a short time?  Mostly it&#8217;s the technology and tools that are available in the market, which allow crazy deadlines to be met.  I&#8217;ve also noticed a comfort level, from our clients, in trusting the technology.  People are not able to read faster today; instead they are able to pinpoint with precision what needs to be read.  So a 2 million page collection can be tackled by a few attorneys in just a weekend.</p>
<p>I would say that things like concept search and its derivative, near-dupes, played  a huge role in getting through this collection.  Giving the tools to the end user with the ability to make one click and 50 doc decisions, builds trust and transparency around some of these tools.  Letting someone click on a &#8220;diff&#8221; to see where the near-dupes diverge, gives that user the confidence in the algorithm making the decision for them.  This is where the black box solution loses the client.</p>
<p>This is just one example of how far the industry has come in such a short time.  It&#8217;s incredible the amount of AI and research going into our market. I&#8217;m hoping we can continue to defy Moore&#8217;s Law on our ability to innovate.</p>
<p align="left"><a href="http://farm3.static.flickr.com/2188/2294542510_28e303a876_o.jpg" target="_blank"><br />
</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.discoverymining.com/dmblog/index.php/2007/10/review-velocity/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
