Yahoo on Google lines to challenge

Yahoo! is following Google’s path in order to challenge it. Yahoo! recently came you as big sponsor of open source data mining project – Hadoop.



Hadoop is a software platform lets one easily write and run applications that process vast amounts of data.


Here is a deep analysis of the attempt at replication of Google by Yahoo! by going Open Source project way. Also it says that last year, founder of Hadoop project, Doug Cutting became a Yahoo employee.


The basic technique Hadoop uses is part of what has allowed Google to manage the massive data processing challenges associated with indexing the Web?and do it economically. Google has not released source code for its Google File System or the associated distributed computing environment, known as MapReduce. But what Google has done is publish academic papers on the computer science behind both?presumably knowing full well that competitors and open source programmers would be likely to create their own implementations.


Hadoop includes a version of the distributed file system originally created for Nutch along with a version of MapReduce, both written in Java. As in Google’s MapReduce, the Hadoop version automates the division of computer-intensive tasks into smaller sub-tasks that are assigned to individual computers in a cluster. Each computation is broken into two stages: the “Map,” which produces an intermediate set of results, and the “Reduce” function, usually devoted to sorting and aggregating data to produce a final result. In the context of compiling a search index, the Map phase would involve thousands of computers each assigned the task of indexing a subset of the Web crawl data, and the Reduce phase would be sorting and merging those results into the final index.


hadoop architecture


You may also like to read How Google Works over same place.

Leave a Reply

Your email address will not be published. Required fields are marked *