MapReduce: A major step backwards

submited by
Style Pass
2022-06-24 01:00:07

These are the web's most talked about URLs on Fri 18th Jan 2008. The current winner is .. Read More

As an MR advocate, I can agree with several of the above points. Certainly, MR development shouldn't ignore previous research, nor should it be constrained by it. MR is directed at a different problem from the modern DBMS.

For example, using MR to rapidly identify small subsets of data is a bad idea. However MR is a good large-data manipulation tool - something for which DBs are notoriously bad. A grid DB's indexing offers no advantage when computing page rank of the internet for example. Indices are pure overhead in that situation.

I am concerned the authors are suggesting that introducing MR into academia is a bad idea since that is where most of the previous literature is well understood. Some of the best improvements to MR lately have been based on distributing reductions ala Monet's continuous near-neighbor load distribution. To say MR doesn't have high level languages/tools/optimizations is short-sighted. Pig, Sawzall, and others functional languages are in development. Additional tools, research, and optimization will follow. Presenting MR as a research topic will enable that growth.

For engineers, the underlying issue is picking the right tool for the job. RDB versus Flat Files versus MQL versus MR smacks of the same "religious" debates between Java versus C++ versus Ruby versus assembly and is generally a waste of effort. A good engineer understands the specific problem space, examines the potential solutions, and picks the right tool for the job.

Leave a Comment