Alvin's Big Data Notebook : Limitations of MapReduce

Sunday, 28 December 2014

Limitations of MapReduce

1. MapReduce is great at one-pass computation, but inefficient for multi-pass algorithms.

2. No efficient primitives for data sharing.

State between steps goes to distributed file systems.
Slow due to replication & disk storage.

3. Most algorithms are much harder toimplement directly in restrictive MapReduce model

4. Performance bottlenecks, or batch not fitting the use cases

For example in PageRank:

To repeatedly multiply sparse matrix and vector, it requires repeatedly hashing together page adjacency lists and rank vector.
Using cache(), keep neighbor lists in RAM
Using partitioning, avoid repeated hashing

Therefore, MapReduce requires asymptotically more communication and I/O.

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)