Sunday 28 December 2014

Limitations of MapReduce

1. MapReduce is great at one-pass computation, but inefficient for multi-pass algorithms.

2. No efficient primitives for data sharing.

  • State between steps goes to distributed file systems.
  • Slow due to replication & disk storage.

3. Most algorithms are much harder toimplement directly in restrictive MapReduce model

4. Performance bottlenecks, or batch not fitting the use cases

For example in PageRank:

  • To repeatedly multiply sparse matrix and vector, it requires repeatedly hashing together page adjacency lists and rank vector.
  • Using cache(), keep neighbor lists in RAM
  • Using partitioning, avoid repeated hashing


Therefore, MapReduce requires asymptotically more communication and I/O.

No comments:

Post a Comment