2. No efficient primitives for data sharing.
- State between steps goes to distributed file systems.
- Slow due to replication & disk storage.
3. Most algorithms are much harder toimplement directly in restrictive MapReduce model
4. Performance bottlenecks, or batch not fitting the use cases
For example in PageRank:
- To repeatedly multiply sparse matrix and vector, it requires repeatedly hashing together page adjacency lists and rank vector.
- Using cache(), keep neighbor lists in RAM
- Using partitioning, avoid repeated hashing
Therefore, MapReduce requires asymptotically more communication and I/O.
No comments:
Post a Comment