MapReduce is the key programming model for data processing in the Hadoop ecosystem. This repository is used to collect the problems applicable by MapReduce.
-
Summarization Patterns
- Word Count
- Inverted Index (demo Tool, ToolRunner)
- Matrix-vector Multiplication (demo MultipleInputs)
- Matrix-matrix Multiplication
-
Filtering Patterns
- Anagram
- Top K
- Sentiment Analysis
-
Organization Patterns
- Partial Sort
- Secondary Sort
-
Join Patterns
-
Metapatterns
- NGramAutocomplete
- Page Rank
- Recommender System