Skip to content

drangons/Spark_MR_design_patterns

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Spark_MR_design_patterns

Implementation of MapReduce patterns in Spark Pyspark

Summarization pattern

  • Min, max and count

Filter pattern

  • Bloom filter
  • Top 10
  • Distinct

Data organization pattern

  • structured to hirerachical
  • Partitioning
  • Binning
  • Shuffling

Join pattern

  • Map-side join
  • Reduce-side join
  • Replicated join
  • composite join
  • Cartesian join

Dataset: cs stackexcange dataset

Reference: MapReduce Design Patterns, Building Effective Algorithms and Analytics for Hadoop and Other Systems By Donald Miner, Adam Shook

About

Implementation of MapReduce patterns in Spark Pyspark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages