Optimal chunksizes #32

mrocklin · 2017-02-25T14:56:08Z

In some cases we may wish to rechunk our data prior to execution. This can help to balance between high scheduling overheads (too many tasks) and poor load balancing (too few tasks).

It appears that different algorithms have different optimum sizes. For example algorithms with low task counts like ADMM benefit from smaller chunksizes while algorithms with many small tasks like gradient/proximal descent benefit from larger chunksizes.

stoneyv · 2017-04-06T16:18:47Z

Section 4.2 Cache-aware Access of Tianqi Chen and Carlos Guestrin's XGBoost paper discusses the tradeoff between smaller blocks to larger block sizes. Smaller blocks workloads and inefficient parallelization for each thread. Larger blocks that result in processor cache misses. They do this analysis for two data sets Allstatate 10M and Higgs 10M. Figure 9 plots time per thread versus number of threads for block sizes of 2^12, 2^16, 2^20, and 2^24. There is a significant difference between the performance of the 2^24 block size and the other block sizes.
https://arxiv.org/pdf/1603.02754.pdf

Maybe we could duplicate this experiment for ADMM on those two data sets. They used S3 instead of HDFS.

Also, it is worth looking at Section 4.3 Blocks for Out-of-core computation which discuss block-compression and sharding. You might also look at the Blosc/c-blosc package that hdf5 uses.
https://github.com/Blosc/c-blosc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimal chunksizes #32

Optimal chunksizes #32

mrocklin commented Feb 25, 2017

stoneyv commented Apr 6, 2017 •

edited

Loading

Optimal chunksizes #32

Optimal chunksizes #32

Comments

mrocklin commented Feb 25, 2017

stoneyv commented Apr 6, 2017 • edited Loading

stoneyv commented Apr 6, 2017 •

edited

Loading