Batching into shared memory is deprecated, but essential for performance #492

kvablack · 2024-06-28T17:37:22Z

I was doing some profiling of my data pipeline and found that the Batch transformation was a severe bottleneck. Here are the critical lines in operations.py:

def stacking_function(*args):
      first_arg = np.asanyarray(args[0])
      shape, dtype = (len(args),) + first_arg.shape, first_arg.dtype
      if not self._use_shared_memory or dtype.hasobject:
        return np.stack(args)
      return np.stack(args, out=SharedMemoryArray(shape, dtype=dtype)).metadata

I found that self._use_shared_memory == True iff you used the deprecated grain.BatchOperation, rather than the "recommended" grain.Batch. And what do you know, switching to grain.BatchOperation gave me a 3x increase in throughput! This matches up with my intuition, because in the self._use_shared_memory == True branch, there is only one copy that goes directly into shared memory. But in the self._use_shared_memory == False branch, the np.stack will induce one copy into private memory, and then the later CopyNumPyArrayToSharedMemory transform performs an explicit second copy into shared memory. It's not too surprising that adding another copy of all of the pipeline's data could slow things down significantly.

Here comes the real problem -- I want to use grain through airio, which doesn't go through the standard DataLoader, but the much more complex lazy_dataset API. In lazy_dataset, batching is done through a different code path that does not have an option to enable this optimization. It always batches into private memory, and then the MultiprocessPrefetchLazyIterDataset does a second copy into shared memory.

I manually added a (slightly hacky) solution that enables batching directly into shared memory iff the batch operation is a parent of a MultiprocessPrefetchLazyIterDataset. Indeed, I saw a significant performance increase when using grain through airio. Is this something that could possibly be upstreamed into grain?

The text was updated successfully, but these errors were encountered:

quanvuong · 2024-07-17T15:27:04Z

+1

Mddct · 2024-07-19T14:56:22Z

+1

mhyatt000 · 2024-08-23T01:05:22Z

+1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batching into shared memory is deprecated, but essential for performance #492

Batching into shared memory is deprecated, but essential for performance #492

kvablack commented Jun 28, 2024

quanvuong commented Jul 17, 2024

Mddct commented Jul 19, 2024

mhyatt000 commented Aug 23, 2024

Batching into shared memory is deprecated, but essential for performance #492

Batching into shared memory is deprecated, but essential for performance #492

Comments

kvablack commented Jun 28, 2024

quanvuong commented Jul 17, 2024

Mddct commented Jul 19, 2024

mhyatt000 commented Aug 23, 2024