You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TLDR: I'm working with robotics datasets. They are expressed a nested dataset. Grain calls repr on the inner dataset which slows down the loops significantly.
I think grain has serious benefits over tfds for nested datasets like this. Currently, rlds is the best way of handling these types of datasets. It somewhat abuses tfds dataset manipulation techniques since it doesn't have a better of handling this. I think grain would be able to simplify a lot of these workflows and allow for much more complex things to be created, i.e stitching multiple robot episodes together in unique ways that aren't easily expressible in tfds or rlds.
Fixing the above problem would be hugely beneficial and doesn't feel like it's super necessary.
Simple example to showcase the problem:
builder=tfds.builder_from_directory(builder_dir=dataset_path)
episode_data_source=builder.as_data_source("train", deserialize_method=tfds.core.decode.DeserializeMethod.DESERIALIZE_AND_DECODE)
episode_index_sampler=grain.IndexSampler(
num_records=2,
num_epochs=1,
shard_options=grain.ShardOptions(shard_index=0, shard_count=1, drop_remainder=True),
shuffle=True,
seed=0
)
steps_index_sampler=grain.IndexSampler(
num_records=2,
num_epochs=1,
shard_options=grain.ShardOptions(shard_index=0, shard_count=1, drop_remainder=True),
shuffle=True,
seed=0
)
importpyinstrumentprofiler=pyinstrument.Profiler()
profiler.start()
episode_data_loader=grain.DataLoader(data_source=episode_data_source, sampler=episode_index_sampler)
forepisode_datainepisode_data_loader:
steps_data_source=episode_data[rlds.STEPS]
steps_data_loader=grain.DataLoader(data_source=steps_data_source, sampler=steps_index_sampler)
forsteps_datainsteps_data_loader:
passprofiler.stop()
# Save the flamegraph to an HTML filewithopen('flamegraph.html', 'w') asf:
f.write(profiler.output_html())
Specific slowdowns happen when creating the state/validating the state.
The text was updated successfully, but these errors were encountered:
TLDR: I'm working with robotics datasets. They are expressed a nested dataset. Grain calls repr on the inner dataset which slows down the loops significantly.
I think grain has serious benefits over tfds for nested datasets like this. Currently, rlds is the best way of handling these types of datasets. It somewhat abuses tfds dataset manipulation techniques since it doesn't have a better of handling this. I think grain would be able to simplify a lot of these workflows and allow for much more complex things to be created, i.e stitching multiple robot episodes together in unique ways that aren't easily expressible in tfds or rlds.
Fixing the above problem would be hugely beneficial and doesn't feel like it's super necessary.
Simple example to showcase the problem:
Specific slowdowns happen when creating the state/validating the state.
The text was updated successfully, but these errors were encountered: