Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreading issues preventing views to go in parallel #586

Open
FrancescAlted opened this issue Jun 10, 2022 · 3 comments
Open

Multithreading issues preventing views to go in parallel #586

FrancescAlted opened this issue Jun 10, 2022 · 3 comments
Assignees
Labels
enhancement New feature or request high-priority
Milestone

Comments

@FrancescAlted
Copy link
Collaborator

FrancescAlted commented Jun 10, 2022

A recent optimization for activating type views to go in parallel (6d2964a) had to be disabled (81a8400) because, even though tests are passing, helgrind is issuing pretty scaring race conditions like:

==213406== Possible data race during read of size 1 at 0x914621F by thread #265
==213406== Locks held: none
==213406==    at 0x74079D: blosclz_decompress (contribs/caterva/contribs/c-blosc2/blosc/blosclz.c:706)
==213406==    by 0x73BC7C: blosc_d (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:1717)
==213406==    by 0x73AC2B: _blosc_getitem (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2897)
==213406==    by 0x73CBE8: blosc2_getitem_ctx (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2975)
==213406==    by 0x73CB07: blosc2_getitem (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2934)
==213406==    by 0x827AF5: get_coffset (contribs/caterva/contribs/c-blosc2/blosc/frame.c:1880)
==213406==    by 0x8270DC: frame_get_lazychunk (contribs/caterva/contribs/c-blosc2/blosc/frame.c:2094)
==213406==    by 0x72E30A: type_view_postfilter (src/iarray_views.c:195)
==213406==    by 0x73C093: blosc_d (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:1610)
==213406==    by 0x73AC2B: _blosc_getitem (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2897)
==213406==    by 0x73CBE8: blosc2_getitem_ctx (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2975)
==213406==    by 0x715D50: prefilter_func (src/iarray_expression.c:436)
==213406==  Address 0x914621f is 175 bytes inside a block of size 181 alloc'd
==213406==    at 0x667F893: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
==213406==    by 0x825466: get_coffsets (contribs/caterva/contribs/c-blosc2/blosc/frame.c:1106)
==213406==    by 0x827ACF: get_coffset (contribs/caterva/contribs/c-blosc2/blosc/frame.c:1873)
==213406==    by 0x8270DC: frame_get_lazychunk (contribs/caterva/contribs/c-blosc2/blosc/frame.c:2094)
==213406==    by 0x72E30A: type_view_postfilter (src/iarray_views.c:195)
==213406==    by 0x73C093: blosc_d (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:1610)
==213406==    by 0x73AC2B: _blosc_getitem (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2897)
==213406==    by 0x73CBE8: blosc2_getitem_ctx (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2975)
==213406==    by 0x715D50: prefilter_func (src/iarray_expression.c:436)
==213406==    by 0x736DE6: pipeline_forward (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:856)
==213406==    by 0x73EC86: blosc_c (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:1008)
==213406==    by 0x73F994: t_blosc_do_job (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:0)
==213406==  Block was alloc'd by thread #263
==213406==
==213406== ----------------------------------------------------------------

(and tons of others)

These should be addressed before we can finally unleash all the performance out of views. So far, we will use them in pure single-thread environments.

@FrancescAlted
Copy link
Collaborator Author

Besides not being able to use views in expressions, it can be seen that activating multithreading (e.g. commenting this line out: https://github.com/inaos/iron-array/blob/develop/src/iarray_views.c#L573), can lead to run conditions in other situations, like simple slicing, as the helgrind tool is showing:

$ valgrind --tool=helgrind ./tests slice_type:3_f_ll_v
<skip>
==1261230== ----------------------------------------------------------------
==1261230==
==1261230== Possible data race during read of size 8 at 0x8745C08 by thread #53
==1261230== Locks held: none
==1261230==    at 0x805834: get_coffsets (contribs/caterva/contribs/c-blosc2/blosc/frame.c:1035)
==1261230==    by 0x807FCF: get_coffset (contribs/caterva/contribs/c-blosc2/blosc/frame.c:1873)
==1261230==    by 0x8075DC: frame_get_lazychunk (contribs/caterva/contribs/c-blosc2/blosc/frame.c:2094)
==1261230==    by 0x704466: slice_view_postfilter (src/iarray_views.c:237)
==1261230==    by 0x711E43: blosc_d (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:1610)
==1261230==    by 0x7109DB: _blosc_getitem (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2899)
==1261230==    by 0x712978: blosc2_getitem_ctx (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2977)
==1261230==    by 0x7041DE: type_view_postfilter (src/iarray_views.c:211)
==1261230==    by 0x711E43: blosc_d (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:1610)
==1261230==    by 0x7157A5: t_blosc_do_job (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:3107)
==1261230==    by 0x712DF8: t_blosc (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:3192)
==1261230==    by 0x5E2DB1A: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
==1261230==
==1261230== This conflicts with a previous write of size 8 by thread #54
==1261230== Locks held: none
==1261230==    at 0x805AD1: get_coffsets (contribs/caterva/contribs/c-blosc2/blosc/frame.c:1123)
==1261230==    by 0x807FCF: get_coffset (contribs/caterva/contribs/c-blosc2/blosc/frame.c:1873)
==1261230==    by 0x8075DC: frame_get_lazychunk (contribs/caterva/contribs/c-blosc2/blosc/frame.c:2094)
==1261230==    by 0x704466: slice_view_postfilter (src/iarray_views.c:237)
==1261230==    by 0x711E43: blosc_d (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:1610)
==1261230==    by 0x7109DB: _blosc_getitem (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2899)
==1261230==    by 0x712978: blosc2_getitem_ctx (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2977)
==1261230==    by 0x7041DE: type_view_postfilter (src/iarray_views.c:211)
==1261230==  Address 0x8745c08 is 24 bytes inside a block of size 64 alloc'd
==1261230==    at 0x5E29E39: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
==1261230==    by 0x803656: frame_new (contribs/caterva/contribs/c-blosc2/blosc/frame.c:44)
==1261230==    by 0x74781E: blosc2_schunk_new (contribs/caterva/contribs/c-blosc2/blosc/schunk.c:175)
==1261230==    by 0x7D474B: caterva_blosc_array_new (contribs/caterva/caterva/caterva.c:195)
==1261230==    by 0x7D4C90: caterva_empty (contribs/caterva/caterva/caterva.c:267)
==1261230==    by 0x7D5D94: caterva_from_buffer (contribs/caterva/caterva/caterva.c:432)
==1261230==    by 0x6D6894: iarray_from_buffer (src/iarray_constructor.c:255)
==1261230==    by 0x6C17C3: execute_iarray_slice_type (tests/test_slice_type.c:62)
==1261230==    by 0x6BFFE7: __ina_test_slice_type_3_f_ll_v_run (tests/test_slice_type.c:225)
==1261230==    by 0x866537: ina_test_run (test.c:689)
==1261230==    by 0x84B30B2: (below main) (libc-start.c:308)
==1261230==  Block was alloc'd by thread #1
<skip>
==1261230== Possible data race during read of size 8 at 0x876E380 by thread #53
==1261230== Locks held: none
==1261230==    at 0x70CA26: read_chunk_header (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:705)
==1261230==    by 0x712924: blosc2_getitem_ctx (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2957)
==1261230==    by 0x712897: blosc2_getitem (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2936)
==1261230==    by 0x807FF5: get_coffset (contribs/caterva/contribs/c-blosc2/blosc/frame.c:1880)
==1261230==    by 0x8075DC: frame_get_lazychunk (contribs/caterva/contribs/c-blosc2/blosc/frame.c:2094)
==1261230==    by 0x704466: slice_view_postfilter (src/iarray_views.c:237)
==1261230==    by 0x711E43: blosc_d (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:1610)
==1261230==    by 0x7109DB: _blosc_getitem (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2899)
==1261230==    by 0x712978: blosc2_getitem_ctx (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2977)
==1261230==    by 0x7041DE: type_view_postfilter (src/iarray_views.c:211)
==1261230==    by 0x711E43: blosc_d (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:1610)
==1261230==    by 0x7157A5: t_blosc_do_job (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:3107)
==1261230==  Address 0x876e380 is 16 bytes inside a block of size 128 alloc'd
==1261230==    at 0x5E27893: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
==1261230==    by 0x805966: get_coffsets (contribs/caterva/contribs/c-blosc2/blosc/frame.c:1106)
==1261230==    by 0x807FCF: get_coffset (contribs/caterva/contribs/c-blosc2/blosc/frame.c:1873)
==1261230==    by 0x8075DC: frame_get_lazychunk (contribs/caterva/contribs/c-blosc2/blosc/frame.c:2094)
==1261230==    by 0x704466: slice_view_postfilter (src/iarray_views.c:237)
==1261230==    by 0x711E43: blosc_d (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:1610)
==1261230==    by 0x7109DB: _blosc_getitem (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2899)
==1261230==    by 0x712978: blosc2_getitem_ctx (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2977)
==1261230==    by 0x7041DE: type_view_postfilter (src/iarray_views.c:211)
==1261230==    by 0x711E43: blosc_d (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:1610)
==1261230==    by 0x7157A5: t_blosc_do_job (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:3107)
==1261230==    by 0x712DF8: t_blosc (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:3192)
==1261230==  Block was alloc'd by thread #54
=
<skip>

Ideally, we should provide a way for being able to call postfilters in parallel without these issues. This can be a major task, but fixing that would be of great benefit to us.

@FrancescAlted
Copy link
Collaborator Author

Even with PR #590 , I can still reproduce the freeze on my M1 MacBook Air (but only in that box!):

$ python -m pytest -v
<snip>
iarray/tests/test_reduce.py::test_red_type_view[test_reduce.iarr-False-sum-shape0-chunks0-blocks0-0-float64-uint64] PASSED  [ 73%]
iarray/tests/test_reduce.py::test_red_type_view[test_reduce.iarr-False-sum-shape1-chunks1-blocks1-axis1-int64-float64] ^C⏎
/Users/faltet/miniconda3/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Although it takes a while to freeze (about 5min), this is reproducible and always freezes in the same place.

@martaiborra
Copy link
Contributor

Since f55390e helgrind does not complain in the main view tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request high-priority
Projects
None yet
Development

No branches or pull requests

2 participants