Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The centralized nightlies job failed on Thursday (2024-10-24) #25

Closed
github-actions bot opened this issue Oct 25, 2024 · 27 comments
Closed

The centralized nightlies job failed on Thursday (2024-10-24) #25

github-actions bot opened this issue Oct 25, 2024 · 27 comments
Assignees
Labels
nightly-failure The scheduled nightly builds failed

Comments

@github-actions
Copy link

The centralized nightlies job failed on Thursday (2024-10-24) in run 11510728502

@github-actions github-actions bot added the nightly-failure The scheduled nightly builds failed label Oct 25, 2024
@jdblischak
Copy link
Collaborator

Both TileDB-SOMA-Py and TileDB-SOMA-R failed many tests.

@johnkerl are these known issues?

TileDB-SOMA-Py failed 235 tests (have to view the raw logs to see the test ouptut):

ERROR apis/python/tests/test_metadata.py::test_metadata_marshalling_OK[DenseNDArray--3.1415] - ValueError: SOMADenseNDArray shape must be a non-zero-length tuple of positive ints
ERROR apis/python/tests/test_metadata.py::test_metadata_marshalling_OK[DenseNDArray-] - ValueError: SOMADenseNDArray shape must be a non-zero-length tuple of positive ints
ERROR apis/python/tests/test_metadata.py::test_metadata_marshalling_OK[DenseNDArray-a string] - ValueError: SOMADenseNDArray shape must be a non-zero-length tuple of positive ints
ERROR apis/python/tests/test_metadata.py::test_metadata_marshalling_OK[DenseNDArray-nan] - ValueError: SOMADenseNDArray shape must be a non-zero-length tuple of positive ints
ERROR apis/python/tests/test_metadata.py::test_metadata_marshalling_OK[DenseNDArray-inf] - ValueError: SOMADenseNDArray shape must be a non-zero-length tuple of positive ints
ERROR apis/python/tests/test_metadata.py::test_metadata_marshalling_OK[DenseNDArray--inf] - ValueError: SOMADenseNDArray shape must be a non-zero-length tuple of positive ints
ERROR apis/python/tests/test_metadata.py::test_metadata_marshalling_FAIL[DenseNDArray-bad_value0] - ValueError: SOMADenseNDArray shape must be a non-zero-length tuple of positive ints
ERROR apis/python/tests/test_metadata.py::test_metadata_marshalling_FAIL[DenseNDArray-bad_value1] - ValueError: SOMADenseNDArray shape must be a non-zero-length tuple of positive ints
ERROR apis/python/tests/test_metadata.py::test_metadata_marshalling_FAIL[DenseNDArray-bad_value2] - ValueError: SOMADenseNDArray shape must be a non-zero-length tuple of positive ints
ERROR apis/python/tests/test_metadata.py::test_metadata_marshalling_FAIL[DenseNDArray-bad_value3] - ValueError: SOMADenseNDArray shape must be a non-zero-length tuple of positive ints
ERROR apis/python/tests/test_multiscale_image.py::TestSimpleMultiscale2D::test_read_spatial_region[Level 2, full region, no transform] - ValueError: SOMADenseNDArray shape must be a non-zero-length tuple of positive ints
= 235 failed, 1656 passed, 21 skipped, 2 xfailed, 2220 warnings, 26 errors in 202.64s (0:03:22) =

TileDB-SOMA-R failed18 tests:

── Error ('test-write-soma-resume.R:317:3'): Resume-mode dense arrays ──────────
<<not available>/C++Error/error/condition>
Error: WriterBase: Buffer sizes check failed; Invalid number of cells given for attribute 'soma_data' (400 != 18446744073709551615)
Backtrace:1. ├─testthat::expect_s3_class(...) at test-write-soma-resume.R:317:3
 2. │ └─testthat::quasi_label(enquo(object), arg = "object")
 3. │   └─rlang::eval_bare(expr, quo_get_env(quo))
 4. ├─tiledbsoma::write_soma(...)
 5. └─tiledbsoma:::write_soma.matrix(...)
 6.   └─array$write(x)
 7.     └─tiledbsoma:::writeArrayFromArrow(...)

[ FAIL 18 | WARN 0 | SKIP 0 | PASS 3681 ]

@johnkerl
Copy link
Contributor

johnkerl commented Oct 25, 2024

@jdblischak weird

Maybe related to #3230 but the particular error messages I have not seen before

As noted in #3230 we've been unit-testinig with the new-shape feature flag off and on (2x CI runs) for weeks now and all that changes is which one is the default. I have been VERY careful and cautious with this feature phase-in, and honestly, I'm quite surprised (as well as disappointed) to see this failure here.

So these tests appear to be failing in 'new' ways with the feature-flag enabled.

I'll investigate.

@johnkerl johnkerl self-assigned this Oct 25, 2024
Copy link
Author

The centralized nightlies job failed on Friday (2024-10-25) in run 11510728502

@johnkerl
Copy link
Contributor

johnkerl commented Oct 25, 2024

(That second CI fail was a manual re-run on my part, trying to get tiledbsoma-py logs to appear outside of raw logs -- which was successful.)

@johnkerl
Copy link
Contributor

I think I know what it is ... the nightly has core dev aka "2.27" (unreleased) whereas our tiledbsoma CI runs with release core 2.26.

I'll try a repro that way.

@jdblischak
Copy link
Collaborator

I think I know what it is ... the nightly has core dev aka "2.27" (unreleased) whereas our tiledbsoma CI runs with release core 2.26.

Right. This repo runs "nightly" everything in order to identify potential problems as early as possible.

I have been VERY careful and cautious with this feature phase-in, and honestly, I'm quite surprised (as well as disappointed) to see this failure here.

These test failures here don't indicate any lack of cautiousness on your part. The goal is to identify potential problems with the next core version before it is released.

@johnkerl
Copy link
Contributor

I found the issue and it is here:
https://github.com/single-cell-data/TileDB-SOMA/blob/a1f255c7591d7e9e82ed56e712d5a8899f6d46bf/libtiledbsoma/src/soma/managed_query.cc#L123

  • Without the new-shape feature, dense arrays have small core domain and no core current domain
  • With the new-shape feature, dense arrays have huge core domain and small core current domain

The code in question needs to be modified to use the current domain, if there is one.

That's simple enough, with one catch:

cc @nguyenv just as FYI

@johnkerl
Copy link
Contributor

johnkerl commented Oct 25, 2024

Here's another problem spot:
https://github.com/TileDB-Inc/TileDB/blob/4afbe56fed83f59ea7a602186cd3e17bda94578a/tiledb/sm/query/writers/writer_base.cc#L353

I have some simple script repros (happy to share if anyone's interested):

  • Doing a dnda.read() (with no coords) OOMs due apparently to the above (as far as I can tell from gdb so far)
  • Doing a write fails explicitly with
tiledbsoma._exception.SOMAError: WriterBase: Buffer sizes check failed; Invalid number of cells given for attribute 'soma_data' (1600 != 18446744073709551615)

Both are still occurring even with a fix for the above-noted
https://github.com/single-cell-data/TileDB-SOMA/blob/a1f255c7591d7e9e82ed56e712d5a8899f6d46bf/libtiledbsoma/src/soma/managed_query.cc#L123
in place ...

@johnkerl
Copy link
Contributor

johnkerl commented Oct 25, 2024

These test failures here don't indicate any lack of cautiousness on your part. The goal is to identify potential problems with the next core version before it is released.

@jdblischak I committed code with if-2.27 logic in it, with inadequate testing on my part. I should have caught this sooner. It's good we're catching it now, vs even later, but it should have been caught when I submitted single-cell-data/TileDB-SOMA#3180. I did do testing on a machine with dev core on it, but, didn't see the problem at the time. Either that was a miss on my part, or there is a new core dev defect since then. Regardless, my other failing was not having these nightlies run without/with the new-shape feature-flag enabled. Solid miss on my part.

@johnkerl
Copy link
Contributor

single-cell-data/TileDB-SOMA#3244 is WIP

Copy link
Author

The centralized nightlies job failed on Friday (2024-10-25) in run 11528090329

Copy link
Author

The centralized nightlies job failed on Saturday (2024-10-26) in run 11536844080

Copy link
Author

The centralized nightlies job failed on Sunday (2024-10-27) in run 11546195698

@johnkerl
Copy link
Contributor

Copy link
Author

The centralized nightlies job failed on Monday (2024-10-28) in run 11566125228

Copy link
Author

The centralized nightlies job failed on Tuesday (2024-10-29) in run 11585822740

@johnkerl
Copy link
Contributor

More progress on single-cell-data/TileDB-SOMA#3265 and single-cell-data/TileDB-SOMA#3263. (The soma-level issues with dense arrays and 2.27 were more complex than I realized.)

@jdblischak
Copy link
Collaborator

@johnkerl
Copy link
Contributor

@jdblischak there is more to do on subsequent PRs -- the above alone will not be enough to get all green with core 2.27 -- sorry, I should have made that more clear -- all I intended to do was to say "I am here, working on this, making progress".

@johnkerl
Copy link
Contributor

@jdblischak I won't ask you to do any manual runs until I have 'all green' on my laptop where I have core dev (i.e. 2.27-to-be) checked out -- until I have that working there, it will fail here too.

Copy link
Author

The centralized nightlies job failed on Wednesday (2024-10-30) in run 11605000484

Copy link
Author

github-actions bot commented Nov 1, 2024

The centralized nightlies job failed on Thursday (2024-10-31) in run 11623176373

@jdblischak
Copy link
Collaborator

jdblischak commented Nov 1, 2024

Making good progress! Now there is only 1 failing TileDB-SOMA-Py test and only 6 failing TileDB-SOMA-R tests

=========================== short test summary info ============================
FAILED apis/python/tests/test_dense_nd_array.py::test_dense_nd_array_ned_write - ValueError: cannot reshape array of size 1000000 into shape (4,)
= 1 failed, 1918 passed, 21 skipped, 2 xfailed, 2325 warnings in 258.87s (0:04:18) =
══ Failed tests ════════════════════════════════════════════════════════════════
── Failure ('test-shape.R:569:5'): SOMADenseNDArray shape ──────────────────────
all(readback_shape == readback_maxshape) is not TRUE

`actual`:   FALSE
`expected`: TRUE 
── Failure ('test-shape.R:617:9'): SOMADenseNDArray shape ──────────────────────
Expected `ndarray$write(sm)` to run without any errors.Actually got a <simpleError> with text:
  object 'sm' not found
── Failure ('test-shape.R:626:9'): SOMADenseNDArray shape ──────────────────────
Expected `x <- ndarray$read(coords = coords)$tables()$concat()` to run without any conditions.Actually got a <simpleError> with text:
  attempt to apply non-function
── Failure ('test-shape.R:569:5'): SOMADenseNDArray shape ──────────────────────
all(readback_shape == readback_maxshape) is not TRUE

`actual`:   FALSE
`expected`: TRUE 
── Failure ('test-shape.R:617:9'): SOMADenseNDArray shape ──────────────────────
Expected `ndarray$write(sm)` to run without any errors.Actually got a <simpleError> with text:
  object 'sm' not found
── Failure ('test-shape.R:626:9'): SOMADenseNDArray shape ──────────────────────
Expected `x <- ndarray$read(coords = coords)$tables()$concat()` to run without any conditions.Actually got a <simpleError> with text:
  attempt to apply non-function

[ FAIL 6 | WARN 0 | SKIP 2 | PASS 4128 ]

Copy link
Author

github-actions bot commented Nov 1, 2024

The centralized nightlies job failed on Friday (2024-11-01) in run 11623176373

@johnkerl
Copy link
Contributor

johnkerl commented Nov 1, 2024

@jdblischak indeed! The following are pending review:

@johnkerl
Copy link
Contributor

johnkerl commented Nov 3, 2024

@jdblischak I won't ask you to do any manual runs until I have 'all green' on my laptop where I have core dev (i.e. 2.27-to-be) checked out -- until I have that working there, it will fail here too.

@jdblischak using current main on a system with core dev installed I now have C++, Python, R, and cross-language tests all passing

@jdblischak
Copy link
Collaborator

Phenomenal work @johnkerl! 🎉🎉🎉

I confirmed that the nightly builds have been passing since Friday night.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nightly-failure The scheduled nightly builds failed
Projects
None yet
Development

No branches or pull requests

2 participants