Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IT fails in anvilprod due to multiple values for is_supplementary #5229

Closed
14 tasks
hannes-ucsc opened this issue May 19, 2023 · 11 comments
Closed
14 tasks

IT fails in anvilprod due to multiple values for is_supplementary #5229

hannes-ucsc opened this issue May 19, 2023 · 11 comments
Assignees
Labels
+ [priority] High bug [type] A defect preventing use of the system as specified data [subject] Data or metadata [use of this label is uncommon] no demo [process] Not to be demonstrated at the end of the sprint orange [process] Done by the Azul team spike:3 [process] Spike estimate of three points workaround [type] An enhancement that works around a defect of an external dependency

Comments

@hannes-ucsc
Copy link
Member

https://gitlab.prod.anvil.gi.ucsc.edu/ucsc/azul/-/jobs/4281

for branch https://github.com/DataBiosphere/azul/tree/issues/hannes-ucsc/5015-anvilprod

and on commit 4fe493e

======================================================================
ERROR: test_indexing (integration_test.IndexingIntegrationTest) [catalog_complete] (catalog='anvil-it')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builds/ucsc/azul/test/integration_test.py", line 383, in subTest
    yield
  File "/builds/ucsc/azul/test/integration_test.py", line 991, in _assert_catalog_complete
    indexed_fqids = self._get_indexed_bundles(catalog)
  File "/builds/ucsc/azul/test/integration_test.py", line 966, in _get_indexed_bundles
    is_supplementary = one(is_supplementary)
  File "/build/.venv/lib/python3.9/site-packages/more_itertools/more.py", line 554, in one
    raise too_long or ValueError(msg)
ValueError: Expected exactly one item in iterable, but got False, True, and perhaps more.
----------------------------------------------------------------------
Ran 14 tests in 298.301s
FAILED (errors=1, skipped=2)
make: *** [Makefile:237: integration_test] Error 1
Cleaning up project directory and file based variables 00:01
ERROR: Job failed: exit code 1
  • Security design review completed; the Resolution of this issue does not
    • … affect authentication; for example:
      • OAuth 2.0 with the application (API or Swagger UI)
      • Authentication of developers with Google Cloud APIs
      • Authentication of developers with AWS APIs
      • Authentication with a GitLab instance in the system
      • Password and 2FA authentication with GitHub
      • API access token authentication with GitHub
      • Authentication with
    • … affect the permissions of internal users like access to
      • Cloud resources on AWS and GCP
      • GitLab repositories, projects and groups, administration
      • an EC2 instance via SSH
      • GitHub issues, pull requests, commits, commit statuses, wikis, repositories, organizations
    • … affect the permissions of external users like access to
      • TDR snapshots
    • … affect permissions of service or bot accounts
      • Cloud resources on AWS and GCP
    • … affect audit logging in the system, like
      • adding, removing or changing a log message that represents an auditable event
      • changing the routing of log messages through the system
    • … affect monitoring of the system
    • … introduce a new software dependency like
      • Python packages on PYPI
      • Command-line utilities
      • Docker images
      • Terraform providers
    • … add an interface that exposes sensitive or confidential data at the security boundary
    • … affect the encryption of data at rest
    • … require persistence of sensitive or confidential data that might require encryption at rest
    • … require unencrypted transmission of data within the security boundary
    • … affect the network security layer; for example by
      • modifying, adding or removing firewall rules
      • modifying, adding or removing security groups
      • changing or adding a port a service, proxy or load balancer listens on
  • Documentation on any unchecked boxes is provided in comments below
@hannes-ucsc hannes-ucsc added the orange [process] Done by the Azul team label May 19, 2023
@hannes-ucsc
Copy link
Member Author

hannes-ucsc commented May 19, 2023

One of the affected biosamples from the anvil-it catalog.

{
  "entryId": "1b5294b0-ca95-402c-8b75-03e3aea7b66c",
  "sources": [
    {
      "sourceSpec": "tdr:datarepo-dev-43738c90:snapshot/ANVIL_1000G_2019_Dev_20230302_ANV5_202303032342:/2",
      "sourceId": "cc1c98a4-bfc4-45f2-b8dc-e920e5ca634d"
    }
  ],
  "bundles": [
    {
      "bundleUuid": "1b5294b0-ca95-a02c-8b75-03e3aea7b66c",
      "bundleVersion": "2022-06-01T00:00:00.000000Z"
    }
  ],
  "activities": [
    {
      "activity_type": [
        "Checksum",
        "Indexing",
        "Unknown"
      ],
      "assay_type": [
        null
      ],
      "data_modality": [
        null
      ]
    }
  ],
  "biosamples": [
    {
      "document_id": "1b5294b0-ca95-402c-8b75-03e3aea7b66c",
      "source_datarepo_row_ids": [
        "sample:e343379d-7eff-4df6-a4e1-4b3418f82008"
      ],
      "biosample_id": "f3c8c3d5-ebab-71fe-fd58-9f69923d123b",
      "anatomical_site": null,
      "apriori_cell_type": [
        null
      ],
      "biosample_type": null,
      "disease": null,
      "donor_age_at_collection_unit": null,
      "donor_age_at_collection": {
        "gte": null,
        "lte": null
      },
      "accessible": true
    }
  ],
  "datasets": [
    {
      "dataset_id": [
        "385290c3-dff5-fb6d-2501-fa0ba3ad1c35"
      ],
      "title": [
        "ANVIL_1000G_2019_Dev"
      ]
    }
  ],
  "diagnoses": [],
  "donors": [
    {
      "organism_type": [
        null
      ],
      "phenotypic_sex": [
        null
      ],
      "reported_ethnicity": [
        null
      ],
      "genetic_ancestry": [
        null
      ]
    }
  ],
  "files": [
    {
      "data_modality": [
        null
      ],
      "file_format": [
        ".md5"
      ],
      "reference_assembly": [
        null
      ],
      "is_supplementary": [
        false,
        true
      ],
      "count": 2
    },
    {
      "data_modality": [
        null
      ],
      "file_format": [
        ".cram"
      ],
      "reference_assembly": [
        null
      ],
      "is_supplementary": [
        false
      ],
      "count": 1
    },
    {
      "data_modality": [
        null
      ],
      "file_format": [
        ".crai"
      ],
      "reference_assembly": [
        null
      ],
      "is_supplementary": [
        false
      ],
      "count": 1
    }
  ]
}

@hannes-ucsc hannes-ucsc added bug [type] A defect preventing use of the system as specified test [subject] Unit and integration test code + [priority] High labels May 19, 2023
@hannes-ucsc
Copy link
Member Author

Spike to diagnose.

@hannes-ucsc hannes-ucsc added the spike:3 [process] Spike estimate of three points label May 19, 2023
@hannes-ucsc hannes-ucsc changed the title IT fails in anvilprod due to multiple values for is_supplementary IT fails in anvilprod due to multiple values for is_supplementary May 19, 2023
@nadove-ucsc
Copy link
Contributor

Here is the complete structure of the bundle shown above. Non-supplementary files are in blue and the supplementary file is in red.

bundle

I suspect this is a bug in the snapshot because the two leaf files are the same format and derived from the same activity type, but one is marked as supplementary while the other is not.

@nadove-ucsc
Copy link
Contributor

Suggested workaround:

Subject: [PATCH] fix
---
Index: test/integration_test.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/test/integration_test.py b/test/integration_test.py
--- a/test/integration_test.py	(revision c6d873bfe5ba641359a0598dca1e5c0dea66a2cc)
+++ b/test/integration_test.py	(date 1684707202458)
@@ -966,7 +966,7 @@
                     for file in hit['files']:
                         is_supplementary = file['is_supplementary']
                         if isinstance(is_supplementary, list):
-                            is_supplementary = one(is_supplementary)
+                            is_supplementary = all(is_supplementary)
                         if is_supplementary:
                             bundle_fqid['entity_type'] = BundleEntityType.supplementary.value
                             break

@nadove-ucsc nadove-ucsc added data [subject] Data or metadata [use of this label is uncommon] workaround [type] An enhancement that works around a defect of an external dependency and removed test [subject] Unit and integration test code labels May 22, 2023
hannes-ucsc pushed a commit that referenced this issue May 23, 2023
hannes-ucsc pushed a commit that referenced this issue May 23, 2023
@nadove-ucsc
Copy link
Contributor

Broad confirmed that this is a bug in the snapshot.

https://ucsc-gi.slack.com/archives/C03TPJS54DC/p1684805634413329

@hannes-ucsc
Copy link
Member Author

No demo, passing IT (on PR #5184 for #5015) suffices.

@hannes-ucsc hannes-ucsc added the do not close [process] do not close when PR is merged label Jun 9, 2023
@hannes-ucsc
Copy link
Member Author

The PR is just a workaround, we're still waiting for a fixed snapshot.

https://ucsc-gi.slack.com/archives/C03TPJS54DC/p1684805634413329

@hannes-ucsc
Copy link
Member Author

Furthermore, the workaround in PR #5231 is incomplete so working around this would require even more work.

@hannes-ucsc
Copy link
Member Author

ETA for the fixed snapshot is "early next week".

@hannes-ucsc
Copy link
Member Author

Snapshot has arrived and been verified to address the issue. Assignee to file a PR reverting the workaround and incorporating the snapshot.

@hannes-ucsc hannes-ucsc added this to the AnVIL Public Release milestone Jun 13, 2023
@bvizzier-ucsc bvizzier-ucsc modified the milestones: AnVIL Public Release, AnVIL Beta Release Oct 4, 2023
@dsotirho-ucsc dsotirho-ucsc removed this from the AnVIL Beta Release milestone Oct 10, 2023
@hannes-ucsc hannes-ucsc removed the do not close [process] do not close when PR is merged label Nov 14, 2023
@hannes-ucsc
Copy link
Member Author

Workaround has been removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
+ [priority] High bug [type] A defect preventing use of the system as specified data [subject] Data or metadata [use of this label is uncommon] no demo [process] Not to be demonstrated at the end of the sprint orange [process] Done by the Azul team spike:3 [process] Spike estimate of three points workaround [type] An enhancement that works around a defect of an external dependency
Projects
None yet
Development

No branches or pull requests

4 participants