Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleanup from import automation testing on prod #333

Closed
2 tasks
aclum opened this issue Dec 10, 2024 · 3 comments
Closed
2 tasks

cleanup from import automation testing on prod #333

aclum opened this issue Dec 10, 2024 · 3 comments
Assignees

Comments

@aclum
Copy link
Contributor

aclum commented Dec 10, 2024

WorkflowExeuction records which decend from the following data_generation_set records have duplicate records which need to be fixed:

  • deleted the second value of the has_output from the data_generation_set record along with the corresponding data_object_set record
  • delete the dupicate workflow_execution_set records and data_object_set records that are has_output to those workflow_execution_set records

nmdc:omprc-12-hgksne68
nmdc:omprc-12-c06jgr44
nmdc:omprc-12-scg48547
nmdc:omprc-12-qhne3f15
nmdc:omprc-12-bcd5ve19
nmdc:omprc-12-bk872674
nmdc:omprc-12-h5aayv46

@aclum
Copy link
Contributor Author

aclum commented Dec 12, 2024

We ended up with several different issues including
multiple DataObject id mentions has_output to a data_generation_set record where only 1 was expected
DataObject id mentions in has_output which did not exist
multiple workflow records

used queries:run delete and queries:run update to clean up these records.

@aclum aclum self-assigned this Dec 13, 2024
@aclum
Copy link
Contributor Author

aclum commented Dec 13, 2024

This is complete, @eecavanna ran refscan yesterday to confirm that there were no more referential integrity errors and confirmed by the following mongo aggregation query

db.getCollection('data_generation_set').aggregate(
  [
    {
      $match: {
        associated_studies:
          'nmdc:sty-11-28tm5d36',
        analyte_category: 'metagenome',
        processing_institution: 'JGI'
      }
    },
    {
      $lookup: {
        from: 'data_object_set',
        localField: 'has_output',
        foreignField: 'id',
        as: 'data_object_set'
      }
    },
    {
      $lookup: {
        from: 'workflow_execution_set',
        localField: 'id',
        foreignField: 'was_informed_by',
        as: 'workflow_execution_set'
      }
    },
    {
      $lookup: {
        from: 'data_object_set',
        localField:
          'workflow_execution_set.has_output',
        foreignField: 'id',
        as: 'workflow_output'
      }
    }
  ],
  { maxTimeMS: 60000, allowDiskUse: true }
);

@eecavanna
Copy link
Contributor

@eecavanna ran refscan yesterday to confirm that there were no more referential integrity errors

That is true, other than the 33 that we expect to be there (unrelated to this cleanup).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

3 participants