-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(IPVC-2264) add gene_id to UTA models #24
Conversation
…, and a backfill script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bsgiles73 Please make changes
src/uta/loading.py
Outdated
@@ -336,40 +336,17 @@ def load_geneinfo(session, opts, cf): | |||
for i_gi, gi in enumerate(gir): | |||
session.merge( | |||
usam.Gene( | |||
hgnc=gi.hgnc, | |||
gene_id=gi.gene_id, | |||
hgnc=gi.gene_symbol, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix this, it should pull HGNC from the gene parser.
… value in intermediate file, and transcript to gene id changes should raise exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - we discussed during design review.
This PR:
-- Add gene_id, gene_symbol, type and xrefs to UTA gene model
-- Add geneid to transcript table.
-- Backfill gene id and gene symbol values from output of IPVC-2266
-- Set primary key of gene to “gene_id”
-- Set gene_id from transcript table as having a foreign key relationship to gene
The gene_id update happens in two stages. First we have an Alembic migration to add new nullable columns, a script to backfill the gene_id values, then a second migration to update nullable, primary keys, foreign keys, and affected views. These steps are represented in
misc/gene-update/upgrade-uta-schema.sh
.This updates cannot use the existing docker-compose without a modificaiton. We need to start with an updated UTA DB schema. The current yaml requires a UTA database up and running with the base version uta_20210129b. By commenting out a few lines in the yaml and using the updated schema name it works.
The following steps here can be used to start with the current UTA version and run the upgrade uta schema script...
Once the database is ready you can run the following in a separate shell.
The following lines, 46-48, in the
docker-compose.yml
need to be commented out.The uta-load outlined in the readme will now work.
The diff on the output below shows the same number of updated sequences, genes, transcripts, exons, and alignments added compared to a run without the gene_id update.
The new columns can be see in the gene and transcript tables...