Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use isoform lengths median to determine gene feature_length + write feature_length for 'spike-in' features in var #1005

Merged
merged 5 commits into from
Sep 12, 2024

Conversation

nayib-jose-gloria
Copy link
Contributor

@nayib-jose-gloria nayib-jose-gloria commented Sep 4, 2024

Reason for Change

Changes

  • changed the calculation of gene length from "merged" to "median" using GTFtools implementation.
  • we previously set feature_length to 0 for rows in var with feature_type "spike-in", despite having the feature length pre-calculated. We will now write the feature_length in write-labels for all feature types; this command is run after an H5AD is uploaded + validated in the CxG data portal
  • ran gene_processing.py and generated new gene metadata files (genes_*.csv.gz) using new feature-length calculation. these files are the source of truth for the feature-length annotated onto H5AD var df after upload.

Notes for Reviewer

  • no changes to genes_ercc.csv.gz due to always having a different method for calculating feature length, unrelated to above
  • no changes to genes_sars_cov_2.csv.gz due to each gene in that file only having 1 isoform in gtf, so no changes between merged vs. median calculation

@nayib-jose-gloria nayib-jose-gloria requested review from sidneymbell and removed request for sidneymbell September 4, 2024 21:23
@nayib-jose-gloria nayib-jose-gloria removed the request for review from pablo-gar September 5, 2024 19:55
@nayib-jose-gloria nayib-jose-gloria marked this pull request as ready for review September 5, 2024 20:03
Copy link

@sidneymbell sidneymbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation LGTM! Very tidy. Thank you!

Note that I haven't had time to review the output files in detail, but the code looks good.

@nayib-jose-gloria nayib-jose-gloria merged commit e817586 into main Sep 12, 2024
7 of 8 checks passed
@nayib-jose-gloria nayib-jose-gloria deleted the nayib/update-feature-length branch September 12, 2024 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants