-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: consolidating models with worker (#1956) #2063
refactor: consolidating models with worker (#1956) #2063
Conversation
WalkthroughThe pull request encompasses extensive modifications across multiple files, primarily focusing on renaming and restructuring frequency settings in both backend and frontend components. Key changes include updating Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
deps-report 🔍Commit scanned: 30dab54 Vulnerable dependencies4 dependencies have vulnerabilities 😱
Outdated dependencies53 outdated dependencies found (including 20 outdated major versions)😢
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
🧹 Outside diff range and nitpick comments (13)
frontend/src/seqvars/components/PresetsEditor/types.ts (2)
2-9
: Improve documentation clarityThe comment should be updated to better reflect that these are maximum frequency values rather than generic thresholds.
-/** Frequency thresholds for nuclear variant frequencies as returned from API. */ +/** Maximum frequency values for nuclear variants as returned from API. */
13-19
: Maintain documentation consistencyUpdate the comment to match the terminology used in NuclearFreqs.
-/** Frequency thresholds for mitochondrial frequencies as returned from API. */ +/** Maximum frequency values for mitochondrial variants as returned from API. */frontend/src/seqvars/components/QueryEditor/FrequencyControls.vue (1)
39-39
: Document the size parameter meaning.Consider adding a comment explaining what the size parameter represents (e.g., number of samples in thousands, millions, etc.) for better maintainability.
- { label: 'gnomAD mitochondrial', db: 'gnomad_mtdna', size: 56 }, + // size represents number of samples in thousands (56k samples) + { label: 'gnomAD mitochondrial', db: 'gnomad_mtdna', size: 56 },frontend/src/seqvars/components/QueryEditor/FrequencyControlsRow.vue (1)
34-34
: Consider adding JSDoc comments for the attribute constants.While the new naming convention is more consistent, it's less self-descriptive than the previous domain-specific terms. Adding JSDoc comments explaining what each attribute represents (e.g., homoplasmic vs heteroplasmic for mitochondrial) would improve code maintainability.
+/** Mitochondrial count attributes (homoplasmic and heteroplasmic). */ const MITOCHONDRIAL_ATTRS = ['max_hom', 'max_het'] as const +/** Nuclear count attributes (homozygous, heterozygous, and hemizygous). */ const NUCLEAR_ATTRS = ['max_hom', 'max_het', 'max_hemi'] as constAlso applies to: 36-36
backend/seqvars/migrations/0002_auto_20240708_1305.py (2)
Line range hint
1-18
: Consider upgrading Django version before EOL.The migration is using Django 3.2.25 which will reach end-of-life in April 2024. Consider planning an upgrade to Django 4.2 LTS or newer to ensure continued security updates and support.
Line range hint
1-321
: Well-structured model consolidation with improved type safety.The migration successfully refactors the data model by:
- Consolidating related fields into structured Pydantic models
- Clearly separating nuclear and mitochondrial data concerns
- Improving type safety through Pydantic validation
This architectural change should make the codebase more maintainable and reduce the likelihood of data inconsistencies.
backend/protos/seqvars/protos/output.proto (1)
359-360
: LGTM! Consider enhancing documentationThe MitochondrialFrequency message structure is well-designed and appropriately captures mitochondrial variant metrics.
Consider enhancing the documentation comment to explain:
- The relationship between heteroplasmic and homoplasmic carriers
- The calculation method for allele frequency (af)
- Typical value ranges for each field
-// Mitochondrial frequency information. +// Mitochondrial frequency information capturing both heteroplasmic (varying levels of variant) +// and homoplasmic (complete variant) carriers. The allele frequency (af) represents the +// combined frequency across all carrier types.frontend/src/seqvars/components/PresetsEditor/CategoryPresetsFrequencyEditor.vue (1)
Line range hint
287-317
: LGTM: Consistent property naming across frequency typesThe template bindings have been properly updated to use standardized property names (
max_hom
,max_het
,max_hemi
,max_af
,max_carriers
) across all frequency types.Consider extracting the repeated form field structure into a reusable component to reduce duplication. Example:
<!-- FrequencyField.vue --> <template> <v-text-field :model-value="modelValue" @update:model-value="$emit('update:modelValue', $event)" hide-details density="compact" clearable type="number" :step="isFrequency ? '0.001' : undefined" :disabled="disabled" /> </template> <script setup lang="ts"> defineProps<{ modelValue: number | undefined isFrequency?: boolean disabled?: boolean }>() defineEmits<{ 'update:modelValue': [value: number | undefined] }>() </script>This would simplify the template to:
<FrequencyField v-model="gnomadExomes.max_hom" :disabled="readonly" />Also applies to: 338-368, 389-410, 431-452, 481-511
backend/seqvars/protos/output_pb2.py (1)
27-27
: Note: Style issue in generated code.While there is a style issue with the
False
comparison, this is generated code and should not be modified directly. If this needs to be addressed, it should be done through the Protocol Buffer compiler configuration or by updating the protoc version.🧰 Tools
🪛 Ruff
27-27: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
backend/varfish/tests/drf_openapi_schema/varfish_api_schema.yaml (2)
Line range hint
10035-10138
: Enhance documentation of frequency annotation modelsThe frequency annotation models (SeqvarsFrequencyAnnotationPydantic, SeqvarsMitochondrialFrequencyPydantic) lack detailed documentation about:
- The meaning and valid ranges of frequency values
- The relationship between nuclear and mitochondrial frequency settings
- The interpretation of heteroplasmy values for mitochondrial variants
Consider adding more detailed docstrings to clarify these aspects.
Line range hint
12394-12803
: Consolidate duplicate frequency settings codeThe frequency settings code is duplicated across multiple sections (gnomad_exomes, gnomad_genomes, gnomad_mtdna, helixmtdb, inhouse). Consider:
- Extracting common settings into a shared base schema
- Using inheritance or composition to reduce duplication
- Adding validation for frequency thresholds
Example refactor:
# Base frequency settings FrequencySettingsBase: properties: enabled: type: boolean default: false max_af: type: number nullable: true # Nuclear-specific settings NuclearFrequencySettings: allOf: - $ref: '#/components/schemas/FrequencySettingsBase' - type: object properties: max_het: type: integer max_hom: type: integer max_hemi: type: integer # Mitochondrial-specific settings MitochondrialFrequencySettings: allOf: - $ref: '#/components/schemas/FrequencySettingsBase' - type: object properties: max_het: type: integer max_hom: type: integerbackend/seqvars/factory_defaults.py (2)
Line range hint
461-692
: Consider refactoring repetitive frequency preset code blocksThe
create_seqvarsquerypresetsfrequency
function contains several repetitive instances ofSeqvarsQueryPresetsFrequency
with similar structures. Refactoring this code by introducing a helper function or using loops can improve maintainability and reduce duplication.Here's an example of how you might refactor the code:
def create_frequency_preset( faker, rank, label, gnomad_af, gnomad_hom, gnomad_het, inhouse_carriers, enabled_mtdna=False, ): return SeqvarsQueryPresetsFrequency( sodar_uuid=faker.uuid4(), date_created=TIME_VERSION_1_0, date_modified=TIME_VERSION_1_0, rank=rank, label=label, gnomad_exomes=SeqvarsNuclearFrequencySettingsPydantic( enabled=True, max_af=gnomad_af, max_hom=gnomad_hom, max_het=gnomad_het, max_hemi=None, ), gnomad_genomes=SeqvarsNuclearFrequencySettingsPydantic( enabled=True, max_af=gnomad_af, max_hom=gnomad_hom, max_het=gnomad_het, max_hemi=None, ), gnomad_mtdna=SeqvarsMitochondrialFrequencySettingsPydantic( enabled=enabled_mtdna, max_af=None, max_het=None, max_hom=None, ), helixmtdb=SeqvarsMitochondrialFrequencySettingsPydantic( enabled=enabled_mtdna, max_af=None, max_het=None, max_hom=None, ), inhouse=SeqvarsInhouseFrequencySettingsPydantic( enabled=True, max_carriers=inhouse_carriers, max_hom=None, max_het=None, max_hemi=None, ), ) def create_seqvarsquerypresetsfrequency(faker: Faker) -> list[SeqvarsQueryPresetsFrequency]: presets = [ { "rank": 1, "label": "dominant super strict", "gnomad_af": 0.002, "gnomad_hom": 0, "gnomad_het": 1, "inhouse_carriers": None, }, { "rank": 2, "label": "dominant strict", "gnomad_af": 0.002, "gnomad_hom": 0, "gnomad_het": 1, "inhouse_carriers": 20, }, { "rank": 3, "label": "dominant relaxed", "gnomad_af": 0.01, "gnomad_hom": 0, "gnomad_het": 50, "inhouse_carriers": 20, }, # Add additional presets as needed... ] return [ create_frequency_preset( faker, preset["rank"], preset["label"], preset["gnomad_af"], preset["gnomad_hom"], preset["gnomad_het"], preset["inhouse_carriers"], ) for preset in presets ]
481-485
: Ensure consistent parameter ordering inhelixmtdb
instancesThe parameter order in the
SeqvarsMitochondrialFrequencySettingsPydantic
instances forhelixmtdb
is inconsistent with other instances. Maintaining a consistent order improves readability and reduces the potential for errors.Reorder the parameters to match the established convention:
SeqvarsMitochondrialFrequencySettingsPydantic( enabled=False, - max_het=None, - max_hom=None, - max_af=None, + max_af=None, + max_hom=None, + max_het=None, ),Apply this change to all
helixmtdb
instances for consistency.Also applies to: 521-525, 561-565, 601-605, 641-645, 681-685
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (30)
- backend/Makefile (1 hunks)
- backend/protos/seqvars/protos/output.proto (2 hunks)
- backend/protos/seqvars/protos/query.proto (2 hunks)
- backend/seqvars/factory_defaults.py (7 hunks)
- backend/seqvars/migrations/0002_auto_20240708_1305.py (10 hunks)
- backend/seqvars/migrations/0013_auto_20241028_0928.py (1 hunks)
- backend/seqvars/models/base.py (7 hunks)
- backend/seqvars/models/protobufs.py (9 hunks)
- backend/seqvars/protos/output_pb2.py (2 hunks)
- backend/seqvars/protos/output_pb2.pyi (4 hunks)
- backend/seqvars/protos/query_pb2.py (2 hunks)
- backend/seqvars/protos/query_pb2.pyi (3 hunks)
- backend/seqvars/serializers.py (2 hunks)
- backend/seqvars/tests/factories.py (2 hunks)
- backend/seqvars/tests/snapshots/snap_test_factory_defaults.py (18 hunks)
- backend/seqvars/tests/snapshots/snap_test_models.py (1 hunks)
- backend/seqvars/tests/snapshots/snap_test_views_api.py (6 hunks)
- backend/seqvars/tests/test_models.py (1 hunks)
- backend/seqvars/tests/test_serializers.py (4 hunks)
- backend/seqvars/tests/test_views_api.py (1 hunks)
- backend/varfish/tests/drf_openapi_schema/varfish_api_schema.yaml (13 hunks)
- frontend/ext/varfish-api/src/lib/schemas.gen.ts (132 hunks)
- frontend/ext/varfish-api/src/lib/types.gen.ts (11 hunks)
- frontend/src/seqvars/components/PresetsEditor/CategoryPresetsFrequencyEditor.vue (20 hunks)
- frontend/src/seqvars/components/PresetsEditor/lib/utils.ts (2 hunks)
- frontend/src/seqvars/components/PresetsEditor/types.ts (1 hunks)
- frontend/src/seqvars/components/QueryEditor/FrequencyControls.vue (1 hunks)
- frontend/src/seqvars/components/QueryEditor/FrequencyControlsRow.vue (3 hunks)
- frontend/src/seqvars/components/QueryEditor/groups.ts (1 hunks)
- frontend/src/seqvars/components/QueryEditor/lib/types.ts (1 hunks)
🧰 Additional context used
🪛 Ruff
backend/seqvars/protos/output_pb2.py
27-27: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
backend/seqvars/protos/query_pb2.py
22-22: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
🪛 Biome
frontend/src/seqvars/components/PresetsEditor/lib/utils.ts
[error] 142-142: isNaN is unsafe. It attempts a type coercion. Use Number.isNaN instead.
See the MDN documentation for more details.
Unsafe fix: Use Number.isNaN instead.(lint/suspicious/noGlobalIsNan)
[error] 142-142: isNaN is unsafe. It attempts a type coercion. Use Number.isNaN instead.
See the MDN documentation for more details.
Unsafe fix: Use Number.isNaN instead.(lint/suspicious/noGlobalIsNan)
🔇 Additional comments (75)
frontend/src/seqvars/components/QueryEditor/lib/types.ts (1)
5-5
: LGTM! Consistent with backend changes.The rename from 'gnomad_mitochondrial' to 'gnomad_mtdna' aligns with the broader standardization of naming conventions across the codebase.
frontend/src/seqvars/components/PresetsEditor/types.ts (2)
1-30
: Overall structure looks goodThe interfaces are well-structured with consistent naming patterns and proper TypeScript types. The changes align well with the PR objective of consolidating and standardizing the models.
23-30
: Maintain documentation consistency and verify API alignmentUpdate the comment to match the terminology used in other interfaces.
-/** Frequency thresholds for inhouse frequencies as returned from API. */ +/** Maximum frequency values for in-house variants as returned from API. */Let's verify that these interface changes align with the API implementation:
✅ Verification successful
Based on the shell script results, I can now generate the final response since I have enough information to verify the API alignment:
Documentation update is appropriate and interface matches API implementation
The interface changes in
InhouseFreqs
align with the backend implementation as shown in the API schema and protobuf definitions. Themax_carriers
field is consistently used across the codebase for in-house frequency settings, while other interfaces usemax_af
for allele frequencies.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the API endpoints and OpenAPI schema match these interface definitions # Check OpenAPI schema for corresponding endpoint definitions rg -A 10 "NuclearFreqs|MitochondrialFreqs|InhouseFreqs" docs/openapi.yaml # Check Python models/serializers for matching field names rg -A 10 "max_(hom|het|hemi|af|carriers)" backend/Length of output: 237607
backend/seqvars/migrations/0013_auto_20241028_0928.py (3)
1-18
: LGTM: Migration setup is correctThe migration is properly structured with appropriate dependencies and imports.
30-59
: Consider data migration needs for inhouse field changesThe field alterations look correct, but since we're changing the field type to
PydanticSchemaField
with new schema definitions:
- Ensure existing data is compatible with the new schema
- Consider adding a data migration if necessary to convert existing data
#!/bin/bash # Description: Check for existing data in the inhouse fields that might need migration echo "Checking for related model definitions and potential data migrations..." rg -A 5 "class SeqvarsInhouseFrequencySettingsPydantic" --type python echo "Checking for any existing data migrations in seqvars..." fd --type f --glob "*_auto_*.py" backend/seqvars/migrations/ | sort
20-29
: Verify consistent renaming across the codebaseThe renaming from 'gnomad_mitochondrial' to 'gnomad_mtdna' appears correct, but we should verify that this change is consistently applied across all related components.
✅ Verification successful
Field renaming has been consistently applied across the codebase
The search results confirm that:
- Old references to 'gnomad_mitochondrial' only appear in migration files (0002 and 0013)
- All current code uses 'gnomad_mtdna' consistently across:
- Models and serializers
- Factory defaults
- Tests and snapshots
- Protocol buffer definitions
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for any remaining references to 'gnomad_mitochondrial' # to ensure complete renaming across the codebase echo "Checking for remaining references to 'gnomad_mitochondrial'..." rg -l "gnomad_mitochondrial" --type-add 'django:*.py' --type django echo "Checking for new references to 'gnomad_mtdna'..." rg -l "gnomad_mtdna" --type-add 'django:*.py' --type djangoLength of output: 912
Script:
#!/bin/bash # Let's try a more comprehensive search with proper flags and patterns # Check for remaining references to 'gnomad_mitochondrial' echo "Checking for remaining references to 'gnomad_mitochondrial'..." rg "gnomad_mitochondrial" -t py # Check for new references to 'gnomad_mtdna' echo "Checking for new references to 'gnomad_mtdna'..." rg "gnomad_mtdna" -t py # Look for related model definitions echo "Checking for related model definitions..." rg "class SeqvarsQueryPresetsFrequency|class SeqvarsQuerySettingsFrequency" -t py # Check for any serializers or forms that might use these fields echo "Checking for serializers and forms..." rg "Frequency.*Serializer|Frequency.*Form" -t pyLength of output: 37306
frontend/src/seqvars/components/QueryEditor/FrequencyControlsRow.vue (2)
129-129
: LGTM! Clean and consistent database-specific handling.The explicit database check for mitochondrial data is clear and the template structure maintains good separation between mitochondrial and nuclear controls.
34-36
: Verify consistent attribute naming across the codebase.The changes from domain-specific terms (homoplasmic, heteroplasmic) to generic terms (max_hom, max_het) should be consistently applied across all related components and documentation.
Also applies to: 107-110, 129-129
✅ Verification successful
Note: Removed file type filters since 'vue' is not recognized. This will search across all files in the repository.
Based on the search results, I can now provide a final response:
Consistent attribute naming verified across the codebase
The changes from domain-specific terms to generic terms (
max_hom
,max_het
,max_hemi
) are consistently applied across the codebase. The new terminology is used in:
- Frontend components (FrequencyControlsRow.vue, types.ts, etc.)
- Backend models and migrations
- API schemas and protobuf definitions
- Test fixtures and snapshots
The old terminology only appears in documentation and changelog entries which is expected and appropriate.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for any remaining old terminology to ensure complete migration echo "Checking for old terminology..." rg -i "homoplasmic|heteroplasmic|homozygous|heterozygous|hemizygous" \ --type vue \ --type typescript \ --type javascript # Search for new terminology to verify consistent usage echo "Checking new terminology usage..." rg -i "max_hom|max_het|max_hemi" \ --type vue \ --type typescript \ --type javascriptLength of output: 405
Script:
#!/bin/bash # Search for any remaining old terminology to ensure complete migration echo "Checking for old terminology..." rg -i "homoplasmic|heteroplasmic|homozygous|heterozygous|hemizygous" # Search for new terminology to verify consistent usage echo "Checking new terminology usage..." rg -i "max_hom|max_het|max_hemi"Length of output: 262382
frontend/src/seqvars/components/PresetsEditor/lib/utils.ts (2)
84-85
: LGTM! Consistent terminology usage.The renaming from 'end' to 'stop' aligns with bioinformatics conventions and maintains consistency with the broader codebase changes.
140-141
: LGTM! Consistent terminology usage.The renaming from 'end' to 'stop' maintains consistency with bioinformatics conventions and the rest of the codebase.
Also applies to: 146-147, 150-150
backend/seqvars/tests/snapshots/snap_test_models.py (4)
44-47
: LGTM! Consistent renaming of frequency fields.The renaming of frequency fields to use the
max_
prefix improves clarity by explicitly indicating these are maximum threshold values.
52-55
: LGTM! Maintains consistency with exomes settings.The frequency field renaming maintains consistency with the gnomad_exomes structure, ensuring a uniform approach across different data sources.
67-69
: LGTM! Consistent structure for mitochondrial data source.The helixmtdb settings maintain consistency with the gnomad_mtdna structure, appropriately using the same fields for mitochondrial variant frequencies.
74-76
: LGTM! Appropriate structure for in-house frequency data.The inhouse settings correctly use a distinct structure with
max_carriers
while maintaining consistency with themax_
prefix naming convention.backend/protos/seqvars/protos/query.proto (5)
91-102
: LGTM! Improved field naming in NuclearFrequencySettingsThe renamed fields (max_het, max_hom, max_hemi, max_af) are more concise and consistent with standard naming conventions while maintaining protobuf compatibility.
105-114
: LGTM! Enhanced reusability of MitochondrialFrequencySettingsThe message has been generalized by removing the "Gnomad" prefix, making it reusable for different mitochondrial frequency data sources while maintaining a consistent field naming pattern.
117-128
: LGTM! Well-structured InhouseFrequencySettingsThe message provides a comprehensive structure for in-house frequency data with clear separation of carrier types and the addition of total carriers tracking.
138-142
: LGTM! Consistent type updates in QuerySettingsFrequencyThe field type updates correctly reflect the renamed messages while maintaining field names and ordering for backward compatibility.
288-288
: LGTM! Improved genomic coordinate terminologyThe field rename from 'end' to 'stop' better aligns with standard genomic coordinate terminology while maintaining protobuf compatibility.
backend/seqvars/protos/query_pb2.py (1)
Line range hint
16-63
: LGTM! Auto-generated protobuf file.The changes to the serialized positions are correct and align with the introduction of new frequency settings messages (
MitochondrialFrequencySettings
andInhouseFrequencySettings
). These modifications are automatically generated by the protocol buffer compiler and should not be manually edited.Let's verify that the corresponding
.proto
file exists and contains the new message definitions:✅ Verification successful
LGTM! The auto-generated protobuf file correctly reflects the source proto definitions.
The changes in
backend/seqvars/protos/query_pb2.py
are verified to be correctly auto-generated from the source proto filebackend/protos/seqvars/protos/query.proto
. The source file contains the new message definitions:
MitochondrialFrequencySettings
(lines 116-124)InhouseFrequencySettings
(lines 127-137)These messages are properly integrated into the
QuerySettingsFrequency
message (lines 140-152), which explains the updated serialized positions in the generated pb2 file.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the existence of the source .proto file and its content # Test: Check if the .proto file exists and contains the new message definitions rg -l "message (MitochondrialFrequencySettings|InhouseFrequencySettings)" "seqvars/protos/query.proto"Length of output: 169
Script:
#!/bin/bash # Let's first find the actual location of the proto file fd -e proto queryLength of output: 61
Script:
#!/bin/bash # Now let's check the content of the proto file rg -l "message (MitochondrialFrequencySettings|InhouseFrequencySettings)" "backend/protos/seqvars/protos/query.proto" # Also get the actual content to verify the message definitions cat "backend/protos/seqvars/protos/query.proto"Length of output: 10419
🧰 Tools
🪛 Ruff
22-22: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
backend/seqvars/migrations/0002_auto_20240708_1305.py (2)
Line range hint
160-321
: Review schema validation and default values.The new Pydantic schema fields follow good practices with clear separation between nuclear and mitochondrial data. However, there are a few points to consider:
- All fields are nullable with default=None, which might lead to inconsistent states
- No visible validation constraints beyond the basic schema structure
Please verify the Pydantic models' implementation:
#!/bin/bash # Description: Check Pydantic models implementation # Look for validation rules and default values in the models echo "Checking SeqvarsNuclearFrequencySettingsPydantic implementation:" rg -A 10 "class SeqvarsNuclearFrequencySettingsPydantic" echo "Checking SeqvarsMitochondrialFrequencySettingsPydantic implementation:" rg -A 10 "class SeqvarsMitochondrialFrequencySettingsPydantic"
Line range hint
19-159
: Verify data migration strategy for removed fields.The migration removes numerous fields containing genomic frequency data. However, there's no visible data migration operation to preserve existing data before the field removal.
Please ensure that:
- A data migration strategy exists to transfer data from the old fields to the new Pydantic schema fields
- The application code is prepared to handle records that might have been created during the migration
Run this script to check for existing data that needs migration:
✅ Verification successful
Based on the code analysis, I can now generate the final response:
Migration is safe - fields are being replaced by Pydantic schema fields
The removed fields are being replaced by structured Pydantic schema fields that store the same information in a more organized way:
- Old flat fields like
gnomad_exomes_enabled
,gnomad_exomes_frequency
etc. are replaced by nested Pydantic models:
SeqvarsNuclearFrequencySettingsPydantic
for gnomad_exomes/genomesSeqvarsMitochondrialFrequencySettingsPydantic
for gnomad_mtdna/helixmtdb- These models contain the same data in a more structured format
The migration is part of a larger refactoring to use Pydantic models instead of flat fields, with proper data transfer handled through the models' schema fields.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for existing records that might need data migration # Note: This assumes we're in a Django management shell context # Check for records in both models echo "Checking seqvarsquerypresetsfrequency records:" python manage.py shell -c "from seqvars.models import SeqvarsQueryPresetsFrequency; print(SeqvarsQueryPresetsFrequency.objects.count())" echo "Checking seqvarsquerysettingsfrequency records:" python manage.py shell -c "from seqvars.models import SeqvarsQuerySettingsFrequency; print(SeqvarsQuerySettingsFrequency.objects.count())"Length of output: 2255
Script:
#!/bin/bash # Let's check the migration history and related model definitions # Check if there are any data migrations before this one echo "Checking for data migrations:" fd ".*migrations.*\.py" backend/seqvars/migrations/ --exec grep -l "def forwards\|def backwards" # Look for model definitions and their fields echo -e "\nChecking model definitions:" rg -A 5 "class SeqvarsQueryPresetsFrequency|class SeqvarsQuerySettingsFrequency" backend/seqvars/ # Check for any JSON/JSONB fields that might store the data echo -e "\nChecking for JSON fields:" rg "JSONField|django.contrib.postgres.fields" backend/seqvars/ # Look for any data transfer logic echo -e "\nChecking for data transfer logic:" rg "gnomad_|helixmtdb_|inhouse_" backend/seqvars/ --type pyLength of output: 73613
backend/protos/seqvars/protos/output.proto (1)
338-338
: Verify data migration for mitochondrial frequency changesThe standardization of mitochondrial frequency types is a good architectural decision for consistency. However, this is a breaking change that requires careful consideration of:
- Data migration from the old format
- Backward compatibility during deployment
- Client code updates
Let's verify the migration strategy:
Also applies to: 340-340
✅ Verification successful
Migration path is already in place
Based on the search results, the change from
GnomadMitochondrialFrequency
toMitochondrialFrequency
is well-supported:
- The migration path is handled through Django migrations (
backend/seqvars/migrations/0002_auto_20240708_1305.py
)- All necessary model updates are in place:
- Protocol buffers (
backend/protos/seqvars/protos/*.proto
)- Generated Python code (
backend/seqvars/protos/*.py
)- Pydantic models (
backend/seqvars/models/base.py
)- Protobuf converters (
backend/seqvars/models/protobufs.py
)- No references to the old type names were found, indicating the change has been consistently applied
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check for migration files handling these changes fd --type f --glob "*.py" | rg -l "GnomadMitochondrialFrequency|HelixMtDbFrequency" # Look for potential client code that needs updating rg "GnomadMitochondrialFrequency|HelixMtDbFrequency" --type pythonLength of output: 185
Script:
#!/bin/bash # Let's try a more comprehensive search without file type restrictions rg "GnomadMitochondrialFrequency|HelixMtDbFrequency" # Also search for any references to the new MitochondrialFrequency type rg "MitochondrialFrequency" # Look for any protobuf files that might be affected fd --type f --glob "*.proto" | rg -l "MitochondrialFrequency"Length of output: 31255
frontend/src/seqvars/components/PresetsEditor/CategoryPresetsFrequencyEditor.vue (3)
12-12
: LGTM: Good architectural improvementThe change from
GnomadFreqs
toNuclearFreqs
represents a positive architectural improvement, making the type more generic and reusable across different nuclear genome frequency sources.
97-102
: LGTM: Consistent type usageThe computed properties have been correctly updated to use the new
NuclearFreqs
type while maintaining type safety through proper default value handling.
Line range hint
287-511
: Verify API compatibility with renamed propertiesThe property names have been standardized across the frontend (
max_hom
,max_het
,max_hemi
,max_af
,max_carriers
). Let's verify that these match the backend API expectations.✅ Verification successful
Based on the search results, I can see that the property names
max_hom
,max_het
,max_hemi
,max_af
, andmax_carriers
are consistently used throughout the codebase, both in the frontend and backend:
- Backend models and protobuf definitions use these exact property names:
max_het: int | None = None max_hom: int | None = None max_hemi: int | None = None max_af: float | None = None max_carriers: int | None = None
- The frontend Vue component matches these property names exactly:
v-model="gnomadExomes.max_hom" v-model="gnomadExomes.max_het" v-model="gnomadExomes.max_hemi" v-model="gnomadExomes.max_af"
- The property names are used consistently in database migrations, serializers, and API endpoints.
Property names are consistent with API expectations
The standardized property names in the frontend code match the backend API contract exactly. No changes needed.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for corresponding property names in API-related files echo "Checking API property names in backend files..." rg -g '*.py' -g '*.proto' 'max_(hom|het|hemi|af|carriers)' # Search for any remaining old property names that might need updating echo "Checking for any remaining old property names..." rg -g '*.py' -g '*.proto' -g '*.vue' '(homozygous|heterozygous|hemizygous|frequency)'Length of output: 345861
backend/seqvars/protos/output_pb2.py (2)
Line range hint
29-100
: LGTM: Serialized range updates are consistent.The updates to the serialized ranges appear to be consistent with the protobuf message structure. These changes are automatically generated and reflect modifications in the underlying
.proto
definitions.
Line range hint
1-100
: Important: This is a generated Protocol Buffer file.This file is automatically generated by the Protocol Buffer compiler and should not be modified directly. Any changes should be made to the source
.proto
files instead.Let's verify the protobuf compilation process:
✅ Verification successful
Generated Protocol Buffer file is correctly maintained
The file
backend/seqvars/protos/output_pb2.py
is correctly generated from the source proto file using the standard protoc compiler. The source proto file and compilation process are properly configured:
- Source proto file exists at
backend/protos/seqvars/protos/output.proto
- Compilation is handled via Makefile with correct protoc flags:
- Uses Python output
- Includes mypy type generation
- Maintains proper import paths
The serialized range updates in the generated file are a normal result of the protobuf compilation process and should not be modified manually.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the protobuf source file and compilation process # Test 1: Check if the source .proto file exists fd -t f "output.proto$" -x echo "Source proto file: {}" # Test 2: Check for protoc compiler configuration fd -t f "Makefile$" -x grep -l "protoc" {} \; -x echo "Found protoc configuration in: {}"Length of output: 283
Script:
#!/bin/bash # Let's examine the protobuf source and compilation setup # Check the source proto file content echo "=== Source Proto File Content ===" cat ./backend/protos/seqvars/protos/output.proto # Check Makefile protoc configuration echo -e "\n=== Protoc Configuration in Makefile ===" cat ./backend/Makefile | grep -A 5 -B 5 "protoc"Length of output: 13053
🧰 Tools
🪛 Ruff
18-18:
seqvars.protos.query_pb2
imported but unusedRemove unused import:
seqvars.protos.query_pb2
(F401)
27-27: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
backend/seqvars/tests/factories.py (4)
17-19
: LGTM: Import changes align with model renaming.The new imports correctly reflect the renamed Pydantic models, maintaining consistent naming conventions.
95-103
: LGTM: Consistent attribute naming with nuclear settings.The mitochondrial frequency settings factory maintains consistency with the nuclear settings while appropriately excluding hemizygous-specific fields.
109-115
: LGTM: Consistent naming convention applied.The inhouse frequency settings factory follows the new naming convention with the
max_
prefix, maintaining consistency across all frequency settings.
122-134
: Verify the impact of consolidating mitochondrial frequency settings.The consolidation of
helixmtdb
to useMitochondrialFrequencySettingsFactory
is a good simplification. However, ensure that all test cases using the oldHelixMtDbFrequencySettingsFactory
have been updated.✅ Verification successful
Consolidation of mitochondrial frequency settings verified
The consolidation of
helixmtdb
to useMitochondrialFrequencySettingsFactory
is complete and safe. No references to the oldHelixMtDbFrequencySettingsFactory
were found in the codebase, and all test files have been properly updated to use the new factory.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for any remaining references to the old factory # Expected: No references to HelixMtDbFrequencySettingsFactory should exist rg "HelixMtDbFrequencySettings(Factory|Pydantic)" --type pyLength of output: 61
Script:
#!/bin/bash # Let's verify the changes more thoroughly by: # 1. Looking for any frequency settings factory definitions # 2. Checking test files that might use frequency settings # 3. Looking for any helixmtdb related code # Check for frequency settings factory definitions echo "=== Frequency Settings Factory Definitions ===" rg "class.*FrequencySettings(Factory|Pydantic)" -A 5 --type py # Check test files using frequency settings echo -e "\n=== Test Files Using Frequency Settings ===" rg "FrequencySettings" --type py --glob "*test*.py" -l # Look for any helixmtdb related code echo -e "\n=== HelixMtDb Related Code ===" rg "helixmtdb" --type py -iLength of output: 61551
Script:
#!/bin/bash # Let's verify if there are any remaining references to HelixMtDbFrequencySettingsFactory # and check the test files that might be affected by the change # Check for any remaining HelixMtDbFrequencySettingsFactory class definitions echo "=== Checking for HelixMtDbFrequencySettingsFactory class definitions ===" ast-grep --pattern 'class HelixMtDbFrequencySettingsFactory' # Check test files that might be using the factory echo -e "\n=== Checking test files using HelixMtDbFrequencySettingsFactory ===" rg "HelixMtDbFrequencySettingsFactory" --type py --glob "*test*.py" # Check for any test files that might be using helixmtdb with frequency settings echo -e "\n=== Checking test files using helixmtdb frequency settings ===" rg "helixmtdb.*=.*FrequencySettings" --type py --glob "*test*.py"Length of output: 645
backend/seqvars/serializers.py (3)
22-22
: LGTM: Import changes align with the model refactoring.The new imports for Seqvars-prefixed frequency settings Pydantic models are properly organized and consistent with the refactoring objectives.
Also applies to: 24-25
122-122
: LGTM: Meta class fields list correctly updated.The Meta class fields list properly reflects the field renaming from
gnomad_mitochondrial
tognomad_mtdna
.
98-114
: LGTM: Field updates improve consistency and maintain schema validation.The changes appropriately:
- Update field types to use new Seqvars-prefixed models
- Rename
gnomad_mitochondrial
tognomad_mtdna
for better consistency- Maintain proper null handling and defaults
Let's verify the field renaming is consistent across the codebase:
✅ Verification successful
Field renaming is properly migrated across the codebase
The verification shows that:
- The old field name
gnomad_mitochondrial
only appears in migration files (0002 and 0013), which is expected and correct- The new field name
gnomad_mtdna
is consistently used across:
- Serializers
- Models
- Factory defaults
- Tests and test snapshots
- Protobuf definitions
The migration is complete and consistent.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for any remaining instances of old field name # Expect: No occurrences of old field name 'gnomad_mitochondrial' rg "gnomad_mitochondrial" --type pyLength of output: 423
Script:
#!/bin/bash # Check for current usage of gnomad_mtdna to ensure the migration is complete rg "gnomad_mtdna" --type py -A 2 # Also check the test files to ensure test cases are updated rg "gnomad_mitochondrial" --type py --glob "*test*.py"Length of output: 47617
backend/seqvars/protos/query_pb2.pyi (5)
684-771
: LGTM! Field names are now more explicit and clearer.The renaming of fields in
NuclearFrequencySettings
from generic terms to explicit maximum values (e.g.,heterozygous
→max_het
) improves code clarity and maintainability.
775-847
: LGTM! Consistent naming pattern applied to mitochondrial settings.The renaming of the class from
GnomadMitochondrialFrequencySettings
toMitochondrialFrequencySettings
makes it more generic and reusable. The field renames follow the same pattern asNuclearFrequencySettings
, maintaining consistency across the codebase.
851-939
: LGTM! Improved naming and functionality for in-house frequency settings.The renaming from
HelixMtDbFrequencySettings
toInhouseFrequencySettings
better reflects its purpose. The addition ofmax_carriers
provides a useful way to limit total carriers regardless of their specific type.
1090-1102
: LGTM! Better genomics terminology used.The renaming of the field from
end
tostop
better aligns with standard genomics terminology.
Line range hint
684-1102
: Verify complete renaming across the codebase.Let's ensure all references to the old field names have been updated consistently.
✅ Verification successful
No issues found with field renaming
The search results show that the terms "heterozygous", "homozygous", "hemizygous", "heteroplasmic", and "homoplasmic" appear only in:
- Test files and fixtures
- Documentation strings and comments
- Database migration files
- Model definitions
- Preset configurations
There are no inconsistencies or old field names that need to be updated. The code uses these terms appropriately in their biological/genetic context rather than as legacy field names that need renaming.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for any remaining instances of old field names that should have been renamed # Test: Search for old field names that should have been renamed echo "Checking for old field names..." rg -i "heterozygous_field_number|homozygous_field_number|hemizygous_field_number|end_field_number|heteroplasmic|homoplasmic" \ --type py \ --type proto # Test: Search for old class names echo "Checking for old class names..." rg -i "GnomadMitochondrialFrequencySettings|HelixMtDbFrequencySettings" \ --type py \ --type protoLength of output: 452
Script:
#!/bin/bash # Description: Check for any remaining instances of old field names that should have been renamed # Test: Search for old field names that should have been renamed echo "Checking for old field names..." rg -i "heterozygous|homozygous|hemizygous|end_field|heteroplasmic|homoplasmic" \ --type-add 'proto:*.{proto,pyi}' \ --type proto \ --type py # Test: Search for old class names echo "Checking for old class names..." rg -i "GnomadMitochondrialFrequencySettings|HelixMtDbFrequencySettings" \ --type-add 'proto:*.{proto,pyi}' \ --type proto \ --type pyLength of output: 131503
backend/seqvars/tests/test_serializers.py (2)
234-235
: LGTM: Proper handling of renamed field in presets serializer test.The changes correctly update the test to use the renamed field
gnomad_mtdna
and ensure proper JSON serialization of the field.Also applies to: 249-251
1242-1243
: LGTM: Consistent handling of renamed field in settings serializer test.The changes maintain consistency with the presets serializer by using the same field name
gnomad_mtdna
and ensuring proper JSON serialization.Also applies to: 1255-1257
backend/seqvars/protos/output_pb2.pyi (3)
1503-1507
: LGTM! Good consolidation of mitochondrial frequency types.The change to use a single
MitochondrialFrequency
type for both gnomAD and HelixMtDb frequencies improves code maintainability while preserving type safety and documentation.
1519-1520
: LGTM! Constructor parameters correctly updated.The constructor parameter types are properly aligned with the property changes, maintaining type safety.
Line range hint
1634-1666
: LGTM! Well-structured consolidated frequency class.The new
MitochondrialFrequency
class effectively combines the functionality of the previous separate classes while maintaining:
- Clear documentation
- Type safety
- Complete field definitions
- Proper field clearing methods
frontend/ext/varfish-api/src/lib/types.gen.ts (5)
1993-2000
: LGTM: Improved frequency annotation type organizationThe frequency annotation type has been updated with more consistent naming:
- Renamed
gnomad_mitochondrial
tognomad_mtdna
- Renamed
helixmtdb
tohelixmtdb
2014-2030
: LGTM: Well-structured mitochondrial frequency typeNew
SeqvarsMitochondrialFrequencyPydantic
type properly encapsulates mitochondrial-specific frequency data with appropriate fields.
Line range hint
2039-2065
: LGTM: Clear nuclear frequency type structureThe
SeqvarsNuclearFrequencyPydantic
type has been updated with:
- Clear documentation for dual-use (nuclear and in-house MT data)
- Consistent field naming
2325-2354
: LGTM: Consistent frequency preset type updatesThe query preset frequency types have been updated to maintain consistency with the core frequency type changes:
- Aligned field names with the new frequency type structure
- Maintained proper type relationships
Also applies to: 2373-2402
998-998
:⚠️ Potential issueBreaking change: Renamed range property from
end
tostop
This is a breaking change that will require updates in any code using the
OneBasedRangePydantic
type. The change aligns with common range conventions but needs careful migration.backend/seqvars/tests/snapshots/snap_test_views_api.py (5)
2190-2200
: LGTM: Consistent frequency threshold structure across presets.The frequency thresholds are well-structured and maintain consistency across different preset configurations (dominant super strict, dominant strict, dominant relaxed, recessive strict, recessive relaxed). Each preset appropriately adjusts the thresholds while maintaining the same parameter structure.
Also applies to: 2231-2241, 2272-2282, 2313-2323, 2354-2364
2202-2206
: LGTM: Standardized mtDNA frequency parameters.The
gnomad_mtdna
settings maintain a consistent structure across all presets with appropriate null values when disabled. This aligns with the broader refactoring effort to standardize mitochondrial DNA parameter naming.Also applies to: 2243-2247, 2284-2288, 2325-2329, 2366-2370
2207-2212
: LGTM: Consistent HelixMtDb parameter structure.The HelixMtDb settings follow the same parameter structure as other frequency sources, ensuring consistency in the configuration schema.
Also applies to: 2248-2253, 2289-2294, 2330-2335, 2371-2376
2395-2423
: LGTM: Appropriate default values for disabled frequency filters.The "any" preset correctly sets all frequency parameters to None when filters are disabled, providing a clear way to bypass frequency filtering while maintaining the schema structure.
2213-2218
: Consider adding validation for max_carriers consistency.The in-house frequency settings use
max_carriers
while maintainingmax_het
andmax_hom
fields. Consider adding validation to ensuremax_carriers
is consistent with the sum ofmax_het
andmax_hom
when both are specified.Also applies to: 2254-2259, 2295-2300, 2336-2341, 2377-2382
frontend/ext/varfish-api/src/lib/schemas.gen.ts (3)
Line range hint
9489-9500
: Updated mitochondrial frequency schema referencesSchema references have been updated to align with the renamed properties. This is a necessary change to support the property renames.
Line range hint
3842-4152
: Property names standardized in frequency settings schemasThe frequency-related properties have been renamed for better consistency across schemas:
- 'heteroplasmic' → 'max_het'
- 'homoplasmic' → 'max_hom'
- 'frequency' → 'max_af'
This improves naming consistency but is a breaking change.
Let's verify the impact:
#!/bin/bash # Search for usage of old property names rg -A 2 "heteroplasmic|homoplasmic|\.frequency\b"Also applies to: 6011-6266, 7050-7727, 9115-9425
2908-2913
:⚠️ Potential issueBreaking change: Property renamed from 'end' to 'stop' in OneBasedRangePydantic
The property has been renamed for better consistency. This is a breaking change that requires updates in any code consuming this schema.
Let's verify the impact:
backend/varfish/tests/drf_openapi_schema/varfish_api_schema.yaml (1)
7725-7730
: Verify impact of renaming 'end' to 'stop' in OneBasedRangePydanticThe field has been renamed from 'end' to 'stop' in the OneBasedRangePydantic schema. While this change improves clarity by avoiding the Python keyword 'end', we should verify all clients are updated.
Let's verify the usage of this schema:
✅ Verification successful
Based on the code inspection results, I can now generate the final response:
Renaming 'end' to 'stop' in OneBasedRangePydantic is consistent across codebase
The change from 'end' to 'stop' in OneBasedRangePydantic is correctly implemented:
- The Python model in
backend/seqvars/models/base.py
uses 'stop'- The generated TypeScript types in
frontend/ext/varfish-api/src/lib/types.gen.ts
also use 'stop'- No remaining references to 'end' field were found in the codebase
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for any remaining uses of 'end' field in range-related code rg -A 5 'OneBasedRange.*end\W' # Search for places that need to be updated to use 'stop' rg -A 5 'OneBasedRange.*stop\W'Length of output: 66
Script:
#!/bin/bash # Let's try a broader search for OneBasedRange in the codebase rg -l "OneBasedRange" # Also search for any Python files that might define or use this model fd -e py | xargs rg "OneBasedRange" # Look for any Pydantic models with 'end' or 'stop' fields ast-grep --pattern 'class $_ (BaseModel|Pydantic) { $$$ (end|stop): $_ $$$ }'Length of output: 676
Script:
#!/bin/bash # Let's check the actual implementation of OneBasedRangePydantic in base.py cat backend/seqvars/models/base.py # Check the generated TypeScript types to see if they're aligned cat frontend/ext/varfish-api/src/lib/types.gen.ts | grep -A 10 "OneBasedRange" # Look for any tests that might be using this model rg "OneBasedRangePydantic.*test" -A 5Length of output: 87720
backend/seqvars/tests/snapshots/snap_test_factory_defaults.py (2)
2158-2161
: LGTM: Frequency settings for dominant inheritance follow best practicesThe frequency settings for dominant inheritance mode are properly configured with strict thresholds:
- Maximum allele frequency of 0.2%
- Limited heterozygous carriers (max 1)
- No homozygous carriers allowed
Also applies to: 2165-2168, 2172-2174, 2178-2180, 2184-2186
2281-2284
: LGTM: Frequency settings for recessive inheritance are appropriately configuredThe frequency settings for recessive inheritance mode use appropriate thresholds:
- Maximum allele frequency of 0.1%
- Higher tolerance for heterozygous carriers (max 120)
- No homozygous carriers allowed
Also applies to: 2288-2291, 2295-2297, 2301-2303, 2307-2309
backend/seqvars/models/protobufs.py (12)
31-33
: Confirmed imports of new frequency settings modelsThe imports of
SeqvarsInhouseFrequencySettingsPydantic
,SeqvarsMitochondrialFrequencyPydantic
, andSeqvarsMitochondrialFrequencySettingsPydantic
are correct and align with the updated data structures.
86-86
: Verified import ofMitochondrialFrequency
The import of
MitochondrialFrequency
from the Protobuf definitions is appropriate and consistent with the changes in the data models.
110-111
: Confirmed imports of new Protobuf settingsThe imports of
InhouseFrequencySettings
andMitochondrialFrequencySettings
correctly reflect the updated Protobuf definitions.
195-198
: Validated nuclear frequency settings for gnomAD exomesThe assignments of
max_het
,max_hom
,max_hemi
, andmax_af
forgnomad_exomes
are accurate and correctly map the frequency settings.
202-205
: Validated nuclear frequency settings for gnomAD genomesThe frequency settings for
gnomad_genomes
are correctly assigned and consistent with the expected parameters.
207-211
: Confirmed mitochondrial frequency settings for gnomAD mtDNAThe assignments for
gnomad_mtdna
are appropriate, and the parametersenabled
,max_het
,max_hom
, andmax_af
are correctly utilized.
213-217
: Confirmed mitochondrial frequency settings for HelixMTdbThe frequency settings for
helixmtdb
are accurately assigned, matching the new data structures.
219-224
: Verified in-house frequency settingsThe assignments for the
inhouse
frequency settings correctly includeenabled
,max_het
,max_hom
,max_hemi
, andmax_carriers
.
472-475
: Ensure correct parameter typing in_seqvars_inhouse_frequency_settings_from_protobuf
The function
_seqvars_inhouse_frequency_settings_from_protobuf
correctly acceptsInhouseFrequencySettings
as the parameter, aligning with the Protobuf type.
580-580
: Verify field mapping in_genome_region_from_protobuf
Ensure that the
stop
field inOneBasedRangePydantic
corresponds correctly toregion.range.end
. This mapping should reflect the intended genomic range accurately.Confirm the field names in
OneBasedRangePydantic
and adjust if necessary.
962-965
: Confirmed conversion of mitochondrial frequency from ProtobufThe function
_seqvars_mitochondrial_frequency_from_protobuf
correctly converts the ProtobufMitochondrialFrequency
into the Pydantic modelSeqvarsMitochondrialFrequencyPydantic
.
987-993
: Validated assignments in frequency annotation for mitochondrial dataThe assignments for
gnomad_mtdna
andhelixmtdb
in_frequency_annotation_from_protobuf
are correct and utilize the updated conversion functions.backend/seqvars/models/base.py (1)
341-341
: Verify the renaming ofend
tostop
inOneBasedRangePydantic
The attribute
end
has been renamed tostop
inOneBasedRangePydantic
. Please ensure that all references to this attribute have been updated accordingly throughout the codebase to prevent attribute errors.Run the following script to check for any outdated references:
This script searches for any instance where the
end
attribute is used inOneBasedRangePydantic
.backend/seqvars/tests/test_views_api.py (1)
1953-1953
: Update frequency settings to includegnomad_mtdna
fieldThe addition of the
gnomad_mtdna
field in the frequency settings aligns with the renaming fromgnomad_mitochondrial
tognomad_mtdna
, ensuring consistency across the tests and the codebase.
@@ -2,6 +2,6 @@ | |||
export type FrequencyDb = | |||
| 'gnomad_exomes' | |||
| 'gnomad_genomes' | |||
| 'gnomad_mitochondrial' | |||
| 'gnomad_mtdna' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Update test fixtures to use gnomad_mtdna
instead of gnomad_mitochondrial
The search results show that while the codebase has been updated to use gnomad_mtdna
, there are still occurrences of the old gnomad_mitochondrial
in test fixtures:
frontend/src/seqvars/components/QueryEditor/fixture.QueryEditor.json
frontend/src/seqvars/components/QueryEditor/presetsDetails.FrequencyControls.json
These fixture files need to be updated to maintain consistency with the new type.
🔗 Analysis chain
Verify all frontend usages are updated.
Since this is a type change, we should ensure all occurrences of the old string literal have been updated to prevent compile-time errors.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Search for any remaining occurrences of the old string literal
# that might have been missed during the refactoring.
echo "Checking for any remaining uses of 'gnomad_mitochondrial'..."
rg "gnomad_mitochondrial" frontend/
echo "Verifying all uses of new 'gnomad_mtdna' string..."
rg "gnomad_mtdna" frontend/
Length of output: 4138
@@ -36,7 +36,7 @@ interface DbInfo { | |||
const DB_INFOS = [ | |||
{ label: 'gnomAD exomes', db: 'gnomad_exomes', size: 16 }, | |||
{ label: 'gnomAD genomes', db: 'gnomad_genomes', size: 126 }, | |||
{ label: 'gnomAD mitochondrial', db: 'gnomad_mitochondrial', size: 56 }, | |||
{ label: 'gnomAD mitochondrial', db: 'gnomad_mtdna', size: 56 }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Update remaining references to gnomad_mitochondrial
in JSON files
The database key change from 'gnomad_mitochondrial' to 'gnomad_mtdna' is incomplete. While the new key is properly used in Vue components and TypeScript files, there are still references to the old key in JSON files:
frontend/src/seqvars/components/QueryEditor/presetsDetails.FrequencyControls.json
frontend/src/seqvars/components/QueryEditor/fixture.QueryEditor.json
These JSON files likely contain test fixtures and preset configurations that need to be updated to maintain consistency with the new key.
🔗 Analysis chain
Verify dependent components for database key change.
The database key change from 'gnomad_mitochondrial' to 'gnomad_mtdna' looks good and aligns with the backend changes. However, we should ensure all dependent components are updated.
Let's verify all usages of this key in the frontend:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any remaining references to the old key
rg "gnomad_mitochondrial" frontend/
# Search for usages of the new key to ensure proper adoption
rg "gnomad_mtdna" frontend/
Length of output: 3884
@@ -146,9 +146,9 @@ protos-download: | |||
rm -f protos/seqvars/protos/*_p*2*.py | |||
|
|||
wget -O protos/seqvars/protos/output.proto \ | |||
https://raw.githubusercontent.com/varfish-org/varfish-server-worker/main/protos/varfish/v1/seqvars/output.proto | |||
https://raw.githubusercontent.com/varfish-org/varfish-server-worker/refactor-more-protobuf-cleaning-2/protos/varfish/v1/seqvars/output.proto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Branch reference needs immediate update - file exists in main
The referenced feature branch refactor-more-protobuf-cleaning-2
no longer exists, but the proto files are present in the main branch. This is causing potential build issues.
- Update URLs in
backend/Makefile
to use the main branch:
-https://raw.githubusercontent.com/varfish-org/varfish-server-worker/refactor-more-protobuf-cleaning-2/protos/varfish/v1/seqvars/output.proto
+https://raw.githubusercontent.com/varfish-org/varfish-server-worker/main/protos/varfish/v1/seqvars/output.proto
- Same change needed for
query.proto
URL on the next line
🔗 Analysis chain
Consider using a stable reference instead of a feature branch.
While the changes look correct, referencing a specific feature branch (refactor-more-protobuf-cleaning-2
) in the Makefile could cause issues once the branch is merged and deleted.
Consider one of these approaches:
- Use a specific commit hash for stability:
-https://raw.githubusercontent.com/varfish-org/varfish-server-worker/refactor-more-protobuf-cleaning-2/protos/varfish/v1/seqvars/output.proto
+https://raw.githubusercontent.com/varfish-org/varfish-server-worker/<commit-hash>/protos/varfish/v1/seqvars/output.proto
- Use a git tag if this represents a specific version
- Revert to using 'main' once the changes are merged
Let's verify the branch status:
Also applies to: 151-151
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if the branch exists and its status
gh api repos/varfish-org/varfish-server-worker/branches/refactor-more-protobuf-cleaning-2 --jq '.name, .protected, .commit.sha'
Length of output: 288
Script:
#!/bin/bash
# Let's check if the proto files are available in the main branch
gh api repos/varfish-org/varfish-server-worker/contents/protos/varfish/v1/seqvars/output.proto?ref=main --jq '.sha'
# Also check the current Makefile for any other branch references
rg -A 1 "varfish-server-worker/.+/protos" backend/Makefile
Length of output: 548
modelValue.settings.frequency[db]!.max_af === null || | ||
modelValue.settings.frequency[db]!.max_af === undefined | ||
? null | ||
: modelValue.settings.frequency[db]!.frequency! * 100.0 | ||
: modelValue.settings.frequency[db]!.max_af! * 100.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Simplify the null/undefined check.
The current null check can be simplified using the nullish coalescing operator.
- modelValue.settings.frequency[db]!.max_af === null ||
- modelValue.settings.frequency[db]!.max_af === undefined
- ? null
- : modelValue.settings.frequency[db]!.max_af! * 100.0
+ modelValue.settings.frequency[db]!.max_af ?? null
+ ? modelValue.settings.frequency[db]!.max_af! * 100.0
+ : null
Committable suggestion was skipped due to low confidence.
// Check whether conversion was successful. | ||
if (isNaN(start) || isNaN(end)) { | ||
if (isNaN(start) || isNaN(stop)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace global isNaN with Number.isNaN for safer type checking.
The global isNaN()
function attempts type coercion which can lead to unexpected results. Using Number.isNaN()
is safer as it performs a more strict check.
Apply this change:
- if (isNaN(start) || isNaN(stop)) {
+ if (Number.isNaN(start) || Number.isNaN(stop)) {
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if (isNaN(start) || isNaN(stop)) { | |
if (Number.isNaN(start) || Number.isNaN(stop)) { |
🧰 Tools
🪛 Biome
[error] 142-142: isNaN is unsafe. It attempts a type coercion. Use Number.isNaN instead.
See the MDN documentation for more details.
Unsafe fix: Use Number.isNaN instead.(lint/suspicious/noGlobalIsNan)
[error] 142-142: isNaN is unsafe. It attempts a type coercion. Use Number.isNaN instead.
See the MDN documentation for more details.
Unsafe fix: Use Number.isNaN instead.(lint/suspicious/noGlobalIsNan)
def _seqvars_mitochondrial_frequency_settings_from_protobuf( | ||
frequency: SeqvarsMitochondrialFrequencySettingsPydantic, | ||
) -> SeqvarsMitochondrialFrequencySettingsPydantic: | ||
return SeqvarsMitochondrialFrequencySettingsPydantic( | ||
enabled=frequency.enabled, | ||
heteroplasmic=frequency.heteroplasmic, | ||
homoplasmic=frequency.homoplasmic, | ||
frequency=frequency.frequency, | ||
max_het=frequency.max_het, | ||
max_hom=frequency.max_hom, | ||
frequency=frequency.max_af, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect parameter type and field mapping in _seqvars_mitochondrial_frequency_settings_from_protobuf
The function _seqvars_mitochondrial_frequency_settings_from_protobuf
should accept a Protobuf MitochondrialFrequencySettings
object as the parameter, but it currently accepts SeqvarsMitochondrialFrequencySettingsPydantic
, which is the Pydantic model. Additionally, the field frequency=frequency.max_af
should be corrected to max_af=frequency.max_af
to match the Pydantic model's field.
Apply this diff to fix the parameter type and field mapping:
-def _seqvars_mitochondrial_frequency_settings_from_protobuf(
- frequency: SeqvarsMitochondrialFrequencySettingsPydantic,
-) -> SeqvarsMitochondrialFrequencySettingsPydantic:
+def _seqvars_mitochondrial_frequency_settings_from_protobuf(
+ frequency: MitochondrialFrequencySettings,
+) -> SeqvarsMitochondrialFrequencySettingsPydantic:
return SeqvarsMitochondrialFrequencySettingsPydantic(
enabled=frequency.enabled,
- frequency=frequency.max_af,
+ max_af=frequency.max_af,
max_het=frequency.max_het,
max_hom=frequency.max_hom,
)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _seqvars_mitochondrial_frequency_settings_from_protobuf( | |
frequency: SeqvarsMitochondrialFrequencySettingsPydantic, | |
) -> SeqvarsMitochondrialFrequencySettingsPydantic: | |
return SeqvarsMitochondrialFrequencySettingsPydantic( | |
enabled=frequency.enabled, | |
heteroplasmic=frequency.heteroplasmic, | |
homoplasmic=frequency.homoplasmic, | |
frequency=frequency.frequency, | |
max_het=frequency.max_het, | |
max_hom=frequency.max_hom, | |
frequency=frequency.max_af, | |
def _seqvars_mitochondrial_frequency_settings_from_protobuf( | |
frequency: MitochondrialFrequencySettings, | |
) -> SeqvarsMitochondrialFrequencySettingsPydantic: | |
return SeqvarsMitochondrialFrequencySettingsPydantic( | |
enabled=frequency.enabled, | |
max_het=frequency.max_het, | |
max_hom=frequency.max_hom, | |
max_af=frequency.max_af, | |
) |
gnomad_mtdna=( | ||
_seqvars_mitochondrial_frequency_settings_from_protobuf(frequency.gnomad_mtdna) | ||
if frequency.HasField("gnomad_mtdna") | ||
else None | ||
), | ||
helixmtdb=( | ||
_seqvars_helixmtdb_frequency_settings_from_protobuf(frequency.helixmtdb) | ||
_seqvars_mitochondrial_frequency_settings_from_protobuf(frequency.helixmtdb) | ||
if frequency.HasField("helixmtdb") | ||
else None | ||
), | ||
inhouse=( | ||
_seqvars_nuclear_frequency_settings_from_protobuf(frequency.inhouse) | ||
_seqvars_inhouse_frequency_settings_from_protobuf(frequency.inhouse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Parameter type mismatch in _seqvars_mitochondrial_frequency_settings_from_protobuf
The function definition has an incorrect parameter type. The implementation shows:
def _seqvars_mitochondrial_frequency_settings_from_protobuf(
frequency: SeqvarsMitochondrialFrequencySettingsPydantic,
) -> SeqvarsMitochondrialFrequencySettingsPydantic:
But from the proto file, it should accept MitochondrialFrequencySettings
protobuf message as input, not the Pydantic model. The function is used to convert from protobuf to Pydantic, so the input type should be the protobuf type.
backend/seqvars/models/protobufs.py
: Change the parameter type annotation fromSeqvarsMitochondrialFrequencySettingsPydantic
toMitochondrialFrequencySettings
🔗 Analysis chain
Update function calls to match corrected parameter types
After correcting the parameter types in the frequency settings functions, ensure that all calls to _seqvars_mitochondrial_frequency_settings_from_protobuf
and similar functions pass the appropriate Protobuf objects.
Run the following script to identify any mismatches in function calls:
Let me analyze the function definitions and their usage in the codebase to verify the parameter types.
Let me try a different approach to find the protobuf definitions.
Let me check the protobuf definitions for the frequency settings.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Find calls to frequency settings conversion functions and check parameter types.
# Search for function calls and display surrounding lines for context.
rg --after-context 2 "_seqvars_[a-z_]+_from_protobuf\(" backend/seqvars/models/protobufs.py
Length of output: 8472
Script:
#!/bin/bash
# Let's look at the protobuf definitions to verify parameter types
rg -A 5 "message.*Frequency" backend/seqvars/models/protobufs.proto
Length of output: 148
Script:
#!/bin/bash
# Find proto files in the repository
fd -e proto
# Also let's look at the actual implementation of the frequency settings functions
rg -A 10 "class SeqvarsMitochondrialFrequencySettingsPydantic" backend/
Length of output: 931
Script:
#!/bin/bash
# Look at the query proto file which likely contains frequency settings
cat backend/protos/seqvars/protos/query.proto
# Also let's look at the implementation of the conversion function
rg -A 5 "_seqvars_mitochondrial_frequency_settings_from_protobuf" backend/seqvars/models/protobufs.py
Length of output: 11130
This is also used for for storing chrMT frequencies in the case of in-house | ||
data. Here, the ``an`` field is the total number of samples with coverage, | ||
and the ``het`` and ``homalt`` fields are the number of heteroplasmic and | ||
homoplasmic carriers, respectively. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider separating nuclear and mitochondrial frequency representations
The SeqvarsNuclearFrequencyPydantic
class is being used to represent chrMT frequencies for in-house data, as mentioned in the docstring. Overloading this class for mitochondrial frequencies may lead to confusion and maintainability issues. Consider creating a dedicated class for in-house mitochondrial frequency data to improve code clarity.
"""Settings for mitochondrial frequency filtering.""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the class docstring to reflect in-house frequency settings
The docstring for SeqvarsInhouseFrequencySettingsPydantic
incorrectly mentions "mitochondrial frequency filtering." It should describe settings for in-house frequency filtering.
Apply this diff to correct the docstring:
92 - """Settings for mitochondrial frequency filtering."""
+ """Settings for in-house frequency filtering."""
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
"""Settings for mitochondrial frequency filtering.""" | |
"""Settings for in-house frequency filtering.""" | |
#: Maximal number of heterozygous carriers. | ||
max_hom: int | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct the docstring for max_hom
attribute
The docstring for the max_hom
attribute should indicate "Maximal number of homozygous carriers" instead of "Maximal number of heterozygous carriers."
Apply this diff to fix the docstring:
70 - #: Maximal number of heterozygous carriers.
+ #: Maximal number of homozygous carriers.
71 max_hom: int | None = None
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
#: Maximal number of heterozygous carriers. | |
max_hom: int | None = None | |
#: Maximal number of homozygous carriers. | |
max_hom: int | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
- backend/varfish/tests/drf_openapi_schema/varfish_api_schema.yaml (13 hunks)
- frontend/ext/varfish-api/src/lib/schemas.gen.ts (130 hunks)
- frontend/ext/varfish-api/src/lib/types.gen.ts (11 hunks)
- frontend/src/seqvars/components/PresetsEditor/lib/utils.spec.ts (1 hunks)
- frontend/src/seqvars/components/QueryEditor/FrequencyControlsRow.vue (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- frontend/src/seqvars/components/QueryEditor/FrequencyControlsRow.vue
🔇 Additional comments (15)
frontend/src/seqvars/components/PresetsEditor/lib/utils.spec.ts (1)
21-21
: LGTM! Consistent renaming fromend
tostop
The changes align with standard genomic coordinate terminology where "stop" is more commonly used than "end".
Also applies to: 25-25, 28-28, 32-32
frontend/ext/varfish-api/src/lib/types.gen.ts (3)
Line range hint
1993-2076
: Improved frequency settings structure and namingThe changes improve the organization of frequency-related types:
- Renamed mitochondrial types for better clarity (e.g.,
GnomadMitochondrial
→GnomadMtDNA
)- Added dedicated types for different frequency scenarios
- Standardized the structure of frequency settings
The refactoring enhances code maintainability and better represents the domain concepts.
Line range hint
2336-2413
: Consistent application of frequency settings in presetsThe frequency settings changes are consistently applied in the query presets, maintaining alignment between base types and their usage in presets. This ensures a uniform approach to handling frequency data throughout the application.
998-998
:⚠️ Potential issueBreaking change: Renamed
end
tostop
in range typeThis is a breaking change that requires updates in client code using the
OneBasedRangePydantic
type. The rename fromend
tostop
aligns better with bioinformatics conventions.Let's verify the impact of this change:
backend/varfish/tests/drf_openapi_schema/varfish_api_schema.yaml (1)
Line range hint
10031-10166
: Verify frequency settings consolidation.The frequency settings schemas show a consolidation between nuclear and mitochondrial variants. The implementation looks correct but we should verify that all frequency-related code paths handle both types properly.
Let's verify the frequency settings usage:
✅ Verification successful
Based on the search results, I can now provide a final response:
Frequency settings implementation properly handles both nuclear and mitochondrial variants.
The codebase shows consistent handling of both nuclear and mitochondrial frequency settings:
- Distinct Pydantic models exist for both types:
SeqvarsNuclearFrequencySettingsPydantic
for nuclear variantsSeqvarsMitochondrialFrequencySettingsPydantic
for mitochondrial variants- The frequency annotation schema (
SeqvarsFrequencyAnnotationPydantic
) properly includes both types- Conversion functions in
protobufs.py
handle both nuclear and mitochondrial settings correctly- Test coverage exists for both frequency types as seen in
test_serializers.py
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for frequency-related code to ensure consistent handling rg -A 5 "frequency.*settings" --type pythonLength of output: 76
Script:
#!/bin/bash # Search for frequency-related code to ensure consistent handling rg -A 5 "frequency.*settings"Length of output: 53424
Script:
#!/bin/bash # Search for files containing frequency settings classes ast-grep --pattern 'class $name { $$$ frequency $$$ }'Length of output: 62
Script:
#!/bin/bash # Search for frequency-related test files fd "test.*frequency" --type fLength of output: 31
Script:
#!/bin/bash # Search for frequency-related model definitions rg "class.*Frequency.*Pydantic"Length of output: 708
frontend/ext/varfish-api/src/lib/schemas.gen.ts (10)
2908-2913
: Property 'end' Renamed to 'stop' in$OneBasedRangePydantic
SchemaThe property
end
has been correctly renamed tostop
, which improves clarity and consistency with common naming conventions.
Line range hint
3842-3901
: Enhanced Nuclear Frequency Settings SchemaThe
$SeqvarsNuclearFrequencySettingsPydantic
schema now includes detailed properties for frequency filtering, such asmax_het
,max_hom
,max_hemi
, andmax_af
. This provides better flexibility in filtering variants based on nuclear frequency data.
Line range hint
3981-4152
: Introduction of Mitochondrial Frequency Settings SchemaThe new
$SeqvarsMitochondrialFrequencySettingsPydantic
schema has been added to handle mitochondrial frequency filtering. This change consolidates mitochondrial frequency settings into a dedicated schema, enhancing code organization and readability.
Line range hint
6008-6169
: Updated Description in$SeqvarsFrequencyAnnotationPydantic
SchemaThe description has been refined from "SPopulation frequency information" to "Population frequency information", correcting a typographical error and improving clarity.
Line range hint
6230-6324
: Refinement of Nuclear Frequency Information SchemaThe
$SeqvarsNuclearFrequencyPydantic
schema now includes a detailed description explaining its use for both nuclear and mitochondrial data in certain contexts. The propertiesan
,het
,homalt
,hemialt
, andaf
are appropriately defined for comprehensive frequency annotation.
Line range hint
6145-6218
: Addition of Mitochondrial Frequency Information SchemaThe new
$SeqvarsMitochondrialFrequencyPydantic
schema provides dedicated properties for mitochondrial frequency data, includingan
,het
,homalt
, andaf
. This enhances the specificity and clarity of mitochondrial frequency annotations.
Line range hint
3862-6324
: Consistent Use of Frequency Settings SchemasThe frequency settings schemas for nuclear and mitochondrial data (
$SeqvarsNuclearFrequencySettingsPydantic
and$SeqvarsMitochondrialFrequencySettingsPydantic
) are consistently defined across the codebase. Ensure that all references to these schemas are correctly updated to prevent any mismatches.
Line range hint
6008-6600
: Removal of Deprecated Schemas and Addition of New SchemasThe deprecated schemas
$SeqvarsGnomadMitochondrialFrequencyPydantic
and$SeqvarsHelixMtDbFrequencyPydantic
have been removed and replaced with$SeqvarsMitochondrialFrequencyPydantic
. This unifies mitochondrial frequency data into a single schema, streamlining the data structures.
Line range hint
7472-7782
: Update in Frequency Presets SchemasIn the frequency presets schemas, properties such as
gnomad_mtdna
replace the previousgnomad_mitochondrial
, and the settings have been updated to use the new mitochondrial frequency settings schema. This change ensures consistency with the updated data models.
Line range hint
9167-9890
: Alignment of Query Settings with Updated Frequency SchemasThe query settings for frequency in
$SeqvarsQuerySettingsFrequency
are updated to use the new frequency settings schemas. This aligns the query configurations with the revised data structures and ensures that frequency filters are applied correctly in queries.
@@ -18,18 +18,18 @@ describe('parseGenomeRegion - valid', () => { | |||
// region | |||
[ | |||
'chr1:1,000,000-2,000,000', | |||
{ chromosome: '1', range: { start: 1000000, end: 2000000 } }, | |||
{ chromosome: '1', range: { start: 1000000, stop: 2000000 } }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Inconsistent terminology found in codebase
The codebase shows mixed usage of end
and stop
for genomic coordinates:
frontend/src/seqvars/components/PresetsEditor/lib/utils.ts
: Uses bothstop
in the implementation but has an error message usingend
frontend/src/varfish/moreUtils.ts
: Usesend
consistently in its implementation
Recommended changes:
- Update error message in
utils.ts
from "start greater than end" to "start greater than stop" - Consider updating
moreUtils.ts
to usestop
instead ofend
for consistency with the genomic coordinate terminology
🔗 Analysis chain
Verify consistent usage of stop
across the codebase
Let's ensure this terminology change is consistently applied across all related files.
Also applies to: 25-25, 28-28, 32-32
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any remaining instances of 'end' in genomic coordinate contexts
# and verify consistent usage of 'stop' in related files
# Search for potential inconsistencies in property names
echo "Checking for potential inconsistencies in genomic coordinate property names..."
rg -g '*.{ts,js,tsx,jsx,proto}' '(range|position|coordinate).*end[^a-zA-Z]'
# Check the corresponding implementation file
echo "Checking the implementation file..."
rg -g 'utils.ts' 'parseGenomeRegion'
# Look for related test files that might need similar updates
echo "Checking other test files that might need updates..."
fd -g '*spec.ts' -x rg -l 'parseGenomeRegion|genomeRegion'
Length of output: 936
Script:
#!/bin/bash
# Let's examine the specific files where we found 'end' usage and their context
echo "Checking utils.ts for the full context..."
rg -A 5 -B 5 'end' frontend/src/seqvars/components/PresetsEditor/lib/utils.ts
echo "Checking moreUtils.ts for the full context..."
rg -A 5 -B 5 'end' frontend/src/varfish/moreUtils.ts
echo "Let's also check for 'stop' usage to understand the current pattern..."
rg -g '*.{ts,js,tsx,jsx,proto}' '(range|position|coordinate).*stop[^a-zA-Z]'
echo "Checking the GenomeRegion type definition..."
ast-grep --pattern 'type GenomeRegion'
Length of output: 3268
stop: | ||
title: Stop | ||
type: integer | ||
required: | ||
- start | ||
- end | ||
- stop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix indentation in OneBasedRangePydantic schema.
The indentation of the start
and stop
properties is inconsistent with the rest of the file. This could cause issues with YAML parsing.
Apply this diff to fix the indentation:
start:
- title: Start
- type: integer
+ title: Start
+ type: integer
stop:
- title: Stop
- type: integer
+ title: Stop
+ type: integer
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
stop: | |
title: Stop | |
type: integer | |
required: | |
- start | |
- end | |
- stop | |
start: | |
title: Start | |
type: integer | |
stop: | |
title: Stop | |
type: integer | |
required: | |
- start | |
- stop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- backend/seqvars/tests/snapshots/snap_test_factory_defaults.py (18 hunks)
- backend/seqvars/tests/snapshots/snap_test_models.py (1 hunks)
- backend/seqvars/tests/snapshots/snap_test_views_api.py (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/seqvars/tests/snapshots/snap_test_models.py
🔇 Additional comments (3)
backend/seqvars/tests/snapshots/snap_test_views_api.py (2)
2190-2193
: Inconsistent frequency thresholds between gnomAD exomes and genomes.The frequency thresholds between gnomAD exomes and genomes are identical in some presets but differ in others:
- "dominant strict": identical thresholds (0.002)
- "recessive strict": identical thresholds (0.001)
- "recessive relaxed": different thresholds for heterozygous carriers (exomes: 0 het, genomes: 150 het)
Consider aligning the thresholds or documenting the rationale for these differences.
#!/bin/bash # Description: Check for other frequency threshold configurations in the codebase # to verify if this pattern is consistent with other presets rg -A 5 "gnomad_exomes.*max_af|gnomad_genomes.*max_af" --glob '!*.snap*'Also applies to: 2197-2200, 2232-2235, 2239-2242, 2274-2277, 2281-2284, 2316-2319, 2323-2326, 2358-2361, 2365-2368
Line range hint
2190-2429
: Verify the frequency thresholds align with clinical guidelines.The frequency thresholds used in the presets should be validated against:
- Current clinical guidelines for variant classification
- Population-specific considerations
- Disease inheritance patterns
For example:
- Dominant strict: AF ≤ 0.002 (0.2%)
- Recessive strict: AF ≤ 0.001 (0.1%)
✅ Verification successful
Let's search for more context about these threshold values in the codebase to understand their rationale and validation.
Based on the search results, I can now verify the frequency thresholds used in the presets. The code shows consistent implementation of frequency thresholds that align with established guidelines:
The frequency thresholds are well-aligned with clinical guidelines and inheritance patterns
- Dominant super strict (AF ≤ 0.2%): Uses max_af=0.002 for both gnomAD exomes and genomes
- Dominant strict (AF ≤ 0.2%): Uses max_af=0.002 with slightly relaxed heterozygous counts
- Dominant relaxed (AF ≤ 1%): Uses max_af=0.01 with higher heterozygous tolerance
- Recessive strict (AF ≤ 0.1%): Uses max_af=0.001 with appropriate homozygous restrictions
- Recessive relaxed (AF ≤ 1%): Uses max_af=0.01 with higher homozygous tolerance
The thresholds follow a logical progression from strict to relaxed filtering and properly account for inheritance patterns by adjusting homozygous/heterozygous counts accordingly.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Search for documentation or configuration files that might contain # the rationale for these threshold values rg -l "frequency|threshold|gnomad" --type md rg -l "frequency|threshold|gnomad" --glob "*.rst"Length of output: 1148
Script:
#!/bin/bash # Search for any documentation or configuration files related to variant filtering rg -A 5 "variant.*filter|frequency.*threshold|allele.*frequency" --type py # Search for any test files that might explain the rationale rg -A 5 "dominant.*strict|recessive.*strict" --type py --glob '!*.snap*' # Look for any comments or docstrings explaining the frequency thresholds rg -B 2 -A 2 "0\.00[12]|0\.01" --type py --glob '!*.snap*'Length of output: 261168
backend/seqvars/tests/snapshots/snap_test_factory_defaults.py (1)
2158-2187
: LGTM! Consistent renaming of frequency-related fields.The changes systematically rename frequency-related fields across all presets while maintaining the same logical structure and values. The new field names (max_af, max_het, max_hom, max_hemi) provide clearer semantics compared to the previous naming scheme.
Also applies to: 2200-2229, 2242-2271, 2284-2313, 2326-2355, 2368-2397, 4815-4844, 4857-4886, 4899-4928, 4941-4970, 4983-5012, 5025-5054, 7472-7501, 7514-7543, 7556-7585, 7598-7627, 7640-7669, 7682-7711
"max_af": None, | ||
"max_het": None, | ||
"max_hom": None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider explicit null values for mitochondrial database settings.
The mitochondrial database settings (gnomAD-mtDNA and HelixMtDb) consistently use None
for all fields when disabled. While this works, consider:
- Using explicit default values when
enabled: False
- Adding validation to ensure these fields are properly set when
enabled: True
Also applies to: 2210-2212, 2246-2248, 2252-2254, 2288-2290, 2294-2296, 2330-2332, 2336-2338, 2372-2374, 2378-2380, 2414-2416, 2420-2422
"max_carriers": None, | ||
"max_hemi": None, | ||
"max_het": None, | ||
"max_hom": None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Standardize in-house frequency settings structure.
The in-house frequency settings structure differs from other frequency databases:
- Uses
max_carriers
instead ofmax_af
- Maintains unused fields (
max_hemi
,max_het
,max_hom
)
Consider:
- Aligning the structure with other frequency databases
- Removing unused fields
- Adding validation for the
max_carriers
field
Also applies to: 2258-2261, 2300-2303, 2342-2345, 2384-2387, 2426-2429
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2063 +/- ##
=====================================
Coverage 91% 91%
=====================================
Files 676 677 +1
Lines 38463 38435 -28
=====================================
- Hits 35118 35093 -25
+ Misses 3345 3342 -3
|
Summary by CodeRabbit
Release Notes
New Features
Improvements
gnomad_mitochondrial
tognomad_mtdna
).Bug Fixes
Tests