Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: consolidating models with worker (#1956) #2063

Merged
merged 4 commits into from
Oct 28, 2024

Conversation

holtgrewe
Copy link
Collaborator

@holtgrewe holtgrewe commented Oct 28, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced new frequency settings structures with updated naming conventions.
    • Added new API endpoints for managing case analysis and query presets.
  • Improvements

    • Enhanced clarity in frequency data representation by renaming attributes (e.g., gnomad_mitochondrial to gnomad_mtdna).
    • Streamlined schema definitions and type structures for better consistency.
    • Updated UI components to reflect new property names and data structures.
    • Refactored frequency handling logic in various components to align with new data structures.
  • Bug Fixes

    • Corrected references and bindings in the UI components to align with updated data models.
  • Tests

    • Expanded test coverage for API functionalities and updated snapshot tests to reflect new data structures.
    • Updated test cases to ensure consistency with renamed properties and structures.

@holtgrewe holtgrewe linked an issue Oct 28, 2024 that may be closed by this pull request
Copy link
Contributor

coderabbitai bot commented Oct 28, 2024

Walkthrough

The pull request encompasses extensive modifications across multiple files, primarily focusing on renaming and restructuring frequency settings in both backend and frontend components. Key changes include updating .proto files, modifying Pydantic models, and adjusting serializers to reflect new naming conventions. The updates also introduce new API endpoints and enhance OpenAPI schema documentation. In addition, the frontend components have been updated to align with the new data structures, ensuring consistency in how frequency data is represented and accessed.

Changes

File Change Summary
backend/Makefile Updated .proto file download URLs to a specific branch.
backend/protos/seqvars/protos/output.proto Renamed GnomadMitochondrialFrequency to MitochondrialFrequency and removed HelixMtDbFrequency. Updated references in FrequencyAnnotation.
backend/protos/seqvars/protos/query.proto Renamed messages and fields for clarity; updated NuclearFrequencySettings and GnomadMitochondrialFrequencySettings.
backend/seqvars/factory_defaults.py Removed old frequency settings classes; added new ones with updated parameters.
backend/seqvars/migrations/0002_auto_20240708_1305.py Removed outdated fields and added new Pydantic schema fields in models.
backend/seqvars/migrations/0013_auto_20241028_0928.py Renamed gnomad_mitochondrial to gnomad_mtdna and updated inhouse field definitions.
backend/seqvars/models/base.py Renamed frequency settings Pydantic classes and updated attributes.
backend/seqvars/models/protobufs.py Updated frequency handling functions to use new types and removed old references.
backend/seqvars/protos/output_pb2.py Adjusted serialized ranges for various message types.
backend/seqvars/protos/output_pb2.pyi Renamed classes and updated method signatures for frequency settings.
backend/seqvars/protos/query_pb2.py Updated serialized positions for message types and added new frequency settings messages.
backend/seqvars/protos/query_pb2.pyi Renamed classes and updated field numbers for clarity.
backend/seqvars/serializers.py Updated imports and field types to reflect new Pydantic models.
backend/seqvars/tests/factories.py Renamed and updated frequency settings factories to align with new models.
backend/seqvars/tests/snapshots/snap_test_factory_defaults.py Updated snapshot tests for frequency settings and UUIDs.
backend/seqvars/tests/snapshots/snap_test_models.py Renamed snapshot for mitochondrial data to gnomad_mtdna.
backend/seqvars/tests/snapshots/snap_test_views_api.py Enhanced snapshot tests for query presets with updated frequency structures.
backend/seqvars/tests/test_models.py Renamed constant in test for frequency settings.
backend/seqvars/tests/test_serializers.py Updated expected output field names in serializer tests.
backend/seqvars/tests/test_views_api.py Expanded test coverage for seqvars API views.
backend/varfish/tests/drf_openapi_schema/varfish_api_schema.yaml Updated OpenAPI schema with new endpoints and refined parameter definitions.
frontend/ext/varfish-api/src/lib/schemas.gen.ts Renamed properties and updated schema definitions for frequency settings.
frontend/ext/varfish-api/src/lib/types.gen.ts Renamed properties and updated type definitions for frequency annotations.
frontend/src/seqvars/components/PresetsEditor/CategoryPresetsFrequencyEditor.vue Updated imports and computed properties for frequency presets.
frontend/src/seqvars/components/PresetsEditor/lib/utils.ts Renamed end to stop in utility functions for genome regions.
frontend/src/seqvars/components/PresetsEditor/types.ts Renamed interfaces and updated properties for frequency settings.
frontend/src/seqvars/components/QueryEditor/FrequencyControls.vue Updated database key for mitochondrial entry.
frontend/src/seqvars/components/QueryEditor/FrequencyControlsRow.vue Updated attributes for mitochondrial and nuclear counts.
frontend/src/seqvars/components/QueryEditor/groups.ts Renamed property in getCompareFields method for frequency filter group.
frontend/src/seqvars/components/QueryEditor/lib/types.ts Updated FrequencyDb type definition to reflect new naming.
frontend/src/seqvars/components/QueryEditor/lib/utils.spec.ts Updated expected output for genome region parsing tests.

Poem

🐇 In the land of code where changes bloom,
Frequencies dance, dispelling gloom.
From gnomad to mtdna, a name so bright,
With each new update, the future feels right!
So hop along, dear friends, let's cheer,
For a cleaner codebase, we hold so dear! 🎉


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

github-actions bot commented Oct 28, 2024

deps-report 🔍

Commit scanned: 30dab54
ℹ️ Python version 3.10 is used by your project but the latest version is 3.13.

Vulnerable dependencies

4 dependencies have vulnerabilities 😱
Dependency Advisory Versions impacted
djangorestframework (transitive) Affected versions of the package djangorestframework are vulnerable to Cross-site Scripting (XSS) via the break_long_headers template filter due to improper input sanitization before splitting and joining with
tags.
<3.15.2
jinja2 (transitive) In Jinja2, the from_string function is prone to Server Side Template Injection (SSTI) where it takes the source parameter as a template object, renders it, and then returns it. The attacker can exploit it with INJECTION COMMANDS in a URI. NOTE: The maintainer and multiple third parties believe that this vulnerability isn't valid because users shouldn't use untrusted templates without sandboxing. >=0
setuptools (transitive) Affected versions of Setuptools allow for remote code execution via its download functions. These functions, which are used to download packages from URLs provided by users or retrieved from package index servers, are susceptible to code injection. If these functions are exposed to user-controlled inputs, such as package URLs, they can execute arbitrary commands on the system. <70.0.0
sqlalchemy Sqlalchemy 2.0.0b1 avoids leaking cleartext passwords to the open for careless uses of str(engine.URL()) in logs and prints. sqlalchemy/sqlalchemy#8563 <2.0.0b1

Outdated dependencies

53 outdated dependencies found (including 20 outdated major versions)😢
Dependency Installed version Latest version
alabaster (transitive) 0.7.16 1.0.0
argon2-cffi (transitive) 21.3.0 23.1.0
billiard (transitive) 3.6.4.0 4.2.1
crispy-bootstrap4 (transitive) 2022.1 2024.10
django 3.2.25 5.1.2
django-model-utils (transitive) 4.3.1 5.0.0
django-rest-knox (transitive) 4.2.0 5.0.2
django-sodar-core 0.13.4 1.0.2
et-xmlfile (dev,transitive) 1.1.0 2.0.0
markupsafe (transitive) 2.1.5 3.0.2
mistune (transitive) 2.0.5 3.0.2
packaging (transitive) 23.2 24.1
pillow (transitive) 10.4.0 11.0.0
protobuf 3.20.3 5.28.3
setuptools (transitive) 67.6.1 75.2.0
sphinx (transitive) 6.2.1 8.1.3
sphinx-rtd-theme (transitive) 1.2.2 3.0.1
sqlalchemy 1.4.54 2.0.36
unidecode (transitive) 0.4.21 1.3.8
xmlschema (transitive) 2.5.1 3.4.2
Dependency Installed version Latest version
botocore (transitive) 1.35.36 1.35.49
celery (transitive) 5.2.7 5.4.0
charset-normalizer (transitive) 3.3.2 3.4.0
coverage (dev,transitive) 7.6.1 7.6.4
cryptography (transitive) 43.0.1 43.0.3
django-autocomplete-light (transitive) 3.9.4 3.11.0
django-crispy-forms (transitive) 2.0 2.3
django-db-file-storage (transitive) 0.5.5 0.5.6.1
django-debug-toolbar 4.3.0 4.4.6
django-environ (transitive) 0.10.0 0.11.2
django-iconify (transitive) 0.1.1 0.4
django-plugins-bihealth 0.4.0 0.5.2
django-postgres-copy 2.3.7 2.7.6
djangorestframework (transitive) 3.14.0 3.15.2
docutils (transitive) 0.18.1 0.21.2
drf-keyed-list-bihealth 0.1.1 0.2.1
elementpath (transitive) 4.5.0 4.6.0
faker 30.6.0 30.8.1
frozenlist (transitive) 1.4.1 1.5.0
ipython (dev,transitive) 8.28.0 8.29.0
markdown (transitive) 3.4.1 3.7
mypy-protobuf (dev) 3.3.0 3.6.0
psycopg2 (transitive) 2.9.9 2.9.10
psycopg2-binary (transitive) 2.9.9 2.9.10
pydantic-core (transitive) 2.23.4 2.25.0
redis 5.1.1 5.2.0
requests-http-signature 0.2.0 0.7.1
rules (transitive) 3.3 3.5
trio (dev,transitive) 0.26.2 0.27.0
types-psycopg2 (dev,transitive) 2.9.21.20240819 2.9.21.20241019
versioneer (transitive) 0.28 0.29
wheel (transitive) 0.40.0 0.44.0
yarl (transitive) 1.15.5 1.16.0

Logs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🧹 Outside diff range and nitpick comments (13)
frontend/src/seqvars/components/PresetsEditor/types.ts (2)

2-9: Improve documentation clarity

The comment should be updated to better reflect that these are maximum frequency values rather than generic thresholds.

-/** Frequency thresholds for nuclear variant frequencies as returned from API. */
+/** Maximum frequency values for nuclear variants as returned from API. */

13-19: Maintain documentation consistency

Update the comment to match the terminology used in NuclearFreqs.

-/** Frequency thresholds for mitochondrial frequencies as returned from API. */
+/** Maximum frequency values for mitochondrial variants as returned from API. */
frontend/src/seqvars/components/QueryEditor/FrequencyControls.vue (1)

39-39: Document the size parameter meaning.

Consider adding a comment explaining what the size parameter represents (e.g., number of samples in thousands, millions, etc.) for better maintainability.

-  { label: 'gnomAD mitochondrial', db: 'gnomad_mtdna', size: 56 },
+  // size represents number of samples in thousands (56k samples)
+  { label: 'gnomAD mitochondrial', db: 'gnomad_mtdna', size: 56 },
frontend/src/seqvars/components/QueryEditor/FrequencyControlsRow.vue (1)

34-34: Consider adding JSDoc comments for the attribute constants.

While the new naming convention is more consistent, it's less self-descriptive than the previous domain-specific terms. Adding JSDoc comments explaining what each attribute represents (e.g., homoplasmic vs heteroplasmic for mitochondrial) would improve code maintainability.

+/** Mitochondrial count attributes (homoplasmic and heteroplasmic). */
const MITOCHONDRIAL_ATTRS = ['max_hom', 'max_het'] as const
+/** Nuclear count attributes (homozygous, heterozygous, and hemizygous). */
const NUCLEAR_ATTRS = ['max_hom', 'max_het', 'max_hemi'] as const

Also applies to: 36-36

backend/seqvars/migrations/0002_auto_20240708_1305.py (2)

Line range hint 1-18: Consider upgrading Django version before EOL.

The migration is using Django 3.2.25 which will reach end-of-life in April 2024. Consider planning an upgrade to Django 4.2 LTS or newer to ensure continued security updates and support.


Line range hint 1-321: Well-structured model consolidation with improved type safety.

The migration successfully refactors the data model by:

  1. Consolidating related fields into structured Pydantic models
  2. Clearly separating nuclear and mitochondrial data concerns
  3. Improving type safety through Pydantic validation

This architectural change should make the codebase more maintainable and reduce the likelihood of data inconsistencies.

backend/protos/seqvars/protos/output.proto (1)

359-360: LGTM! Consider enhancing documentation

The MitochondrialFrequency message structure is well-designed and appropriately captures mitochondrial variant metrics.

Consider enhancing the documentation comment to explain:

  • The relationship between heteroplasmic and homoplasmic carriers
  • The calculation method for allele frequency (af)
  • Typical value ranges for each field
-// Mitochondrial frequency information.
+// Mitochondrial frequency information capturing both heteroplasmic (varying levels of variant)
+// and homoplasmic (complete variant) carriers. The allele frequency (af) represents the
+// combined frequency across all carrier types.
frontend/src/seqvars/components/PresetsEditor/CategoryPresetsFrequencyEditor.vue (1)

Line range hint 287-317: LGTM: Consistent property naming across frequency types

The template bindings have been properly updated to use standardized property names (max_hom, max_het, max_hemi, max_af, max_carriers) across all frequency types.

Consider extracting the repeated form field structure into a reusable component to reduce duplication. Example:

<!-- FrequencyField.vue -->
<template>
  <v-text-field
    :model-value="modelValue"
    @update:model-value="$emit('update:modelValue', $event)"
    hide-details
    density="compact"
    clearable
    type="number"
    :step="isFrequency ? '0.001' : undefined"
    :disabled="disabled"
  />
</template>

<script setup lang="ts">
defineProps<{
  modelValue: number | undefined
  isFrequency?: boolean
  disabled?: boolean
}>()
defineEmits<{
  'update:modelValue': [value: number | undefined]
}>()
</script>

This would simplify the template to:

<FrequencyField
  v-model="gnomadExomes.max_hom"
  :disabled="readonly"
/>

Also applies to: 338-368, 389-410, 431-452, 481-511

backend/seqvars/protos/output_pb2.py (1)

27-27: Note: Style issue in generated code.

While there is a style issue with the False comparison, this is generated code and should not be modified directly. If this needs to be addressed, it should be done through the Protocol Buffer compiler configuration or by updating the protoc version.

🧰 Tools
🪛 Ruff

27-27: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

backend/varfish/tests/drf_openapi_schema/varfish_api_schema.yaml (2)

Line range hint 10035-10138: Enhance documentation of frequency annotation models

The frequency annotation models (SeqvarsFrequencyAnnotationPydantic, SeqvarsMitochondrialFrequencyPydantic) lack detailed documentation about:

  • The meaning and valid ranges of frequency values
  • The relationship between nuclear and mitochondrial frequency settings
  • The interpretation of heteroplasmy values for mitochondrial variants

Consider adding more detailed docstrings to clarify these aspects.


Line range hint 12394-12803: Consolidate duplicate frequency settings code

The frequency settings code is duplicated across multiple sections (gnomad_exomes, gnomad_genomes, gnomad_mtdna, helixmtdb, inhouse). Consider:

  1. Extracting common settings into a shared base schema
  2. Using inheritance or composition to reduce duplication
  3. Adding validation for frequency thresholds

Example refactor:

# Base frequency settings
FrequencySettingsBase:
  properties:
    enabled:
      type: boolean
      default: false
    max_af:
      type: number
      nullable: true

# Nuclear-specific settings
NuclearFrequencySettings:
  allOf:
    - $ref: '#/components/schemas/FrequencySettingsBase'
    - type: object
      properties:
        max_het:
          type: integer
        max_hom:
          type: integer
        max_hemi:
          type: integer

# Mitochondrial-specific settings  
MitochondrialFrequencySettings:
  allOf:
    - $ref: '#/components/schemas/FrequencySettingsBase'
    - type: object
      properties:
        max_het:
          type: integer
        max_hom:
          type: integer
backend/seqvars/factory_defaults.py (2)

Line range hint 461-692: Consider refactoring repetitive frequency preset code blocks

The create_seqvarsquerypresetsfrequency function contains several repetitive instances of SeqvarsQueryPresetsFrequency with similar structures. Refactoring this code by introducing a helper function or using loops can improve maintainability and reduce duplication.

Here's an example of how you might refactor the code:

def create_frequency_preset(
    faker,
    rank,
    label,
    gnomad_af,
    gnomad_hom,
    gnomad_het,
    inhouse_carriers,
    enabled_mtdna=False,
):
    return SeqvarsQueryPresetsFrequency(
        sodar_uuid=faker.uuid4(),
        date_created=TIME_VERSION_1_0,
        date_modified=TIME_VERSION_1_0,
        rank=rank,
        label=label,
        gnomad_exomes=SeqvarsNuclearFrequencySettingsPydantic(
            enabled=True,
            max_af=gnomad_af,
            max_hom=gnomad_hom,
            max_het=gnomad_het,
            max_hemi=None,
        ),
        gnomad_genomes=SeqvarsNuclearFrequencySettingsPydantic(
            enabled=True,
            max_af=gnomad_af,
            max_hom=gnomad_hom,
            max_het=gnomad_het,
            max_hemi=None,
        ),
        gnomad_mtdna=SeqvarsMitochondrialFrequencySettingsPydantic(
            enabled=enabled_mtdna,
            max_af=None,
            max_het=None,
            max_hom=None,
        ),
        helixmtdb=SeqvarsMitochondrialFrequencySettingsPydantic(
            enabled=enabled_mtdna,
            max_af=None,
            max_het=None,
            max_hom=None,
        ),
        inhouse=SeqvarsInhouseFrequencySettingsPydantic(
            enabled=True,
            max_carriers=inhouse_carriers,
            max_hom=None,
            max_het=None,
            max_hemi=None,
        ),
    )

def create_seqvarsquerypresetsfrequency(faker: Faker) -> list[SeqvarsQueryPresetsFrequency]:
    presets = [
        {
            "rank": 1,
            "label": "dominant super strict",
            "gnomad_af": 0.002,
            "gnomad_hom": 0,
            "gnomad_het": 1,
            "inhouse_carriers": None,
        },
        {
            "rank": 2,
            "label": "dominant strict",
            "gnomad_af": 0.002,
            "gnomad_hom": 0,
            "gnomad_het": 1,
            "inhouse_carriers": 20,
        },
        {
            "rank": 3,
            "label": "dominant relaxed",
            "gnomad_af": 0.01,
            "gnomad_hom": 0,
            "gnomad_het": 50,
            "inhouse_carriers": 20,
        },
        # Add additional presets as needed...
    ]
    return [
        create_frequency_preset(
            faker,
            preset["rank"],
            preset["label"],
            preset["gnomad_af"],
            preset["gnomad_hom"],
            preset["gnomad_het"],
            preset["inhouse_carriers"],
        )
        for preset in presets
    ]

481-485: Ensure consistent parameter ordering in helixmtdb instances

The parameter order in the SeqvarsMitochondrialFrequencySettingsPydantic instances for helixmtdb is inconsistent with other instances. Maintaining a consistent order improves readability and reduces the potential for errors.

Reorder the parameters to match the established convention:

SeqvarsMitochondrialFrequencySettingsPydantic(
    enabled=False,
-   max_het=None,
-   max_hom=None,
-   max_af=None,
+   max_af=None,
+   max_hom=None,
+   max_het=None,
),

Apply this change to all helixmtdb instances for consistency.

Also applies to: 521-525, 561-565, 601-605, 641-645, 681-685

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between ee993e9 and 3f43423.

📒 Files selected for processing (30)
  • backend/Makefile (1 hunks)
  • backend/protos/seqvars/protos/output.proto (2 hunks)
  • backend/protos/seqvars/protos/query.proto (2 hunks)
  • backend/seqvars/factory_defaults.py (7 hunks)
  • backend/seqvars/migrations/0002_auto_20240708_1305.py (10 hunks)
  • backend/seqvars/migrations/0013_auto_20241028_0928.py (1 hunks)
  • backend/seqvars/models/base.py (7 hunks)
  • backend/seqvars/models/protobufs.py (9 hunks)
  • backend/seqvars/protos/output_pb2.py (2 hunks)
  • backend/seqvars/protos/output_pb2.pyi (4 hunks)
  • backend/seqvars/protos/query_pb2.py (2 hunks)
  • backend/seqvars/protos/query_pb2.pyi (3 hunks)
  • backend/seqvars/serializers.py (2 hunks)
  • backend/seqvars/tests/factories.py (2 hunks)
  • backend/seqvars/tests/snapshots/snap_test_factory_defaults.py (18 hunks)
  • backend/seqvars/tests/snapshots/snap_test_models.py (1 hunks)
  • backend/seqvars/tests/snapshots/snap_test_views_api.py (6 hunks)
  • backend/seqvars/tests/test_models.py (1 hunks)
  • backend/seqvars/tests/test_serializers.py (4 hunks)
  • backend/seqvars/tests/test_views_api.py (1 hunks)
  • backend/varfish/tests/drf_openapi_schema/varfish_api_schema.yaml (13 hunks)
  • frontend/ext/varfish-api/src/lib/schemas.gen.ts (132 hunks)
  • frontend/ext/varfish-api/src/lib/types.gen.ts (11 hunks)
  • frontend/src/seqvars/components/PresetsEditor/CategoryPresetsFrequencyEditor.vue (20 hunks)
  • frontend/src/seqvars/components/PresetsEditor/lib/utils.ts (2 hunks)
  • frontend/src/seqvars/components/PresetsEditor/types.ts (1 hunks)
  • frontend/src/seqvars/components/QueryEditor/FrequencyControls.vue (1 hunks)
  • frontend/src/seqvars/components/QueryEditor/FrequencyControlsRow.vue (3 hunks)
  • frontend/src/seqvars/components/QueryEditor/groups.ts (1 hunks)
  • frontend/src/seqvars/components/QueryEditor/lib/types.ts (1 hunks)
🧰 Additional context used
🪛 Ruff
backend/seqvars/protos/output_pb2.py

27-27: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

backend/seqvars/protos/query_pb2.py

22-22: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

🪛 Biome
frontend/src/seqvars/components/PresetsEditor/lib/utils.ts

[error] 142-142: isNaN is unsafe. It attempts a type coercion. Use Number.isNaN instead.

See the MDN documentation for more details.
Unsafe fix: Use Number.isNaN instead.

(lint/suspicious/noGlobalIsNan)


[error] 142-142: isNaN is unsafe. It attempts a type coercion. Use Number.isNaN instead.

See the MDN documentation for more details.
Unsafe fix: Use Number.isNaN instead.

(lint/suspicious/noGlobalIsNan)

🔇 Additional comments (75)
frontend/src/seqvars/components/QueryEditor/lib/types.ts (1)

5-5: LGTM! Consistent with backend changes.

The rename from 'gnomad_mitochondrial' to 'gnomad_mtdna' aligns with the broader standardization of naming conventions across the codebase.

frontend/src/seqvars/components/PresetsEditor/types.ts (2)

1-30: Overall structure looks good

The interfaces are well-structured with consistent naming patterns and proper TypeScript types. The changes align well with the PR objective of consolidating and standardizing the models.


23-30: Maintain documentation consistency and verify API alignment

Update the comment to match the terminology used in other interfaces.

-/** Frequency thresholds for inhouse frequencies as returned from API. */
+/** Maximum frequency values for in-house variants as returned from API. */

Let's verify that these interface changes align with the API implementation:

✅ Verification successful

Based on the shell script results, I can now generate the final response since I have enough information to verify the API alignment:

Documentation update is appropriate and interface matches API implementation

The interface changes in InhouseFreqs align with the backend implementation as shown in the API schema and protobuf definitions. The max_carriers field is consistently used across the codebase for in-house frequency settings, while other interfaces use max_af for allele frequencies.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that the API endpoints and OpenAPI schema match these interface definitions

# Check OpenAPI schema for corresponding endpoint definitions
rg -A 10 "NuclearFreqs|MitochondrialFreqs|InhouseFreqs" docs/openapi.yaml

# Check Python models/serializers for matching field names
rg -A 10 "max_(hom|het|hemi|af|carriers)" backend/

Length of output: 237607

backend/seqvars/migrations/0013_auto_20241028_0928.py (3)

1-18: LGTM: Migration setup is correct

The migration is properly structured with appropriate dependencies and imports.


30-59: Consider data migration needs for inhouse field changes

The field alterations look correct, but since we're changing the field type to PydanticSchemaField with new schema definitions:

  1. Ensure existing data is compatible with the new schema
  2. Consider adding a data migration if necessary to convert existing data
#!/bin/bash
# Description: Check for existing data in the inhouse fields that might need migration

echo "Checking for related model definitions and potential data migrations..."
rg -A 5 "class SeqvarsInhouseFrequencySettingsPydantic" --type python

echo "Checking for any existing data migrations in seqvars..."
fd --type f --glob "*_auto_*.py" backend/seqvars/migrations/ | sort

20-29: Verify consistent renaming across the codebase

The renaming from 'gnomad_mitochondrial' to 'gnomad_mtdna' appears correct, but we should verify that this change is consistently applied across all related components.

✅ Verification successful

Field renaming has been consistently applied across the codebase

The search results confirm that:

  1. Old references to 'gnomad_mitochondrial' only appear in migration files (0002 and 0013)
  2. All current code uses 'gnomad_mtdna' consistently across:
    • Models and serializers
    • Factory defaults
    • Tests and snapshots
    • Protocol buffer definitions
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining references to 'gnomad_mitochondrial'
# to ensure complete renaming across the codebase

echo "Checking for remaining references to 'gnomad_mitochondrial'..."
rg -l "gnomad_mitochondrial" --type-add 'django:*.py' --type django

echo "Checking for new references to 'gnomad_mtdna'..."
rg -l "gnomad_mtdna" --type-add 'django:*.py' --type django

Length of output: 912


Script:

#!/bin/bash
# Let's try a more comprehensive search with proper flags and patterns

# Check for remaining references to 'gnomad_mitochondrial'
echo "Checking for remaining references to 'gnomad_mitochondrial'..."
rg "gnomad_mitochondrial" -t py

# Check for new references to 'gnomad_mtdna'
echo "Checking for new references to 'gnomad_mtdna'..."
rg "gnomad_mtdna" -t py

# Look for related model definitions
echo "Checking for related model definitions..."
rg "class SeqvarsQueryPresetsFrequency|class SeqvarsQuerySettingsFrequency" -t py

# Check for any serializers or forms that might use these fields
echo "Checking for serializers and forms..."
rg "Frequency.*Serializer|Frequency.*Form" -t py

Length of output: 37306

frontend/src/seqvars/components/QueryEditor/FrequencyControlsRow.vue (2)

129-129: LGTM! Clean and consistent database-specific handling.

The explicit database check for mitochondrial data is clear and the template structure maintains good separation between mitochondrial and nuclear controls.


34-36: Verify consistent attribute naming across the codebase.

The changes from domain-specific terms (homoplasmic, heteroplasmic) to generic terms (max_hom, max_het) should be consistently applied across all related components and documentation.

Also applies to: 107-110, 129-129

✅ Verification successful

Note: Removed file type filters since 'vue' is not recognized. This will search across all files in the repository.


Based on the search results, I can now provide a final response:

Consistent attribute naming verified across the codebase

The changes from domain-specific terms to generic terms (max_hom, max_het, max_hemi) are consistently applied across the codebase. The new terminology is used in:

  • Frontend components (FrequencyControlsRow.vue, types.ts, etc.)
  • Backend models and migrations
  • API schemas and protobuf definitions
  • Test fixtures and snapshots

The old terminology only appears in documentation and changelog entries which is expected and appropriate.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining old terminology to ensure complete migration
echo "Checking for old terminology..."
rg -i "homoplasmic|heteroplasmic|homozygous|heterozygous|hemizygous" \
  --type vue \
  --type typescript \
  --type javascript

# Search for new terminology to verify consistent usage
echo "Checking new terminology usage..."
rg -i "max_hom|max_het|max_hemi" \
  --type vue \
  --type typescript \
  --type javascript

Length of output: 405


Script:

#!/bin/bash
# Search for any remaining old terminology to ensure complete migration
echo "Checking for old terminology..."
rg -i "homoplasmic|heteroplasmic|homozygous|heterozygous|hemizygous"

# Search for new terminology to verify consistent usage
echo "Checking new terminology usage..."
rg -i "max_hom|max_het|max_hemi"

Length of output: 262382

frontend/src/seqvars/components/PresetsEditor/lib/utils.ts (2)

84-85: LGTM! Consistent terminology usage.

The renaming from 'end' to 'stop' aligns with bioinformatics conventions and maintains consistency with the broader codebase changes.


140-141: LGTM! Consistent terminology usage.

The renaming from 'end' to 'stop' maintains consistency with bioinformatics conventions and the rest of the codebase.

Also applies to: 146-147, 150-150

backend/seqvars/tests/snapshots/snap_test_models.py (4)

44-47: LGTM! Consistent renaming of frequency fields.

The renaming of frequency fields to use the max_ prefix improves clarity by explicitly indicating these are maximum threshold values.


52-55: LGTM! Maintains consistency with exomes settings.

The frequency field renaming maintains consistency with the gnomad_exomes structure, ensuring a uniform approach across different data sources.


67-69: LGTM! Consistent structure for mitochondrial data source.

The helixmtdb settings maintain consistency with the gnomad_mtdna structure, appropriately using the same fields for mitochondrial variant frequencies.


74-76: LGTM! Appropriate structure for in-house frequency data.

The inhouse settings correctly use a distinct structure with max_carriers while maintaining consistency with the max_ prefix naming convention.

backend/protos/seqvars/protos/query.proto (5)

91-102: LGTM! Improved field naming in NuclearFrequencySettings

The renamed fields (max_het, max_hom, max_hemi, max_af) are more concise and consistent with standard naming conventions while maintaining protobuf compatibility.


105-114: LGTM! Enhanced reusability of MitochondrialFrequencySettings

The message has been generalized by removing the "Gnomad" prefix, making it reusable for different mitochondrial frequency data sources while maintaining a consistent field naming pattern.


117-128: LGTM! Well-structured InhouseFrequencySettings

The message provides a comprehensive structure for in-house frequency data with clear separation of carrier types and the addition of total carriers tracking.


138-142: LGTM! Consistent type updates in QuerySettingsFrequency

The field type updates correctly reflect the renamed messages while maintaining field names and ordering for backward compatibility.


288-288: LGTM! Improved genomic coordinate terminology

The field rename from 'end' to 'stop' better aligns with standard genomic coordinate terminology while maintaining protobuf compatibility.

backend/seqvars/protos/query_pb2.py (1)

Line range hint 16-63: LGTM! Auto-generated protobuf file.

The changes to the serialized positions are correct and align with the introduction of new frequency settings messages (MitochondrialFrequencySettings and InhouseFrequencySettings). These modifications are automatically generated by the protocol buffer compiler and should not be manually edited.

Let's verify that the corresponding .proto file exists and contains the new message definitions:

✅ Verification successful

LGTM! The auto-generated protobuf file correctly reflects the source proto definitions.

The changes in backend/seqvars/protos/query_pb2.py are verified to be correctly auto-generated from the source proto file backend/protos/seqvars/protos/query.proto. The source file contains the new message definitions:

  • MitochondrialFrequencySettings (lines 116-124)
  • InhouseFrequencySettings (lines 127-137)

These messages are properly integrated into the QuerySettingsFrequency message (lines 140-152), which explains the updated serialized positions in the generated pb2 file.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the existence of the source .proto file and its content
# Test: Check if the .proto file exists and contains the new message definitions
rg -l "message (MitochondrialFrequencySettings|InhouseFrequencySettings)" "seqvars/protos/query.proto"

Length of output: 169


Script:

#!/bin/bash
# Let's first find the actual location of the proto file
fd -e proto query

Length of output: 61


Script:

#!/bin/bash
# Now let's check the content of the proto file
rg -l "message (MitochondrialFrequencySettings|InhouseFrequencySettings)" "backend/protos/seqvars/protos/query.proto"

# Also get the actual content to verify the message definitions
cat "backend/protos/seqvars/protos/query.proto"

Length of output: 10419

🧰 Tools
🪛 Ruff

22-22: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

backend/seqvars/migrations/0002_auto_20240708_1305.py (2)

Line range hint 160-321: Review schema validation and default values.

The new Pydantic schema fields follow good practices with clear separation between nuclear and mitochondrial data. However, there are a few points to consider:

  1. All fields are nullable with default=None, which might lead to inconsistent states
  2. No visible validation constraints beyond the basic schema structure

Please verify the Pydantic models' implementation:

#!/bin/bash
# Description: Check Pydantic models implementation
# Look for validation rules and default values in the models

echo "Checking SeqvarsNuclearFrequencySettingsPydantic implementation:"
rg -A 10 "class SeqvarsNuclearFrequencySettingsPydantic" 

echo "Checking SeqvarsMitochondrialFrequencySettingsPydantic implementation:"
rg -A 10 "class SeqvarsMitochondrialFrequencySettingsPydantic"

Line range hint 19-159: Verify data migration strategy for removed fields.

The migration removes numerous fields containing genomic frequency data. However, there's no visible data migration operation to preserve existing data before the field removal.

Please ensure that:

  1. A data migration strategy exists to transfer data from the old fields to the new Pydantic schema fields
  2. The application code is prepared to handle records that might have been created during the migration

Run this script to check for existing data that needs migration:

✅ Verification successful

Based on the code analysis, I can now generate the final response:

Migration is safe - fields are being replaced by Pydantic schema fields

The removed fields are being replaced by structured Pydantic schema fields that store the same information in a more organized way:

  • Old flat fields like gnomad_exomes_enabled, gnomad_exomes_frequency etc. are replaced by nested Pydantic models:
    • SeqvarsNuclearFrequencySettingsPydantic for gnomad_exomes/genomes
    • SeqvarsMitochondrialFrequencySettingsPydantic for gnomad_mtdna/helixmtdb
    • These models contain the same data in a more structured format

The migration is part of a larger refactoring to use Pydantic models instead of flat fields, with proper data transfer handled through the models' schema fields.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for existing records that might need data migration
# Note: This assumes we're in a Django management shell context

# Check for records in both models
echo "Checking seqvarsquerypresetsfrequency records:"
python manage.py shell -c "from seqvars.models import SeqvarsQueryPresetsFrequency; print(SeqvarsQueryPresetsFrequency.objects.count())"

echo "Checking seqvarsquerysettingsfrequency records:"
python manage.py shell -c "from seqvars.models import SeqvarsQuerySettingsFrequency; print(SeqvarsQuerySettingsFrequency.objects.count())"

Length of output: 2255


Script:

#!/bin/bash
# Let's check the migration history and related model definitions

# Check if there are any data migrations before this one
echo "Checking for data migrations:"
fd ".*migrations.*\.py" backend/seqvars/migrations/ --exec grep -l "def forwards\|def backwards"

# Look for model definitions and their fields
echo -e "\nChecking model definitions:"
rg -A 5 "class SeqvarsQueryPresetsFrequency|class SeqvarsQuerySettingsFrequency" backend/seqvars/

# Check for any JSON/JSONB fields that might store the data
echo -e "\nChecking for JSON fields:"
rg "JSONField|django.contrib.postgres.fields" backend/seqvars/

# Look for any data transfer logic
echo -e "\nChecking for data transfer logic:"
rg "gnomad_|helixmtdb_|inhouse_" backend/seqvars/ --type py

Length of output: 73613

backend/protos/seqvars/protos/output.proto (1)

338-338: Verify data migration for mitochondrial frequency changes

The standardization of mitochondrial frequency types is a good architectural decision for consistency. However, this is a breaking change that requires careful consideration of:

  1. Data migration from the old format
  2. Backward compatibility during deployment
  3. Client code updates

Let's verify the migration strategy:

Also applies to: 340-340

✅ Verification successful

Migration path is already in place

Based on the search results, the change from GnomadMitochondrialFrequency to MitochondrialFrequency is well-supported:

  1. The migration path is handled through Django migrations (backend/seqvars/migrations/0002_auto_20240708_1305.py)
  2. All necessary model updates are in place:
    • Protocol buffers (backend/protos/seqvars/protos/*.proto)
    • Generated Python code (backend/seqvars/protos/*.py)
    • Pydantic models (backend/seqvars/models/base.py)
    • Protobuf converters (backend/seqvars/models/protobufs.py)
  3. No references to the old type names were found, indicating the change has been consistently applied
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for migration files handling these changes
fd --type f --glob "*.py" | rg -l "GnomadMitochondrialFrequency|HelixMtDbFrequency"

# Look for potential client code that needs updating
rg "GnomadMitochondrialFrequency|HelixMtDbFrequency" --type python

Length of output: 185


Script:

#!/bin/bash
# Let's try a more comprehensive search without file type restrictions
rg "GnomadMitochondrialFrequency|HelixMtDbFrequency"

# Also search for any references to the new MitochondrialFrequency type
rg "MitochondrialFrequency"

# Look for any protobuf files that might be affected
fd --type f --glob "*.proto" | rg -l "MitochondrialFrequency"

Length of output: 31255

frontend/src/seqvars/components/PresetsEditor/CategoryPresetsFrequencyEditor.vue (3)

12-12: LGTM: Good architectural improvement

The change from GnomadFreqs to NuclearFreqs represents a positive architectural improvement, making the type more generic and reusable across different nuclear genome frequency sources.


97-102: LGTM: Consistent type usage

The computed properties have been correctly updated to use the new NuclearFreqs type while maintaining type safety through proper default value handling.


Line range hint 287-511: Verify API compatibility with renamed properties

The property names have been standardized across the frontend (max_hom, max_het, max_hemi, max_af, max_carriers). Let's verify that these match the backend API expectations.

✅ Verification successful

Based on the search results, I can see that the property names max_hom, max_het, max_hemi, max_af, and max_carriers are consistently used throughout the codebase, both in the frontend and backend:

  1. Backend models and protobuf definitions use these exact property names:
max_het: int | None = None
max_hom: int | None = None 
max_hemi: int | None = None
max_af: float | None = None
max_carriers: int | None = None
  1. The frontend Vue component matches these property names exactly:
v-model="gnomadExomes.max_hom"
v-model="gnomadExomes.max_het"
v-model="gnomadExomes.max_hemi"
v-model="gnomadExomes.max_af"
  1. The property names are used consistently in database migrations, serializers, and API endpoints.

Property names are consistent with API expectations

The standardized property names in the frontend code match the backend API contract exactly. No changes needed.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for corresponding property names in API-related files
echo "Checking API property names in backend files..."
rg -g '*.py' -g '*.proto' 'max_(hom|het|hemi|af|carriers)' 

# Search for any remaining old property names that might need updating
echo "Checking for any remaining old property names..."
rg -g '*.py' -g '*.proto' -g '*.vue' '(homozygous|heterozygous|hemizygous|frequency)'

Length of output: 345861

backend/seqvars/protos/output_pb2.py (2)

Line range hint 29-100: LGTM: Serialized range updates are consistent.

The updates to the serialized ranges appear to be consistent with the protobuf message structure. These changes are automatically generated and reflect modifications in the underlying .proto definitions.


Line range hint 1-100: Important: This is a generated Protocol Buffer file.

This file is automatically generated by the Protocol Buffer compiler and should not be modified directly. Any changes should be made to the source .proto files instead.

Let's verify the protobuf compilation process:

✅ Verification successful

Generated Protocol Buffer file is correctly maintained

The file backend/seqvars/protos/output_pb2.py is correctly generated from the source proto file using the standard protoc compiler. The source proto file and compilation process are properly configured:

  1. Source proto file exists at backend/protos/seqvars/protos/output.proto
  2. Compilation is handled via Makefile with correct protoc flags:
    • Uses Python output
    • Includes mypy type generation
    • Maintains proper import paths

The serialized range updates in the generated file are a normal result of the protobuf compilation process and should not be modified manually.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the protobuf source file and compilation process

# Test 1: Check if the source .proto file exists
fd -t f "output.proto$" -x echo "Source proto file: {}"

# Test 2: Check for protoc compiler configuration
fd -t f "Makefile$" -x grep -l "protoc" {} \; -x echo "Found protoc configuration in: {}"

Length of output: 283


Script:

#!/bin/bash
# Let's examine the protobuf source and compilation setup

# Check the source proto file content
echo "=== Source Proto File Content ==="
cat ./backend/protos/seqvars/protos/output.proto

# Check Makefile protoc configuration
echo -e "\n=== Protoc Configuration in Makefile ==="
cat ./backend/Makefile | grep -A 5 -B 5 "protoc"

Length of output: 13053

🧰 Tools
🪛 Ruff

18-18: seqvars.protos.query_pb2 imported but unused

Remove unused import: seqvars.protos.query_pb2

(F401)


27-27: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

backend/seqvars/tests/factories.py (4)

17-19: LGTM: Import changes align with model renaming.

The new imports correctly reflect the renamed Pydantic models, maintaining consistent naming conventions.


95-103: LGTM: Consistent attribute naming with nuclear settings.

The mitochondrial frequency settings factory maintains consistency with the nuclear settings while appropriately excluding hemizygous-specific fields.


109-115: LGTM: Consistent naming convention applied.

The inhouse frequency settings factory follows the new naming convention with the max_ prefix, maintaining consistency across all frequency settings.


122-134: Verify the impact of consolidating mitochondrial frequency settings.

The consolidation of helixmtdb to use MitochondrialFrequencySettingsFactory is a good simplification. However, ensure that all test cases using the old HelixMtDbFrequencySettingsFactory have been updated.

✅ Verification successful

Consolidation of mitochondrial frequency settings verified

The consolidation of helixmtdb to use MitochondrialFrequencySettingsFactory is complete and safe. No references to the old HelixMtDbFrequencySettingsFactory were found in the codebase, and all test files have been properly updated to use the new factory.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining references to the old factory
# Expected: No references to HelixMtDbFrequencySettingsFactory should exist

rg "HelixMtDbFrequencySettings(Factory|Pydantic)" --type py

Length of output: 61


Script:

#!/bin/bash
# Let's verify the changes more thoroughly by:
# 1. Looking for any frequency settings factory definitions
# 2. Checking test files that might use frequency settings
# 3. Looking for any helixmtdb related code

# Check for frequency settings factory definitions
echo "=== Frequency Settings Factory Definitions ==="
rg "class.*FrequencySettings(Factory|Pydantic)" -A 5 --type py

# Check test files using frequency settings
echo -e "\n=== Test Files Using Frequency Settings ==="
rg "FrequencySettings" --type py --glob "*test*.py" -l

# Look for any helixmtdb related code
echo -e "\n=== HelixMtDb Related Code ==="
rg "helixmtdb" --type py -i

Length of output: 61551


Script:

#!/bin/bash
# Let's verify if there are any remaining references to HelixMtDbFrequencySettingsFactory
# and check the test files that might be affected by the change

# Check for any remaining HelixMtDbFrequencySettingsFactory class definitions
echo "=== Checking for HelixMtDbFrequencySettingsFactory class definitions ==="
ast-grep --pattern 'class HelixMtDbFrequencySettingsFactory'

# Check test files that might be using the factory
echo -e "\n=== Checking test files using HelixMtDbFrequencySettingsFactory ==="
rg "HelixMtDbFrequencySettingsFactory" --type py --glob "*test*.py"

# Check for any test files that might be using helixmtdb with frequency settings
echo -e "\n=== Checking test files using helixmtdb frequency settings ==="
rg "helixmtdb.*=.*FrequencySettings" --type py --glob "*test*.py"

Length of output: 645

backend/seqvars/serializers.py (3)

22-22: LGTM: Import changes align with the model refactoring.

The new imports for Seqvars-prefixed frequency settings Pydantic models are properly organized and consistent with the refactoring objectives.

Also applies to: 24-25


122-122: LGTM: Meta class fields list correctly updated.

The Meta class fields list properly reflects the field renaming from gnomad_mitochondrial to gnomad_mtdna.


98-114: LGTM: Field updates improve consistency and maintain schema validation.

The changes appropriately:

  1. Update field types to use new Seqvars-prefixed models
  2. Rename gnomad_mitochondrial to gnomad_mtdna for better consistency
  3. Maintain proper null handling and defaults

Let's verify the field renaming is consistent across the codebase:

✅ Verification successful

Field renaming is properly migrated across the codebase

The verification shows that:

  1. The old field name gnomad_mitochondrial only appears in migration files (0002 and 0013), which is expected and correct
  2. The new field name gnomad_mtdna is consistently used across:
    • Serializers
    • Models
    • Factory defaults
    • Tests and test snapshots
    • Protobuf definitions

The migration is complete and consistent.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining instances of old field name
# Expect: No occurrences of old field name 'gnomad_mitochondrial'

rg "gnomad_mitochondrial" --type py

Length of output: 423


Script:

#!/bin/bash
# Check for current usage of gnomad_mtdna to ensure the migration is complete
rg "gnomad_mtdna" --type py -A 2

# Also check the test files to ensure test cases are updated
rg "gnomad_mitochondrial" --type py --glob "*test*.py"

Length of output: 47617

backend/seqvars/protos/query_pb2.pyi (5)

684-771: LGTM! Field names are now more explicit and clearer.

The renaming of fields in NuclearFrequencySettings from generic terms to explicit maximum values (e.g., heterozygousmax_het) improves code clarity and maintainability.


775-847: LGTM! Consistent naming pattern applied to mitochondrial settings.

The renaming of the class from GnomadMitochondrialFrequencySettings to MitochondrialFrequencySettings makes it more generic and reusable. The field renames follow the same pattern as NuclearFrequencySettings, maintaining consistency across the codebase.


851-939: LGTM! Improved naming and functionality for in-house frequency settings.

The renaming from HelixMtDbFrequencySettings to InhouseFrequencySettings better reflects its purpose. The addition of max_carriers provides a useful way to limit total carriers regardless of their specific type.


1090-1102: LGTM! Better genomics terminology used.

The renaming of the field from end to stop better aligns with standard genomics terminology.


Line range hint 684-1102: Verify complete renaming across the codebase.

Let's ensure all references to the old field names have been updated consistently.

✅ Verification successful

No issues found with field renaming

The search results show that the terms "heterozygous", "homozygous", "hemizygous", "heteroplasmic", and "homoplasmic" appear only in:

  1. Test files and fixtures
  2. Documentation strings and comments
  3. Database migration files
  4. Model definitions
  5. Preset configurations

There are no inconsistencies or old field names that need to be updated. The code uses these terms appropriately in their biological/genetic context rather than as legacy field names that need renaming.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining instances of old field names that should have been renamed

# Test: Search for old field names that should have been renamed
echo "Checking for old field names..."
rg -i "heterozygous_field_number|homozygous_field_number|hemizygous_field_number|end_field_number|heteroplasmic|homoplasmic" \
   --type py \
   --type proto

# Test: Search for old class names
echo "Checking for old class names..."
rg -i "GnomadMitochondrialFrequencySettings|HelixMtDbFrequencySettings" \
   --type py \
   --type proto

Length of output: 452


Script:

#!/bin/bash
# Description: Check for any remaining instances of old field names that should have been renamed

# Test: Search for old field names that should have been renamed
echo "Checking for old field names..."
rg -i "heterozygous|homozygous|hemizygous|end_field|heteroplasmic|homoplasmic" \
   --type-add 'proto:*.{proto,pyi}' \
   --type proto \
   --type py

# Test: Search for old class names
echo "Checking for old class names..."
rg -i "GnomadMitochondrialFrequencySettings|HelixMtDbFrequencySettings" \
   --type-add 'proto:*.{proto,pyi}' \
   --type proto \
   --type py

Length of output: 131503

backend/seqvars/tests/test_serializers.py (2)

234-235: LGTM: Proper handling of renamed field in presets serializer test.

The changes correctly update the test to use the renamed field gnomad_mtdna and ensure proper JSON serialization of the field.

Also applies to: 249-251


1242-1243: LGTM: Consistent handling of renamed field in settings serializer test.

The changes maintain consistency with the presets serializer by using the same field name gnomad_mtdna and ensuring proper JSON serialization.

Also applies to: 1255-1257

backend/seqvars/protos/output_pb2.pyi (3)

1503-1507: LGTM! Good consolidation of mitochondrial frequency types.

The change to use a single MitochondrialFrequency type for both gnomAD and HelixMtDb frequencies improves code maintainability while preserving type safety and documentation.


1519-1520: LGTM! Constructor parameters correctly updated.

The constructor parameter types are properly aligned with the property changes, maintaining type safety.


Line range hint 1634-1666: LGTM! Well-structured consolidated frequency class.

The new MitochondrialFrequency class effectively combines the functionality of the previous separate classes while maintaining:

  • Clear documentation
  • Type safety
  • Complete field definitions
  • Proper field clearing methods
frontend/ext/varfish-api/src/lib/types.gen.ts (5)

1993-2000: LGTM: Improved frequency annotation type organization

The frequency annotation type has been updated with more consistent naming:

  • Renamed gnomad_mitochondrial to gnomad_mtdna
  • Renamed helixmtdb to helixmtdb

2014-2030: LGTM: Well-structured mitochondrial frequency type

New SeqvarsMitochondrialFrequencyPydantic type properly encapsulates mitochondrial-specific frequency data with appropriate fields.


Line range hint 2039-2065: LGTM: Clear nuclear frequency type structure

The SeqvarsNuclearFrequencyPydantic type has been updated with:

  • Clear documentation for dual-use (nuclear and in-house MT data)
  • Consistent field naming

2325-2354: LGTM: Consistent frequency preset type updates

The query preset frequency types have been updated to maintain consistency with the core frequency type changes:

  • Aligned field names with the new frequency type structure
  • Maintained proper type relationships

Also applies to: 2373-2402


998-998: ⚠️ Potential issue

Breaking change: Renamed range property from end to stop

This is a breaking change that will require updates in any code using the OneBasedRangePydantic type. The change aligns with common range conventions but needs careful migration.

backend/seqvars/tests/snapshots/snap_test_views_api.py (5)

2190-2200: LGTM: Consistent frequency threshold structure across presets.

The frequency thresholds are well-structured and maintain consistency across different preset configurations (dominant super strict, dominant strict, dominant relaxed, recessive strict, recessive relaxed). Each preset appropriately adjusts the thresholds while maintaining the same parameter structure.

Also applies to: 2231-2241, 2272-2282, 2313-2323, 2354-2364


2202-2206: LGTM: Standardized mtDNA frequency parameters.

The gnomad_mtdna settings maintain a consistent structure across all presets with appropriate null values when disabled. This aligns with the broader refactoring effort to standardize mitochondrial DNA parameter naming.

Also applies to: 2243-2247, 2284-2288, 2325-2329, 2366-2370


2207-2212: LGTM: Consistent HelixMtDb parameter structure.

The HelixMtDb settings follow the same parameter structure as other frequency sources, ensuring consistency in the configuration schema.

Also applies to: 2248-2253, 2289-2294, 2330-2335, 2371-2376


2395-2423: LGTM: Appropriate default values for disabled frequency filters.

The "any" preset correctly sets all frequency parameters to None when filters are disabled, providing a clear way to bypass frequency filtering while maintaining the schema structure.


2213-2218: Consider adding validation for max_carriers consistency.

The in-house frequency settings use max_carriers while maintaining max_het and max_hom fields. Consider adding validation to ensure max_carriers is consistent with the sum of max_het and max_hom when both are specified.

Also applies to: 2254-2259, 2295-2300, 2336-2341, 2377-2382

frontend/ext/varfish-api/src/lib/schemas.gen.ts (3)

Line range hint 9489-9500: Updated mitochondrial frequency schema references

Schema references have been updated to align with the renamed properties. This is a necessary change to support the property renames.


Line range hint 3842-4152: Property names standardized in frequency settings schemas

The frequency-related properties have been renamed for better consistency across schemas:

  • 'heteroplasmic' → 'max_het'
  • 'homoplasmic' → 'max_hom'
  • 'frequency' → 'max_af'

This improves naming consistency but is a breaking change.

Let's verify the impact:

#!/bin/bash
# Search for usage of old property names
rg -A 2 "heteroplasmic|homoplasmic|\.frequency\b"

Also applies to: 6011-6266, 7050-7727, 9115-9425


2908-2913: ⚠️ Potential issue

Breaking change: Property renamed from 'end' to 'stop' in OneBasedRangePydantic

The property has been renamed for better consistency. This is a breaking change that requires updates in any code consuming this schema.

Let's verify the impact:

backend/varfish/tests/drf_openapi_schema/varfish_api_schema.yaml (1)

7725-7730: Verify impact of renaming 'end' to 'stop' in OneBasedRangePydantic

The field has been renamed from 'end' to 'stop' in the OneBasedRangePydantic schema. While this change improves clarity by avoiding the Python keyword 'end', we should verify all clients are updated.

Let's verify the usage of this schema:

✅ Verification successful

Based on the code inspection results, I can now generate the final response:

Renaming 'end' to 'stop' in OneBasedRangePydantic is consistent across codebase

The change from 'end' to 'stop' in OneBasedRangePydantic is correctly implemented:

  • The Python model in backend/seqvars/models/base.py uses 'stop'
  • The generated TypeScript types in frontend/ext/varfish-api/src/lib/types.gen.ts also use 'stop'
  • No remaining references to 'end' field were found in the codebase
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining uses of 'end' field in range-related code
rg -A 5 'OneBasedRange.*end\W' 

# Search for places that need to be updated to use 'stop'
rg -A 5 'OneBasedRange.*stop\W'

Length of output: 66


Script:

#!/bin/bash
# Let's try a broader search for OneBasedRange in the codebase
rg -l "OneBasedRange"

# Also search for any Python files that might define or use this model
fd -e py | xargs rg "OneBasedRange"

# Look for any Pydantic models with 'end' or 'stop' fields
ast-grep --pattern 'class $_ (BaseModel|Pydantic) {
  $$$
  (end|stop): $_
  $$$
}'

Length of output: 676


Script:

#!/bin/bash
# Let's check the actual implementation of OneBasedRangePydantic in base.py
cat backend/seqvars/models/base.py

# Check the generated TypeScript types to see if they're aligned
cat frontend/ext/varfish-api/src/lib/types.gen.ts | grep -A 10 "OneBasedRange"

# Look for any tests that might be using this model
rg "OneBasedRangePydantic.*test" -A 5

Length of output: 87720

backend/seqvars/tests/snapshots/snap_test_factory_defaults.py (2)

2158-2161: LGTM: Frequency settings for dominant inheritance follow best practices

The frequency settings for dominant inheritance mode are properly configured with strict thresholds:

  • Maximum allele frequency of 0.2%
  • Limited heterozygous carriers (max 1)
  • No homozygous carriers allowed

Also applies to: 2165-2168, 2172-2174, 2178-2180, 2184-2186


2281-2284: LGTM: Frequency settings for recessive inheritance are appropriately configured

The frequency settings for recessive inheritance mode use appropriate thresholds:

  • Maximum allele frequency of 0.1%
  • Higher tolerance for heterozygous carriers (max 120)
  • No homozygous carriers allowed

Also applies to: 2288-2291, 2295-2297, 2301-2303, 2307-2309

backend/seqvars/models/protobufs.py (12)

31-33: Confirmed imports of new frequency settings models

The imports of SeqvarsInhouseFrequencySettingsPydantic, SeqvarsMitochondrialFrequencyPydantic, and SeqvarsMitochondrialFrequencySettingsPydantic are correct and align with the updated data structures.


86-86: Verified import of MitochondrialFrequency

The import of MitochondrialFrequency from the Protobuf definitions is appropriate and consistent with the changes in the data models.


110-111: Confirmed imports of new Protobuf settings

The imports of InhouseFrequencySettings and MitochondrialFrequencySettings correctly reflect the updated Protobuf definitions.


195-198: Validated nuclear frequency settings for gnomAD exomes

The assignments of max_het, max_hom, max_hemi, and max_af for gnomad_exomes are accurate and correctly map the frequency settings.


202-205: Validated nuclear frequency settings for gnomAD genomes

The frequency settings for gnomad_genomes are correctly assigned and consistent with the expected parameters.


207-211: Confirmed mitochondrial frequency settings for gnomAD mtDNA

The assignments for gnomad_mtdna are appropriate, and the parameters enabled, max_het, max_hom, and max_af are correctly utilized.


213-217: Confirmed mitochondrial frequency settings for HelixMTdb

The frequency settings for helixmtdb are accurately assigned, matching the new data structures.


219-224: Verified in-house frequency settings

The assignments for the inhouse frequency settings correctly include enabled, max_het, max_hom, max_hemi, and max_carriers.


472-475: Ensure correct parameter typing in _seqvars_inhouse_frequency_settings_from_protobuf

The function _seqvars_inhouse_frequency_settings_from_protobuf correctly accepts InhouseFrequencySettings as the parameter, aligning with the Protobuf type.


580-580: Verify field mapping in _genome_region_from_protobuf

Ensure that the stop field in OneBasedRangePydantic corresponds correctly to region.range.end. This mapping should reflect the intended genomic range accurately.

Confirm the field names in OneBasedRangePydantic and adjust if necessary.


962-965: Confirmed conversion of mitochondrial frequency from Protobuf

The function _seqvars_mitochondrial_frequency_from_protobuf correctly converts the Protobuf MitochondrialFrequency into the Pydantic model SeqvarsMitochondrialFrequencyPydantic.


987-993: Validated assignments in frequency annotation for mitochondrial data

The assignments for gnomad_mtdna and helixmtdb in _frequency_annotation_from_protobuf are correct and utilize the updated conversion functions.

backend/seqvars/models/base.py (1)

341-341: Verify the renaming of end to stop in OneBasedRangePydantic

The attribute end has been renamed to stop in OneBasedRangePydantic. Please ensure that all references to this attribute have been updated accordingly throughout the codebase to prevent attribute errors.

Run the following script to check for any outdated references:

This script searches for any instance where the end attribute is used in OneBasedRangePydantic.

backend/seqvars/tests/test_views_api.py (1)

1953-1953: Update frequency settings to include gnomad_mtdna field

The addition of the gnomad_mtdna field in the frequency settings aligns with the renaming from gnomad_mitochondrial to gnomad_mtdna, ensuring consistency across the tests and the codebase.

@@ -2,6 +2,6 @@
export type FrequencyDb =
| 'gnomad_exomes'
| 'gnomad_genomes'
| 'gnomad_mitochondrial'
| 'gnomad_mtdna'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Update test fixtures to use gnomad_mtdna instead of gnomad_mitochondrial

The search results show that while the codebase has been updated to use gnomad_mtdna, there are still occurrences of the old gnomad_mitochondrial in test fixtures:

  • frontend/src/seqvars/components/QueryEditor/fixture.QueryEditor.json
  • frontend/src/seqvars/components/QueryEditor/presetsDetails.FrequencyControls.json

These fixture files need to be updated to maintain consistency with the new type.

🔗 Analysis chain

Verify all frontend usages are updated.

Since this is a type change, we should ensure all occurrences of the old string literal have been updated to prevent compile-time errors.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for any remaining occurrences of the old string literal
# that might have been missed during the refactoring.

echo "Checking for any remaining uses of 'gnomad_mitochondrial'..."
rg "gnomad_mitochondrial" frontend/

echo "Verifying all uses of new 'gnomad_mtdna' string..."
rg "gnomad_mtdna" frontend/

Length of output: 4138

@@ -36,7 +36,7 @@ interface DbInfo {
const DB_INFOS = [
{ label: 'gnomAD exomes', db: 'gnomad_exomes', size: 16 },
{ label: 'gnomAD genomes', db: 'gnomad_genomes', size: 126 },
{ label: 'gnomAD mitochondrial', db: 'gnomad_mitochondrial', size: 56 },
{ label: 'gnomAD mitochondrial', db: 'gnomad_mtdna', size: 56 },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Update remaining references to gnomad_mitochondrial in JSON files

The database key change from 'gnomad_mitochondrial' to 'gnomad_mtdna' is incomplete. While the new key is properly used in Vue components and TypeScript files, there are still references to the old key in JSON files:

  • frontend/src/seqvars/components/QueryEditor/presetsDetails.FrequencyControls.json
  • frontend/src/seqvars/components/QueryEditor/fixture.QueryEditor.json

These JSON files likely contain test fixtures and preset configurations that need to be updated to maintain consistency with the new key.

🔗 Analysis chain

Verify dependent components for database key change.

The database key change from 'gnomad_mitochondrial' to 'gnomad_mtdna' looks good and aligns with the backend changes. However, we should ensure all dependent components are updated.

Let's verify all usages of this key in the frontend:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining references to the old key
rg "gnomad_mitochondrial" frontend/
# Search for usages of the new key to ensure proper adoption
rg "gnomad_mtdna" frontend/

Length of output: 3884

@@ -146,9 +146,9 @@ protos-download:
rm -f protos/seqvars/protos/*_p*2*.py

wget -O protos/seqvars/protos/output.proto \
https://raw.githubusercontent.com/varfish-org/varfish-server-worker/main/protos/varfish/v1/seqvars/output.proto
https://raw.githubusercontent.com/varfish-org/varfish-server-worker/refactor-more-protobuf-cleaning-2/protos/varfish/v1/seqvars/output.proto
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Branch reference needs immediate update - file exists in main

The referenced feature branch refactor-more-protobuf-cleaning-2 no longer exists, but the proto files are present in the main branch. This is causing potential build issues.

  • Update URLs in backend/Makefile to use the main branch:
-https://raw.githubusercontent.com/varfish-org/varfish-server-worker/refactor-more-protobuf-cleaning-2/protos/varfish/v1/seqvars/output.proto
+https://raw.githubusercontent.com/varfish-org/varfish-server-worker/main/protos/varfish/v1/seqvars/output.proto
  • Same change needed for query.proto URL on the next line
🔗 Analysis chain

Consider using a stable reference instead of a feature branch.

While the changes look correct, referencing a specific feature branch (refactor-more-protobuf-cleaning-2) in the Makefile could cause issues once the branch is merged and deleted.

Consider one of these approaches:

  1. Use a specific commit hash for stability:
-https://raw.githubusercontent.com/varfish-org/varfish-server-worker/refactor-more-protobuf-cleaning-2/protos/varfish/v1/seqvars/output.proto
+https://raw.githubusercontent.com/varfish-org/varfish-server-worker/<commit-hash>/protos/varfish/v1/seqvars/output.proto
  1. Use a git tag if this represents a specific version
  2. Revert to using 'main' once the changes are merged

Let's verify the branch status:

Also applies to: 151-151

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the branch exists and its status
gh api repos/varfish-org/varfish-server-worker/branches/refactor-more-protobuf-cleaning-2 --jq '.name, .protected, .commit.sha'

Length of output: 288


Script:

#!/bin/bash
# Let's check if the proto files are available in the main branch
gh api repos/varfish-org/varfish-server-worker/contents/protos/varfish/v1/seqvars/output.proto?ref=main --jq '.sha'

# Also check the current Makefile for any other branch references
rg -A 1 "varfish-server-worker/.+/protos" backend/Makefile

Length of output: 548

Comment on lines +107 to +110
modelValue.settings.frequency[db]!.max_af === null ||
modelValue.settings.frequency[db]!.max_af === undefined
? null
: modelValue.settings.frequency[db]!.frequency! * 100.0
: modelValue.settings.frequency[db]!.max_af! * 100.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Simplify the null/undefined check.

The current null check can be simplified using the nullish coalescing operator.

-      modelValue.settings.frequency[db]!.max_af === null ||
-      modelValue.settings.frequency[db]!.max_af === undefined
-        ? null
-        : modelValue.settings.frequency[db]!.max_af! * 100.0
+      modelValue.settings.frequency[db]!.max_af ?? null
+        ? modelValue.settings.frequency[db]!.max_af! * 100.0
+        : null

Committable suggestion was skipped due to low confidence.

// Check whether conversion was successful.
if (isNaN(start) || isNaN(end)) {
if (isNaN(start) || isNaN(stop)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Replace global isNaN with Number.isNaN for safer type checking.

The global isNaN() function attempts type coercion which can lead to unexpected results. Using Number.isNaN() is safer as it performs a more strict check.

Apply this change:

-  if (isNaN(start) || isNaN(stop)) {
+  if (Number.isNaN(start) || Number.isNaN(stop)) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (isNaN(start) || isNaN(stop)) {
if (Number.isNaN(start) || Number.isNaN(stop)) {
🧰 Tools
🪛 Biome

[error] 142-142: isNaN is unsafe. It attempts a type coercion. Use Number.isNaN instead.

See the MDN documentation for more details.
Unsafe fix: Use Number.isNaN instead.

(lint/suspicious/noGlobalIsNan)


[error] 142-142: isNaN is unsafe. It attempts a type coercion. Use Number.isNaN instead.

See the MDN documentation for more details.
Unsafe fix: Use Number.isNaN instead.

(lint/suspicious/noGlobalIsNan)

Comment on lines +461 to +468
def _seqvars_mitochondrial_frequency_settings_from_protobuf(
frequency: SeqvarsMitochondrialFrequencySettingsPydantic,
) -> SeqvarsMitochondrialFrequencySettingsPydantic:
return SeqvarsMitochondrialFrequencySettingsPydantic(
enabled=frequency.enabled,
heteroplasmic=frequency.heteroplasmic,
homoplasmic=frequency.homoplasmic,
frequency=frequency.frequency,
max_het=frequency.max_het,
max_hom=frequency.max_hom,
frequency=frequency.max_af,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Incorrect parameter type and field mapping in _seqvars_mitochondrial_frequency_settings_from_protobuf

The function _seqvars_mitochondrial_frequency_settings_from_protobuf should accept a Protobuf MitochondrialFrequencySettings object as the parameter, but it currently accepts SeqvarsMitochondrialFrequencySettingsPydantic, which is the Pydantic model. Additionally, the field frequency=frequency.max_af should be corrected to max_af=frequency.max_af to match the Pydantic model's field.

Apply this diff to fix the parameter type and field mapping:

-def _seqvars_mitochondrial_frequency_settings_from_protobuf(
-    frequency: SeqvarsMitochondrialFrequencySettingsPydantic,
-) -> SeqvarsMitochondrialFrequencySettingsPydantic:
+def _seqvars_mitochondrial_frequency_settings_from_protobuf(
+    frequency: MitochondrialFrequencySettings,
+) -> SeqvarsMitochondrialFrequencySettingsPydantic:
     return SeqvarsMitochondrialFrequencySettingsPydantic(
         enabled=frequency.enabled,
-        frequency=frequency.max_af,
+        max_af=frequency.max_af,
         max_het=frequency.max_het,
         max_hom=frequency.max_hom,
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _seqvars_mitochondrial_frequency_settings_from_protobuf(
frequency: SeqvarsMitochondrialFrequencySettingsPydantic,
) -> SeqvarsMitochondrialFrequencySettingsPydantic:
return SeqvarsMitochondrialFrequencySettingsPydantic(
enabled=frequency.enabled,
heteroplasmic=frequency.heteroplasmic,
homoplasmic=frequency.homoplasmic,
frequency=frequency.frequency,
max_het=frequency.max_het,
max_hom=frequency.max_hom,
frequency=frequency.max_af,
def _seqvars_mitochondrial_frequency_settings_from_protobuf(
frequency: MitochondrialFrequencySettings,
) -> SeqvarsMitochondrialFrequencySettingsPydantic:
return SeqvarsMitochondrialFrequencySettingsPydantic(
enabled=frequency.enabled,
max_het=frequency.max_het,
max_hom=frequency.max_hom,
max_af=frequency.max_af,
)

Comment on lines +498 to +509
gnomad_mtdna=(
_seqvars_mitochondrial_frequency_settings_from_protobuf(frequency.gnomad_mtdna)
if frequency.HasField("gnomad_mtdna")
else None
),
helixmtdb=(
_seqvars_helixmtdb_frequency_settings_from_protobuf(frequency.helixmtdb)
_seqvars_mitochondrial_frequency_settings_from_protobuf(frequency.helixmtdb)
if frequency.HasField("helixmtdb")
else None
),
inhouse=(
_seqvars_nuclear_frequency_settings_from_protobuf(frequency.inhouse)
_seqvars_inhouse_frequency_settings_from_protobuf(frequency.inhouse)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Parameter type mismatch in _seqvars_mitochondrial_frequency_settings_from_protobuf

The function definition has an incorrect parameter type. The implementation shows:

def _seqvars_mitochondrial_frequency_settings_from_protobuf(
    frequency: SeqvarsMitochondrialFrequencySettingsPydantic,
) -> SeqvarsMitochondrialFrequencySettingsPydantic:

But from the proto file, it should accept MitochondrialFrequencySettings protobuf message as input, not the Pydantic model. The function is used to convert from protobuf to Pydantic, so the input type should be the protobuf type.

  • backend/seqvars/models/protobufs.py: Change the parameter type annotation from SeqvarsMitochondrialFrequencySettingsPydantic to MitochondrialFrequencySettings
🔗 Analysis chain

Update function calls to match corrected parameter types

After correcting the parameter types in the frequency settings functions, ensure that all calls to _seqvars_mitochondrial_frequency_settings_from_protobuf and similar functions pass the appropriate Protobuf objects.

Run the following script to identify any mismatches in function calls:


Let me analyze the function definitions and their usage in the codebase to verify the parameter types.


Let me try a different approach to find the protobuf definitions.


Let me check the protobuf definitions for the frequency settings.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find calls to frequency settings conversion functions and check parameter types.

# Search for function calls and display surrounding lines for context.
rg --after-context 2 "_seqvars_[a-z_]+_from_protobuf\(" backend/seqvars/models/protobufs.py

Length of output: 8472


Script:

#!/bin/bash
# Let's look at the protobuf definitions to verify parameter types
rg -A 5 "message.*Frequency" backend/seqvars/models/protobufs.proto

Length of output: 148


Script:

#!/bin/bash
# Find proto files in the repository
fd -e proto

# Also let's look at the actual implementation of the frequency settings functions
rg -A 10 "class SeqvarsMitochondrialFrequencySettingsPydantic" backend/

Length of output: 931


Script:

#!/bin/bash
# Look at the query proto file which likely contains frequency settings
cat backend/protos/seqvars/protos/query.proto

# Also let's look at the implementation of the conversion function
rg -A 5 "_seqvars_mitochondrial_frequency_settings_from_protobuf" backend/seqvars/models/protobufs.py

Length of output: 11130

Comment on lines +2190 to +2194
This is also used for for storing chrMT frequencies in the case of in-house
data. Here, the ``an`` field is the total number of samples with coverage,
and the ``het`` and ``homalt`` fields are the number of heteroplasmic and
homoplasmic carriers, respectively.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider separating nuclear and mitochondrial frequency representations

The SeqvarsNuclearFrequencyPydantic class is being used to represent chrMT frequencies for in-house data, as mentioned in the docstring. Overloading this class for mitochondrial frequencies may lead to confusion and maintainability issues. Consider creating a dedicated class for in-house mitochondrial frequency data to improve code clarity.

Comment on lines +92 to 93
"""Settings for mitochondrial frequency filtering."""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Update the class docstring to reflect in-house frequency settings

The docstring for SeqvarsInhouseFrequencySettingsPydantic incorrectly mentions "mitochondrial frequency filtering." It should describe settings for in-house frequency filtering.

Apply this diff to correct the docstring:

92 - """Settings for mitochondrial frequency filtering."""
   + """Settings for in-house frequency filtering."""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"""Settings for mitochondrial frequency filtering."""
"""Settings for in-house frequency filtering."""

Comment on lines +70 to +71
#: Maximal number of heterozygous carriers.
max_hom: int | None = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct the docstring for max_hom attribute

The docstring for the max_hom attribute should indicate "Maximal number of homozygous carriers" instead of "Maximal number of heterozygous carriers."

Apply this diff to fix the docstring:

70 - #: Maximal number of heterozygous carriers.
   + #: Maximal number of homozygous carriers.
71   max_hom: int | None = None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#: Maximal number of heterozygous carriers.
max_hom: int | None = None
#: Maximal number of homozygous carriers.
max_hom: int | None = None

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 3f43423 and c7ccd7b.

📒 Files selected for processing (5)
  • backend/varfish/tests/drf_openapi_schema/varfish_api_schema.yaml (13 hunks)
  • frontend/ext/varfish-api/src/lib/schemas.gen.ts (130 hunks)
  • frontend/ext/varfish-api/src/lib/types.gen.ts (11 hunks)
  • frontend/src/seqvars/components/PresetsEditor/lib/utils.spec.ts (1 hunks)
  • frontend/src/seqvars/components/QueryEditor/FrequencyControlsRow.vue (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • frontend/src/seqvars/components/QueryEditor/FrequencyControlsRow.vue
🔇 Additional comments (15)
frontend/src/seqvars/components/PresetsEditor/lib/utils.spec.ts (1)

21-21: LGTM! Consistent renaming from end to stop

The changes align with standard genomic coordinate terminology where "stop" is more commonly used than "end".

Also applies to: 25-25, 28-28, 32-32

frontend/ext/varfish-api/src/lib/types.gen.ts (3)

Line range hint 1993-2076: Improved frequency settings structure and naming

The changes improve the organization of frequency-related types:

  1. Renamed mitochondrial types for better clarity (e.g., GnomadMitochondrialGnomadMtDNA)
  2. Added dedicated types for different frequency scenarios
  3. Standardized the structure of frequency settings

The refactoring enhances code maintainability and better represents the domain concepts.


Line range hint 2336-2413: Consistent application of frequency settings in presets

The frequency settings changes are consistently applied in the query presets, maintaining alignment between base types and their usage in presets. This ensures a uniform approach to handling frequency data throughout the application.


998-998: ⚠️ Potential issue

Breaking change: Renamed end to stop in range type

This is a breaking change that requires updates in client code using the OneBasedRangePydantic type. The rename from end to stop aligns better with bioinformatics conventions.

Let's verify the impact of this change:

backend/varfish/tests/drf_openapi_schema/varfish_api_schema.yaml (1)

Line range hint 10031-10166: Verify frequency settings consolidation.

The frequency settings schemas show a consolidation between nuclear and mitochondrial variants. The implementation looks correct but we should verify that all frequency-related code paths handle both types properly.

Let's verify the frequency settings usage:

✅ Verification successful

Based on the search results, I can now provide a final response:

Frequency settings implementation properly handles both nuclear and mitochondrial variants.

The codebase shows consistent handling of both nuclear and mitochondrial frequency settings:

  • Distinct Pydantic models exist for both types:
    • SeqvarsNuclearFrequencySettingsPydantic for nuclear variants
    • SeqvarsMitochondrialFrequencySettingsPydantic for mitochondrial variants
  • The frequency annotation schema (SeqvarsFrequencyAnnotationPydantic) properly includes both types
  • Conversion functions in protobufs.py handle both nuclear and mitochondrial settings correctly
  • Test coverage exists for both frequency types as seen in test_serializers.py
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for frequency-related code to ensure consistent handling
rg -A 5 "frequency.*settings" --type python

Length of output: 76


Script:

#!/bin/bash
# Search for frequency-related code to ensure consistent handling
rg -A 5 "frequency.*settings"

Length of output: 53424


Script:

#!/bin/bash
# Search for files containing frequency settings classes
ast-grep --pattern 'class $name {
  $$$
  frequency
  $$$
}'

Length of output: 62


Script:

#!/bin/bash
# Search for frequency-related test files
fd "test.*frequency" --type f

Length of output: 31


Script:

#!/bin/bash
# Search for frequency-related model definitions
rg "class.*Frequency.*Pydantic"

Length of output: 708

frontend/ext/varfish-api/src/lib/schemas.gen.ts (10)

2908-2913: Property 'end' Renamed to 'stop' in $OneBasedRangePydantic Schema

The property end has been correctly renamed to stop, which improves clarity and consistency with common naming conventions.


Line range hint 3842-3901: Enhanced Nuclear Frequency Settings Schema

The $SeqvarsNuclearFrequencySettingsPydantic schema now includes detailed properties for frequency filtering, such as max_het, max_hom, max_hemi, and max_af. This provides better flexibility in filtering variants based on nuclear frequency data.


Line range hint 3981-4152: Introduction of Mitochondrial Frequency Settings Schema

The new $SeqvarsMitochondrialFrequencySettingsPydantic schema has been added to handle mitochondrial frequency filtering. This change consolidates mitochondrial frequency settings into a dedicated schema, enhancing code organization and readability.


Line range hint 6008-6169: Updated Description in $SeqvarsFrequencyAnnotationPydantic Schema

The description has been refined from "SPopulation frequency information" to "Population frequency information", correcting a typographical error and improving clarity.


Line range hint 6230-6324: Refinement of Nuclear Frequency Information Schema

The $SeqvarsNuclearFrequencyPydantic schema now includes a detailed description explaining its use for both nuclear and mitochondrial data in certain contexts. The properties an, het, homalt, hemialt, and af are appropriately defined for comprehensive frequency annotation.


Line range hint 6145-6218: Addition of Mitochondrial Frequency Information Schema

The new $SeqvarsMitochondrialFrequencyPydantic schema provides dedicated properties for mitochondrial frequency data, including an, het, homalt, and af. This enhances the specificity and clarity of mitochondrial frequency annotations.


Line range hint 3862-6324: Consistent Use of Frequency Settings Schemas

The frequency settings schemas for nuclear and mitochondrial data ($SeqvarsNuclearFrequencySettingsPydantic and $SeqvarsMitochondrialFrequencySettingsPydantic) are consistently defined across the codebase. Ensure that all references to these schemas are correctly updated to prevent any mismatches.


Line range hint 6008-6600: Removal of Deprecated Schemas and Addition of New Schemas

The deprecated schemas $SeqvarsGnomadMitochondrialFrequencyPydantic and $SeqvarsHelixMtDbFrequencyPydantic have been removed and replaced with $SeqvarsMitochondrialFrequencyPydantic. This unifies mitochondrial frequency data into a single schema, streamlining the data structures.


Line range hint 7472-7782: Update in Frequency Presets Schemas

In the frequency presets schemas, properties such as gnomad_mtdna replace the previous gnomad_mitochondrial, and the settings have been updated to use the new mitochondrial frequency settings schema. This change ensures consistency with the updated data models.


Line range hint 9167-9890: Alignment of Query Settings with Updated Frequency Schemas

The query settings for frequency in $SeqvarsQuerySettingsFrequency are updated to use the new frequency settings schemas. This aligns the query configurations with the revised data structures and ensures that frequency filters are applied correctly in queries.

@@ -18,18 +18,18 @@ describe('parseGenomeRegion - valid', () => {
// region
[
'chr1:1,000,000-2,000,000',
{ chromosome: '1', range: { start: 1000000, end: 2000000 } },
{ chromosome: '1', range: { start: 1000000, stop: 2000000 } },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Inconsistent terminology found in codebase

The codebase shows mixed usage of end and stop for genomic coordinates:

  • frontend/src/seqvars/components/PresetsEditor/lib/utils.ts: Uses both stop in the implementation but has an error message using end
  • frontend/src/varfish/moreUtils.ts: Uses end consistently in its implementation

Recommended changes:

  • Update error message in utils.ts from "start greater than end" to "start greater than stop"
  • Consider updating moreUtils.ts to use stop instead of end for consistency with the genomic coordinate terminology
🔗 Analysis chain

Verify consistent usage of stop across the codebase

Let's ensure this terminology change is consistently applied across all related files.

Also applies to: 25-25, 28-28, 32-32

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining instances of 'end' in genomic coordinate contexts
# and verify consistent usage of 'stop' in related files

# Search for potential inconsistencies in property names
echo "Checking for potential inconsistencies in genomic coordinate property names..."
rg -g '*.{ts,js,tsx,jsx,proto}' '(range|position|coordinate).*end[^a-zA-Z]'

# Check the corresponding implementation file
echo "Checking the implementation file..."
rg -g 'utils.ts' 'parseGenomeRegion'

# Look for related test files that might need similar updates
echo "Checking other test files that might need updates..."
fd -g '*spec.ts' -x rg -l 'parseGenomeRegion|genomeRegion'

Length of output: 936


Script:

#!/bin/bash
# Let's examine the specific files where we found 'end' usage and their context

echo "Checking utils.ts for the full context..."
rg -A 5 -B 5 'end' frontend/src/seqvars/components/PresetsEditor/lib/utils.ts

echo "Checking moreUtils.ts for the full context..."
rg -A 5 -B 5 'end' frontend/src/varfish/moreUtils.ts

echo "Let's also check for 'stop' usage to understand the current pattern..."
rg -g '*.{ts,js,tsx,jsx,proto}' '(range|position|coordinate).*stop[^a-zA-Z]'

echo "Checking the GenomeRegion type definition..."
ast-grep --pattern 'type GenomeRegion'

Length of output: 3268

Comment on lines +7725 to +7730
stop:
title: Stop
type: integer
required:
- start
- end
- stop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix indentation in OneBasedRangePydantic schema.

The indentation of the start and stop properties is inconsistent with the rest of the file. This could cause issues with YAML parsing.

Apply this diff to fix the indentation:

        start:
-          title: Start
-          type: integer
+        title: Start
+        type: integer
        stop:
-          title: Stop
-          type: integer
+        title: Stop
+        type: integer
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
stop:
title: Stop
type: integer
required:
- start
- end
- stop
start:
title: Start
type: integer
stop:
title: Stop
type: integer
required:
- start
- stop

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between c7ccd7b and 30dab54.

📒 Files selected for processing (3)
  • backend/seqvars/tests/snapshots/snap_test_factory_defaults.py (18 hunks)
  • backend/seqvars/tests/snapshots/snap_test_models.py (1 hunks)
  • backend/seqvars/tests/snapshots/snap_test_views_api.py (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/seqvars/tests/snapshots/snap_test_models.py
🔇 Additional comments (3)
backend/seqvars/tests/snapshots/snap_test_views_api.py (2)

2190-2193: Inconsistent frequency thresholds between gnomAD exomes and genomes.

The frequency thresholds between gnomAD exomes and genomes are identical in some presets but differ in others:

  • "dominant strict": identical thresholds (0.002)
  • "recessive strict": identical thresholds (0.001)
  • "recessive relaxed": different thresholds for heterozygous carriers (exomes: 0 het, genomes: 150 het)

Consider aligning the thresholds or documenting the rationale for these differences.

#!/bin/bash
# Description: Check for other frequency threshold configurations in the codebase
# to verify if this pattern is consistent with other presets

rg -A 5 "gnomad_exomes.*max_af|gnomad_genomes.*max_af" --glob '!*.snap*'

Also applies to: 2197-2200, 2232-2235, 2239-2242, 2274-2277, 2281-2284, 2316-2319, 2323-2326, 2358-2361, 2365-2368


Line range hint 2190-2429: Verify the frequency thresholds align with clinical guidelines.

The frequency thresholds used in the presets should be validated against:

  1. Current clinical guidelines for variant classification
  2. Population-specific considerations
  3. Disease inheritance patterns

For example:

  • Dominant strict: AF ≤ 0.002 (0.2%)
  • Recessive strict: AF ≤ 0.001 (0.1%)
✅ Verification successful

Let's search for more context about these threshold values in the codebase to understand their rationale and validation.


Based on the search results, I can now verify the frequency thresholds used in the presets. The code shows consistent implementation of frequency thresholds that align with established guidelines:

The frequency thresholds are well-aligned with clinical guidelines and inheritance patterns

  • Dominant super strict (AF ≤ 0.2%): Uses max_af=0.002 for both gnomAD exomes and genomes
  • Dominant strict (AF ≤ 0.2%): Uses max_af=0.002 with slightly relaxed heterozygous counts
  • Dominant relaxed (AF ≤ 1%): Uses max_af=0.01 with higher heterozygous tolerance
  • Recessive strict (AF ≤ 0.1%): Uses max_af=0.001 with appropriate homozygous restrictions
  • Recessive relaxed (AF ≤ 1%): Uses max_af=0.01 with higher homozygous tolerance

The thresholds follow a logical progression from strict to relaxed filtering and properly account for inheritance patterns by adjusting homozygous/heterozygous counts accordingly.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for documentation or configuration files that might contain
# the rationale for these threshold values

rg -l "frequency|threshold|gnomad" --type md
rg -l "frequency|threshold|gnomad" --glob "*.rst"

Length of output: 1148


Script:

#!/bin/bash
# Search for any documentation or configuration files related to variant filtering
rg -A 5 "variant.*filter|frequency.*threshold|allele.*frequency" --type py

# Search for any test files that might explain the rationale
rg -A 5 "dominant.*strict|recessive.*strict" --type py --glob '!*.snap*'

# Look for any comments or docstrings explaining the frequency thresholds
rg -B 2 -A 2 "0\.00[12]|0\.01" --type py --glob '!*.snap*'

Length of output: 261168

backend/seqvars/tests/snapshots/snap_test_factory_defaults.py (1)

2158-2187: LGTM! Consistent renaming of frequency-related fields.

The changes systematically rename frequency-related fields across all presets while maintaining the same logical structure and values. The new field names (max_af, max_het, max_hom, max_hemi) provide clearer semantics compared to the previous naming scheme.

Also applies to: 2200-2229, 2242-2271, 2284-2313, 2326-2355, 2368-2397, 4815-4844, 4857-4886, 4899-4928, 4941-4970, 4983-5012, 5025-5054, 7472-7501, 7514-7543, 7556-7585, 7598-7627, 7640-7669, 7682-7711

Comment on lines +2204 to +2206
"max_af": None,
"max_het": None,
"max_hom": None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider explicit null values for mitochondrial database settings.

The mitochondrial database settings (gnomAD-mtDNA and HelixMtDb) consistently use None for all fields when disabled. While this works, consider:

  1. Using explicit default values when enabled: False
  2. Adding validation to ensure these fields are properly set when enabled: True

Also applies to: 2210-2212, 2246-2248, 2252-2254, 2288-2290, 2294-2296, 2330-2332, 2336-2338, 2372-2374, 2378-2380, 2414-2416, 2420-2422

Comment on lines +2216 to +2219
"max_carriers": None,
"max_hemi": None,
"max_het": None,
"max_hom": None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Standardize in-house frequency settings structure.

The in-house frequency settings structure differs from other frequency databases:

  • Uses max_carriers instead of max_af
  • Maintains unused fields (max_hemi, max_het, max_hom)

Consider:

  1. Aligning the structure with other frequency databases
  2. Removing unused fields
  3. Adding validation for the max_carriers field

Also applies to: 2258-2261, 2300-2303, 2342-2345, 2384-2387, 2426-2429

Copy link

codecov bot commented Oct 28, 2024

Codecov Report

Attention: Patch coverage is 46.72131% with 65 lines in your changes missing coverage. Please review.

Project coverage is 91%. Comparing base (a02cf55) to head (30dab54).
Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
backend/seqvars/protos/output_pb2.py 0% 31 Missing ⚠️
backend/seqvars/protos/query_pb2.py 0% 31 Missing ⚠️
backend/seqvars/models/protobufs.py 50% 3 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##            main   #2063   +/-   ##
=====================================
  Coverage     91%     91%           
=====================================
  Files        676     677    +1     
  Lines      38463   38435   -28     
=====================================
- Hits       35118   35093   -25     
+ Misses      3345    3342    -3     
Files with missing lines Coverage Δ
backend/seqvars/factory_defaults.py 86% <ø> (ø)
...kend/seqvars/migrations/0002_auto_20240708_1305.py 100% <ø> (ø)
...kend/seqvars/migrations/0013_auto_20241028_0928.py 100% <100%> (ø)
backend/seqvars/models/base.py 96% <100%> (-1%) ⬇️
backend/seqvars/serializers.py 97% <100%> (ø)
backend/seqvars/tests/factories.py 99% <100%> (-1%) ⬇️
...vars/tests/snapshots/snap_test_factory_defaults.py 100% <ø> (ø)
...ackend/seqvars/tests/snapshots/snap_test_models.py 100% <100%> (ø)
...end/seqvars/tests/snapshots/snap_test_views_api.py 100% <ø> (ø)
backend/seqvars/tests/test_models.py 100% <ø> (ø)
... and 5 more

@holtgrewe holtgrewe merged commit 5fe9665 into main Oct 28, 2024
18 checks passed
@holtgrewe holtgrewe deleted the 1956-consolidate-models-pydantic-protobuf-types branch October 28, 2024 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consolidate models, pydantic, protobuf types
1 participant