Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate fieldnames and types when using pydantic codegen #1189

Merged
merged 2 commits into from
Nov 22, 2024

Conversation

imalsogreg
Copy link
Contributor

@imalsogreg imalsogreg commented Nov 22, 2024

Python and pydantic do not allow arbitrary identifiers to be used as fields in classes. This PR adds checks to the BAML grammar, which run conditionally when the user includes a python/pydantic code generator block:

  • field names must not be Python keywords.
  • field names must not be lexographically equal to the field type, or the base of an optional type.

E.g. rule 1:

# Not ok
class Foo(BaseModel):
  if string

E.g. rule 2:

class ETA(BaseModel):
  time: string

# Not ok
class Foo(BaseModel):
  ETA: ETA

These rules are now checked during validation of the syntax tree prior to construction of the IR, and if they are violated we push an error to Diagnostics.

Bonus: There are a few changes in the PR not related to the issue - they are little cleanups to reduce the number of unnecessary rustc warnings.


Important

Add validation for field names in BAML classes to prevent Python keyword and type name conflicts when using Pydantic code generation.

  • Validation:
    • Add assert_no_field_name_collisions() in classes.rs to check field names against Python keywords and type names when using Pydantic.
    • Use reserved_names() to map keywords to target languages.
  • Diagnostics:
    • Update new_field_validation_error() in error.rs to accept String for error messages.
  • Miscellaneous:
    • Remove unused code and features in lib.rs and build.rs to reduce rustc warnings.
    • Add tests generator_keywords1.baml and generator_keywords2.baml to validate new rules.

This description was created by Ellipsis for 49d31fb. It will automatically update as commits are pushed.

Copy link

vercel bot commented Nov 22, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
baml ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 22, 2024 11:41pm

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to f4ee739 in 1 minute and 32 seconds

More details
  • Looked at 378 lines of code in 11 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. engine/baml-lib/baml-core/src/validate/validation_pipeline/validations.rs:31
  • Draft comment:
    The function assert_no_reserved_names_in_fields is called here but is not defined. Did you mean to call assert_no_field_name_collisions instead?
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The comment is speculative because it assumes the function is undefined without providing evidence. The diff does not show any changes related to the definition of assert_no_reserved_names_in_fields, so the comment might not be relevant to the changes made in this diff. Without more context, it's unclear if the function is indeed undefined or if the suggested alternative is correct.
    I might be missing the broader context of the codebase where the function could be defined elsewhere. The comment could be valid if the function is indeed undefined, but this cannot be confirmed with the current information.
    Given the lack of evidence in the diff and the speculative nature of the comment, it is safer to assume the comment is not useful unless proven otherwise.
    Delete the comment as it is speculative and not directly related to the changes in the diff.
2. engine/baml-lib/baml-core/src/validate/validation_pipeline/validations/classes.rs:56
  • Draft comment:
    The function assert_no_field_name_collisions is defined but not used. It seems like it should be called instead of assert_no_reserved_names_in_fields in validations.rs.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_iIwecOoidhbiaDnO


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 2bf8f73 in 15 seconds

More details
  • Looked at 13 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. engine/baml-lib/baml-core/src/validate/validation_pipeline/validations.rs:31
  • Draft comment:
    Ensure that all references to assert_no_reserved_names_in_fields are updated to assert_no_field_name_collisions for consistency.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The function name change from assert_no_reserved_names_in_fields to assert_no_field_name_collisions should be reflected in all relevant documentation and comments to maintain consistency.

Workflow ID: wflow_yHwEbIzvM7QTpfe2


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 49d31fb in 22 seconds

More details
  • Looked at 271 lines of code in 6 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. engine/baml-lib/baml-core/src/validate/validation_pipeline/validations/classes.rs:114
  • Draft comment:
    Consider directly inserting language keywords into the HashMap instead of using an intermediate Vec. This will improve performance by reducing unnecessary allocations and iterations.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The reserved_names function is currently using a Vec to store language keywords, which is then converted to a HashMap. This can be optimized by directly inserting into the HashMap without the intermediate Vec. This will improve performance by reducing unnecessary allocations and iterations.
2. engine/baml-lib/baml-core/src/validate/validation_pipeline/validations/classes.rs:90
  • Draft comment:
    Consider using unwrap_or_default() instead of map_or("".to_string(), ...) for better readability and to avoid unnecessary string allocations.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The map_or function is used with an empty string as the default value. This could be replaced with unwrap_or_default for better readability and to avoid unnecessary string allocations.

Workflow ID: wflow_BPj7ZLovayyqiTYW


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@imalsogreg imalsogreg added this pull request to the merge queue Nov 22, 2024
Merged via the queue into canary with commit 93b393d Nov 22, 2024
10 checks passed
@imalsogreg imalsogreg deleted the greg/python-keywords branch November 22, 2024 23:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant