-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-828] [Feature] dbt-core raising compile errors for invalid metric names (and potentially invalid model names) #5456
Comments
Third option: Basically: Disallow things we know are bad (columns starting with digits, periods in column names) and play a bit of whack-a-mole on anything else that comes up as/if it happens. |
I was going to propose something similar to Joel's idea above, from the opposite starting point: What if we were just very strict about metric naming? Alphanumeric characters and underscores, can't start with a number. That's it. Not too tricky to regex for that I'm pretty sure that's the rule for column naming in BigQuery + Apache Spark, and it ought to be the rule for Snowflake, since anything trickier than that requires quotes (bad!!). For the ultimate implementation, we just need to update this logic with a slightly fancier regex: dbt-core/core/dbt/contracts/graph/unparsed.py Lines 468 to 469 in f988f76
Pulling this into the v1.3 milestone, since it feels like a thing we should try to get right before metrics' "experimentality" comes to an end! |
That works for me - seems like the simplest solution that satisfies the requirements. Figured it would be good to confirm with an example and the documentation. The Big 4
Proposed Regex:
|
Thanks for the refinement @callum-mcdata! These two make total sense to me:
I could see some folks eventually running into issues with the third one, though I don't mind it personally. Concision is a virtue (not one I always live up to) — and we support This is the only one that concerns me:
To perform that validation, we'd need to either:
We'd also need to update the list of keywords each time the database adds a new one. That feels more reasonable to do if we go with option 2, and trust each adapter maintainer to keep on top of it. Maybe, in the future, this could be a feature of our SQL grammar + dialects...? For now - how willing would you be to punt on item 4? We would document: "The name of your metric must not match a reserved SQL keyword in your database." |
@jtcohen6 I'd be fine with punting on option 4. I got a bit carried away last night trying to think of all possible complications that would raise issues for naming metrics 😅
I think this is a great point and, at least to me, helps justify punting on this. Once that work stream is in flight, we can revisit this discussion and see if we want to go with option 2 (adapters) or the SQL grammar. As for option 3 (character limits) I definitely agree that people will run into this constraint. So with that in mind, the proposed changes are:
|
💯 I feel even better about the character constraint. Let's make it happen! |
This is the acceptance criteria to close this ticket out:
|
This should include validating names for exposures as well #5606 |
Is this your first time opening an issue?
Describe the Feature
Continuing this conversation from CT-65 [Bug] Metrics' names shouldn’t be allowed to contain spaces after some more recent work on the metrics package. @joellabes had raised some really great points around additional constraints for metric names that weren't fixed when that issue was closed. Namely, leading numbers and periods in metric names that would cause column aliasing to fail.
If I understand the issue correctly, there are two paths forward that I can see:
1. Introduce additional naming constraints into dbt parsing logic based around db column naming
2. Accept run errors around column names
Based on the amount of work laid out in the Con section of option 1, I am personally of the opinion that we move forward with option 2 until there is bandwidth or interest in the core team to tackle something like this. Granted, I could be totally off base around how much work it is and that could change the math here.
Describe alternatives you've considered
^See option 2.
Who will this benefit?
This will benefit anyone using the metrics package and potentially more if the scope is expanded to model names.
Are you interested in contributing this feature?
Potentially! Depends on the complexity.
Anything else?
Nope! For those who read the entire issue, thank you - please enjoy this very cute video of a bunny!
The text was updated successfully, but these errors were encountered: