-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: remove Identifier (once known as Identities) from codebase. #1845
Conversation
Codecov Report
Additional details and impacted files
|
The docs have (correctly) lost all references to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a big PR! It will be nice to drop the identities mechanism given that we don't properly leverage it; less code to read and type in future!
I went line-by-line. I'll run a few greps offline to see if we've missed anything, but once that's done I'll approve!
Also, I took the liberty of removing the v1 content from the Doxygen index, as you already have to touch that file in this PR. |
I checked Do you know what the |
That checks to see if a field name of a record fits the regex that most languages use as identifiers—in particular, the Datashape language (which inherits it from whatever parser they're using). That's This PR was not a sed substitution. I checked each usage of strings matching I'll also simplify the two f-string substitutions. It used to be all |
Fab, this was my read of things, but I never closely looked at identifiers before you removed them, so I wanted to clarify.
You can tell; lots of tricky things to find beyond a guided exploration. Even just reading all the files took a while, nice effort!
I'm 100% on-board with f-strings. In these two rare cases, I'd either assign the long statements to local variables, or just use a multi-line |
In this case, I'm going to leave these strings as f-strings, but predefine the variables to substitute. That way, all of the Numba type strings will be generated in the same way. |
Auto-merging. Thanks! |
Identifiers were motivated by PartiQL, an experimental combinatorics language that uses set-like referential identity to track particles through a calculation. It would require our data to track a surrogate index through all of its operations, like a Pandas
Index
.The
Identities
(Awkward v1) andIdentifier
(Awkward v2) were a "foot in the door" implementation to add such a feature. Originally, it was supposed to be implemented before the Awkward 1.0.0 release, but other features were more pressing. I left them in on the theory that it's easier to take them out than add them later, but it's been 3 years now and 2.0.0 is almost ready to ship—it's not going to happen. Moreover, 2.0.0 is almost ready to ship—removing them would mean changing the public API (albeit for the low-level layer, intended for downstream dependencies, rather than data analysts). So it must be done now.Hopefully, this won't be too hard to merge with the other PRs. The most likely complication is that all Content subclasses and Form subclasses have 1 fewer argument: instead of "
identifier, parameters
", it's just "parameters
". "None
" is a likely value to be passed asidentifier
, and it's a likely value to be passed toparameters
, so it shows up as "None, parameters
" or "None, None
".📚 The documentation for this PR will be available at https://awkward-array.readthedocs.io/en/jpivarski-remove-identifiers-aka-identities/ once Read the Docs has finished building 🔨