chore: remove Identifier (once known as Identities) from codebase. #1845

jpivarski · 2022-10-28T20:34:47Z

Identifiers were motivated by PartiQL, an experimental combinatorics language that uses set-like referential identity to track particles through a calculation. It would require our data to track a surrogate index through all of its operations, like a Pandas Index.

The Identities (Awkward v1) and Identifier (Awkward v2) were a "foot in the door" implementation to add such a feature. Originally, it was supposed to be implemented before the Awkward 1.0.0 release, but other features were more pressing. I left them in on the theory that it's easier to take them out than add them later, but it's been 3 years now and 2.0.0 is almost ready to ship—it's not going to happen. Moreover, 2.0.0 is almost ready to ship—removing them would mean changing the public API (albeit for the low-level layer, intended for downstream dependencies, rather than data analysts). So it must be done now.

Hopefully, this won't be too hard to merge with the other PRs. The most likely complication is that all Content subclasses and Form subclasses have 1 fewer argument: instead of "identifier, parameters", it's just "parameters". "None" is a likely value to be passed as identifier, and it's a likely value to be passed to parameters, so it shows up as "None, parameters" or "None, None".

📚 The documentation for this PR will be available at https://awkward-array.readthedocs.io/en/jpivarski-remove-identifiers-aka-identities/ once Read the Docs has finished building 🔨

codecov · 2022-10-28T20:41:41Z

Codecov Report

Merging #1845 (e2008d8) into main (6b5d46e) will increase coverage by 0.10%.
The diff coverage is 93.95%.

Additional details and impacted files

Impacted Files	Coverage Δ
src/awkward/__init__.py	`97.05% <ø> (-0.09%)`	⬇️
src/awkward/_broadcasting.py	`93.41% <ø> (ø)`
src/awkward/_connect/pyarrow.py	`88.46% <ø> (ø)`
src/awkward/_util.py	`82.15% <ø> (-0.18%)`	⬇️
src/awkward/_v2.py	`100.00% <ø> (ø)`
src/awkward/operations/ak_full_like.py	`100.00% <ø> (ø)`
src/awkward/operations/ak_singletons.py	`96.00% <ø> (ø)`
src/awkward/operations/ak_with_name.py	`100.00% <ø> (ø)`
src/awkward/forms/unionform.py	`79.67% <50.00%> (ø)`
src/awkward/contents/emptyarray.py	`72.28% <57.14%> (+0.23%)`	⬆️
... and 38 more

jpivarski · 2022-10-28T20:53:41Z

Remove 1345 lines that were never tested (Identifiers were almost always None) and the coverage goes up by... 0.1%.

Merging #1845 (10bede1) into main (178b4e9) will increase coverage by 0.10%.
The diff coverage is 93.75%.

jpivarski · 2022-10-28T21:18:18Z

The docs have (correctly) lost all references to identifier.

agoose77

This was a big PR! It will be nice to drop the identities mechanism given that we don't properly leverage it; less code to read and type in future!

I went line-by-line. I'll run a few greps offline to see if we've missed anything, but once that's done I'll approve!

src/awkward/_connect/numba/layout.py

agoose77 · 2022-10-28T21:52:01Z

Also, I took the liberty of removing the v1 content from the Doxygen index, as you already have to touch that file in this PR.

agoose77 · 2022-10-28T21:56:47Z

I checked NumpyArray, IndexedOptionArray, and ListOffsetArray, which I expected to be the most common layout types, and found no misuse.

Do you know what the is_identifier.match excerpt refers to in _prettyprint.py (used in a few places)?

jpivarski · 2022-10-28T22:18:29Z

Do you know what the is_identifier.match excerpt refers to in _prettyprint.py (used in a few places)?

That checks to see if a field name of a record fits the regex that most languages use as identifiers—in particular, the Datashape language (which inherits it from whatever parser they're using). That's /[A-Za-z_][A-Za-z_0-9]*/.

This PR was not a sed substitution. I checked each usage of strings matching /identi/i (because I wanted to be sure there wasn't anything left-over from the "Identity"/"Identities" days).

I'll also simplify the two f-string substitutions. It used to be all format method calls because we were originally allowing for Python 2. So, little by little, I've been converting them to f-strings, and it seemed to be a good time/place to do that if there were identifiers to remove.

agoose77 · 2022-10-28T22:21:14Z

Do you know what the is_identifier.match excerpt refers to in _prettyprint.py (used in a few places)?

That checks to see if a field name of a record fits the regex that most languages use as identifiers—in particular, the Datashape language (which inherits it from whatever parser they're using). That's /[A-Za-z_][A-Za-z_0-9]*/.

Fab, this was my read of things, but I never closely looked at identifiers before you removed them, so I wanted to clarify.

This PR was not a sed substitution. I checked each usage of strings matching /identi/i (because I wanted to be sure there wasn't anything left-over from the "Identity"/"Identities" days).

You can tell; lots of tricky things to find beyond a guided exploration. Even just reading all the files took a while, nice effort!

I'll also simplify the two f-string substitutions. It used to be all format method calls because we were originally allowing for Python 2. So, little by little, I've been converting them to f-strings, and it seemed to be a good time/place to do that if there were identifiers to remove.

I'm 100% on-board with f-strings. In these two rare cases, I'd either assign the long statements to local variables, or just use a multi-line string.format, hence the suggestion.

jpivarski · 2022-10-28T22:22:19Z

In this case, I'm going to leave these strings as f-strings, but predefine the variables to substitute. That way, all of the Numba type strings will be generated in the same way.

jpivarski · 2022-10-28T22:31:37Z

Auto-merging.

Thanks!

chore: remove Identifier (once known as Identities) from codebase.

10bede1

jpivarski linked an issue Oct 28, 2022 that may be closed by this pull request

Remove Identifiers/identifier/has_identifiers from the codebase #1843

Closed

jpivarski requested a review from agoose77 October 28, 2022 20:44

docs: remove old v1 C++ references in Doxygen

9e9907e

jpivarski mentioned this pull request Oct 28, 2022

chore: remove Identifier and "uproot" parameter. scikit-hep/uproot5#770

Merged

agoose77 reviewed Oct 28, 2022

View reviewed changes

src/awkward/_connect/numba/layout.py Outdated Show resolved Hide resolved

src/awkward/_connect/numba/layout.py Outdated Show resolved Hide resolved

agoose77 approved these changes Oct 28, 2022

View reviewed changes

jpivarski and others added 2 commits October 28, 2022 17:30

Simplify the f-strings in RecordArrayType and UnionArrayType.

90fac56

Merge branch 'main' into jpivarski/remove-identifiers-aka-identities

e2008d8

jpivarski enabled auto-merge (squash) October 28, 2022 22:31

jpivarski merged commit ccadeec into main Oct 28, 2022

jpivarski deleted the jpivarski/remove-identifiers-aka-identities branch October 28, 2022 22:47

jpivarski mentioned this pull request Nov 2, 2022

refactor: removed placeholder (-1) intended for Identifier from Lookup. #1859

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: remove Identifier (once known as Identities) from codebase. #1845

chore: remove Identifier (once known as Identities) from codebase. #1845

jpivarski commented Oct 28, 2022 •

edited by github-actions bot

Loading

codecov bot commented Oct 28, 2022 •

edited

Loading

jpivarski commented Oct 28, 2022 •

edited

Loading

jpivarski commented Oct 28, 2022

agoose77 left a comment

agoose77 commented Oct 28, 2022

agoose77 commented Oct 28, 2022

jpivarski commented Oct 28, 2022

agoose77 commented Oct 28, 2022

jpivarski commented Oct 28, 2022

jpivarski commented Oct 28, 2022

chore: remove Identifier (once known as Identities) from codebase. #1845

chore: remove Identifier (once known as Identities) from codebase. #1845

Conversation

jpivarski commented Oct 28, 2022 • edited by github-actions bot Loading

codecov bot commented Oct 28, 2022 • edited Loading

Codecov Report

jpivarski commented Oct 28, 2022 • edited Loading

jpivarski commented Oct 28, 2022

agoose77 left a comment

Choose a reason for hiding this comment

agoose77 commented Oct 28, 2022

agoose77 commented Oct 28, 2022

jpivarski commented Oct 28, 2022

agoose77 commented Oct 28, 2022

jpivarski commented Oct 28, 2022

jpivarski commented Oct 28, 2022

jpivarski commented Oct 28, 2022 •

edited by github-actions bot

Loading

codecov bot commented Oct 28, 2022 •

edited

Loading

jpivarski commented Oct 28, 2022 •

edited

Loading