Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix IntersectMapKeys first-wins bug #339

Merged
merged 2 commits into from
Jul 8, 2024
Merged

Fix IntersectMapKeys first-wins bug #339

merged 2 commits into from
Jul 8, 2024

Conversation

drevell
Copy link
Contributor

@drevell drevell commented Jul 2, 2024

There was a bug in the old version. The intended behavior was for the the map value to come from the first map in the list. However, the actual behavior was that the map value came from either the first map or from the smallest map, depending on the size order of the input maps.

There was a bug in the old version. The intended behavior was for the
the map value to come from the *first* map in the list. However, the
actual behavior was that the map value came from either the first map or
from the *smallest* map, depending on the order of the input maps.

Alternative considered: do a more efficient algorithm. This alternative
wasn't chosen because it would be a lot of complexity, and the
improvement would only matter for large datasets, and would probably be
slower for small datasets. The most efficient algorithm I can think of
would be a hash join:

 - Find the intersection of the keys:
   - Sort the input maps in increasing order of size
   - Iterate through the input maps and keep only the keys that appear
     in every map
 - Iterate again through each map
   - For each key that's already known to be in every map:
     - If this key doesn't already appear in the output map:
       - Add this key's value to the output map

I didn't choose this option due to its complexity probably not being
worth it.
@drevell drevell requested a review from sethvargo July 2, 2024 21:55
@drevell drevell requested a review from a team as a code owner July 2, 2024 21:55
@drevell drevell marked this pull request as draft July 2, 2024 21:56
@drevell
Copy link
Contributor Author

drevell commented Jul 2, 2024

Converted to draft while I rethink this algorithm

@pdewilde
Copy link
Contributor

pdewilde commented Jul 2, 2024

Converted to draft while I rethink this algorithm

You could still have the smallest map optimization.

candidates = smallestMap
output = map[][]
for key in candidates:
  value = maps[0][key]
  for map in maps[1:] && value != null :
     if key not in map:
       value = null
   if value != null 
     output[key] = value
     

@sethvargo
Copy link
Contributor

What's the bug here?

@pdewilde
Copy link
Contributor

pdewilde commented Jul 2, 2024

What's the bug here?

From first revision of dave's description:

There was a bug in the old version. The intended behavior was for the the map value to come from the first map in the list. However, the actual behavior was that the map value came from either the first map or from the smallest map, depending on the size order of the input maps.

@drevell drevell requested a review from pdewilde July 2, 2024 22:39
@drevell drevell marked this pull request as ready for review July 2, 2024 22:39
@drevell
Copy link
Contributor Author

drevell commented Jul 2, 2024

What's the bug here?

From first revision of dave's description:

There was a bug in the old version. The intended behavior was for the the map value to come from the first map in the list. However, the actual behavior was that the map value came from either the first map or from the smallest map, depending on the size order of the input maps.

Right. You can cherry-pick just the test changes to reproduce the bug.

sets/maps.go Show resolved Hide resolved
@sethvargo
Copy link
Contributor

sethvargo commented Jul 2, 2024

#340

I'm pretty sure the fix can be much simpler at the cost of one more iteration. Given the number of maps we're intersecting is usually the lower bound, this seems negligible.

@drevell
Copy link
Contributor Author

drevell commented Jul 2, 2024

#340

I'm pretty sure the fix can be much simpler at the cost of one more iteration. Given the number of maps we're intersecting is usually the lower bound, this seems negligible.

From my point of view, the fix may be simpler in #340 (add a for loop), but the resulting algorithm is simpler in this PR.

@drevell drevell merged commit 45258f9 into main Jul 8, 2024
2 checks passed
@drevell drevell deleted the drevell/intersect-bug branch July 8, 2024 17:49
@token-minter-prod token-minter-prod bot mentioned this pull request Aug 6, 2024
token-minter-prod bot added a commit that referenced this pull request Aug 6, 2024
## What's Changed
* checkout config from default branch in load-workflow-variables action
by @gjonathanhong in #337
* Fix IntersectMapKeys first-wins bug by @drevell in
#339
* Fix docstring for logging method by @pjh in
#341

## New Contributors
* @pjh made their first contribution in
#341

**Full Changelog**:
v1.1.1...v1.1.2

Co-authored-by: token-minter-prod[bot] <125072751+token-minter-prod[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants