Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early Exit Finding First Item in Array That Meets Some Condition #5170

Open
davidmarne-wf opened this issue Sep 23, 2022 · 4 comments
Open

Comments

@davidmarne-wf
Copy link
Contributor

davidmarne-wf commented Sep 23, 2022

What is the underlying problem you're trying to solve?

There is no way to exit early when trying to find the first item in an array that meets some condition. As far as i can tell, to find the first item in an array that meets some condition, one must use a list comprehension to create a new array of all the items that meet the condition, then take the first.

Describe the ideal solution

rego adds a new keyword, first, that iterates arrays in order and exits early after finding the first value that satisfies a condition

items := [0, -2, 4, 6, 2]

# should return 4
first_positive_item = item if {
  first item in items
  item > 0
}

item > 0 should only be executed three times, not 5. Once where item is 0, once where item is -2, then finally once where item is 4 and the condition is met.

if item > 0 were replaced with an expensive rule/function call, or the items array was very long, this could lead to some big performance boosts over the "Good Enough" solution

first can only be used with arrays. non ordered collections, ie sets and maps would not work with first

Describe a "Good Enough" solution

Policy authors leverage the following technique to find the first item in an array that meets some condition. A performance penalty is paid since the item > 0 condition must be executed in every item in the array.

items := [0, -2, 4, 6, 2]

first_positive_item = first_item if {
  positive_items := [item | some item in items; item > 0]
  count(positive_items) > 0
  first_item := positive_items[0]
}

Would love to hear other suggestions here if anyone has any!

Additional Context

Slack thread discussing my teams specific use case in more detail

@srenatus
Copy link
Contributor

Thanks for taking the time to write this up!

There might be further ways to approach this. So for one, EE of first_positive_item in

first_positive_item = item if {
  some item in items
  item > 0
}

would not be an option since it requires constant values, and item is a variable.

Now if we define a multi-value rule (partial set), that would be

positive_items contains item if {
  some item in items
  item > 0
}

but if we use that, like

allow {
  some item in positive_items
  whatever(item)
}

allow can exit early (it's got only one constant value, true); but when topdown evaluates some item in positive_items it'll notice that item isn't bound, so it'll eval and cache all of positive_items.

If item was already bound, some item in positive_items would not evaluate them all. 👈 Could this be an alley worth exploring?


So -- this doesn't work, but I think maybe it could:

package p

import future.keywords.contains
import future.keywords.if
import future.keywords.in

items := {-1, 0, 1, 2}

first_positive_item contains item if {
	some item in items
	item > 0
	print(item)
}

allow if {
	some item in items
	first_positive_item[item] # (a)
}

Currently, evaluating (a) still suppresses the early exit, when it should be safe to do that. This is one of the EE improvement we had considered back when implementing EE In the first place.

As a side note, I'm afraid it's an implementation detail you shouldn't have to worry about, but using this

item in first_positive_item

instead of (a) always evaluates the collection (first_positive_item) in full, since it's de-sugared into a function call, internal.member2(first_positive_item, item).

@srenatus
Copy link
Contributor

srenatus commented Sep 23, 2022

👉 #4035 this is the issue I've meant there.

@srenatus
Copy link
Contributor

Now here's another thing that bugs me -- the last paragraph. You shouldn't have to know those things about in. Perhaps we could look into some compiler rewriting to turn in into the other form during compilation. The tricky bit is the return value, if set[a] fails because a is not a member of the set, it'll be undefined; internal.member2(set, a) will be false.

But it's not really urgent without the other feature -- If #4035 was done, it would come with benefits; right now, it doesn't matter much.

@stale
Copy link

stale bot commented Oct 23, 2022

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.

@stale stale bot added the inactive label Oct 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants