decision log should include values from non-deterministic built-in functions like time.now_ns #1514

tsandall · 2019-06-21T15:58:25Z

Decision log events should include all of the information necessary to re-execute policy evaluation. Today if policies depend on time.now_ns() or other (custom) non-deterministic built-in functions, it is not easy to accurately re-execute policy evaluation. We should extend the decision log event structure to include additional values supplied to or computed during policy evaluation that are not contained in the input, policy, or contextual data (e.g., time.now_ns() result, opa.runtime() result, http.send() result, etc.)

philipaconrad · 2022-07-18T23:44:51Z

Built-in Functions with non-deterministic outputs

rand.intn (relies on RNG)
~~io.jwt.encode_sign, io.jwt.encode_sign_raw (relies on RNG internally)~~
~~io.jwt.decode_verify (relies on time internally)~~
time.now_ns (relies on time directly)
http.send (unpredictable HTTP request results)
net.lookup_ip_addr (unpredictable DNS resolution results)
opa.runtime (unpredictable config/environment-dependent results)
uuid.rfc4122 (relies on RNG)

Based on a cursory look through the built-ins list, these are the functions I found that have obvious non-deterministic behavior. The rest should be deterministic.

srenatus · 2022-07-19T07:29:53Z

FWIW the RNG source should come from the BuiltinContext. We've had to bend io.jwt.encode_sign and _raw to use it, though: e732b0b. But controlling that alone doesn't give us completely reproducible signatures, since there's an uncontrollable-random step that will either discard or use the first byte. (See this comment) The code path for this is Sign -> randutil.MaybeReadByte

anderseknert · 2022-07-19T08:35:57Z

Interesting. I guess it makes sense for consistency, but do users really want reproducible token signing? Would that not entail having their private keys logged with each decision? 🤔

srenatus · 2022-07-19T09:02:18Z

Hah, good point. I guess it would make sense to not worry about the io.jwt.encode_sign[_raw] builtins too much.

But thinking about this again, we also can't ever control http.send and net.lookup_ip_addr's inputs (those being outside servers) -- so, reading the top issue again, the proper way to go here might be to record the outcomes, and provide ways to stub those out for decision log replay. For that, the RNG doesn't matter as much -- we need to know what the result was -- and then io.jwt.encode_sign[_raw] aren't so different from the other builtins @philipaconrad has collected there.

anderseknert · 2022-07-19T12:02:15Z

Agreed, persisting the output of those rather than the input seems like a better approach 👍

philipaconrad · 2022-07-21T16:57:05Z

I'm considering a simple K/V cache for implementing aspects of this. Do we want a max size restriction on the cache? I'd assume that if someone enables recording non-deterministic builtin results that memory consumption isn't exactly a great concern, since you wouldn't turn it on unless you wanted to get a potentially large pile of results. 🤔

philipaconrad · 2022-07-21T16:59:09Z

Also, because this feature will potentially result in massive additional memory consumption (and equally massive decision logs in some cases!), I am planning to gate this feature of decision logging behind a config file option, like decision.record_non_deterministic_builtin_results = true/false, for example.

srenatus · 2022-07-21T17:08:38Z

Would it be that much, though? One time ns integer if time us used, one string for each encode_sign call... I suppose it would go into the decision logs entry, so we don't need to hold onto it anywhere else. 🤔

IIRC decision logs are flushed when their buffer is full, and that would then occur more often...?

srenatus · 2022-07-21T17:09:14Z

Hmm http.send is different, though. Those response payloads can be very large.

philipaconrad · 2022-07-21T18:18:18Z

Hmm http.send is different, though. Those response payloads can be very large.

Right, I'm thinking of the cases where a user hits several unique webpages.

IIRC decision logs are flushed when their buffer is full, and that would then occur more often...?

This would definitely increase how often those buffers flush, to such a degree that I think it's worth feature-gating behind a config option, with the default being to leave it turned off. 😰

philipaconrad · 2022-07-21T18:23:23Z

My current plan (I'm copying this over from the design doc) is to only cache the first result from a non-deterministic builtin. That at least bounds how much memory we're burning to cache stuff. The main builtins that might be affected by this behavior are probably http.send, net.lookup_ip_addr, and possibly opa.runtime(), since those all could potentially have non-idempotent results.

I think this is a reasonable, scope-limiting behavior decision because if a single webpage gives back different results each time it's queried with http.send, there's no way to replicate that in a decision replay down the road unless you're recording every result from http.send, and then we're basically just doing selective tracing with more steps, and will still have to account for cases where the user queries that webpage one more time than we have recorded results for. 🙁

srenatus · 2022-07-21T19:52:14Z

because if a single webpage gives back different results each time it's queried with http.send,

I don't think we'd notice 🙃 We shouldn't do two requests for

p {
    obj := {"method": "GET", "url": "https://example.com"}
    http.send(obj)
    http.send(obj)
}

so intra-query caching should make us not send two requests here (even if it was different rules etc; same input to http.send implies only one result, only one request sent. -- here's the code, I think

Same for net.lookup_ip_addr, see this.

philipaconrad · 2022-07-22T19:31:56Z

Well, that's perfect then! Since we already have caching in place for these things, it's mostly going to be an exercise in plumbing work to get those cached values up to where they need to go for decision logging.

This commit includes evaluator support for an opt-in, non-deterministic builtins caching system, designed to help with future replay of decision logs. The cache allows early-exit in the evaluator if the builtin is non-deterministic, and has already cached a result. Since the cache can be pre-populated by `rego` module users, this should make offline policy testing and future work around decision replay more straightforward. Fixes: open-policy-agent#1514 Signed-off-by: Philip Conrad <[email protected]>

This commit includes evaluator support for an opt-in, non-deterministic builtins caching system, designed to help with future replay of decision logs. The cache allows early-exit in the evaluator if the builtin is non-deterministic, and has already cached a result. Since the cache can be pre-populated by `rego` module users, this should make offline policy testing and future work around decision replay more straightforward. Fixes: #1514 Signed-off-by: Philip Conrad <[email protected]>

srenatus · 2022-09-01T18:31:33Z

Is this fixed already? I thought the merged PR was the first step....

philipaconrad · 2022-09-01T18:46:08Z

Ah, nuts. I accidentally left in the Fixes #1514 in the PR, because when it was first written, the plan had been to go all the way to decision logging integration. Yeah, let's reopen this one.

philipaconrad · 2022-09-01T18:47:32Z

We'll put this zombie issue back to rest when decision logging integrates the non-deterministic builtin cache work from #4926. 🧟

This commit integrates the ND builtins caching system into decision logging, both in the server and sdk packages. Some reworking of the NDBCache's serialization format were required to accommodate this. The feature is disabled by default, and must be opted into by user configuration. The feature can be enabled via a top-level config key: nd_builtin_cache=true The NDBCache is exposed to the masking system under the `/nd_builtin_cache` path, which allows masking or dropping sensitive values from decision logs selectively. Note: If a decision log event exceeds the `upload_size_limit_bytes` value for the OPA instance, OPA will reattempt uploading it, after dropping the NDBCache from the event. This behavior will trigger a log error, and will increment the `decision_logs_nd_builtin_cache_dropped` metrics counter. Fixes: open-policy-agent#1514 Signed-off-by: Philip Conrad <[email protected]>

This commit integrates the non-deterministic builtins caching system into decision logging, both in the server and sdk packages. Some reworking of the NDBCache's serialization format were required to accommodate this. The feature is disabled by default, and must be opted into by user configuration. The feature can be enabled via a top-level config key: nd_builtin_cache=true The NDBCache is exposed to the masking system under the `/nd_builtin_cache` path, which allows masking or dropping sensitive values from decision logs selectively. Note: If a decision log event exceeds the `upload_size_limit_bytes` value for the OPA instance, OPA will reattempt uploading it, after dropping the NDBCache from the event. This behavior will trigger a log error, and will increment the `decision_logs_nd_builtin_cache_dropped` metrics counter. Fixes: #1514 Signed-off-by: Philip Conrad <[email protected]>

…olicy-agent#5147) This commit integrates the non-deterministic builtins caching system into decision logging, both in the server and sdk packages. Some reworking of the NDBCache's serialization format were required to accommodate this. The feature is disabled by default, and must be opted into by user configuration. The feature can be enabled via a top-level config key: nd_builtin_cache=true The NDBCache is exposed to the masking system under the `/nd_builtin_cache` path, which allows masking or dropping sensitive values from decision logs selectively. Note: If a decision log event exceeds the `upload_size_limit_bytes` value for the OPA instance, OPA will reattempt uploading it, after dropping the NDBCache from the event. This behavior will trigger a log error, and will increment the `decision_logs_nd_builtin_cache_dropped` metrics counter. Fixes: open-policy-agent#1514 Signed-off-by: Philip Conrad <[email protected]> Signed-off-by: Byron Lagrone <[email protected]>

tsandall added the design label Jun 21, 2019

tsandall mentioned this issue Mar 30, 2021

Add print function #3319

Closed

tsandall added the auditing label Dec 2, 2021

anderseknert mentioned this issue May 19, 2022

Add support to use print with a log level of choice. #4670

Closed

philipaconrad mentioned this issue Aug 5, 2022

Support Caching Output of Non-Deterministic Builtins #4926

Merged

3 tasks

philipaconrad closed this as completed in #4926 Sep 1, 2022

philipaconrad reopened this Sep 1, 2022

philipaconrad mentioned this issue Sep 16, 2022

server+plugins: Integrate server + NDBCache in decision logs. #5147

Merged

7 tasks

philipaconrad closed this as completed in #5147 Oct 6, 2022

ashutosh-narkar added this to Open Policy Agent Aug 5, 2024

ashutosh-narkar moved this to Done in Open Policy Agent Aug 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

decision log should include values from non-deterministic built-in functions like time.now_ns #1514

decision log should include values from non-deterministic built-in functions like time.now_ns #1514

tsandall commented Jun 21, 2019

philipaconrad commented Jul 18, 2022 •

edited

Loading

srenatus commented Jul 19, 2022

anderseknert commented Jul 19, 2022

srenatus commented Jul 19, 2022

anderseknert commented Jul 19, 2022

philipaconrad commented Jul 21, 2022

philipaconrad commented Jul 21, 2022

srenatus commented Jul 21, 2022

srenatus commented Jul 21, 2022

philipaconrad commented Jul 21, 2022

philipaconrad commented Jul 21, 2022 •

edited

Loading

srenatus commented Jul 21, 2022

philipaconrad commented Jul 22, 2022

srenatus commented Sep 1, 2022

philipaconrad commented Sep 1, 2022

philipaconrad commented Sep 1, 2022

decision log should include values from non-deterministic built-in functions like time.now_ns #1514

decision log should include values from non-deterministic built-in functions like time.now_ns #1514

Comments

tsandall commented Jun 21, 2019

philipaconrad commented Jul 18, 2022 • edited Loading

Built-in Functions with non-deterministic outputs

srenatus commented Jul 19, 2022

anderseknert commented Jul 19, 2022

srenatus commented Jul 19, 2022

anderseknert commented Jul 19, 2022

philipaconrad commented Jul 21, 2022

philipaconrad commented Jul 21, 2022

srenatus commented Jul 21, 2022

srenatus commented Jul 21, 2022

philipaconrad commented Jul 21, 2022

philipaconrad commented Jul 21, 2022 • edited Loading

srenatus commented Jul 21, 2022

philipaconrad commented Jul 22, 2022

srenatus commented Sep 1, 2022

philipaconrad commented Sep 1, 2022

philipaconrad commented Sep 1, 2022

philipaconrad commented Jul 18, 2022 •

edited

Loading

philipaconrad commented Jul 21, 2022 •

edited

Loading