Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental changes for Crosshair support #4164

Merged
merged 2 commits into from
Nov 29, 2024

Conversation

Zac-HD
Copy link
Member

@Zac-HD Zac-HD commented Nov 10, 2024

cc @pschanely

  • following Various small fixes #4165, the Tracer no longer branches (in a way visible to Crosshair) on whether sys.settrace() is active
  • span_start() and span_end() methods to make recognising recursive strategies easier.
    • not yet tested, but that's why the PR is marked experimental.
  • experimental engine change which I think should let us raise Unsatisfiable after an alternative backend gives up.
    • I'm not sure of this though, and would appreciate a pointer to the affected tests so that we can get this behavior deliberately checked by our test suite.

@Zac-HD Zac-HD added the interop how to play nicely with other packages label Nov 10, 2024
@@ -1334,6 +1334,29 @@ def draw_bytes(
) -> bytes:
raise NotImplementedError

def span_start(self, label: int, /) -> None:
Copy link
Member

@tybug tybug Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big fan of the "span" terminology, over "examples".

In the future we should probably provide a more meaningful representation of the strategy than an integer label, so providers can do particular things for e.g. lists. Pass through the strategy instance wholesale, for examples with a corresponding strategy? I'd guess providers are unlikely to check anything beyond type(strategy) is ListStrategy, but could also check attributes like strategy.min_size in principle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(maybe we should rename the existing code to match? big diff but it'd make it easier to work with the internals...)

The PrimitiveProvider interface is explicitly unstable for now, so we can change the label type when we get around to that ourselves. Nonetheless IMO passing the strategy instance exposes too much accidental complexity and implementation detail; I do want to at least try evolving towards a clean well-abstracted interface.

Copy link
Member

@tybug tybug Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass the strategy type + relevant type-specific kwargs as a dict?

def span_start(self, label: int, strategy: type[SearchStrategy] | None, strategy_kwargs: dict[str, Any])

Agree we can just change it later though, not something we have to decide now. And the label will likely have to stay forever because not all spans are associated with a strategy (eg stateful rules).

-0 on a patch in the near future for selfish reasons; I have some local work towards excising all buffer usages, which is very large and would require conflict resolution. But I wouldn't stop such a pull.

@pschanely
Copy link
Contributor

  • experimental engine change which I think should let us raise Unsatisfiable after an alternative backend gives up.
    • I'm not sure of this though, and would appreciate a pointer to the affected tests so that we can get this behavior deliberately checked by our test suite.

Thanks for this! Yeah, this sort of change looks like the kind of thing that would help, but my example cases aren't yet passing. They are test_erroring_rewrite_unsatisfiable_filter, test_unsat_filtered_sampling, and test_unsat_filtered_sampling_in_rejection_stage. It's also possible that my diagnosis is just wrong. I'll doublecheck today.

@pschanely
Copy link
Contributor

  • experimental engine change which I think should let us raise Unsatisfiable after an alternative backend gives up.

    • I'm not sure of this though, and would appreciate a pointer to the affected tests so that we can get this behavior deliberately checked by our test suite.

Thanks for this! Yeah, this sort of change looks like the kind of thing that would help, but my example cases aren't yet passing. They are test_erroring_rewrite_unsatisfiable_filter, test_unsat_filtered_sampling, and test_unsat_filtered_sampling_in_rejection_stage. It's also possible that my diagnosis is just wrong. I'll doublecheck today.

Ok! What you have is almost enough; however I also need BackendCannotProceed exception cases to count as INVALID; something like this?:

diff --git a/hypothesis-python/src/hypothesis/internal/conjecture/engine.py b/hypothesis-python/src/hypothesis/internal/conjecture/engine.py
index d12ad4b5b..180ac3f81 100644
--- a/hypothesis-python/src/hypothesis/internal/conjecture/engine.py
+++ b/hypothesis-python/src/hypothesis/internal/conjecture/engine.py
@@ -444,6 +444,7 @@ class ConjectureRunner:
             interrupted = True
             raise
         except BackendCannotProceed as exc:
+            data.status = Status.INVALID
             if exc.scope in ("verified", "exhausted"):
                 self._switch_to_hypothesis_provider = True
                 if exc.scope == "verified":

This solution would imply that we're declaring that exhausted/verified should be raised outside of a normal execution. But I think we're ok with that?

@Zac-HD
Copy link
Member Author

Zac-HD commented Nov 18, 2024

Ok! What you have is almost enough; however I also need BackendCannotProceed exception cases to count as INVALID

Hmm. It's not really Status.INVALID either; that would imply that we've had a filter/assume predicate reject this input - instead I've tried using the interrupted = True code path that we have for KeyboardInterrupt to indicate that we should just throw out that execution entirely; that should work but I'm honestly not sure whether it does.

@pschanely
Copy link
Contributor

Ok! What you have is almost enough; however I also need BackendCannotProceed exception cases to count as INVALID

Hmm. It's not really Status.INVALID either; that would imply that we've had a filter/assume predicate reject this input - instead I've tried using the interrupted = True code path that we have for KeyboardInterrupt to indicate that we should just throw out that execution entirely; that should work but I'm honestly not sure whether it does.

Sounds good. I will franken-merge the things together, apply electricity, and report back.

@pschanely
Copy link
Contributor

Hmm. It's not really Status.INVALID either; that would imply that we've had a filter/assume predicate reject this input - instead I've tried using the interrupted = True code path that we have for KeyboardInterrupt to indicate that we should just throw out that execution entirely; that should work but I'm honestly not sure whether it does.

Makes sense. This tweak is not solving the issue for me right now however. Setting interrupted = True indeed skips some logic, but because we aren't actually raising an exception, much of ConjectureRunner.test_function still runs; critically the line that increments self.valid_examples. Adding an explicit return statement at the bottom of the except BackendCannotProceed block seems to make it work, but I have very low confidence in my ability to suggest the correct fix in this situation!

@Zac-HD
Copy link
Member Author

Zac-HD commented Nov 28, 2024

Looks like the early return was indeed the key; with some minor tweaks I've confirmed that this fixes the can't-raise-Unsatisfiable problem we've had 🎉

Once you confirm, I'll merge!

@pschanely
Copy link
Contributor

Looks like the early return was indeed the key; with some minor tweaks I've confirmed that this fixes the can't-raise-Unsatisfiable problem we've had 🎉

Once you confirm, I'll merge!

Yes! These work for me too!

@Zac-HD Zac-HD marked this pull request as ready for review November 29, 2024 22:57
@Zac-HD Zac-HD merged commit 1e91394 into HypothesisWorks:master Nov 29, 2024
49 checks passed
@Zac-HD Zac-HD deleted the report-spans branch November 29, 2024 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interop how to play nicely with other packages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants