-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PLT-936: Add Arbitrary Data
instance
#4922
Conversation
Arbitrary Data
instanceArbitrary Data
instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you show what kind of Data
objects this generator produces? Via something like
replicateM 10 . generate . resize 80 $ arbitrary @Data
Sure, here it is:
|
@@ -13,7 +14,8 @@ import Test.Tasty.QuickCheck | |||
-- The default for the argument is @1@. | |||
generators :: Int -> TestNested | |||
generators factor = return $ testGroup "generators" | |||
[ testProperty "prop_genKindCorrect" $ withMaxSuccess (factor*100000) (prop_genKindCorrect False) | |||
[ testProperty "prop_genData" $ withMaxSuccess (factor*10000) prop_genData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plutus-ir/test
is a weird place for this, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Particularly since the Data
generator comes from PlutusCore.Generators.QuickCheck.Builtin
.
You mean that the existing list generators don't let you provide a lower bound? Maybe that's so it doesn't interfere with shrinking. |
I don't know if you're seen this, but there's a QuickCheck generator for |
It requires using |
I think there's also another generator for data somewhere else... ah yes: https://github.com/input-output-hk/plutus/blob/master/plutus-tx/test/Spec.hs#L133 |
Yeah, there are two hedgehog generators for Data, which should be merged into one. |
@kwxm Yeah, I saw that. Is it necessary to bound the |
I think the main problem is that the size function for A better approach might be to fix the size measure and even have multiple measures of size for the same object (eg, number of nodes, total size of integer fields, total size of bytestring fields). There are some tickets to look into that. |
I took a closer look at Anyway, I think it makes sense for the budgeting benchmark to have its own |
There's a lot to discuss here... The most obvious thing is that distribution of integers and bytestrings is horribly skewed. It's especially clear in this example:
This is because you start with a big size and generate big constants thereof, but as the size gets smaller, the constants get smaller and smaller too. We don't want the size of constants to be tied to the size of the AST. There's a Note about that in the brand new QuviQ's generators:
As such, you virtually never generate an "outer" Which brings us to the next point: we do want to generate a lot of meaningful values in an And I don't mean to suggest that we only want to generate meaningful values: it is as important to run tests over diverse "meaningless" values as it is to run them over meaningful ones. And meaningful and meaningless are not the only kinds of possible values! Perhaps there's something in the middle: e.g. a Perhaps we don't want to go this far for an initial implementation at least, but we definitely should have a non-negligible amount of Another issue is that this generator is exponential. Here are the line lengths (not a very scientific metric, but will do for the purpose of the argument) of the output you pasted:
The size of values grows extremely fast. 30% of cases are trivial ones and then we immediately jump to thousands. What this means is that we extremely rarely generate, say, lists with multiple values, all of which are not huge. You can of course say that with the size And the reason why those are not of hundreds of thousands of size is lists being always of length We have a generator in the distribution
As you can see we get diverse and reasonably distributed types despite having pairs which potentially could lead to exponentiality and weird distribution (handling lists rather than pairs is gonna be a bit more complicated I think, but probably not much). It is absolutely fine to get some weird distribution for a part of the generator, but the meaningful values should be generated with more or less reasonable distribution. So overall, I think we can merge this PR, but if we care about having quality testing infrastructure, we'll have to devote more time to writing complex generators, up to writing tests for the generators. |
@effectfully Thanks, these are very good to know. Apparently writing a good generator is harder than I thought! |
I've got through multiple iterations of this realization already. Not feeling confident still.
Could you should some similar output? It's really hard to predict what ends up being generated. |
I agree, but let me know if you have any good ideas! I see that we have a separate issue (SCP-3653) for doing that for costing. I'll update the title of that to make it clear that it's for costing. |
We should remember to do PLT-118 so that we get things like |
@effectfully 5f08336 contains a new generator that is an improved version of the Hedgehog generator, in that the longer the list, the smaller the elements. Here are 20 random Data values, generated via
It looks reasonable to me but let me know what you think. |
@zliu41 using the stupid metric of the rendered string's length I get multiple numbers higher than 100000 symbols:
so there's still some exponentiality going on, but now it's bound by the constant depth rather than the dynamic size, which is good, because we're not going to generate an obscenely huge As for the exponentiality, depth-based generation is inherently exponential if BTW, one thing I noticed is that you decrease the depth for each recursive generator twice. Once in, say, This generator is of course completely useless for testing any kind of business logic. There's virtually zero chance that a validator will not fail on a Overall, I like the new version more, but it has a lot of the same problems as the previous one, plus there's no way of dynamically controlling the size of the AST of the generated I'm still completely fine with us merging it as is and coming back to it later. It's not wrong, just skewed. And if we want to make a more balanced generator that does not blow up and we get both deep and wide |
Again, great points. I like the |
@effectfully updated - using Here's what I got: Default size:
Resize 80:
This one should be better than the previous two attempts, since we can control the AST size, and the constant sizes are not affected by AST sizes (there's no |
This probably contributed to the exponential behavior, at least in part: [ (1, genList 0 5 (gen (depth - 1)))
, (1, genList 0 50 (gen (depth - 2)))
, (1, genList 0 500 (gen (depth - 3)))
] Changing it to this seems to help: [ (100, genList 0 5 (gen depth))
, (10, genList 0 50 (gen (depth `div` 2)))
, (1, genList 0 500 (gen (depth `div` 4)))
] Of course, to get the optimal numbers we'd need to do some more rigorous analysis. |
* Add `Arbitrary Data` instance * New generator * Update `genData` include using ``depth `div` 2``
This is the first time I use QuickCheck, and I find it surprising that it doesn't have something equivalent to
Hedgehog.Gen.list
.