-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show some output validator feedback #140
base: develop
Are you sure you want to change the base?
Show some output validator feedback #140
Conversation
If the output validator put something in feedbackdir/judgemessage.txt or feedbackdir/teammessage.txt, then show (a summmary of) that information, preferring the former. The summary is just a truncation of the txt-file, except if it is recognisatble as the output from the standard output validator, in which case it is summarised more carefully.
Sorry for slow response; various other duties have demanded attention recently. If I understand correctly, the purpose of this is to show a digest of the validator feedback even for submissions which get their expected judgement? Can you post a screenshot of what it is supposed to look like? I'm not necessarily against adding this, but am slightly hesitant because if anything I think that the submission summary is already a bit too verbose. But I'm thinking maybe there should be a command-line flag controlling whether or not this is shown? |
Here’s the intended output from an interactive problem, askmarilyn — basically the Monty Hall problem. The judge (or team) messages are shown in square brackets. The interesting part is the WA section, where you can see the (wrong) submissions implementing various broken strategies or protocols, and triggering various judge/team messages. For instance, the WA submission “random door”, that just picks doors at random, receives 346 cars/drinks (out of 1000), which are too few. (More than 600 are needed.) This is helpful for increasing my confidence during problem development that the WA submissions achieve full code coverage in the output verifier, and that the team messages make sense.
|
Here’s the output on a simpler, non-interactive problem. The judge message from the standard output generator is shown, in a truncated but still useful way, in square brackets.
|
All of these examples show that |
Now we’re here, the way verification summaries are shown on https://github.com/RagnarGrootKoerkamp/BAPCtools (colours, alignment) are a big improvement for readability. I’m happy to submit a PR with a similar result. (But I’m a big fan of not interrupting workflows and UI conventions of experienced users. It’s highly plausible that I’m overlooking things and would prefer to have a lot more experience with actually using verifyproblems. I’m very much a newcomer.) |
As for the command line flag, the cleanest way is an option to verifyproblem that takes a format string, and a sensible default. This avoids overloading the options to verifyproblem. git’s |
That would be great! I've been meaning to do things like that for a long time but never get around to it. I know also @RagnarGrootKoerkamp has been wanting to merge those features of BAPCtools into problemtools.
I agree, was also thinking that a format string is the way to go. Something along the lines of git's pretty sounds perfectly reasonable to me.
I think these conventions you refer to are more localized than you think and I don't really like this part. But of course this becomes near-moot with the format string, then everyone can use whatever abbreviations they want. |
OK, I think I know what to do. The cleanest way would be to implement Once that is done, (some of) the formatting strings can become command-line options to Once that is done, I can add “my” team and judge message digests to the formatting mini language. I suggest we close this PR and I task myself with (some version) of the above plan. |
Note to self (or others who understand the code better): the fields |
Sounds good. Would probably be good to settle on the details of the formatting notation. It may be more Pythonesque to use |
I’m doing that right now. Here’s a teaser from my current code:
But I have to play around with this a bit and field-test it before I make a concrete suggestion. Any comment on Using |
After thinking a bit more: Would you object to either of
|
Hmm I think I would actually prefer the second option, since there are other things (validators) that are |
Where does the new |
1st proposal for syntax of formatting strings: There are (at least) two formatting strings,
Now for the problem. The format for results is much, much harder to specify, because not all fields are present in all cases. It would make sense to specify the two standard cases (AC and anything else with reason). Even then there would be whitespace issues. Check out these, just to get the ball rolling: This would reproduce the current behaviour:
And this would be BAPC-tools:
And here is a suggestion with judge or team messages and pythonesque truncation
These examples show some of the fields I have currently in play. But what about AC submissions with And what about non-AC submissions? Do we really need REJECTED_RESULT_FORMAT_WITH_REASON and REJECTED_RESULT_WITHOUT_REASON_BUT_POSITIVE_RUNTIME ? |
Here’s a concrete suggestion for which keywords to recognise. Expected usage is a format string like this:
'''Return submission result in the specified format.
The following keywords are recognised in the format string:
meaning type examples
verdict * str AC
score int 12
verdict_score * str AC, AC(12)
expected_verdict * str TLE
name * str binarysearch.py
directory * str accepted
language * str Python 3
max_time float .455411
total_time float 10.125315
max_time_tc str group1/021-n-100000.in
reason str validator produced "score.txt" ...
reason_tc str group2/03-overflow.in
judge_message str l 43: int expected
team_message str Congratulation! 534 drinks
The starred keywords are never None. The others may be None.
The numerical keywords (score, max_time, total_time) may
be None. If so, they degrade gracefully to minuses, but still
respect formatting directives. For instance, {score:03d} will
result in '014' if score is 12, and '---' if score is None.
''' Related: the current version of
That’s way easier to implement, of course. Still, something is fishy in that part of the code, and I don’t quite understand the expected semantics of running times being -1, None, or something else. |
@thorehusfeldt Can you rebase and clean up this PR as a preparation to get it merged? |
I’d be very happy to take this up again. My own conclusion is that there are two subtasks with well-defined but different goals:
1 precedes 2, and solving 1 well trivialises 2. So I guess the best way to proceed is to close the present pull request and task myself with solving 1 (resulting in a new pull request). |
@thorehusfeldt Sounds like a good plan. |
@pehrsoderman : I would need an answer to the question from 12 Nov 2019. The conversation with @austrin points to the solution of introducing Would it be useful have that as a first subgoal? The counterargument is that this would lead to quite a lot of changes to the code. (Thumbs-up is enough.) |
@pehrsoderman : I have now thought through one way of making Before continuing along this avenue I’d like you (or anybody else) to comment on that suggestion. (A next step would then be to separate |
If the output validator put something in
feedbackdir/judgemessage.txt
or
feedbackdir/teammessage.txt
, then show (a summmary of) thatinformation, preferring the former.
The summary is just a truncation of the txt-file, except if it
is recognisable as the output from the standard output validator,
in which case it is summarised more carefully.