-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser tests without error cases #102
Comments
Testing of invalid HTML is one of reasons why this test suite exists in the first place. However, we are currently working on standardisation of parse errors both in spec and tests. Part for tokenisation was already landed and we just started work on tree construction stage. So eventually you will be able to get subset of test suite for valid HTML by just filtering out tests that have parse errors specified. |
I certainly want parse errors to be specified and in fact it's critical to me that I report the correct parse errors in my offline compiler.
Today, many tests use parse errors gratuitously. For example, it's hard (maybe impossible, but I can't tell) to find a test for the (valid) case of inserting a Here's an example test:
This test is really testing a very simple, valid thing, but because of the way it happens to be tested, it ends up producing a lot of errors that I would have to somehow know how to ignore. It's not obvious to me how to ignore these errors reliably. Maybe there's a good trick? Also, is the idea of the standardization to make tests like this one avoid gratuitously invalid content? That's more or less what I'm asking for. |
OK, I'm a bit confused now. What's the problem with errors if your parser can recover from errors like any spec compatible parser do? Or I'm missing something? |
Or you just want more test cases to be added for valid markup? If so, then it's up to you, we are happy to accept PRs. |
According to the spec:
We abort, in conformance with the spec, but the tests don't have a mode to test this conformant scenario. |
Thank you for the clarification. The only thing that we can do at the moment is just to add more tests for valid scenarios. Any contribution is welcome. |
@inikulin would you take patches to add missing doctypes and closing tags in existing tests that gratuitously do not include them? |
@inikulin another possible refactoring could be to differentiate between real errors and errors that only arise as a result of early termination of the HTML document. |
We would like to keep existing tests as is, since they are widely used by implementations that gracefully handle these errors (i.e. browsers). But we are open to accept new tests even if they are just valid versions of old one, you can place them in separate file, e.g. |
@inikulin would you be open to adding more metadata to existing tests, like indicating whether an error is due to early EOF? Can you think of a way to do that that wouldn't break existing consumers? |
Well, parse errors is such metadata. It's well documented which tokenisation errors occur due to unexpected EOF. |
Not implemented yet for tree construction stage, though |
@inikulin what I'm saying is that from my perspective there are "errors I am specifically trying to assert occur" and "errors that occur as a side effect of the way we wrote the test". Errors that occur due to unexpected EOF can be treated specially in my tests, because I have the partially constructed tree that I can compare. Maybe all I really need (and maybe this is already true) is that all errors that occur due to unexpected EOF end in something like |
It's very unlikely that there will be new errors added in near future and if they would it will be a significant change to HTML syntax itself, so you will need to update your parser anyway. Meanwhile, you can find standard parse errors and their descriptions in spec: https://html.spec.whatwg.org/multipage/parsing.html#parse-errors |
(Apart from tree construction stage errors that are not in spec yet) |
My vision is that if any implementation requires some subset of tests based on some assertions it should create these subset on it's side and we provide all required prerequisites for it, such as standard and well documented error codes. Or if it's missing some test cases we are open to accept new tests. I don't see any reason why some ad hoc solutions should get into this test suite. |
Thinking of it a bit more, we can add support for implementations that bailout on parse errors. We can achieve this by inserting marker in expected markup that shows position of first parse error. So during testing it will be possible to skip comparison of AST after this marker. However, it might not work in all cases, such as when error recovery mechanism modify AST that is already parsed before error point, e.g. in case of foster parenting. But we can mark these tests somehow as well, so they can be completely skipped \cc @gsnedders |
@inikulin if there's a strategy that you're happy with, I'm willing to do some work to update the tests in that way. One issue with the current tests is that a lot of the cases of "first parse error" is "missing doctype". I've been thinking more about the issue of keeping the existing suite working for existing impls. One approach might be to accept some improvements to the metadata description, but write a script to emit the original format. We could then confirm 0 diffs in the original format, but I could enrich the test definitions with more information. |
@wycats Sorry, I've missed your last reply somehow. In my opinion for now the best solution will be the one that doesn't require any change in the existing test format. So, I vote for adding new tests in the separate file (they can be modified existing tests). |
We attempt to implement a valid parser for a large subset of valid HTML in Glimmer (including SVG, including foreign elements), but unfortunately we cannot use these tests.
The reason is that the tests liberally make use of invalid HTML (missing doctypes, no closing tags, etc.) for convenience, but since Glimmer is an offline build tool, we report real errors in these cases.
Right now, we make a best-effort attempt at conformance (we fix bugs when they're reported) but I just noticed a bug in our table handling that makes me wish we could just use an exhaustive suite directly.
Is there any interest in avoiding gratuitous use of invalid HTML in the tests, or is it too late for that?
The text was updated successfully, but these errors were encountered: