-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser agnostic i18n Locale transform #12238
base: master
Are you sure you want to change the base?
Parser agnostic i18n Locale transform #12238
Conversation
Allows to not rely on strange hacks that are RST dependant. There is still an issue With the warning of missing literal block
So we trim the literal suffix to avoid warnings and we add it back at the end
sphinx/parsers.py
Outdated
self.statemachine = states.RSTStateMachine( | ||
state_classes=self.state_classes, | ||
initial_state='Text', | ||
debug=document.reporter.debug_flag, | ||
) | ||
|
||
inputlines = StringList([inputstring], document.current_source) | ||
|
||
self.decorate(inputlines) | ||
self.statemachine.run(inputlines, document, inliner=self.inliner) | ||
self.finish_parse() | ||
if has_literal: | ||
p = document[0] | ||
assert isinstance(p, nodes.paragraph) | ||
p += nodes.Text(':') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should all be using the self.inliner
to only parse inline syntaxes https://github.com/live-clones/docutils/blob/d50e1676a87f5a495f1a5a0f447e8da9317e1195/docutils/docutils/parsers/rst/states.py#L614
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well this was my initial idea, but it was a lot more complicated to implement. I don't remember if I managed to make it work in the end, but if I did, then it had the same result as using Text
as the initial_state
(the ::
without literal block were still causing issues), as I did on line 77
Line 77 in 5994ca5
initial_state='Text', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok yes I indeed managed to make it work, but it looked like this:
diff --git a/sphinx/parsers.py b/sphinx/parsers.py
index 09ee7e8ff..a99c6d498 100644
--- a/sphinx/parsers.py
+++ b/sphinx/parsers.py
@@ -7,7 +7,7 @@ from typing import TYPE_CHECKING
import docutils.parsers
import docutils.parsers.rst
from docutils import nodes
-from docutils.parsers.rst import states
+from docutils.parsers.rst import states, languages
from docutils.statemachine import StringList
from docutils.transforms.universal import SmartQuotes
@@ -71,18 +71,26 @@ class RSTParser(docutils.parsers.rst.Parser, Parser):
if has_literal:
inputstring = inputstring[:-2]
- self.setup_parse(inputstring, document) # type: ignore[arg-type]
- self.statemachine = states.RSTStateMachine(
- state_classes=self.state_classes,
- initial_state='Text',
- debug=document.reporter.debug_flag,
- )
-
- inputlines = StringList([inputstring], document.current_source)
-
- self.decorate(inputlines)
- self.statemachine.run(inputlines, document, inliner=self.inliner)
- self.finish_parse()
+ language = languages.get_language(
+ document.settings.language_code, document.reporter)
+ if self.inliner is None:
+ inliner = states.Inliner()
+ else:
+ inliner = self.inliner
+ inliner.init_customizations(document.settings)
+ memo = states.Struct(document=document,
+ reporter=document.reporter,
+ language=language,
+ title_styles=[],
+ section_level=0,
+ section_bubble_up_kludge=False,
+ inliner=inliner)
+ memo.reporter.get_source_and_line = lambda x: (document.source, x)
+ textnodes, _ = inliner.parse(inputstring, 1, memo, document)
+ p = nodes.paragraph(inputstring, '', *textnodes)
+ p.source = document.source
+ p.line = 1
+ document.append(p)
if has_literal:
p = document[0]
assert isinstance(p, nodes.paragraph)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then it had the same result as using
Text
Text
also passes definition lists and section titles; its definitely much cleaner to use the proper inline parsing, even if docutils does not make this as easy 😒
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
textnodes, _ = inliner.parse(inputstring, 1, memo, document)
here you should also have parse_inline
take the actual line number and use that, plus the _
is system_message
nodes that should be appended to the paragraph
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I'll try to add it, but not sure how to test it.
Well.. if this proposal is to be accepted, then really it needs to have proper "generic" tests, not just specific to the i18n
use case, as obviously it could be used for other use cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I get it. Sorry I was too focused on the problem I was trying to solve, but you are right.
Regardless of the implementation details, what do you think about this proposal? As this is more of a proof of concept than a finished product.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeh I mean, coming from the MyST perspective, in principle I am certainly in favor of removing "rST hard-coded" aspects of the code base 😄 (the other big problematic aspect of sphinx for this is executablebooks/MyST-Parser#228)
But indeed, it is quite a "core" addition to sphinx, more broad reaching than just this use case,
so I would obviously want to be very careful (and have good agreement from other maintainers) before merging anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, I marked it as "draft" to make it clearer that it is not ready yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should all be using the
self.inliner
to only parse inline syntaxes
☑️
here you should also have
parse_inline
take the actual line number and use that
☑️
plus the
_
issystem_message
nodes that should be appended to theparagraph
☑️
Also I started to add some tests for this new function.
As recommended by @chrisjsewell
As recommended by @chrisjsewell. Because even if they are ignored by the i18n transform, parse_inline() could still be used in other places.
As frontend.get_defaut_settings() function recommended in docutils's docs [1] is not there. [1] https://docutils.sourceforge.io/docs/dev/hacking.html#parsing-the-document
I don't think there'll be time to get this in for 8.0.0. We can introduce it in a back-compatible way with a config option during 8.x. A |
@n-peugnet I think in some way this should be linked with #12492 (comment), i.e. that it's made possible for the parser to determine how it handles parsing. The sphinx/sphinx/util/docutils.py Line 467 in 49bf65f
I think these two things need to be reconciled somehow |
This is a proof of concept to fix #8852 based on my idea : #8852 (comment)
Warning
BREAKING CHANGE: The new
parse_inline()
function must now be implemented by third-party parsers. At least when i18n is used.Feature or Bugfix
Purpose
By adding a
parse_inline()
function to the Parser, we can get rid of all the RST specific hacks that the i18n Locale transform contained:I also successfully implemented this function in MyST-Parser which is I think the second most used Sphinx parser: executablebooks/MyST-Parser@master...n-peugnet:MyST-Parser:add-parse-inline
Detail
parse_inline()
), so I chose to instead only emit the url as the message to be translated.diff -r
the results of Sphinx's doc's french translation and the result is identical except for the autodoc and python module parts. To make this check I recommend doing agit reset master
to keep the same commit hash (otherwise a lot of pages are different) and to comment out the non-implementedparse_inline()
method of theParser
class. This allows to keep the same inventory as the master branch.Relates
From my testing, it allows to fix at least executablebooks/MyST-Parser#852, but could potentially fix all the issues linked in #8852 (didn't check yet).
Fixes #8852
Fixes executablebooks/MyST-Parser#444
Fixes executablebooks/MyST-Parser#852
Fixes #12287