Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser agnostic i18n Locale transform #12238

Open
wants to merge 30 commits into
base: master
Choose a base branch
from

Conversation

n-peugnet
Copy link
Contributor

@n-peugnet n-peugnet commented Apr 7, 2024

This is a proof of concept to fix #8852 based on my idea : #8852 (comment)

Warning

BREAKING CHANGE: The new parse_inline() function must now be implemented by third-party parsers. At least when i18n is used.

Feature or Bugfix

  • Bugfix

Purpose

By adding a parse_inline() function to the Parser, we can get rid of all the RST specific hacks that the i18n Locale transform contained:

  • The title hack is not needed any-more, since we only parse inline elements.
  • The literal block hack has been moved to RSTParser.parse_inline, leaving the Locale transform parser agnostic.

I also successfully implemented this function in MyST-Parser which is I think the second most used Sphinx parser: executablebooks/MyST-Parser@master...n-peugnet:MyST-Parser:add-parse-inline

Detail

  • I am not yet sure about the API and I am absolutely open to discussions about this.
  • In the process of removing RST specific code, I had to stop emitting RST code for images (now they won't be parsed anyway with parse_inline()), so I chose to instead only emit the url as the message to be translated.
  • I checked with diff -r the results of Sphinx's doc's french translation and the result is identical except for the autodoc and python module parts. To make this check I recommend doing a git reset master to keep the same commit hash (otherwise a lot of pages are different) and to comment out the non-implemented parse_inline() method of the Parser class. This allows to keep the same inventory as the master branch.

Relates

From my testing, it allows to fix at least executablebooks/MyST-Parser#852, but could potentially fix all the issues linked in #8852 (didn't check yet).

Fixes #8852
Fixes executablebooks/MyST-Parser#444
Fixes executablebooks/MyST-Parser#852
Fixes #12287

Comment on lines 75 to 89
self.statemachine = states.RSTStateMachine(
state_classes=self.state_classes,
initial_state='Text',
debug=document.reporter.debug_flag,
)

inputlines = StringList([inputstring], document.current_source)

self.decorate(inputlines)
self.statemachine.run(inputlines, document, inliner=self.inliner)
self.finish_parse()
if has_literal:
p = document[0]
assert isinstance(p, nodes.paragraph)
p += nodes.Text(':')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this was my initial idea, but it was a lot more complicated to implement. I don't remember if I managed to make it work in the end, but if I did, then it had the same result as using Text as the initial_state (the :: without literal block were still causing issues), as I did on line 77

initial_state='Text',

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yes I indeed managed to make it work, but it looked like this:

diff --git a/sphinx/parsers.py b/sphinx/parsers.py
index 09ee7e8ff..a99c6d498 100644
--- a/sphinx/parsers.py
+++ b/sphinx/parsers.py
@@ -7,7 +7,7 @@ from typing import TYPE_CHECKING
 import docutils.parsers
 import docutils.parsers.rst
 from docutils import nodes
-from docutils.parsers.rst import states
+from docutils.parsers.rst import states, languages
 from docutils.statemachine import StringList
 from docutils.transforms.universal import SmartQuotes
 
@@ -71,18 +71,26 @@ class RSTParser(docutils.parsers.rst.Parser, Parser):
         if has_literal:
             inputstring = inputstring[:-2]
 
-        self.setup_parse(inputstring, document)  # type: ignore[arg-type]
-        self.statemachine = states.RSTStateMachine(
-            state_classes=self.state_classes,
-            initial_state='Text',
-            debug=document.reporter.debug_flag,
-        )
-
-        inputlines = StringList([inputstring], document.current_source)
-
-        self.decorate(inputlines)
-        self.statemachine.run(inputlines, document, inliner=self.inliner)
-        self.finish_parse()
+        language = languages.get_language(
+            document.settings.language_code, document.reporter)
+        if self.inliner is None:
+            inliner = states.Inliner()
+        else:
+            inliner = self.inliner
+        inliner.init_customizations(document.settings)
+        memo = states.Struct(document=document,
+                             reporter=document.reporter,
+                             language=language,
+                             title_styles=[],
+                             section_level=0,
+                             section_bubble_up_kludge=False,
+                             inliner=inliner)
+        memo.reporter.get_source_and_line = lambda x: (document.source, x)
+        textnodes, _ = inliner.parse(inputstring, 1, memo, document)
+        p = nodes.paragraph(inputstring, '', *textnodes)
+        p.source = document.source
+        p.line = 1
+        document.append(p)
         if has_literal:
             p = document[0]
             assert isinstance(p, nodes.paragraph)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then it had the same result as using Text

Text also passes definition lists and section titles; its definitely much cleaner to use the proper inline parsing, even if docutils does not make this as easy 😒

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

textnodes, _ = inliner.parse(inputstring, 1, memo, document)

here you should also have parse_inline take the actual line number and use that, plus the _ is system_message nodes that should be appended to the paragraph

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I'll try to add it, but not sure how to test it.

Well.. if this proposal is to be accepted, then really it needs to have proper "generic" tests, not just specific to the i18n use case, as obviously it could be used for other use cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I get it. Sorry I was too focused on the problem I was trying to solve, but you are right.

Regardless of the implementation details, what do you think about this proposal? As this is more of a proof of concept than a finished product.

Copy link
Member

@chrisjsewell chrisjsewell Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh I mean, coming from the MyST perspective, in principle I am certainly in favor of removing "rST hard-coded" aspects of the code base 😄 (the other big problematic aspect of sphinx for this is executablebooks/MyST-Parser#228)

But indeed, it is quite a "core" addition to sphinx, more broad reaching than just this use case,
so I would obviously want to be very careful (and have good agreement from other maintainers) before merging anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, I marked it as "draft" to make it clearer that it is not ready yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisjsewell

This should all be using the self.inliner to only parse inline syntaxes

☑️

here you should also have parse_inline take the actual line number and use that

☑️

plus the _ is system_message nodes that should be appended to the paragraph

☑️

Also I started to add some tests for this new function.

@n-peugnet n-peugnet marked this pull request as draft April 8, 2024 20:59
@n-peugnet n-peugnet changed the title Parser agnostic i18n post transform Parser agnostic i18n Locale transform May 11, 2024
@n-peugnet n-peugnet marked this pull request as ready for review May 11, 2024 20:14
@n-peugnet n-peugnet marked this pull request as ready for review May 19, 2024 19:32
@n-peugnet n-peugnet changed the title Parser agnostic i18n Locale transform [8.x] Parser agnostic i18n Locale transform May 23, 2024
@n-peugnet n-peugnet requested a review from chrisjsewell June 26, 2024 15:33
@AA-Turner AA-Turner added this to the 8.x milestone Jul 14, 2024
@n-peugnet n-peugnet mentioned this pull request Jul 21, 2024
@AA-Turner AA-Turner modified the milestones: 8.x, 8.0.0 Jul 23, 2024
@AA-Turner AA-Turner modified the milestones: 8.0.0, 8.x Jul 23, 2024
@AA-Turner
Copy link
Member

I don't think there'll be time to get this in for 8.0.0. We can introduce it in a back-compatible way with a config option during 8.x.

A

@chrisjsewell
Copy link
Member

chrisjsewell commented Jul 23, 2024

@n-peugnet I think in some way this should be linked with #12492 (comment), i.e. that it's made possible for the parser to determine how it handles parsing.
I think this could be made back compatible, in that you could check for the method on the parser, but if missing it would revert to using a default implementation.

The parse_inline function here though is a bit at odds with

def parse_inline(
though:

  • that inline_parse is used to parse a string as part of an already proceeding parse, whereas
  • this inline_parse is trying to parse the string as a "standalone" entity

I think these two things need to be reconciled somehow

@AA-Turner AA-Turner changed the title [8.x] Parser agnostic i18n Locale transform Parser agnostic i18n Locale transform Oct 21, 2024
@AA-Turner AA-Turner modified the milestones: 8.x, some future version Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants