Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stateless parser API #11147

Merged
merged 18 commits into from
Sep 27, 2024
Merged

Stateless parser API #11147

merged 18 commits into from
Sep 27, 2024

Conversation

kazcw
Copy link
Contributor

@kazcw kazcw commented Sep 20, 2024

Pull Request Description

Stateless (static) parser interface. Buffer-reuse optimization is now hidden within Parser implementation. Fixes #11121 and prevents similar bugs.

Important Notes

  • Also simplify EnsoParser API, exposing only a higher-level interface.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    TypeScript,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • Unit tests have been written where possible.

@kazcw kazcw self-assigned this Sep 20, 2024
@kazcw kazcw added the CI: No changelog needed Do not require a changelog entry for this PR. label Sep 20, 2024
@kazcw kazcw force-pushed the wip/kw/parserless-parser branch from ebd4caf to 234a2c4 Compare September 20, 2024 22:01
@kazcw kazcw added the CI: Clean build required CI runners will be cleaned before and after this PR is built. label Sep 20, 2024
@kazcw kazcw force-pushed the wip/kw/parserless-parser branch 3 times, most recently from 9c0a0fe to b67db01 Compare September 20, 2024 22:38
Stateless (static) parser interface. Buffer-reuse optimization is now hidden
behind JNI FFI implementation. Fixes #11121 and prevents similar bugs.
@kazcw kazcw force-pushed the wip/kw/parserless-parser branch from e10fb43 to ce95e4e Compare September 21, 2024 00:00
@kazcw kazcw force-pushed the wip/kw/parserless-parser branch from f856b49 to 285e2e6 Compare September 21, 2024 04:25
Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing use of the Rust parser from multiple threads at (some small cost) will simplify its (clueless) usage from JVM.

@kazcw
Copy link
Contributor Author

kazcw commented Sep 23, 2024

I moved the thread-local buffer-recycling logic to the Java bindings, to keep the Rust code threading-agnostic in case we would like to compile it to WASM in the future.

@@ -69,7 +70,6 @@ class Compiler(
if (config.outputRedirect.isDefined)
new PrintStream(config.outputRedirect.get)
else context.getOut
private lazy val ensoCompiler: EnsoParser = new EnsoParser()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the original code was buggy. The Compiler should have implemented Closable interface and call ensoCompiler.close() on the EnsoParser. I am not sure what's the overhead of a single EnsoParser instance and its natively allocated memory, but this must have leaked a lot during unit test execution!

@hubertp did reduce the memory, but the Rust parser native part might now even have been visible in the JVM heap dump - only in RSS...

} catch (URISyntaxException ex) {
root = new File(".").getAbsoluteFile();
private static final FinalizationManager finalizationManager = new FinalizationManager();
private static final Thread finalizationThread = new Thread(finalizationManager.createRunner());
Copy link
Member

@JaroslavTulach JaroslavTulach Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thread is unfortunate overhead. These are some options to avoid it:

  • check the FinalizationManager queue before parsing starts - used by WeakHashMap.exprungeStaleEntries()
  • use Cleaner
  • leave the cleanup up to the user - e.g. continue to provide instances and close() operations

Own Thread adds overhead. The Cleaner might be better, but there seems to be a thread per each Cleaner anyway. exprungingStaledParser might be good, but may leave some of the latest parser(s) pending for cleanup. We do get a shutdown callback in the interpreter - e.g. we can explicitly request cleanup of EnsoContext.getCompiler() parser workers...

Best Option? Exprunging stale & Close

Having exprungeStaleParsers() check and a way to trigger it (or request shutdown of all existing) when EnsoContext.shutdown() would do the trick too without need for any special Thread or decisions of GC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name it? I often look into thread dump and hunting down unnamed ones is a real PITA.

@@ -700,7 +696,7 @@ class Compiler(
* @return A Tree representation of `source`
*/
def parseInline(source: CharSequence): Tree =
ensoCompiler.parse(source)
Parser.parse(source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having EnsoParser a few lines above and here Parser it really begs a question: "What's the difference?"/"Which one should I use?" Wouldn't it better for EnsoParser.parse to simply forward to Parser.parse to hide that?

Copy link
Contributor Author

@kazcw kazcw Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not to expose two different levels of abstraction from EnsoParser, so that EnsoParser methods output IR but not Tree. Backend parser-users work with IR exclusively, except the single caller of this parseInline method. The caller uses the Tree API to validate whether an input is inline. If we were to rewrite those few lines of code to use the IR API, we would be able to treat the entire Tree API as an internal implementation detail of the parser.

@kazcw kazcw marked this pull request as draft September 26, 2024 19:18
@@ -1351,6 +1351,17 @@ class RuntimeVisualizationsTest extends AnyFlatSpec with Matchers {
)
)

val attachVisualizationResponses =
Copy link
Contributor Author

@kazcw kazcw Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is failing in CI. It looks to me like it is a nondeterministic failure related to sequentialExecution = false command delays, and not related to this changeset. The observed failure mode is for all execution updates to show the result of executing the post-modification module. I think this would occur if delaying is applied to the attachVisualization commands, and not to the textEdit command, so that the edit is performed before the visualizations are attached.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CCing @4e6

@kazcw kazcw marked this pull request as ready for review September 27, 2024 04:28
private static native ByteBuffer parseTreeLazy(long state, ByteBuffer input);
@Override
public void run() {
freeState(state.getAndSet(0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. This guarantees each Rust pointer is cleaned only once.

Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call runPendingFinalizers() more frequently. Anyway overall it looks fine.

@JaroslavTulach
Copy link
Member

What happens when there is a Rust code working on a buffer and some other thread drops it?

@kazcw
Copy link
Contributor Author

kazcw commented Sep 27, 2024

What happens when there is a Rust code working on a buffer and some other thread drops it?

The parsing thread "checks out" the state for the duration of the operation:
https://github.com/enso-org/enso/pull/11147/files#diff-e237fc34931809c65628b8bcde58e32add8959b8e66940c46a5f63f557530302R169-R181

So that particular parsing thread's buffer won't be dropped by the freeAll call, but will eventually dropped if the parser becomes unreferenced.

GitHub
Pull Request Description Stateless (static) parser interface. Buffer-reuse optimization is now hidden within Parser implementation. Fixes #11121 and prevents similar bugs. Important Notes

Also sim...

@kazcw kazcw added the CI: Ready to merge This PR is eligible for automatic merge label Sep 27, 2024
@mergify mergify bot merged commit 2891981 into develop Sep 27, 2024
42 checks passed
@mergify mergify bot deleted the wip/kw/parserless-parser branch September 27, 2024 15:58
@JaroslavTulach JaroslavTulach mentioned this pull request Oct 4, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Clean build required CI runners will be cleaned before and after this PR is built. CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parser crashing in native code due to multi-threaded access
4 participants