Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: faster processing of large file #543

Merged
merged 12 commits into from
Jun 9, 2023
Merged

Conversation

sebthom
Copy link
Member

@sebthom sebthom commented Jun 7, 2023

This PR addresses #540 and additionally improves general rendering performance.

Full colorization of this file https://github.com/json-iterator/test-data/blob/master/large-file.json (26MB, 11k lines) took 29 seconds on my machine before this PR and takes 20 seconds with this PR applied. Also short UI freezes encountered without the PR are gone with the PR applied.

@angelozerr
Copy link
Contributor

@sebthom have you some benshmark about the performance improvement?

To be honnest with you I dont understand where you have optimized the code. Is it just use of ArrayList instead of LinkedArrayList,?

Many thankq for your clarification.

@sebthom
Copy link
Member Author

sebthom commented Jun 7, 2023

@sebthom have you some benshmark about the performance improvement?

To be honnest with you I dont understand where you have optimized the code. Is it just use of ArrayList instead of LinkedArrayList,?

Many thankq for your clarification.

This commit replaces LinkedList with ArrayList in AbstractModelLines to address #540. AbstractModelLines mostly accesses internal list using random access via List#get(int). This results into UI freezes for files with a lot of lines.

There is actually only one usecase where LinkedList is faster than ArrayList: when adding/removing elements at the beginning of a list. Even for insertions and removal within large lists ArrayList is faster because with LinkedList the list must be traversed up to the point where elements are removed/added.

@sebthom sebthom force-pushed the performance branch 3 times, most recently from dc01954 to 19a562e Compare June 8, 2023 11:39
@@ -37,7 +37,7 @@ static final class ModelLine {
List<TMToken> tokens = Collections.emptyList();
}

private final List<ModelLine> list = new LinkedList<>();
private final List<ModelLine> list = new ArrayList<>();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice starting point.
I suggest to go a bit further and try to work based on addAll(int idx, Collection elements), and list.subList().clear();

Two examples:

protected void addLines(final int lineIndex, final int count) {
		if (count < 1)
			return;
		// prepare temporary list;
		var newLines =    new ArrayList<>(count)
		for (int i = 0; i < count; i++) {
			newLines.add(new ModelLine());
		}
		synchronized (list) {
			final var firstLine = getOrNull(lineIndex);
			list.addAll(lineIndex, newLines);
			if (firstLine != null) {
				list.get(lineIndex).startState = firstLine.startState;
			}
			updateLine(lineIndex);
		}
	}

similarly removeLines could benefit from list.sublist(lineIndex, lineIndex + count).clear()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise there is potential for optimizations in replaceLines via subList().replaceAll(..) followed by an addAll or clear depending on the requested changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took 29 seconds on my machine before this PR and takes 20 seconds with this PR

Could you please provide updated numbers? Because to be honest, 20 seconds is still quite slow.

I shaved of a few more seconds by adding a token caching on the hot path but I don't see more low hanging fruits that will have a noticeable. Because of it's async nature profiling is a bit difficult. Most of the time seems now to be spend in the joni library for TextMate regex searches and in eclipse UI classes doing the rendering. I see one more possible optimization but I am not sure how/if this will result in a noticeable performance improvement #38

@sebthom sebthom marked this pull request as draft June 8, 2023 21:37
@sebthom sebthom force-pushed the performance branch 2 times, most recently from 380d026 to 94acfc4 Compare June 8, 2023 21:46
@szarnekow
Copy link

took 29 seconds on my machine before this PR and takes 20 seconds with this PR

Could you please provide updated numbers? Because to be honest, 20 seconds is still quite slow.

@sebthom sebthom marked this pull request as ready for review June 9, 2023 21:43
@sebthom sebthom merged commit 205c39b into eclipse-tm4e:main Jun 9, 2023
@sebthom sebthom deleted the performance branch June 9, 2023 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants