Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor document delta handling #1745

Closed
wants to merge 18 commits into from
Closed

Refactor document delta handling #1745

wants to merge 18 commits into from

Conversation

aldendaniels
Copy link
Contributor

The changes proposed below seek to simplify Ace's handling of document mutations in the interest of bug-free code and easy integration. I hope that the benefits of the recommended changes will be clear, but I'd be glad to elaborate as necessary.

Feedback and review of these changes from a core contributor would be appreciated. Having extensive experience with web-based editing, I realize the risk of destabilization inherent in touching the core document code. In this case, I think it's worth the trade-off, but the changes will need careful scrutiny.

Refactor delta types

  • Combine the insertText and insertLines delta types into a single insert delta type
  • Combine the removeText and removeLines delta types into a single remove delta type

The new delta types will store inserted or deleted text has an array of lines (split on the detected new line character). This is efficient and has no presuppositions regarding newlines. Pressing ENTER in an empty document, therefore, will create the following delta:

{
    "action": "insert",
    "range": {
        "start": {
            "row": 0,
            "column": 0
        },
        "end": {
            "row": 1,
            "column": 0
        }
    },
    "lines": ["", ""]
}

Handle all document changes in a single spot

  • Make all document mutations in a single applyDelta function.
  • Add delta validation

Basically insert(), insertLines(), insertNewLine(), remove(), etc. should all wrap the applyDelta() function instead of the inverse. This way it's easy to ensure that all deltas are being applied in the same way and are completely reversible. This also makes it easy to validate deltas in a single location.

Supporting changes

  • Rework / Simplify anchor logic to handle new delt types
  • Rename:
    • insert() to insertText()
    • remove() to removeText()
    • insertLines() to insertFullLines()
    • removeLines() to removeFullLines()

Refactor delta handling code to:

- Combine the "insertText" and "insertLines" delta types into a single
"insert" delta type

- Combine the "removeText" and "removeLines" delta types into a single
"remove" delta type

- Make all document mutations in a single applyDelta function.

- Add basic delta validation (more needed . . . see TODOs)

- Rework anchor logic to handle new delta types (also simplified)

- Rename "insert()" to "insertText()" and "remove()" to "removeText()"

- Rename "insertLines()" to "insertFullLines()" and "removeLines()" to
"removeFullLines()"

See related issue for more information. All tests are passing and the
changes appear functional under preliminary testing, but careful review
and testing will be necessary.
if (range.start.row == range.end.row && delta.action != "insertLines" && delta.action != "removeLines")
lastRow = range.end.row;
else
lastRow = Infinity;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this breaks screen updating after removing a line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, sure enough. I'll update this to:

var range = e.data.range;
var lastRow = (range.start.row == range.end.row ? range.end.row : Infinity);
this.renderer.updateLines(range.start.row, lastRow);

I hadn't realized that multi-line actions require regenerating everything below, but it makes sense that they do.

@nightwing
Copy link
Member

This is great!
The old way of handling deltas was very limiting, making it very hard to extend editor behavior.
To merge this i have to ask you to send a cla to [email protected]. See contributing.md for details

Also this still needs some work:

  • new api is significantly slower, indenting a large file takes 1.57s instead of 321ms
  • changes to public interface should be minimal
  • would be good to find a way to not have lines array for single line changes

this._emit("change", { data: delta });

if (row < this.getLength() - 1 && row >= 0)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use style matching the rest of code in ace

if (...) {
}

@aldendaniels
Copy link
Contributor Author

Hi @nightwing, thanks for the feedback.

I'm working on improvements based on your suggestions above. I'm not sure I understand your final point, however. Which of the following are you suggesting?

  1. Bring back the insertInLine() document method
    Doesn't this duplicate insertText()?
  2. Single-line insertions should have a .text attribute instead of a .lines attribute
    This would seem to undercut the goal of consistent delta formats. A single-line delta isn't fundamentally different from a mutli-line delta. It's nice when extending the editor behavior to be able to consistently determine the number of lines inserted (including 0) by checking delta.lines.length - 1. Am I missing something?

I've emailed the cla agreement.

This seeks to keep the public API in-tact while improving method names
within ace by keeping the old methods as wrappers around the new
better-named methods.

For example, document.insert() now simply calls document.insertText()
and warns the caller via a console.log() that they are using a
deprecated method.

I've also updated the coding style of my changes (where I noticed
discrepancies) to match the rest of Ace.
Both were introduced in 2e6f127.
console.warn makes better sense than console.log and matches similar
warnings in ace (see gutter.js for example).
@nightwing
Copy link
Member

Bring back the insertInLine() document method Doesn't this duplicate insertText()?

Not quite since insertInLine doesn't have to do additional split, without that it might be impossible to make indenting or commenting long text fast again. but let's leave this for later.

It's nice when extending the editor behavior to be able to consistently determine the number of lines inserted (including 0) by checking delta.lines.length - 1

Yes, that's why i am saying would be good to find a way to do that and still keep api easy to use.
E.g. we might remove delta.range and have delta = {start, end, lines|text, action} instead
delta.end.row - delta.start.row is almost as good as delta.lines.length - 1

I've emailed the cla agreement.

Thanks!

2e6f127 slowed down the application of
deltas that only affect a single line. The slow-down, though trivial for
a single line, is significant for operations than separately modify
thousands of rows (such as indenting a large document).

This commit speeds up single-line deltas by avoiding unnecessary calls
to splitLine() and joinLineWithNext().
@aldendaniels
Copy link
Contributor Author

Not quite since insertInLine doesn't have to do additional split, without that it might be impossible to make indenting or commenting long text fast again. but let's leave this for later.

Good point. I was concerned about that initially but (testing in Chrome) I didn't see any appreciable slowdown. Most of the slowdown was from inefficiencies in applyDelta() which my latest commit resolves. If you're concerned about performance, however, I'll resurrect insertInLine(). My main concern is that users will erroneously call insertInLine() with multi-line text.

Yes, that's why i am saying would be good to find a way to do that and still keep api easy to use.
E.g. we might remove delta.range and have delta = {start, end, lines|text, action} instead
delta.end.row - delta.start.row is almost as good as delta.lines.length - 1

That's fair. I like using .lines even for single-line deltas because it avoids cluttering code with if(delta.lines){...} else{...}. I'm probably bit biased, however, because I'm working on a project that involves collaborative editing with OT. For the OT algorithm, it's really useful to think of every delta as an action, a range, and an array of lines.

@nightwing
Copy link
Member

My main concern is that users will erroneously call insertInLine() with multi-line text.

every function can be misused, there's nothing we can do to prevent it. But lets leave it out for now, since there are still other ways to improve performance.

it avoids cluttering code with if(delta.lines){...} else{...}.

one could use lines = delta.lines || [delta.text] instead

I'm probably bit biased

I must be biased too. But cutting down range object, and an array for most of deltas saves lots of memory, which is worth some added complexity.
We could also add delta.rowDiff.

For the OT algorithm, it's really useful to think of every delta as an action, a range, and an array of lines.

Do you use ace deltas in the OT implementation or convert them to something else in change handler?


this.applyDelta = function(delta) {

function splitLine(lines, point)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to move these helper function to outer scope, since that is faster

This makes it possible to break out helper functions without exposing
them to the rest of the document class. Also, long term, we may want to
have a stand-alone test suite for applyDelta, so it makes sense in its
own file.

All other changes involve syntax corrections (some syntax issues were
mine, others pre-existed) to make the documentation compilation work.
Since .apply() can't handle more than 65535 parameters, splice.apply()
is brittle. It's also hard to read. This replaces splice.apply() calls
throughout ace code with lang.spliceIntoArray().
@aldendaniels
Copy link
Contributor Author

But cutting down range object, and an array for most of deltas saves lots of memory, which is worth some added complexity.

Yes, if the difference is appreciable. In terms of speed, the difference in delta-creation time between an array and a string isn't significant from what I can tell. I'm seeing a difference of about 500 milliseconds across 10 million delta objects in Chrome and FF: http://jsfiddle.net/y24yq/2/. I'm not certain about memory usage.

one could use lines = delta.lines || [delta.text] instead

True, although if you do that very much you'll blow away any performance benefit gained.

Do you use ace deltas in the OT implementation or convert them to something else in change handler?

I convert them, but they closely resemble the original ace delta in structure. If we use .text for single-line deltas, I can easily convert them back to .lines in the change handler.


My preference is for using .lines for the sake of consistency, but it's ultimately your call. If you'd prefer .text, I'll make the change. At this point, what remains to be done before these changes are ready to merge?

@nightwing
Copy link
Member

I'm seeing a difference of about 500 milliseconds across 10 million delta objects in Chrome and FF

that's jit compiler being smart and creating one array for all deltas, when it can't cheat array version takes 2-3 times more http://jsfiddle.net/y24yq/4/, but yeah i wasn't talking about performance, but about memory usage of deltas stored in undoManager, if we go with lines we'll have to modify them for storing in the undoManager.

True, although if you do that very much you'll blow away any performance benefit gained.

code that cares about performance, have to special case one line deltas anyway, and if it doesn't we can't help it.

At this point, what remains to be done before these changes are ready to merge?

Since this changes api, we need to release current version first, likely next week.

Also we need to make this as fast as the current implementation. Now for one line deltas, this is only 1.5 times slower, but multiline deltas are very slow, due to line by line splicing.

It would be good to add beforeChange event. which is useful for implementing readonly ranges
The event should be fired before applyDelta and should allow it's listener to split delta into multiple parts.
This will allow to remove validateDelta, which isn't useful in general case, and can be added as a beforeChange event listener if user wants.

As I said, I don't think renaming insert is possible, it breaks too much code. why do you think insertText is a much better name?

@aldendaniels
Copy link
Contributor Author

OK, please review the To Do list below to make sure we're completely on the same page. This corresponds basically to your suggestions above, but diverges slightly with regards to validateDelta handling.

Note that I think the changes in the Deferred group are good changes. I'm going to be personally crimped for time over the next six weeks, however, so I'd like to focus on getting the core changes thoroughly vetted. The deferred changes can be subsequently handled.


To Do:

  1. Rename insertText back to insert
    I'm fine doing this. I liked insertText better because it's more precise and had hoped that keeping insert() around as a deprecated method would be good enough, but I don't care much. Besides, insert corresponds nicely to remove.

  2. Add back insertInLine to avoid an extra $split call.

  3. Modify spliceIntoArray to use Array.splice.apply() instead of line-by-line splicing.

    I'm assuming this is the line-by-line splicing you were referring to? I thought I had tested this and found it fast, but perhaps I was fooled by the clever jit compiler again.

    Alternatively, I can simply revert 8624ab8. Which would you prefer I do?

  4. Fix / complete validateDelta.

  5. Add a doNotValidate param to applyDelta and set it to true in insertInLine/removeInLine.

    I don't see a reason not to validate multi-line insertions since those will rarely if-ever occur in bulk the way single-line deltas do. Since the greatest potential for error is with multi-line deltas, this seems like a good trade-off to me.

  6. Store single-line deltas as .text instead of .lines in undo history . . . or everywhere?

    I did some memory profiling on this using Chromes Dev Tools (see http://jsfiddle.net/RHZYn/14/show/). If my test is good, using an array economizes 8.3 MB of memory across 1 million lines. Do you still think the savings is worth it at 8.3 bytes per line ?

    If so, would you like this to be a localized change to undo history only (my preference) or a global change?

I propose deferring:

  • Add a beforeChange event. This should receive the delta as the parameter and return either:
    a. The delta(s) to be applied or . . .
    b. false to preventDefault.
  • Modify the onChange event and the beforeChange event (or create new change event types) to provide a list of all deltas emitted (like we do for UndoManager.execute). This is esp. useful for onBeforeChange where you would probably not want to prevent a single delta in a multi-delta action.

Your input on items 3, 5, and 6 would be particularly appreciated.

@nightwing
Copy link
Member

I didn't mean that beforeChange event is required for merging. It's a different feature, and this is already quite a big pull request.
All I was saying, is that applyDelta can't guarantee that it will produce exactly one delta event for each call, so keeping the old way of handling splice.apply is good enough.

I'm assuming this is the line-by-line splicing you were referring to? I thought I had tested this and found it fast, but perhaps I was fooled by the clever jit compiler again.

Try pasting large text at the start of a large file. With line by line splicing it takes O(PastedLines*ExistingLines)
which is very slow.
applyDelta does line by line splicing as well [https://github.com//pull/1745/files#diff-e181f93c51f60011e7c3a151257e66bdR117].

so for 3 I would prefer to revert 8624ab8.

6 is the hardest one.
I've measured the difference again at http://jsfiddle.net/RHZYn/17/show/
memory

and i am getting around 16 mb, (20.234 instead of 4.29959) on 64 bit chrome it will be twice of that.
This is noticeable difference in cloud9, where i often have many open tabs and they retain undo history across reloads.

So, unless i measure something wrong, undo history have to use text.
Deciding if we should do the same everywhere, depends on how fast converting will be, and if there will be any jit deoptimizations because of deltas having different shape.
I think it's ok, if you leave lines as is for now, but i may change it when merging.

For 4,5 I too don't see a reason not to validate deltas, but i don't see a reason to validate deltas either. So far i never saw a bug in Ace, that could be caught by the validate function. So if it introduces any noticeable difference in performance we most likely will disable it in the future.

For 1,2 please restore old names, also i think using action name 'remove' instead of 'delete' will be more consistent.

@aldendaniels
Copy link
Contributor Author

To summarize:

To Do:

  1. Revert 8624ab8
  2. Rename insertText back to insert
  3. Rename the 'delete' delta action to 'remove' (matches previous naming convention)
  4. Bring back insertInLine (avoids an extra $split call)
  5. Stop doing a line-by-line splicing in applyDelta
  6. Fix / complete validateDelta
  7. Add a doNotValidate param to applyDelta and set it to true in insertInLine/removeInLine
  8. Store single-line deltas as .text instead of .lines in undo history
  9. Treat deltas with empty ranges as NOOPs
  10. Review changes for unconventional { formatting

Note: I'm aware of one bug that validateDelta would have caught: ace used to create a delta with a range that went past the end of the document when Alt + Down Arrow was pressed on the last line of a document.

Question: Once the above is done, will you merge this as-is or will you need me to submit a new pull request with all these commits squashed into one?

@nightwing
Copy link
Member

Yes, i agree about the todo. (also would be nice if you could remove all uses of { on own line:)

There is no need to create a new pull request, and you can --force push to this branch however you want.

when this is ready i'll merge this it into v1.2 branch, to add several more non backward compatible changes in other parts of the api, and test it some more before merging into the master.

Modify the onChange event and the beforeChange event (or create new change event types) to provide a list of all deltas emitted (like we do for UndoManager.execute).

Why? I actually want to modify UndoManager.execute to take deltas one by one, and keep them in a flat list.

@aldendaniels
Copy link
Contributor Author

Ha! Old coding practices die hard. I thought I had brought all my { into (or should I say out of?) line. Todo 9 added.

when this is ready i'll merge this it into v1.2 branch, to add several more non backward compatible changes in other parts of the api, and test it some more before merging into the master.

That sounds good. Any idea when the v1.2 will be released?

Matches previous naming convention.
Avoids an extra $split call.
Also brings back the functionality where large deltas are split into
smaller deltas so that .splice.apply() calls will work.
This uncovered the fact that until now delta.range had not always been a
Range object. This inconsistency has been resolved by my changes in
mirror.js.
Set it to true in insertInLine/removeInLine.

Also sped up indent/dedent by using insertInLine and removeInLine.
Stores single-line delta content as .text instead of .lines in undo
history. This is done without modifying the original delta object in
case the caller still retains a handle to the original.
- Fix unconventional '{' formatting
- Reformat `UndoManager` changes
- Revert change from `insertInLine` to `insert` in text.js
This should be faster since we don't have to re-initialize the helper
functions each time Anchor.onChange is fired.
@aldendaniels
Copy link
Contributor Author

OK, all of the above TODOs are done.

@aldendaniels
Copy link
Contributor Author

Any ETA on when this can get merged into the 1.2 branch? As soon as it is, I'll check out the branch and start using / testing it in my project.

nightwing added a commit that referenced this pull request Jan 15, 2014
Refactor document delta handling
@nightwing
Copy link
Member

Sorry for the delay, i've created v-1.2 branch and expect to merge it into master late February.
After your changes performance is mostly good, except that ace causes more gc than before.

My todo for merging this is:

  • remove delta.range in favor of delta.start delta.end
  • get rid of for in loops in delta converting functions
  • experiment with allowing delta.text for (i am almost convinced that always having lines is better but want to check)
  • rename insertMergedLines, too bad that insertLines name is already taken (maybe allow passing an array to insert?)
  • add simpler way for inserting at the end of document/line: either add getEndPosition or allow undefined to mean the last row column.

// 2. fn.apply() doesn't work for a large number of params. The mallest threshold is on safari 0xFFFF.
//
// To Do: Ideally we'd be consistent and also split 'delete' deltas. We don't do this now, because delete
// delta handling is too slow. If we make delete delta handling faster we can split all large deltas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is delete delta handling is too slow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On very long documents (like 1,000,000 lines), deleting the entire document all in one delta is fast but deleting it in multiple deltas is slow. I'm not sure why. I'm guessing it's related to inefficiencies in the rendering layer, but IDK.

@aldendaniels
Copy link
Contributor Author

Hi @nightwing, no problem. Thanks for staying on top of this. See responses below:

My todo for merging this is:

Just to clarify: you're planning on making these changes? I'm happy to help as you find issues, but I won't be able to give this much serious time in the near future.

remove delta.range in favor of delta.start delta.end

Makes sense. This will allow us to remove my fix in in mirror.js added in ef0e8da. See my comment above, however, for why relying on window.message still seems like a bad idea.

rename insertMergedLines, too bad that insertLines name is already taken (maybe allow passing an array to insert?)

I'm confused. insertMergedLines() does take an array of lines to insert. What am I missing?

add simpler way for inserting at the end of document/line: either add getEndPosition or allow undefined to mean the last row column.

I personally prefer the former (getEndPosition) approach because it's more explicit / consistent. Either way though, it would be nice if by the time the delta makes it to applyDelta there are no undefined offsets. That way 3rd party apps that integrate won't have to check for undefined end.row and end.column values.

@nightwing
Copy link
Member

Hi
i have pushed an update to v-1.2 branch, removing delta.range and doing some cleanups.
Now this is faster than what we had before.

I'm confused. insertMergedLines() does take an array of lines to insert. What am I missing?

i don't like the word merged in it's name, because lines it takes are not merged.

Btw validateDelta helped to find a bug in setValue, too, so thanks for keeping it!

@aldendaniels
Copy link
Contributor Author

Nice work! Looking forward to seeing this released. I'm glad validateDelta turned out useful after-all.

i don't like the word merged in it's name, because lines it takes are not merged.

Yeah, "merged" isn't great. The idea was to communicate that the first and last lines are merged with the existing document (e.g. insertMergedLines({row: 1, column: 1}, ['', '']) simply breaks line 1 at position 1.

@lennartcl
Copy link
Contributor

Closing this in favor of #1819. Also note the link in the changelog there.

@lennartcl lennartcl closed this May 21, 2015
jpallen pushed a commit to overleaf/web that referenced this pull request Sep 7, 2015
This required a refactor of all code that listen to events changes since the API
has changed. See ajaxorg/ace#1745 for more details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants