-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve calculate dirty performance #2505
improve calculate dirty performance #2505
Conversation
cd9954a
to
3cae829
Compare
This works well for solving the problem of having to recalculate paths continuously, but sidesteps the benefits of using paths to access nodes in the tree. At this time, the larger performance issue is unfortunately keeping track of the paths of dirty nodes and eliminating this is a large performance improvement. This is good in the short term for folks who have this performance issue, but I unsure it is acceptable to be included into master as is. I would suggest finding a way to "unpack" the keys back into paths for use in normalize functions, or at the very least updating the terminology and variable names to show that they are keys at this level and not paths. |
@CameronAckermanSEL @ianstormtaylor I agree that this might not be ideal code state for long-term fixes of removing |
Hi @yunwuxin, I'm not sure if you've been following Slack or not, but we've been chatting about this one and think we should get ready to merge it. @CameronAckermanSEL has been running this in production for a few months and I plan to put it into production soon at ConvertKit as well. I agree with the feedback so far, and once that's addressed, I think we should be good to go. Specifically, let's rename things that say After this, we can make a follow up issue if we think it's worth finding a way to get this back to paths. Let me know if you don't have time for this and I'd be happy to make changes also. Thanks! |
@brendancarney Of course you can |
3cae829
to
bd0e41f
Compare
I updated this PR to change the naming from path to key where those were changed. |
bd0e41f
to
c4b950b
Compare
@@ -582,11 +582,16 @@ function normalizeDirtyPaths(editor) { | |||
* @param {Array} path | |||
*/ | |||
|
|||
function normalizeNodeByPath(editor, path) { | |||
function normalizeNodeByKey(editor, key) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
light recommendation: I believe there's a naming convention for functions that can accept a path or a key interchangably (I believe this is driven by getNode operating on either). If that's the case then I think we should borrow on that naming convention for this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I may have actually broken this functions ability to work with either based on some other code I removed. I can add that back in though if it makes sense for this to accept either a path or a key.
Hey folks, we're trying to get rid of |
I don't disagree, if we can use paths that'd be nice. That said, I don't really have a lot of ideas on how to deal with this problem in an alternative way. So let's maybe start by explaining the issue. On insert fragment, because we're inserting a bunch of nodes within a When we leave the
So this shakes out to be an O(N^2) operation. I've observed also that somewhere along the line we're inserting duplicate paths, so we do have an opportunity to improve that by deduplicating them somehow. This is prone to creating out of memory exceptions because of how much we're doing in this loop. Even if we do something special for insert fragment like just shove all of the paths of the nodes in the fragment post insert into the dirty collection, there will still be the corner case where the insert fragment happens within a Tricky |
@CameronAckermanSEL it seems like dropping Immutable.js will reduce some of that extra unnecessary serialization, so that's a win. As for the duplication, I agree. It seems like if we had a nested representation for dirty paths (instead of an array of dirty paths), we'd automatically get de-duplication for parent paths that will be normalized if any of their children are normalized? |
Yep, I agree on both counts. However, having to calculate new paths for nodes that are EDIT: Noticed a typo :P |
@CameronAckermanSEL I'm not sure what you mean by that one. |
No problem. Here's a contrived one
All the nodes from fragment that are inserted are inserted as an Does that make more sense? |
@CameronAckermanSEL I think I see, thanks! In the case that we drop Immutable.js, then wouldn't the "recalculation" involve just But potentially if the paths are stored as a nested tree, that use case might actually get slower. There may be some other way to avoid the recalculation altogether, but I'm not sure of any off the top of my head, since any operation technically could affect it. Allowing an |
@ianstormtaylor Maybe they way we track dirty nodes can differ based on operation. Like for an insert fragment, we only need the start and the end, then we normalize everything in between, right? I'm not sure what that would look like now, but I think exposing a lower level API could maybe allow userland to figure it out. This is one of the benefits from my proposal in #2658. We could allow someone to completely disable normalization, and handle it themselves:
If that proposal doesn't work out, some other way to be able to disable the dirtiness tracking would be really helpful for performance sensitive operations. As it stands now, we're having to do super hacky workarounds to avoid minute long pastes in production. This is on ~1200 line plain text pastes where each line is a block. |
@brendancarney I’d rather not have userland need to care. If you’re interested in speeding up that code path though, could you profile one of those super slow pastes to see exactly which line(s) of code are taking the majority of the time? I’d rather not do a ton of unrelated work if it’s mostly Immutable.js for instance. |
@ianstormtaylor I agree that most users shouldn't have to care, but I think the pros of changing the API a bit and exposing that outweigh the potential maintenance cons. The current way dirty paths are tracked isn't a problem 80% of the time. Unfortunately, there's a good possibility that switching up how it works negatively impacts other cases. @jtadmor may have more info, but I know he worked on a tree based strategy that helped a lot here, but slowed down the normal case. If we had an API that allowed more control over normalization, we could probably make some better decisions based on the path. For example, here's some pseudocode insert fragment.
I think that this ability is valuable internally, and to users of Slate. By exposing it, we allow people who hit performance issues here (which will probably happen in some cases no matter what we do) to figure out solutions to them, implement them in the real world, and see if they make sense to contribute back to core. I'd love to get your thoughts on an API like this: #2658, but if that's probably not going to work out, I'd really like the ability to disable normalization and tracking of dirty paths. Onto the profiling of this specific issue. There are a few layers of issues (probably more, but I haven't gone deeper:
I've tested #2559 and this PR (1 & 2), and combined, they bring some >60s pastes down to ~4-8s, which while slow, is almost passable. |
@brendancarney thanks for pointing out that other PR, I must have read it and forgot to check back when it settled. That one looks great to me, since is just a performance-related change that appears to help us. As for this one, I'm still not convinced yet. For small performance tweaks that don't alter the intent or simplicity of our logic, I'm always down. But this is different, because it changes the logic more drastically and reverts to using keys. Whereas, I think dropping Immutable.js will be very helpful, and is something we already have planned, so it doesn't increase our complexity or maintenance burden at all. I'll personally need more profiling data before I can make a decision about specific improvements or combinations of improvements, if they aren't straightforward ones purely for performance. |
@ianstormtaylor I totally understand. Any thoughts on allowing a way to disable normalization? I'd even be fine with an undocumented, not officially supported way for now just so we don't have to run our own fork. (To be clear, I think it would be a good addition to the official API somehow, but I understand if that requires more thought) |
Hi y'all, Seems like I missed this conversation. @ianstormtaylor As brendan pointed out, I did experiment with a tree-based approach to storing dirty paths. I had hoped it would provide fairly dramatic improvements, and it did perform much better than the current approach to normalization when the tree needs a lot of fixes and on large inserts. Conceptually, I also really like the idea. But there were several downsides:
|
I faced with the same problem. Will this changes be merged in the near future ? |
@ianstormtaylor |
I don't see this PR being merged unfortunately. It's reliant on keys when keys are being removed. I do have an alternative solution in #2658. This issue potentially simplifies the API and gives you more control over normalization. Example:
|
How will that influence the work of editor in general if we turn off the normalization with this new feature? Here is example of using nested structure. If to uncomment line 9-19, you most probably notice increase in scripting time. |
@brendancarney Will be the way to turn this normalization off with the solution you had suggested ? Probably there is an approach how to turn it off now (in general or for some blocks), but I don't know it, please point if I missed that in the documentation or examples. |
As of #3093 (which was just merged), I believe this issue is no longer applicable, because a lot has changed. I'm going through and closing out any potential issues that are not out of date with the overhaul. Thanks for understanding. |
improve calculate dirty performance
yarn test
.yarn lint
. (Fix errors withyarn prettier
.)yarn watch
.)Does this fix any issues or need any specific reviewers?
Fixes: #2414