-
-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient storage of CSS #1923
Efficient storage of CSS #1923
Conversation
I would like to know the following.
Please let me know if I made a mistake, such as I should raise the issue first. |
Hi! Sorry for the delay, we’re currently very busy working on other projects. Thanks a lot for your pull request, we’ll take the time to review it and get back to you as soon as possible 💜. |
Hi! Thanks a lot for the pull request. I’ve finally taken the time to review your code, and it’s a pretty good idea. You’ll probably love to read this article that takes this idea even a bit further! There are many problems that prevent your idea from being simple to implement. Here are the 2 principal ones I know about:
So it’s a bit too early to get a clean way to implement your idea in a clean way. Working on the first item is possible and will help to at least get a simpler cache structure. The other one is more complicated. |
Thank you for your review.
However, it is worth doing first just the sharing part of saving and retrieving the ComputedStyle, without considering the sharing of each acquired box (e.g., efficient sharing of AnonymousStyle). Specifically, I'm thinking of splitting the There is a possibility that the parent style call may require some changes in code other than If the challenge does not result in clean code, I will give it some time until I can come up with a good solution. |
Sorry, I did not read the article. I don't know if I can do it, but I will try the following first.
I had first thought about sharing by structure, but not being familiar with css, I was looking for a way to reflect the current code without changing it as much as possible. |
👍
That’s just an article I find very inspiring on the topic, but there’s no need to follow it for your code! Maybe one day, but for now we’re far from what Firefox does. Thanks a lot! |
Thanks for the continued support. I have also tried modifying the ComputedStyle, but I am getting errors in the following three cases. I would appreciate if someone could give me some advice. The simple calculation of each code seems to be working, but I think it is wrong that the specified px is automatically calculated in the
It looks like it could be done nicely with less modifications than I thought. We couldn't decide if we should also share AnonymousStyle, so AnonymousStyle is mostly intact. Even with the current code, we have been able to reduce the amount of memory significantly, but there are many issues to be addressed. (It seems to be about 1/4 the size in my test environment.) |
I'm sorry. The test case we compared was incorrect. The slow deepcopy is causing the execution time to increase significantly. I will try to improve the code. Table: 2500rows
|
Table: 2500rows
|
As far as the distribution of memory specifications is concerned, as confirmed by the document.py _render method, the larger the table, the more memory usage will approach 1/3 of the current amount. If it is possible to compress the amount used by the following process, it may be possible to make it even smaller. rendering = cls(
[Page(page_box) for page_box in page_boxes],.
DocumentMetadata(**get_html_metadata(html)),
html.url_fetcher, font_config) A more efficient logic is desired if possible, as some people may not tolerate the slowness. I can see that the following is occurring in the
|
I also tried AnonymousStyle sharing. It still could not be compressed better than the others. Table: 2500rows
|
Significant memory compression is expected for large tables. Table: 13000rows
|
For ComputedStyle with the same cascading style index, would it reduce processing if I could mark it as computed, or would that not be very effective since it would have no effect on AnonymousStyles? I'll try to understand the following process mentioned in this article, which may be necessary, by following the code that is running.
|
The test is now passing, but I'm not very confident in the implementation. |
Thanks for your hard work on this. I like the overall goal of this pull request, but I have two main problems with it:
The global idea is, for each tag, to list all the matching selectors. If the list of selectors is exactly the same, then we can use the same computed values (and avoid a lot of copies). To be honest, I’m not sure that it can really work this way anymore, as more and more computed values depend on the content of the tag (for example when using |
Thank you! Sorry, I was not able to respond to your other issue. I will try to organize it so that it can be corrected in I will try to make some of the processing done in |
The If we can do what is specified in the |
Closing for now, please add a comment if you want to share new commits on this topic. |
Large tables require huge memory to store CSS.
We believe this problem can be solved by streamlining the memory storage of Cascaded styles.
The fix is an attempt to compress what is loaded the first time, but I think it contains a problem that requires deepcopy and is not very fast.
The same kind of fix for
ComputedStyle
compressed the relevant processing part from 178MB to about 13MB for a table with about 3000 rows.Because
ComputedStyle
shares cascaded styles, it must be saved once without the element, parent style, and root style. What I tried at this point was to create and use a separate class for relationships.I think a complete modification would also require sharing Box and Page styles. (I'm also wondering if it would be better to change the parenting scheme for the Class itself).
I need to handle a table with about 20,000 rows,
Currently, I am assuming 4-6 GB will be used. I think I can get it down to under 500MB once it is fully modified.
Thanks for reading.