-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
size of serialized DOM #151
Comments
Reducing the size of the serialized DOM would be great. Did you want to create a PR ? |
+1 on this proposal. Besides, what we can do is: |
@bingjie3216 There's an option to keep absolute css paths within the html. Even with this option on the DOM size is sizeable. |
@eoghanmurray @IMFIL @bingjie3216 Thanks for the feedback! I believe there is a huge potential to reduce the size of the recorded events(I always seeing 90% size reduction when I gzip the events). So the works for reducing size may contain the following part:
Some pack/unpack strategies I know including:
|
Sorry for the later. I think this issue is the most important one in the current stage, and I would like to provide a solution int the next major release. With the ideas that I illustrated above, I have done some POC code in this repo. Currently, I have implemented a analyze framework and several packers:
Now the msgpack packer is not working as intend and I'm still checking my implementation. The other two shows some good result when testing on two real-world events log. I'm using two real-world events log to benchmark the packers:
=== simplee1
e2
pakoe1
e2
|
Pako seems like quite a big dependancy (63% of the size of rrweb itself if npm is to be believed). Are there any lighter weight alternatives? I bring it up, as the bigger this library becomes, the less attractive it is to bundle to end users. |
@ChuckJonas Thanks for the feedback. I believe there are several important aspects when designing the packer plugin. EfficiencyWith some experiments mentioned above or not, it seems some Zlib-like compressing algorithms are most efficient in processing rrweb's events. Data compressing is not the only way to reduce the size of events. User can still do things like:
But data compressing is the simplest and the most versatile way to do this. Users can add one-line code and see up to 90% reduction of events size. Although there is some trade-off between efficiency and other aspects, it is still the most important one since it affects MiB~TiB level data. Browser runtime sizeOne of rrweb's advantage is its minimal runtime size(5.9 KiB gzipped size for the recorder). So if we decide to add a packer to rrweb, we will:
Besides that, users can still choose to run the packing process in their server, instead of in end-users' browsers(think about the reverse of edge computing). import { pack } from 'rrweb'
server.post('/events', (req, res) => {
const packedData = pack(req.body)
saveDataToDB(packedData)
res.send('Ok')
}) In this way, the trade-off is end-users will not load the pack plugin bundle, but will still have a relative high transfer data size and your server will become a centralize packing factory. SimplicityI prefer to provide an easy-to-use API for rrweb users, which means they can decide to pack or not with a simple boolean flag. A table of several packer plugin choices along with their trade-off on bundle size, efficiency, CPU costing, etc is not the thing I would like to ship in rrweb. |
Appreciate the detailed response thank you, the planned plugin nature of it alleviates any concerns. |
I like the idea of packing serverside to offload the work, but also like the idea of doing it on the client to just...speed up the transfer of events/ideally slim down the network traffic size. Tradeoffs, for sure. Probably will implement on the server personally, but would generally love documentation around this concept. |
I have a question.
compare to the original not packed object sent to backend , the compress rate is about 0.1 - 0.5 |
Just a reminder that my original proposal related to being a bit more careful/efficient in the JSON format itself. Reducing the repetitive aspects of the original JSON would provide advantages in transmission as well preempt much of the need for zipping either client side or server side.
Here's a quick analysis of a sample JSON DOM structure showing repetitive keys:
And here's the empty nodes e.g. (Here's the code I executed at the console to come up with these figures:
) So by e.g. abbreviating This could be done in a backwards compatible way so that it's still possible to playback non-abbreviated content. |
Yes, shorten the JSON keys will help in some cases. But we are also seeing some size issues in the case of:
Considering a situation like this:
rrweb will collect a lot of data in the process, which can be greatly compressed by gzip(because the data are quite similar, e.g, every row of the table). So I think to introduce pako is a more general solution. But I'm very open to the packer plugin system, I suggest anyone can build a compatible ad-hoc packer plugin based on its interface. |
Cool; for my use case I'll be sending events over a websocket connection as they happen; so I was only seeing compression in terms of compressing single events at a time; in particular the event which has the initial DOM tree. Adding Pako or similar wouldn't be a runner as for my project the size of the .js deliverable is a big factor. |
For anyone who is interested in this issue, the packer plugin API has finally been stabilized. The purposed API looks like this: /**
* Now you can import the official pack and unpack function from the rrweb package.
*
* The pack and unpack code was implemented in separate modules. So bundlers
* could tree-shaking them if you do not import, which means there should be
* no bundle size difference when you are not going to use the packer feature.
*/
import { record, pack, unpack, Replayer } from 'rrweb';
/**
* When recording, you just need to pass pack as the packFn property to the
* record function.
*/
record({
emit(event) {
// event is the result returned by the pack function
},
packFn: pack
})
/**
* When replaying, you just need to pass unpack as the unpackFn property to
* the replayer.
*
* The official unpack function has the compatibility to process both non-packed events
* and packed events. This is strongly recommended if you are going to implement your
* own packer.
*/
const player = new Replayer(events, {
root: document.body,
unpackFn: unpack
})
player.play()
/**
* As we say 'official', it means you can also implement your own pack/unpack functions.
* For example, you can pack the data by replacing the 'type' property name with a shorter
* one like 't'.
*
* Also you need to unpack the event to a valid rrweb event schema.
*/
function myPack(event) {
event.t = event.type
delete event.type
return event
}
function myUnpack(event) {
event.type = event.t
delete event.t
return event
} I planned to merge the packer PR tomorrow, any feedback welcomed. |
@Yuyz0112 I believe this API proposal for the pack plugin looks great. Do you have committed it somewhere where I can test it out? |
@MaheshCasiraghi You can check this branch: https://github.com/rrweb-io/rrweb/tree/packer |
Is there any documentation on how to disable pako? It's being bundled now in my |
@eoghanmurray Are you using some bundlers like webpack? |
We are concatenating in |
I would have thought that if I wanted the packing capabilities I'd use the new |
Since you are using dist/rrweb.min.js, do you need both record and replay features in the same time in your app? |
No, sorry to clarify, we're concatenating |
I forgot to mention that including the current |
I have previously used pako for data compression and transferred to the node backend for decompression using pako. The compression and decompression functions used by pako are pako.deflate/ pako.inflate. |
I think there are two things could be explained here. The bundle sizerrweb provides three kinds of module system bundle file as output: With the So if people are using these bundlers with A sample code looks like this: // if you do not import pack and unpack functions, they will not be bundled into your final JS file.
import { record, Replayer } from 'rrweb'
record()
new Replayer() How about users not using es moduleIf users are not using es module, especially for the users just use a script tag to load rrweb(which is also my favorite way), they could not benefit from tree-shaking. So I provide some other bundle files for different use cases. For example, there is a bundle file called But when features growing, there are too many combinations of bundling. So currently I provide these combinations:
If you want to bundle rrweb to your website_1 for collecting events and use your website_2 for replaying, you can load if you also want to pack events when recording, you can load additional script So when I see @eoghanmurray 's question, I asked do you need both the record and replayer code on the same page and care about the bundle size of the page? If you do not really need the replayer code on the same page, things are easy. You can just use rrweb-record.js which is small. If you do, I think there are two options:
Custom Pack function@shmilyoo First I suggest reading the comments above about bundle size, so we will on the same page that it is possible to not increase bundle size when you are not loading the official packer plugin. Furthermore, rrweb's plugin system provides the flexibility for using your own pack/unpack functions like this:
Do you think that solve your problems? |
I'm not seeing the reference error. My demo: https://codepen.io/yuyz0112/pen/BaojoMd |
Ah sorry, my bad on that one — I did |
(I'd also note that the terminology |
Also, for someone using |
I have done some similar things to the rollup config(https://github.com/rrweb-io/rrweb/blob/master/rollup.config.js#L30-L46). Now add another bundle mode is much easier.
I don't think to expose pako to rrweb users is a great idea, because we may not stick to pako forever and different usage on the pako may cause some problems which are hard to debug. So a simple wrapper on the pako interface may be more stable for rrweb users. |
hmm, now that you are including pako in the core, I imagine that recordings will begun to be created with it.
If this is the goal, then the main |
I tried yesterday to create a |
There are two choices:
So the second one sounds better? |
@Yuyz0112 Using custom pack function is ok, absolutely. |
yes, for me anyhow; but I'd also be happy with an ENV switch which could remove pako from the (also an ENV variable which could omit all recording functionality for a purely 'replayer' version would be super also now that I think of it!) |
I've made a new PR for the bundle things: #199 I think we can move the further discussion to there. |
…o#151) Not 100% sure, but I _think_ this is the cause of publish failing here: https://github.com/getsentry/rrweb/actions/runs/7277648321/job/19830279475
I'm seeing 10x character size of the serialization of the initial DOM state (
EventType.FullSnapshot
) compared with a plain HTML representation of the same thing.Is minimizing the size of this on the agenda as a design goal?
I'm thinking that it could be reduced as follows:
attributes
toattrs
childNodes
/attributes
lists/objects (making them implicit)type: 2
(type: NodeType.Element) and similar, as that can be inferred from presence ofchildNodes
isSVG
/isStyle
boolean attributes if they are unusual (i.e. True)Are there any strong reasons not to do any of the above?
The text was updated successfully, but these errors were encountered: