-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
De-duplicate images #285
Comments
Did some quick benchmarking today with our current approach. The approach looks promising and we should definitely do this and explore more options without killing ES. Benchmark resultsScenarioIn all the below results, both todos and elastic homepage is run Todos journeys - All tests under our example/todos directory Elastic inline journey - Go to homepage and hover over products ResultsIndex size based on the three tests
Summary
|
Great work, amazing to see a 3X improvement on the elastic.co website! I can only imagine that the savings increase the more runs that are done. Can you say more about the pressure on ES? One simple optimization we have not yet done is to use the It looks like de duping on the complete buffer fared very poorly on the elastic.co website. That doesn't seem to be a viable approach if it's dependent on customers having very simple sites as with the todos journey which don't vary over time. Iit makes sense to stay with the blocks because we have additional avenues for optimization as well. For instance, we could do visual diffs between images to further compress individual blocks. Think I-Frames in video processing. So long as the ImageRef objects are essentially lists of compositing operations that gives us a lot of room to optimize things on the agent side |
I'll add that there are a lot of avenues to pursue in terms of optimization here but I think the next step is to get a solid implementation down. We can iterate in the future and improve things. |
++ to your points, It was just for educational purposes and see how well it performs. Ideally I dont expect users to have the same exact screenshot for each and every run.
When the benchmarking was run, ES container was killed as the ES process became unhealthy and also queries took a like in order of > 20 seconds even for small match query. I believe its due to constantly changing the id of the underlying document, may be its worth checking again with the
💯 Lets iterate on our current approach and also figure out the optimal block size with more benchmarks. |
Ran the benchmarks with the I have not encountered any ES pressure this time, Still unsure if the previous default indexing strategy was the cause for this. ResultsTodos journey 10 runs - 481 kb - 100kb less than previous run. |
Tried with changing the block size to 16 which resulted in 256 images for an image (1280*720), the results were not that great (Todos journey 10 runs - 1.2 mb). Let's stick to 8 as per the PR which results in 64 image blocks for an image. |
Currently screenshots take up too much storage space in synthetics. While there are many approaches to improving this this issue covers our initial approach which will be de-duplication of images, or rather sections of images that repeat.
This is accomplished via a content –addressable scheme of storage where we take each image slice it up into
n
parts, hash each part and use that hash as an elasticsearch document ID. We can then describe each image as a series of image processing operations compositing these parts into a final canvas either on the client or server-side.See this PR: #282
The text was updated successfully, but these errors were encountered: