Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downgrade hashing from SHA-3 to SHA-1 and eliminate unnecessary unnecessary digest #5763

Closed
hubertp opened this issue Feb 24, 2023 · 3 comments · Fixed by #5791
Closed

Downgrade hashing from SHA-3 to SHA-1 and eliminate unnecessary unnecessary digest #5763

hubertp opened this issue Feb 24, 2023 · 3 comments · Fixed by #5791
Assignees
Labels
-compiler p-high Should be completed in the next sprint

Comments

@hubertp
Copy link
Collaborator

hubertp commented Feb 24, 2023

#5700 allows for a quick switch to different hashing algorithms for our CI.
Quick profiling experiments revealed that we spend over a second while computing hashing digest, just to verify that caches are up-to-date.

There are few problems with that:

  • While less susceptible to collision, SHA3 is more expensive to compute compared to SHA-1 or MD5
  • We actually don't have to compute digest of de-serialized objects, only the sources
  • We could potentially switch to timestamps to avoid computing digest all together
@hubertp hubertp added p-high Should be completed in the next sprint -compiler labels Feb 24, 2023
@hubertp hubertp self-assigned this Feb 24, 2023
@hubertp hubertp moved this from ❓New to 📤 Backlog in Issues Board Feb 24, 2023
@hubertp hubertp added this to the Beta Release milestone Feb 24, 2023
@enso-bot
Copy link

enso-bot bot commented Mar 2, 2023

Hubert Plociniczak reports a new STANDUP for yesterday (2023-03-01):

Progress: Started introducing long planned optimizations like downgrading hashing algorithm to SHA-1. Jackon initialization was taking a long time so it is done only once now. Conversions between Java and Scala collections costed us precious ms so tried to eliminate them as much as possible. Got rid of the annoying warning when generating caches. It should be finished by 2023-03-02.

Next Day: Next day I will be working on the #5620 task. In-depth analysis of the visualization problem, investigate further optimizations.

@sylwiabr sylwiabr moved this from 📤 Backlog to 🔧 Implementation in Issues Board Mar 2, 2023
@jdunkerley jdunkerley moved this from 🔧 Implementation to 👁️ Code review in Issues Board Mar 7, 2023
@mergify mergify bot closed this as completed in #5791 Mar 9, 2023
mergify bot pushed a commit that referenced this issue Mar 9, 2023
This change downgrades hashing algorithm used in caching IR and library bindings to SHA-1. It is sufficient and significantly faster for the purpose of simple checksum we use it for.

Additionally, don't calculate the digest for serialized bytes - if we get the expected object type then we are confident about the integrity.

Don't initialize Jackson's ObjectMapper for every metadata serialization/de-serialization. Initialization is very costly.

Avoid unnecessary conversions between Scala and Java. Those back-and-forth `asScala` and `asJava` are pretty expensive.

Finally fix an SBT warning when generating library cache.

Closes #5763

# Important Notes
The change cuts roughly 0.8-1s from the overall startup.
This change will certainly lead to invalidation of existing caches. It is advised to simply start with a clean slate.
@github-project-automation github-project-automation bot moved this from 👁️ Code review to 🟢 Accepted in Issues Board Mar 9, 2023
@enso-bot
Copy link

enso-bot bot commented Mar 9, 2023

Hubert Plociniczak reports a new 🔴 DELAY for yesterday (2023-03-08):

Summary: There is 6 days delay in implementation of the Downgrade hashing from SHA-3 to SHA-1 and eliminate unnecessary unnecessary digest (#5763) task.
It will cause 1 days delay for the delivery of this weekly plan.

Delay Cause: Trying to do more performance testing and left the PR in semi-ready state before holiday.

@enso-bot
Copy link

enso-bot bot commented Mar 9, 2023

Hubert Plociniczak reports a new STANDUP for yesterday (2023-03-08):

Progress: Updating the PR to latest develop. More analysis of the impact on the performance. Verifying the fix for #5620. Looking for more potential improvements. Catching up on work done by others, lots of PR reviews. It should be finished by 2023-03-08.

Next Day: Next day I will be working on the #5763 task. Start working on reconnection design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-compiler p-high Should be completed in the next sprint
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants