Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(server): calculate sha1 checksum #525

Merged
merged 16 commits into from
Aug 31, 2022

Conversation

panoti
Copy link
Contributor

@panoti panoti commented Aug 23, 2022

  • calculate sha1 checksum by stream
  • store sha1 checksum into asset entity

Copy link
Contributor

@fyfrey fyfrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the approach of writing the file while computing the hash on-the-fly

server/apps/immich/src/utils/disk-storage.ts Outdated Show resolved Hide resolved
server/libs/database/src/entities/asset.entity.ts Outdated Show resolved Hide resolved
server/libs/database/src/entities/exif.entity.ts Outdated Show resolved Hide resolved
@alextran1502
Copy link
Contributor

Are you still planning on benchmarking the two methods of calculating the hash (couple into Multer's storage engine vs. operating the process separately?)

@panoti
Copy link
Contributor Author

panoti commented Aug 25, 2022

Are you still planning on benchmarking the two methods of calculating the hash (couple into Multer's storage engine vs. operating the process separately?)

Yeah, I am preparing env for this benchmark. I hope it will be done ASAP.

@alextran1502
Copy link
Contributor

Are you still planning on benchmarking the two methods of calculating the hash (couple into Multer's storage engine vs. operating the process separately?)

Yeah, I am preparing env for this benchmark. I hope it will be done ASAP.

Thank you, no rush 😃

@panoti panoti marked this pull request as draft August 25, 2022 17:02
@panoti
Copy link
Contributor Author

panoti commented Aug 27, 2022

Benchmark source code (https://github.com/panoti/hash-benchmark). There are 2 parts in the benchmark source code: server and client.

Server

To test the computational ability of the 2 algorithms, I build a NodeJS app with 2 entrypoints: POST /sequence-hasing and POST /stream-hasing.

  • POST /sequence-hasing: The hash is applied when the file upload is successful.
  • POST /stream-hasing: The hash is applied on the upload stream.

Client

Using k6.io to simulate file upload. crypto.randomBytes(<file size>) will create a buffer to simulate a file.

  • sequence-test-1MB.js: test POST /sequence-hasing with 1 MB file.
  • stream-test-1MB.js: test POST /stream-hasing with 1 MB file.
  • sequence-test-50MB.js: test POST /sequence-hasing with 50 MB file.
  • stream-test-50MB.js: test POST /stream-hasing with 50 MB file.
  • sequence-test-200MB.js: test POST /sequence-hasing with 200 MB file.
  • stream-test-200MB.js: test POST /stream-hasing with 200 MB file.

Run benchmark on server

System Info
----------------------------------------------------------------------
CPU model            : Intel(R) Xeon(R) CPU E5-4650 0 @ 2.70GHz
Number of cores      : 16
CPU frequency        : 1440.575 MHz
Total size of Disk   : 11943.9 GB (9384.6 GB Used)
Total amount of Mem  : 15934 MB (11555 MB Used)
Total amount of Swap : 4095 MB (175 MB Used)
System uptime        : 0 days, 12 hour 42 min
Load average         : 1.05, 0.80, 0.87
OS                   : Ubuntu 20.04.4 LTS
Arch                 : x86_64 (64 Bit)
Kernel               : 5.4.0-125-generic
Virt                 : No Virt

Disk Speed
----------------------------------------------------------------------
dd Test
I/O (1st run)        : 82.3 MB/s
I/O (2nd run)        : 85.5 MB/s
I/O (3rd run)        : 84.2 MB/s
Average              : 84.0 MB/s
-----------------------------------
Fio Test
Read performance     : 165MB/s
Read IOPS            : 40.3k
Write performance    : 55.1MB/s
Write IOPS           : 13.5k

Sequence hashing 1MB (08/27/2022 05:53+7)

To simulate 10 virtual users upload a 1MB file during 5 minutes (POST /sequence-hasing).

sudo docker run -it --rm --network mynet ghcr.io/panoti/hash-benchmark/client:main run --vus 10 --duration 5m /app/sequence-test-1MB.js

Client logs

running (5m01.1s), 00/10 VUs, 867 complete and 0 interrupted iterations
default ✓ [======================================] 10 VUs  5m0s

     ✓ is status 200

     checks.........................: 100.00% ✓ 867      ✗ 0
     data_received..................: 276 kB  916 B/s
     data_sent......................: 867 MB  2.9 MB/s
     http_req_blocked...............: avg=57.51µs min=8.29µs   med=40.77µs max=2.83ms   p(90)=73.4µs  p(95)=83.19µs
     http_req_connecting............: avg=9.95µs  min=0s       med=0s      max=2.3ms    p(90)=0s      p(95)=0s
     http_req_duration..............: avg=39.81ms min=9.32ms   med=35.16ms max=146.23ms p(90)=62.44ms p(95)=76.8ms
       { expected_response:true }...: avg=39.81ms min=9.32ms   med=35.16ms max=146.23ms p(90)=62.44ms p(95)=76.8ms
     http_req_failed................: 0.00%   ✓ 0        ✗ 867
     http_req_receiving.............: avg=1.13ms  min=65.86µs  med=408.9µs max=29.26ms  p(90)=1.14ms  p(95)=6.3ms
     http_req_sending...............: avg=3.4ms   min=472.91µs med=2.34ms  max=31ms     p(90)=6.71ms  p(95)=10.74ms
     http_req_tls_handshaking.......: avg=0s      min=0s       med=0s      max=0s       p(90)=0s      p(95)=0s
     http_req_waiting...............: avg=35.27ms min=0s       med=30.51ms max=143.36ms p(90)=57.99ms p(95)=72.29ms
     http_reqs......................: 867     2.879463/s
     iteration_duration.............: avg=3.46s   min=2.5s     med=3.46s   max=4s       p(90)=3.61s   p(95)=3.67s
     iterations.....................: 867     2.879463/s
     vus............................: 2       min=2      max=10
     vus_max........................: 10      min=10     max=10

Server logs

image

image

image

image

Stream hashing 1MB (08/27/2022 06:08+7)

To simulate 10 virtual users upload a 1MB file during 5 minutes (POST /stream-hasing).

sudo docker run -it --rm --network mynet ghcr.io/panoti/hash-benchmark/client:main run --vus 10 --duration 5m /app/stream-test-1MB.js

Client logs

running (5m01.6s), 00/10 VUs, 870 complete and 0 interrupted iterations
default ✓ [======================================] 10 VUs  5m0s

     ✓ is status 200

     checks.........................: 100.00% ✓ 870      ✗ 0
     data_received..................: 277 kB  917 B/s
     data_sent......................: 870 MB  2.9 MB/s
     http_req_blocked...............: avg=59.81µs min=8µs      med=38.59µs  max=6.18ms   p(90)=74.35µs  p(95)=83.17µs
     http_req_connecting............: avg=8.58µs  min=0s       med=0s       max=3.84ms   p(90)=0s       p(95)=0s
     http_req_duration..............: avg=32.25ms min=9.22ms   med=27.5ms   max=143.77ms p(90)=54.2ms   p(95)=65.32ms
       { expected_response:true }...: avg=32.25ms min=9.22ms   med=27.5ms   max=143.77ms p(90)=54.2ms   p(95)=65.32ms
     http_req_failed................: 0.00%   ✓ 0        ✗ 870
     http_req_receiving.............: avg=1.01ms  min=68.63µs  med=375.87µs max=25.24ms  p(90)=972.83µs p(95)=3.83ms
     http_req_sending...............: avg=3.3ms   min=498.13µs med=2.13ms   max=41.54ms  p(90)=6.37ms   p(95)=10.69ms
     http_req_tls_handshaking.......: avg=0s      min=0s       med=0s       max=0s       p(90)=0s       p(95)=0s
     http_req_waiting...............: avg=27.93ms min=0s       med=23.22ms  max=138.43ms p(90)=49.14ms  p(95)=59.41ms
     http_reqs......................: 870     2.884405/s
     iteration_duration.............: avg=3.45s   min=2.35s    med=3.45s    max=3.87s    p(90)=3.6s     p(95)=3.65s
     iterations.....................: 870     2.884405/s
     vus............................: 4       min=4      max=10
     vus_max........................: 10      min=10     max=10

Server logs

image

image

image

image

Test case Client avg duration Server avg duration Server CPU Server Memory
sequence-test-1MB.js 39.81ms 34 ms 6.47% 67 MB
stream-test-1MB.js 32.25ms 26.6 ms 4.57% 70.4 MB

@panoti
Copy link
Contributor Author

panoti commented Aug 28, 2022

Node app host on server: Jetson Nano 2GB

System Info
----------------------------------------------------------------------
CPU model            : ARMv8 Processor rev 1 (v8l)
Number of cores      : 4
CPU frequency        :  MHz
Total size of Disk   : 526.0 GB (85.0 GB Used)
Total amount of Mem  : 1979 MB (492 MB Used)
Total amount of Swap : 5085 MB (0 MB Used)
System uptime        : 0 days, 0 hour 25 min
Load average         : 0,73, 0,46, 0,24
OS                   : Ubuntu 18.04.6 LTS
Arch                 : aarch64 (64 Bit)
Kernel               : 4.9.253-tegra
Virt                 : No Virt

Disk Speed
----------------------------------------------------------------------
dd Test
I/O (1st run)        : 6 MB/s
I/O (2nd run)        : 3 MB/s
I/O (3rd run)        : 7 MB/s
Average              : 5.3 MB/s
-----------------------------------
Fio Test
Read performance     : 7820kB/s
Read IOPS            : 1909
Write performance    : 2609kB/s
Write IOPS           : 636

sequence-test-1MB.js (08/28/2022 01:01+7)

To simulate 4 virtual users upload a 1MB file during 5 minutes (POST /sequence-hasing).

k6 run --vus 4 --duration 5m ./sequence-test-1MB.js

Client logs

running (5m01.3s), 0/4 VUs, 615 complete and 0 interrupted iterations
default ✓ [======================================] 4 VUs  5m0s

     ✓ is status 200

     checks.........................: 100.00% ✓ 615      ✗ 0
     data_received..................: 196 kB  649 B/s
     data_sent......................: 615 MB  2.0 MB/s
     http_req_blocked...............: avg=78.5µs   min=0s      med=0s       max=17.84ms  p(90)=0s       p(95)=0s
     http_req_connecting............: avg=48.68µs  min=0s      med=0s       max=17.84ms  p(90)=0s       p(95)=0s
     http_req_duration..............: avg=145.95ms min=94.43ms med=129.78ms max=353.93ms p(90)=210.27ms p(95)=232.82ms
       { expected_response:true }...: avg=145.95ms min=94.43ms med=129.78ms max=353.93ms p(90)=210.27ms p(95)=232.82ms
     http_req_failed................: 0.00%   ✓ 0        ✗ 615
     http_req_receiving.............: avg=193.92µs min=0s      med=0s       max=12.54ms  p(90)=0s       p(95)=526.33µs
     http_req_sending...............: avg=95.95ms  min=0s      med=95.19ms  max=324.19ms p(90)=168.39ms p(95)=190.15ms
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s       max=0s       p(90)=0s       p(95)=0s
     http_req_waiting...............: avg=49.8ms   min=0s      med=31.86ms  max=275.7ms  p(90)=113.65ms p(95)=127.1ms
     http_reqs......................: 615     2.041428/s
     iteration_duration.............: avg=1.95s    min=1.41s   med=1.91s    max=2.63s    p(90)=2.27s    p(95)=2.38s
     iterations.....................: 615     2.041428/s
     vus............................: 2       min=2      max=4
     vus_max........................: 4       min=4      max=4

Server logs

image

image

image

image

stream-test-1MB.js (08/28/2022 01:12+7)

To simulate 4 virtual users upload a 1MB file during 5 minutes (POST /stream-hasing).

k6 run --vus 4 --duration 5m ./stream-test-1MB.js

Client logs

running (5m01.2s), 0/4 VUs, 646 complete and 0 interrupted iterations
default ✓ [======================================] 4 VUs  5m0s

     ✓ is status 200

     checks.........................: 100.00% ✓ 646      ✗ 0
     data_received..................: 205 kB  682 B/s
     data_sent......................: 646 MB  2.1 MB/s
     http_req_blocked...............: avg=288.32µs min=0s      med=0s       max=63.32ms  p(90)=0s       p(95)=0s
     http_req_connecting............: avg=284.76µs min=0s      med=0s       max=63.32ms  p(90)=0s       p(95)=0s
     http_req_duration..............: avg=136.42ms min=81.02ms med=114.99ms max=485.26ms p(90)=200.88ms p(95)=247.74ms
       { expected_response:true }...: avg=136.42ms min=81.02ms med=114.99ms max=485.26ms p(90)=200.88ms p(95)=247.74ms
     http_req_failed................: 0.00%   ✓ 0        ✗ 646
     http_req_receiving.............: avg=287.39µs min=0s      med=0s       max=22.53ms  p(90)=495.45µs p(95)=1ms
     http_req_sending...............: avg=91.64ms  min=30.31ms med=78.57ms  max=379.24ms p(90)=143.27ms p(95)=172.22ms
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s       max=0s       p(90)=0s       p(95)=0s
     http_req_waiting...............: avg=44.49ms  min=11ms    med=38.27ms  max=292.25ms p(90)=77.55ms  p(95)=106.01ms
     http_reqs......................: 646     2.144836/s
     iteration_duration.............: avg=1.86s    min=1.34s   med=1.78s    max=3.06s    p(90)=2.2s     p(95)=2.35s
     iterations.....................: 646     2.144836/s
     vus............................: 1       min=1      max=4
     vus_max........................: 4       min=4      max=4

Server logs

image

image

image

image

sequence-test-50MB.js (08/28/2022 01:12+7)

To simulate 1 virtual users upload a 50MB file during 10 minutes (POST /sequence-hasing).

k6 run --vus 1 --duration 10m ./sequence-test-50MB.js

Client logs

running (10m30.4s), 0/1 VUs, 7 complete and 1 interrupted iterations
default ✓ [======================================] 1 VUs  10m0s

     ✓ is status 200

     checks.........................: 100.00% ✓ 7                    ✗ 0
     data_received..................: 2.2 kB  3.5308572697925267 B/s
     data_sent......................: 350 MB  555 kB/s
     http_req_blocked...............: avg=26.1ms   min=10.07ms med=15.8ms   max=89.59ms p(90)=50.42ms p(95)=70ms
     http_req_connecting............: avg=9.17ms   min=0s      med=10.12ms  max=13.3ms  p(90)=13.18ms p(95)=13.24ms
     http_req_duration..............: avg=5.27s    min=5.16s   med=5.27s    max=5.38s   p(90)=5.36s   p(95)=5.37s
       { expected_response:true }...: avg=5.27s    min=5.16s   med=5.27s    max=5.38s   p(90)=5.36s   p(95)=5.37s
     http_req_failed................: 0.00%   ✓ 0                    ✗ 7
     http_req_receiving.............: avg=10.19ms  min=0s      med=5.53ms   max=43.94ms p(90)=23.96ms p(95)=33.95ms
     http_req_sending...............: avg=4.3s     min=4.28s   med=4.3s     max=4.32s   p(90)=4.31s   p(95)=4.32s
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s       max=0s      p(90)=0s      p(95)=0s
     http_req_waiting...............: avg=966.01ms min=873.7ms med=960.35ms max=1.07s   p(90)=1.05s   p(95)=1.06s
     http_reqs......................: 7       0.011103/s
     iteration_duration.............: avg=1m19s    min=1m2s    med=1m23s    max=1m36s   p(90)=1m34s   p(95)=1m35s
     iterations.....................: 7       0.011103/s
     vus............................: 1       min=1                  max=1
     vus_max........................: 1       min=1                  max=1

Server logs

image

image

image

image

sequence-test-50MB.js (08/28/2022 01:12+7)

To simulate 1 virtual users upload a 50MB file during 10 minutes (POST /stream-hasing).

k6 run --vus 1 --duration 10m ./sequence-test-50MB.js

Client logs

running (10m12.0s), 0/1 VUs, 7 complete and 0 interrupted iterations
default ✓ [======================================] 1 VUs  10m0s

     ✓ is status 200

     checks.........................: 100.00% ✓ 7                    ✗ 0
     data_received..................: 2.2 kB  3.6374924022796624 B/s
     data_sent......................: 350 MB  572 kB/s
     http_req_blocked...............: avg=13.32ms min=10.99ms med=14.09ms max=15ms    p(90)=14.99ms p(95)=15ms
     http_req_connecting............: avg=6.72ms  min=6ms     med=6.48ms  max=8.01ms  p(90)=7.77ms  p(95)=7.89ms
     http_req_duration..............: avg=4.31s   min=4.26s   med=4.3s    max=4.4s    p(90)=4.35s   p(95)=4.37s
       { expected_response:true }...: avg=4.31s   min=4.26s   med=4.3s    max=4.4s    p(90)=4.35s   p(95)=4.37s
     http_req_failed................: 0.00%   ✓ 0                    ✗ 7
     http_req_receiving.............: avg=18.87ms min=1.63ms  med=7.13ms  max=98.29ms p(90)=45.28ms p(95)=71.79ms
     http_req_sending...............: avg=4.27s   min=4.18s   med=4.29s   max=4.3s    p(90)=4.3s    p(95)=4.3s
     http_req_tls_handshaking.......: avg=0s      min=0s      med=0s      max=0s      p(90)=0s      p(95)=0s
     http_req_waiting...............: avg=23.7ms  min=3.59ms  med=7.02ms  max=90.25ms p(90)=63.47ms p(95)=76.86ms
     http_reqs......................: 7       0.011439/s
     iteration_duration.............: avg=1m27s   min=1m16s   med=1m26s   max=1m37s   p(90)=1m35s   p(95)=1m36s
     iterations.....................: 7       0.011439/s
     vus............................: 1       min=1                  max=1
     vus_max........................: 1       min=1                  max=1

Server logs

image

image

image

image

Summary

Test case Client avg duration Server avg duration Server CPU Server Memory
sequence-test-1MB.js 145.95ms 133 ms 7.01% 74.2 MB
stream-test-1MB.js 136.42ms 122 ms 7.00% 78.6 MB
sequence-test-50MB.js 5.27s 5.24 s 4.53% 75.4 MB
stream-test-50MB.js 4.27s 4.28 s 3.50% 72.8 MB

@panoti panoti marked this pull request as ready for review August 28, 2022 09:21
@alextran1502
Copy link
Contributor

What does the client's average duration account for? upload time?

@panoti
Copy link
Contributor Author

panoti commented Aug 28, 2022

What does the client's average duration account for? upload time?

This is latency per upload request.

@panoti panoti linked an issue Aug 29, 2022 that may be closed by this pull request
4 tasks
@panoti panoti changed the title feat(server): calculate sha1 checksum at the same time uploading asset feat(server): calculate sha1 checksum Aug 30, 2022
@alextran1502 alextran1502 merged commit b80dca7 into immich-app:main Aug 31, 2022
@panoti panoti deleted the feat/sha1-hashing branch August 31, 2022 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants