feat(server): calculate sha1 checksum #525

panoti · 2022-08-23T16:05:26Z

calculate sha1 checksum by stream
store sha1 checksum into asset entity

server/libs/database/src/migrations/1661270339373-AddAssetChecksum.ts

server/apps/immich/src/utils/disk-storage.ts

…Checksum for future

server/apps/immich/src/utils/disk-storage.ts

fyfrey

I really like the approach of writing the file while computing the hash on-the-fly

server/apps/immich/src/utils/disk-storage.ts

server/libs/database/src/entities/asset.entity.ts

server/libs/database/src/entities/exif.entity.ts

alextran1502 · 2022-08-25T14:54:45Z

Are you still planning on benchmarking the two methods of calculating the hash (couple into Multer's storage engine vs. operating the process separately?)

panoti · 2022-08-25T15:34:51Z

Are you still planning on benchmarking the two methods of calculating the hash (couple into Multer's storage engine vs. operating the process separately?)

Yeah, I am preparing env for this benchmark. I hope it will be done ASAP.

alextran1502 · 2022-08-25T16:04:16Z

Are you still planning on benchmarking the two methods of calculating the hash (couple into Multer's storage engine vs. operating the process separately?)

Yeah, I am preparing env for this benchmark. I hope it will be done ASAP.

Thank you, no rush 😃

panoti · 2022-08-27T12:37:54Z

Benchmark source code (https://github.com/panoti/hash-benchmark). There are 2 parts in the benchmark source code: server and client.

Server

To test the computational ability of the 2 algorithms, I build a NodeJS app with 2 entrypoints: POST /sequence-hasing and POST /stream-hasing.

POST /sequence-hasing: The hash is applied when the file upload is successful.
POST /stream-hasing: The hash is applied on the upload stream.

Client

Using k6.io to simulate file upload. crypto.randomBytes(<file size>) will create a buffer to simulate a file.

sequence-test-1MB.js: test POST /sequence-hasing with 1 MB file.
stream-test-1MB.js: test POST /stream-hasing with 1 MB file.
sequence-test-50MB.js: test POST /sequence-hasing with 50 MB file.
stream-test-50MB.js: test POST /stream-hasing with 50 MB file.
sequence-test-200MB.js: test POST /sequence-hasing with 200 MB file.
stream-test-200MB.js: test POST /stream-hasing with 200 MB file.

Run benchmark on server

System Info
----------------------------------------------------------------------
CPU model            : Intel(R) Xeon(R) CPU E5-4650 0 @ 2.70GHz
Number of cores      : 16
CPU frequency        : 1440.575 MHz
Total size of Disk   : 11943.9 GB (9384.6 GB Used)
Total amount of Mem  : 15934 MB (11555 MB Used)
Total amount of Swap : 4095 MB (175 MB Used)
System uptime        : 0 days, 12 hour 42 min
Load average         : 1.05, 0.80, 0.87
OS                   : Ubuntu 20.04.4 LTS
Arch                 : x86_64 (64 Bit)
Kernel               : 5.4.0-125-generic
Virt                 : No Virt

Disk Speed
----------------------------------------------------------------------
dd Test
I/O (1st run)        : 82.3 MB/s
I/O (2nd run)        : 85.5 MB/s
I/O (3rd run)        : 84.2 MB/s
Average              : 84.0 MB/s
-----------------------------------
Fio Test
Read performance     : 165MB/s
Read IOPS            : 40.3k
Write performance    : 55.1MB/s
Write IOPS           : 13.5k

Sequence hashing 1MB (08/27/2022 05:53+7)

To simulate 10 virtual users upload a 1MB file during 5 minutes (POST /sequence-hasing).

sudo docker run -it --rm --network mynet ghcr.io/panoti/hash-benchmark/client:main run --vus 10 --duration 5m /app/sequence-test-1MB.js

Client logs

running (5m01.1s), 00/10 VUs, 867 complete and 0 interrupted iterations
default ✓ [======================================] 10 VUs  5m0s

     ✓ is status 200

     checks.........................: 100.00% ✓ 867      ✗ 0
     data_received..................: 276 kB  916 B/s
     data_sent......................: 867 MB  2.9 MB/s
     http_req_blocked...............: avg=57.51µs min=8.29µs   med=40.77µs max=2.83ms   p(90)=73.4µs  p(95)=83.19µs
     http_req_connecting............: avg=9.95µs  min=0s       med=0s      max=2.3ms    p(90)=0s      p(95)=0s
     http_req_duration..............: avg=39.81ms min=9.32ms   med=35.16ms max=146.23ms p(90)=62.44ms p(95)=76.8ms
       { expected_response:true }...: avg=39.81ms min=9.32ms   med=35.16ms max=146.23ms p(90)=62.44ms p(95)=76.8ms
     http_req_failed................: 0.00%   ✓ 0        ✗ 867
     http_req_receiving.............: avg=1.13ms  min=65.86µs  med=408.9µs max=29.26ms  p(90)=1.14ms  p(95)=6.3ms
     http_req_sending...............: avg=3.4ms   min=472.91µs med=2.34ms  max=31ms     p(90)=6.71ms  p(95)=10.74ms
     http_req_tls_handshaking.......: avg=0s      min=0s       med=0s      max=0s       p(90)=0s      p(95)=0s
     http_req_waiting...............: avg=35.27ms min=0s       med=30.51ms max=143.36ms p(90)=57.99ms p(95)=72.29ms
     http_reqs......................: 867     2.879463/s
     iteration_duration.............: avg=3.46s   min=2.5s     med=3.46s   max=4s       p(90)=3.61s   p(95)=3.67s
     iterations.....................: 867     2.879463/s
     vus............................: 2       min=2      max=10
     vus_max........................: 10      min=10     max=10

Server logs

Stream hashing 1MB (08/27/2022 06:08+7)

To simulate 10 virtual users upload a 1MB file during 5 minutes (POST /stream-hasing).

sudo docker run -it --rm --network mynet ghcr.io/panoti/hash-benchmark/client:main run --vus 10 --duration 5m /app/stream-test-1MB.js

Client logs

running (5m01.6s), 00/10 VUs, 870 complete and 0 interrupted iterations
default ✓ [======================================] 10 VUs  5m0s

     ✓ is status 200

     checks.........................: 100.00% ✓ 870      ✗ 0
     data_received..................: 277 kB  917 B/s
     data_sent......................: 870 MB  2.9 MB/s
     http_req_blocked...............: avg=59.81µs min=8µs      med=38.59µs  max=6.18ms   p(90)=74.35µs  p(95)=83.17µs
     http_req_connecting............: avg=8.58µs  min=0s       med=0s       max=3.84ms   p(90)=0s       p(95)=0s
     http_req_duration..............: avg=32.25ms min=9.22ms   med=27.5ms   max=143.77ms p(90)=54.2ms   p(95)=65.32ms
       { expected_response:true }...: avg=32.25ms min=9.22ms   med=27.5ms   max=143.77ms p(90)=54.2ms   p(95)=65.32ms
     http_req_failed................: 0.00%   ✓ 0        ✗ 870
     http_req_receiving.............: avg=1.01ms  min=68.63µs  med=375.87µs max=25.24ms  p(90)=972.83µs p(95)=3.83ms
     http_req_sending...............: avg=3.3ms   min=498.13µs med=2.13ms   max=41.54ms  p(90)=6.37ms   p(95)=10.69ms
     http_req_tls_handshaking.......: avg=0s      min=0s       med=0s       max=0s       p(90)=0s       p(95)=0s
     http_req_waiting...............: avg=27.93ms min=0s       med=23.22ms  max=138.43ms p(90)=49.14ms  p(95)=59.41ms
     http_reqs......................: 870     2.884405/s
     iteration_duration.............: avg=3.45s   min=2.35s    med=3.45s    max=3.87s    p(90)=3.6s     p(95)=3.65s
     iterations.....................: 870     2.884405/s
     vus............................: 4       min=4      max=10
     vus_max........................: 10      min=10     max=10

Server logs

Test case	Client avg duration	Server avg duration	Server CPU	Server Memory
sequence-test-1MB.js	39.81ms	34 ms	6.47%	67 MB
stream-test-1MB.js	32.25ms	26.6 ms	4.57%	70.4 MB

panoti · 2022-08-28T06:37:53Z

Node app host on server: Jetson Nano 2GB

System Info
----------------------------------------------------------------------
CPU model            : ARMv8 Processor rev 1 (v8l)
Number of cores      : 4
CPU frequency        :  MHz
Total size of Disk   : 526.0 GB (85.0 GB Used)
Total amount of Mem  : 1979 MB (492 MB Used)
Total amount of Swap : 5085 MB (0 MB Used)
System uptime        : 0 days, 0 hour 25 min
Load average         : 0,73, 0,46, 0,24
OS                   : Ubuntu 18.04.6 LTS
Arch                 : aarch64 (64 Bit)
Kernel               : 4.9.253-tegra
Virt                 : No Virt

Disk Speed
----------------------------------------------------------------------
dd Test
I/O (1st run)        : 6 MB/s
I/O (2nd run)        : 3 MB/s
I/O (3rd run)        : 7 MB/s
Average              : 5.3 MB/s
-----------------------------------
Fio Test
Read performance     : 7820kB/s
Read IOPS            : 1909
Write performance    : 2609kB/s
Write IOPS           : 636

sequence-test-1MB.js (08/28/2022 01:01+7)

To simulate 4 virtual users upload a 1MB file during 5 minutes (POST /sequence-hasing).

k6 run --vus 4 --duration 5m ./sequence-test-1MB.js

Client logs

running (5m01.3s), 0/4 VUs, 615 complete and 0 interrupted iterations
default ✓ [======================================] 4 VUs  5m0s

     ✓ is status 200

     checks.........................: 100.00% ✓ 615      ✗ 0
     data_received..................: 196 kB  649 B/s
     data_sent......................: 615 MB  2.0 MB/s
     http_req_blocked...............: avg=78.5µs   min=0s      med=0s       max=17.84ms  p(90)=0s       p(95)=0s
     http_req_connecting............: avg=48.68µs  min=0s      med=0s       max=17.84ms  p(90)=0s       p(95)=0s
     http_req_duration..............: avg=145.95ms min=94.43ms med=129.78ms max=353.93ms p(90)=210.27ms p(95)=232.82ms
       { expected_response:true }...: avg=145.95ms min=94.43ms med=129.78ms max=353.93ms p(90)=210.27ms p(95)=232.82ms
     http_req_failed................: 0.00%   ✓ 0        ✗ 615
     http_req_receiving.............: avg=193.92µs min=0s      med=0s       max=12.54ms  p(90)=0s       p(95)=526.33µs
     http_req_sending...............: avg=95.95ms  min=0s      med=95.19ms  max=324.19ms p(90)=168.39ms p(95)=190.15ms
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s       max=0s       p(90)=0s       p(95)=0s
     http_req_waiting...............: avg=49.8ms   min=0s      med=31.86ms  max=275.7ms  p(90)=113.65ms p(95)=127.1ms
     http_reqs......................: 615     2.041428/s
     iteration_duration.............: avg=1.95s    min=1.41s   med=1.91s    max=2.63s    p(90)=2.27s    p(95)=2.38s
     iterations.....................: 615     2.041428/s
     vus............................: 2       min=2      max=4
     vus_max........................: 4       min=4      max=4

Server logs

stream-test-1MB.js (08/28/2022 01:12+7)

To simulate 4 virtual users upload a 1MB file during 5 minutes (POST /stream-hasing).

k6 run --vus 4 --duration 5m ./stream-test-1MB.js

Client logs

running (5m01.2s), 0/4 VUs, 646 complete and 0 interrupted iterations
default ✓ [======================================] 4 VUs  5m0s

     ✓ is status 200

     checks.........................: 100.00% ✓ 646      ✗ 0
     data_received..................: 205 kB  682 B/s
     data_sent......................: 646 MB  2.1 MB/s
     http_req_blocked...............: avg=288.32µs min=0s      med=0s       max=63.32ms  p(90)=0s       p(95)=0s
     http_req_connecting............: avg=284.76µs min=0s      med=0s       max=63.32ms  p(90)=0s       p(95)=0s
     http_req_duration..............: avg=136.42ms min=81.02ms med=114.99ms max=485.26ms p(90)=200.88ms p(95)=247.74ms
       { expected_response:true }...: avg=136.42ms min=81.02ms med=114.99ms max=485.26ms p(90)=200.88ms p(95)=247.74ms
     http_req_failed................: 0.00%   ✓ 0        ✗ 646
     http_req_receiving.............: avg=287.39µs min=0s      med=0s       max=22.53ms  p(90)=495.45µs p(95)=1ms
     http_req_sending...............: avg=91.64ms  min=30.31ms med=78.57ms  max=379.24ms p(90)=143.27ms p(95)=172.22ms
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s       max=0s       p(90)=0s       p(95)=0s
     http_req_waiting...............: avg=44.49ms  min=11ms    med=38.27ms  max=292.25ms p(90)=77.55ms  p(95)=106.01ms
     http_reqs......................: 646     2.144836/s
     iteration_duration.............: avg=1.86s    min=1.34s   med=1.78s    max=3.06s    p(90)=2.2s     p(95)=2.35s
     iterations.....................: 646     2.144836/s
     vus............................: 1       min=1      max=4
     vus_max........................: 4       min=4      max=4

Server logs

sequence-test-50MB.js (08/28/2022 01:12+7)

To simulate 1 virtual users upload a 50MB file during 10 minutes (POST /sequence-hasing).

k6 run --vus 1 --duration 10m ./sequence-test-50MB.js

Client logs

running (10m30.4s), 0/1 VUs, 7 complete and 1 interrupted iterations
default ✓ [======================================] 1 VUs  10m0s

     ✓ is status 200

     checks.........................: 100.00% ✓ 7                    ✗ 0
     data_received..................: 2.2 kB  3.5308572697925267 B/s
     data_sent......................: 350 MB  555 kB/s
     http_req_blocked...............: avg=26.1ms   min=10.07ms med=15.8ms   max=89.59ms p(90)=50.42ms p(95)=70ms
     http_req_connecting............: avg=9.17ms   min=0s      med=10.12ms  max=13.3ms  p(90)=13.18ms p(95)=13.24ms
     http_req_duration..............: avg=5.27s    min=5.16s   med=5.27s    max=5.38s   p(90)=5.36s   p(95)=5.37s
       { expected_response:true }...: avg=5.27s    min=5.16s   med=5.27s    max=5.38s   p(90)=5.36s   p(95)=5.37s
     http_req_failed................: 0.00%   ✓ 0                    ✗ 7
     http_req_receiving.............: avg=10.19ms  min=0s      med=5.53ms   max=43.94ms p(90)=23.96ms p(95)=33.95ms
     http_req_sending...............: avg=4.3s     min=4.28s   med=4.3s     max=4.32s   p(90)=4.31s   p(95)=4.32s
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s       max=0s      p(90)=0s      p(95)=0s
     http_req_waiting...............: avg=966.01ms min=873.7ms med=960.35ms max=1.07s   p(90)=1.05s   p(95)=1.06s
     http_reqs......................: 7       0.011103/s
     iteration_duration.............: avg=1m19s    min=1m2s    med=1m23s    max=1m36s   p(90)=1m34s   p(95)=1m35s
     iterations.....................: 7       0.011103/s
     vus............................: 1       min=1                  max=1
     vus_max........................: 1       min=1                  max=1

Server logs

sequence-test-50MB.js (08/28/2022 01:12+7)

To simulate 1 virtual users upload a 50MB file during 10 minutes (POST /stream-hasing).

k6 run --vus 1 --duration 10m ./sequence-test-50MB.js

Client logs

running (10m12.0s), 0/1 VUs, 7 complete and 0 interrupted iterations
default ✓ [======================================] 1 VUs  10m0s

     ✓ is status 200

     checks.........................: 100.00% ✓ 7                    ✗ 0
     data_received..................: 2.2 kB  3.6374924022796624 B/s
     data_sent......................: 350 MB  572 kB/s
     http_req_blocked...............: avg=13.32ms min=10.99ms med=14.09ms max=15ms    p(90)=14.99ms p(95)=15ms
     http_req_connecting............: avg=6.72ms  min=6ms     med=6.48ms  max=8.01ms  p(90)=7.77ms  p(95)=7.89ms
     http_req_duration..............: avg=4.31s   min=4.26s   med=4.3s    max=4.4s    p(90)=4.35s   p(95)=4.37s
       { expected_response:true }...: avg=4.31s   min=4.26s   med=4.3s    max=4.4s    p(90)=4.35s   p(95)=4.37s
     http_req_failed................: 0.00%   ✓ 0                    ✗ 7
     http_req_receiving.............: avg=18.87ms min=1.63ms  med=7.13ms  max=98.29ms p(90)=45.28ms p(95)=71.79ms
     http_req_sending...............: avg=4.27s   min=4.18s   med=4.29s   max=4.3s    p(90)=4.3s    p(95)=4.3s
     http_req_tls_handshaking.......: avg=0s      min=0s      med=0s      max=0s      p(90)=0s      p(95)=0s
     http_req_waiting...............: avg=23.7ms  min=3.59ms  med=7.02ms  max=90.25ms p(90)=63.47ms p(95)=76.86ms
     http_reqs......................: 7       0.011439/s
     iteration_duration.............: avg=1m27s   min=1m16s   med=1m26s   max=1m37s   p(90)=1m35s   p(95)=1m36s
     iterations.....................: 7       0.011439/s
     vus............................: 1       min=1                  max=1
     vus_max........................: 1       min=1                  max=1

Server logs

Summary

Test case	Client avg duration	Server avg duration	Server CPU	Server Memory
sequence-test-1MB.js	145.95ms	133 ms	7.01%	74.2 MB
stream-test-1MB.js	136.42ms	122 ms	7.00%	78.6 MB
sequence-test-50MB.js	5.27s	5.24 s	4.53%	75.4 MB
stream-test-50MB.js	4.27s	4.28 s	3.50%	72.8 MB

alextran1502 · 2022-08-28T16:12:14Z

What does the client's average duration account for? upload time?

panoti · 2022-08-28T16:37:29Z

What does the client's average duration account for? upload time?

This is latency per upload request.

panoti added 5 commits August 23, 2022 22:31

feat(server): override multer storage

0613f1c

feat(server): calc sha1 of uploaded file

261f98e

feat(server): add checksum into asset

802c9c0

chore(server): add package-lock for mkdirp package

bc991fd

fix(server): free hash stream

376ff1f

alextran1502 reviewed Aug 23, 2022

View reviewed changes

server/libs/database/src/migrations/1661270339373-AddAssetChecksum.ts Outdated Show resolved Hide resolved

panoti added 2 commits August 23, 2022 23:25

chore(server): rollback this changes, not refactor here

6eaa666

refactor(server): re-arrange import statement

51b7a31

alextran1502 reviewed Aug 23, 2022

View reviewed changes

server/apps/immich/src/utils/disk-storage.ts Outdated Show resolved Hide resolved

panoti added 2 commits August 24, 2022 01:10

fix(server): make sure hash done before callback

1ee72d7

refactor(server): replace varchar to char for checksum, reserve pixel…

02bf73f

…Checksum for future

alextran1502 requested review from jbaez, bo0tzz, matthinc and zackpollard August 24, 2022 03:31

alextran1502 reviewed Aug 24, 2022

View reviewed changes

server/apps/immich/src/utils/disk-storage.ts Outdated Show resolved Hide resolved

alextran1502 reviewed Aug 24, 2022

View reviewed changes

server/apps/immich/src/utils/disk-storage.ts Outdated Show resolved Hide resolved

fyfrey reviewed Aug 24, 2022

View reviewed changes

server/apps/immich/src/utils/disk-storage.ts Outdated Show resolved Hide resolved

server/libs/database/src/entities/asset.entity.ts Outdated Show resolved Hide resolved

server/libs/database/src/entities/exif.entity.ts Outdated Show resolved Hide resolved

panoti marked this pull request as draft August 25, 2022 17:02

panoti added 5 commits August 28, 2022 14:47

refactor(server): remove pixelChecksum

56d506c

Merge branch 'upstream' into feat/sha1-hashing

117115a

refactor(server): convert checksum from string to bytea

8c067a1

feat(server): add index to checksum

5ed15f6

refactor(): rollback package.json changes

e26c6ce

panoti marked this pull request as ready for review August 28, 2022 09:21

feat(server): remove uploaded file when progress fail

5832a3a

panoti linked an issue Aug 29, 2022 that may be closed by this pull request

[BUG] Failed Backup - Duplicate keys #415

Closed

4 tasks

panoti mentioned this pull request Aug 29, 2022

[BUG] Uploading from album not always working #387

Closed

4 tasks

This was linked to issues Aug 29, 2022

[BUG] Upload sames images multiples times create duplicates #248

Closed

[Feature] Prevent duplicate photos while logged in from two devices #404

Closed

[BUG] Uploading from album not always working #387

Closed

panoti removed a link to an issue Aug 29, 2022

[BUG] Uploading from album not always working #387

Closed

4 tasks

panoti changed the title ~~feat(server): calculate sha1 checksum at the same time uploading asset~~ feat(server): calculate sha1 checksum Aug 30, 2022

feat(server): calculate hash in sequence

28ee8cd

panoti requested a review from alextran1502 August 30, 2022 18:20

alextran1502 merged commit b80dca7 into immich-app:main Aug 31, 2022

panoti deleted the feat/sha1-hashing branch August 31, 2022 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): calculate sha1 checksum #525

feat(server): calculate sha1 checksum #525

panoti commented Aug 23, 2022

fyfrey left a comment

alextran1502 commented Aug 25, 2022

panoti commented Aug 25, 2022

alextran1502 commented Aug 25, 2022

panoti commented Aug 27, 2022 •

edited

Loading

panoti commented Aug 28, 2022 •

edited

Loading

alextran1502 commented Aug 28, 2022

panoti commented Aug 28, 2022 •

edited

Loading

feat(server): calculate sha1 checksum #525

feat(server): calculate sha1 checksum #525

Conversation

panoti commented Aug 23, 2022

fyfrey left a comment

Choose a reason for hiding this comment

alextran1502 commented Aug 25, 2022

panoti commented Aug 25, 2022

alextran1502 commented Aug 25, 2022

panoti commented Aug 27, 2022 • edited Loading

Server

Client

Run benchmark on server

Sequence hashing 1MB (08/27/2022 05:53+7)

Client logs

Server logs

Stream hashing 1MB (08/27/2022 06:08+7)

Client logs

Server logs

panoti commented Aug 28, 2022 • edited Loading

Node app host on server: Jetson Nano 2GB

sequence-test-1MB.js (08/28/2022 01:01+7)

Client logs

Server logs

stream-test-1MB.js (08/28/2022 01:12+7)

Client logs

Server logs

sequence-test-50MB.js (08/28/2022 01:12+7)

Client logs

Server logs

sequence-test-50MB.js (08/28/2022 01:12+7)

Client logs

Server logs

Summary

alextran1502 commented Aug 28, 2022

panoti commented Aug 28, 2022 • edited Loading

panoti commented Aug 27, 2022 •

edited

Loading

panoti commented Aug 28, 2022 •

edited

Loading

panoti commented Aug 28, 2022 •

edited

Loading