Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #519: Fix drupal:import-db and drupal:export-db don't compress data. #581

Conversation

alexanderpatriciop
Copy link
Collaborator

Relates to:

Description:

  • Fix drupal:import-db and drupal:export-db don't compress data.

@github-actions github-actions bot temporarily deployed to pantheon-pr-581 May 27, 2024 17:13 Destroyed
@alexanderpatriciop
Copy link
Collaborator Author

On the env that I tested the task drupal:import-db after importing deletes the .gz file if it is compressed, so I made a copy to preserve the file. (also I added the --file-delete parameter because in other environments it may not clear itself)

@justafish
Copy link
Member

Thanks for the PR @alexanderpatriciop !

Is this the bug you're seeing? drush-ops/drush#5377
If so we could use the workaround suggested in that issue, as well as re-open it with your steps to reproduce the unwanted behaviour

@github-actions github-actions bot temporarily deployed to pantheon-pr-581 May 28, 2024 21:39 Destroyed
@alexanderpatriciop
Copy link
Collaborator Author

@justafish Thank you for the feedback, you are right, it's the same bug, I added the workaround suggested on the last commit, however, I realized it has an impact on the performance (at least on my local environment) because with the first approach (copy the .gz file), for importing the database it takes around 40 seconds, but with the latest approach it takes around 4 minutes.
We may need to revert the last commit. Please, let me know your thoughts.

@deviantintegral
Copy link
Member

That's very odd to see such a difference using a pipe like that. Do you have testing or reproduction steps others can use to check?

@alexanderpatriciop
Copy link
Collaborator Author

  • Drupal => 10.2.6
  • lullabot/drainpipe => ^3.8.0
  • drush/drush => ^12.5.2

Apply the patch https://github.com/Lullabot/drainpipe/commit/3b3bb598b96bdf73deb00d46c0900903c4337926.diff

  • Run the commands
    • ddev task drupal:export-db
    • ddev task drupal:import-db

Remove the before patch and apply https://patch-diff.githubusercontent.com/raw/Lullabot/drainpipe/pull/581.diff

  • Run the commands
    • ddev task drupal:export-db
    • ddev task drupal:import-db

And compare the execution time.

tasks/drupal.yml Outdated
@@ -44,15 +44,15 @@ tasks:
- echo "🚮 Dropping existing database"
- ./vendor/bin/drush {{ .site }} sql:drop --yes
- echo "📰 Importing database"
- ./vendor/bin/drush {{ .site }} sql:query --file=$DB_DIR/db.sql.gz
- gunzip < $DB_DIR/db.sql.gz | ./vendor/bin/drush {{ .site }} sql:cli
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few years ago @m4olivei suggested using pigz instead of gunzip because it had significant performance benefits.

https://www.clouvider.com/knowledge_base/compression-options-in-linux-gzip-vs-pigz/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, thanks for linking that article. I never did learn the reason why pigz was faster, just that it was 💡

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think pigz is in the default ddev image, so we'd need to add a Dockerfile to install it. Which IMO is fine, we do it on most of our projects anyways.

I do wonder if it's worth asking about adding it to the default web images in ddev upstream.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrdavidburns
I tried pigz with the following command:

  • pigz -dc file.gz

But the time execution was more than 2 minutes, so I unzipped the file before importing with sql:query and it worked fine, it takes the same time execution with pigz and gunzip so I left gunzip because the other needed to add a Dockerfile to install it as @deviantintegral mentioned

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexanderpatriciop Were you trying pigz with an existing project or just a fresh site install?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I tried it with an existing project

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trawling my workstation for larger databases to test, and surprisingly I don't have any! The largest at hand was 130MB compressed.

$ hyperfine 'gunzip -dkc database.sql.gz > /dev/null' 'pigz -dkc database.sql.gz > /dev/null'
Benchmark 1: gunzip -dkc database.sql.gz > /dev/null
  Time (mean ± σ):     334.3 ms ±   1.2 ms    [User: 323.5 ms, System: 9.4 ms]
  Range (min … max):   332.9 ms … 336.2 ms    10 runs

Benchmark 2: pigz -dkc database.sql.gz > /dev/null
  Time (mean ± σ):     409.8 ms ±   2.2 ms    [User: 403.8 ms, System: 103.2 ms]
  Range (min … max):   407.0 ms … 412.9 ms    10 runs

Summary
  gunzip -dkc database.sql.gz > /dev/null ran
    1.23 ± 0.01 times faster than pigz -dkc database.sql.gz > /dev/null

I then manually blew it up to ~400MB, expecting to see better improvements:

$ hyperfine 'gunzip -dkc database2.sql.gz > /dev/null' 'pigz -dkc database2.sql.gz > /dev/null'
Benchmark 1: gunzip -dkc database2.sql.gz > /dev/null
  Time (mean ± σ):      1.674 s ±  0.004 s    [User: 1.621 s, System: 0.043 s]
  Range (min … max):    1.664 s …  1.678 s    10 runs

Benchmark 2: pigz -dkc database2.sql.gz > /dev/null
  Time (mean ± σ):      2.041 s ±  0.008 s    [User: 1.996 s, System: 0.552 s]
  Range (min … max):    2.026 s …  2.053 s    10 runs

Summary
  gunzip -dkc database2.sql.gz > /dev/null ran
    1.22 ± 0.01 times faster than pigz -dkc database2.sql.gz > /dev/null

No decompression difference either. When I was compressing the test file though, pigz was by far faster:

$ hyperfine 'gzip -kc database2.sql > /dev/null' 'pigz -kc database2.sql > /dev/null'
Benchmark 1: gzip -kc database2.sql > /dev/null
  Time (mean ± σ):     31.556 s ±  0.257 s    [User: 31.100 s, System: 0.176 s]
  Range (min … max):   31.266 s … 32.152 s    10 runs

Benchmark 2: pigz -kc database2.sql > /dev/null
  Time (mean ± σ):      4.462 s ±  0.183 s    [User: 33.747 s, System: 0.765 s]
  Range (min … max):    4.184 s …  4.774 s    10 runs

Summary
  pigz -kc database2.sql > /dev/null ran
    7.07 ± 0.30 times faster than gzip -kc database2.sql > /dev/null

What this tells me is that either something has changed in the move to Apple Silicon, or gzip itself has had performance improvements in the past years, or perhaps pigz has had performance losses, at least as far as decompression goes.

So, +1 to keeping gzip here, and as the default for what Drainpipe ships until we have a clear 80% use case that can benefit from pigz.

@github-actions github-actions bot temporarily deployed to pantheon-pr-581 July 17, 2024 22:06 Destroyed
@mrdavidburns mrdavidburns merged commit d52e82c into main Jul 22, 2024
36 checks passed
@mrdavidburns mrdavidburns deleted the 519--drupal-import-db-and-drupal-export-db-dont-compress-data branch July 22, 2024 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

drupal:import-db and drupal:export-db don't compress data, though they use .sql.gz filenames
5 participants