Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make system testing github actions run faster #74

Open
ManavalanG opened this issue May 31, 2023 · 3 comments
Open

Make system testing github actions run faster #74

ManavalanG opened this issue May 31, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@ManavalanG
Copy link
Member

Github actions for system testing ran for ~0.5hr, where we ran only one (WGS mode AND no prior QC data) of the 4 system tests. Here are the potential ways this can be run faster:

We discussed in the development drop-in hours about making it faster by caching the dependencies but it was decided it is not worthwhile atm to spend time on optimizing the workflow. Followed were discussed during this meeting:

  • Preliminary review of workflow to identify bad practices
  • Reduce runtime where possible
  • Use caching? (both for datasets and tool dependencies)
  • Any side-effects of such long workflows - both for the repo as well as to our github org
@ManavalanG ManavalanG added bug Something isn't working enhancement New feature or request and removed bug Something isn't working labels May 31, 2023
@ManavalanG
Copy link
Member Author

I have then added remaining 3 of the 4 system tests, and system testing workflow took about 1.5 hrs. Note that 4 system test jobs are run in serial fashion within a server/runner. In this scenario, first system test sets up necessary singularity contianers and therefore the remaining 3 system tests would not require setting them up again, as they are already present in the server.

@ManavalanG
Copy link
Member Author

I had a discussion with James about this setup of 4 system testing jobs taking so long (1.5hrs). His recommendation was to run these jobs parallely instead of running them serially.

jobs:
  job1:
  job2:
  job3:

My concern is that such setup will end up using more resources in terms of setting up dependencies in 4 different servers and then running the system tests. This is not a great argument point though.

@ManavalanG
Copy link
Member Author

Note that the major bottleneck atm is not setting up the depenencies but instead the quac itself. It takes ~16-20mins even if singularity containers were already set up.

image

Among the jobs triggered by quac workflow, picard takes way longer to finish, which is the major cause of bottleneck. Here is the time-taken stats for quac workflow (exome mode, prior QC mode) run in slurm (cheaha).

<style> </style>
job_id job_name walltime
20313718 QuaC.picard_collect_wgs_metrics.12 0:06:07
20313735 QuaC.picard_collect_wgs_metrics.10 0:05:42
20313761 QuaC.somalier_ancestry.4 0:01:21
20313728 QuaC.qualimap_bamqc.7 0:00:41
20313733 QuaC.qualimap_bamqc.8 0:00:41
20313723 QuaC.picard_collect_multiple_metrics.9 0:00:23
20313724 QuaC.picard_collect_multiple_metrics.11 0:00:23
20313794 QuaC.multiqc_by_sample_initial_pass.24 0:00:13
20313797 QuaC.multiqc_by_sample_initial_pass.28 0:00:12
20313731 QuaC.somalier_extract.2 0:00:11
20313722 QuaC.somalier_extract.3 0:00:10
20313799 QuaC.quac_watch.27 0:00:10
20313803 QuaC.multiqc_by_sample_final_pass.26 0:00:10
20313804 QuaC.multiqc_aggregation_all_samples.29 0:00:10
20313732 QuaC.samtools_stats.6 0:00:09
20313734 QuaC.bcftools_stats.19 0:00:09
20313798 QuaC.quac_watch.23 0:00:09
20313800 QuaC.multiqc_by_sample_final_pass.22 0:00:07
20313729 QuaC.bcftools_index.21 0:00:06
20313760 QuaC.somalier_relate.1 0:00:06
20313721 QuaC.verifybamid.16 0:00:05
20313725 QuaC.samtools_stats.5 0:00:05
20313726 QuaC.bcftools_index.20 0:00:05
20313727 QuaC.mosdepth_coverage.13 0:00:05
20313730 QuaC.verifybamid.17 0:00:05
20313736 QuaC.mosdepth_coverage.14 0:00:05
20313719 QuaC.aggregate_sample_rename_configs.30 0:00:04
20313720 QuaC.bcftools_stats.18 0:00:04
20313737 QuaC.mosdepth_plot.15 0:00:02

In the github actions workflow for comparable system testing, picard_collect_wgs_metrics job took about 16mins to complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant