-
-
Notifications
You must be signed in to change notification settings - Fork 541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve perfomance during pre-commit --all
(-a
) run
#309
Comments
In #290 ( |
terrascan
] Improve perfomance during pre-commit --all
(-a
) runpre-commit --all
(-a
) run
For - id: terrascan
args:
- --args=--non-recursive # avoids scan errors on subdirectories without Terraform config files
- --args=--policy-type=azure I noted that pre-commit-terraform/.pre-commit-hooks.yaml Line 110 in ac9299c
by adding For testing purposes (meantime we found a solution to detect when For example: TEST_NUM=5
TEST_DIR='/home/carlos/devel/azure/terraform/test/terraform_big_repo'
TEST_DESCRIPTION='`terrascan` for big TF repo with pre-commit-terraform v1.62.3'
RAW_TEST_RESULTS_FILE_NAME='terrascan_big_terrfaform_repo'
# if you don't have a .tf file in the root directory, run the following:
touch ${TEST_DIR}/main-fake.tf
# set the TEST_COMMAND by defining a valid .tf file in the root directory
TEST_COMMAND="pre-commit try-repo --ref v1.62.3 https://github.com/antonbabenko/pre-commit-terraform terrascan --files ${TEST_DIR}/main-fake.tf"
# run the test
./hooks_performance_test.sh "$TEST_NUM" "$TEST_COMMAND" "$TEST_DIR" "$TEST_DESCRIPTION" "$RAW_TEST_RESULTS_FILE_NAME" I made a test in a fake TF project (I copied and pasted the infra code a few times as you suggested) with 189 directories, 610 files: 5 runs '
Run details
Memory info ( MemTotal: 26034676 kB
MemFree: 20826292 kB
MemAvailable: 23280460 kB
Buffers: 232212 kB
Cached: 2866148 kB
SwapCached: 0 kB CPU info: Real procs: 4
Virtual (hyper-threading) procs: 8
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 140
model name : 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
stepping : 1
microcode : 0xffffffff
cpu MHz : 2803.198
cache size : 12288 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 21
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512vbmi umip avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid fsrm avx512_vp2intersect flush_l1d arch_capabilities
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs
bogomips : 5606.39
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management: |
The problem is not that hooks run recursively (at least, not in this issue), problem is that the same check runs multiply times. Not sure why, but 1 recursive check works much faster than for_each dir in the repo, no matter, with 5 runs 'with
Run details
Memory info ( MemTotal: 12765352 kB
MemFree: 7107868 kB
MemAvailable: 9418472 kB
Buffers: 281768 kB
Cached: 2055276 kB
SwapCached: 0 kB CPU info: Real procs: 6
Virtual (hyper-threading) procs: 12
processor : 11
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
stepping : 2
microcode : 0xffffffff
cpu MHz : 2712.009
cache size : 12288 KB
physical id : 0
siblings : 12
core id : 5
cpu cores : 6
apicid : 11
initial apicid : 11
fpu : yes
fpu_exception : yes
cpuid level : 21
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit
bogomips : 5424.01
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management: 5 runs 'as is'
Run details
Memory info ( MemTotal: 12765352 kB
MemFree: 8346212 kB
MemAvailable: 10672940 kB
Buffers: 295600 kB
Cached: 2057220 kB
SwapCached: 0 kB CPU info: Real procs: 6
Virtual (hyper-threading) procs: 12
processor : 11
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
stepping : 2
microcode : 0xffffffff
cpu MHz : 2712.009
cache size : 12288 KB
physical id : 0
siblings : 12
core id : 5
cpu cores : 6
apicid : 11
initial apicid : 11
fpu : yes
fpu_exception : yes
cpuid level : 21
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit
bogomips : 5424.01
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management: 5 runs 'one recurcive check'
Run details
Memory info ( MemTotal: 12765352 kB
MemFree: 8090392 kB
MemAvailable: 10445696 kB
Buffers: 303004 kB
Cached: 2073544 kB
SwapCached: 0 kB CPU info: Real procs: 6
Virtual (hyper-threading) procs: 12
processor : 11
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
stepping : 2
microcode : 0xffffffff
cpu MHz : 2712.009
cache size : 12288 KB
physical id : 0
siblings : 12
core id : 5
cpu cores : 6
apicid : 11
initial apicid : 11
fpu : yes
fpu_exception : yes
cpuid level : 21
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit
bogomips : 5424.01
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management: The main goal of this issue is to find how to check what args specified to
|
I noted that inspecting by directories with I think we should avoid the terrascan scan -i terraform -t azure --non-recursive
2021-12-27T18:56:45.841+0100 error cli/run.go:132 scan run failed{error 26 0 1 error occurred:
* directory '/home/carlos/devel/azure/terraform/test/terraform_big_repo' has no terraform config files
Regarding this, I opened an issue in the pre-comit project to know how we can check for
See pre-commit/pre-commit#2172 (comment) for details. I have implemented a basic solution around that (I need to make some tests to see if all possible cases are covered and handled properly): @MaxymVlasov meantime could you test that branch against your repo to see if the performance is improved? I will back later, with some performance tests and more comments. |
Yeah, I double-checked @carlosbustillordguez super! You found the solution (227f620), let's make a reusable realization. Test results10 runs '
Run details
Memory info ( MemTotal: 12765352 kB
MemFree: 6614128 kB
MemAvailable: 9193352 kB
Buffers: 280404 kB
Cached: 2315700 kB
SwapCached: 0 kB CPU info: Real procs: 6
Virtual (hyper-threading) procs: 12
processor : 11
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
stepping : 2
microcode : 0xffffffff
cpu MHz : 2712.010
cache size : 12288 KB
physical id : 0
siblings : 12
core id : 5
cpu cores : 6
apicid : 11
initial apicid : 11
fpu : yes
fpu_exception : yes
cpuid level : 21
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit
bogomips : 5424.02
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management: 10 runs '
Run details
Memory info ( MemTotal: 12765352 kB
MemFree: 6633428 kB
MemAvailable: 9213592 kB
Buffers: 281244 kB
Cached: 2315700 kB
SwapCached: 0 kB CPU info: Real procs: 6
Virtual (hyper-threading) procs: 12
processor : 11
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
stepping : 2
microcode : 0xffffffff
cpu MHz : 2712.010
cache size : 12288 KB
physical id : 0
siblings : 12
core id : 5
cpu cores : 6
apicid : 11
initial apicid : 11
fpu : yes
fpu_exception : yes
cpuid level : 21
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit
bogomips : 5424.02
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management: 10 runs 'v1.62.3
Run details
Memory info ( MemTotal: 12765352 kB
MemFree: 6603292 kB
MemAvailable: 9188652 kB
Buffers: 283984 kB
Cached: 2317136 kB
SwapCached: 0 kB CPU info: Real procs: 6
Virtual (hyper-threading) procs: 12
processor : 11
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
stepping : 2
microcode : 0xffffffff
cpu MHz : 2712.010
cache size : 12288 KB
physical id : 0
siblings : 12
core id : 5
cpu cores : 6
apicid : 11
initial apicid : 11
fpu : yes
fpu_exception : yes
cpuid level : 21
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit
bogomips : 5424.02
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management: 10 runs 'v1.62.3
Run details
Memory info ( MemTotal: 12765352 kB
MemFree: 6621388 kB
MemAvailable: 9212708 kB
Buffers: 289692 kB
Cached: 2317140 kB
SwapCached: 0 kB CPU info: Real procs: 6
Virtual (hyper-threading) procs: 12
processor : 11
vendor_id : GenuineIntel
cpu family : 6
model : 165
model name : Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
stepping : 2
microcode : 0xffffffff
cpu MHz : 2712.010
cache size : 12288 KB
physical id : 0
siblings : 12
core id : 5
cpu cores : 6
apicid : 11
initial apicid : 11
fpu : yes
fpu_exception : yes
cpuid level : 21
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit
bogomips : 5424.02
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management: After the holidays, #310 will be merged, so I suggest relying on the introduced #310 structure and future "The next big thing" look, described in the description. So, what I propose: # check is (optional) function defined
if [ $(type -t run_hook_on_whole_repo) == function ] \
# check is hook run via `pre-commit run --all`
&& [[ $(common::is_hook_run_on_whole_repo "${files[@]}") ]]; then
run_hook_on_whole_repo "$args"
exit 0
fi Where:
|
HOOK_ID=terraform_docs
files=$(awk -v HOOK_ID=$HOOK_ID '{if (match($0, "^- id: " HOOK_ID "$")) found_start=1; if (found_start == 1 && NF == 0) exit; if (found_start == 1 && $1 == "files:") {print $2; exit}}' .pre-commit-hooks.yaml) or to retain readability though to replace HOOK_ID=terraform_docs
files=$(sed -n "/^- id: $HOOK_ID$/,/^$/p" .pre-commit-hooks.yaml | awk '$1 == "files:" {print $2}') |
@MaxymVlasov great! I will be back tomorrow with the changes. Should I wait for the merge of #310 to send my contribution?
Yes, that applies to SCRIPT_DIR="$(dirname "$(realpath "${BASH_SOURCE[0]}")")" Like is used here: pre-commit-terraform/terrascan.sh Lines 10 to 18 in d19a0e3
Maybe for "The next big thing" we will need to use the above approach to source the
@yermulnik thanks for the above code to catch the "files"!! |
@carlosbustillordguez For what it's worth:
|
No need to create unreadable |
@MaxymVlasov I have implemented your suggested changes for To implementation can read the included and excluded files from As commented above I had to use the Regarding global variables, should we pass them as arguments to the needed function or consume them directly inside the function? I mean for |
Hey @carlosbustillordguez Pro tip: we are looking forward to using BASH Google Style Guide, the documentation part is already done. |
Hi @MaxymVlasov! Sorry for the delay. Great! I can start working this afternoon/night. |
This issue has been resolved in version 1.63.0 🎉 |
This issue has been resolved in version 1.64.0 🎉 |
Some hooks recursively check all files in provided dir.
So performance degradation exists only in the
pre-commit run --all
situation, because it will provide all existing repo files to hook:pre-commit-terraform/terrascan.sh
Lines 15 to 19 in e6ffbcd
Then, unique paths are found and run
terrascan
for each repo folder:pre-commit-terraform/terrascan.sh
Lines 29 to 30 in e6ffbcd
It works literally how it should work: checks only diffs.
So, good to know when the
--all
(-a
) argument passed topre-commit
and just runterrascan -d GIT_REPO_ROOT
, not all-all dirs.Useful info
How to run hook performance test
pre-commit
automatically parallel checks to exiting cores, so you need to run tests on a repo that has at least 2x more tf-dirs than CPU cores you have. If you have not so big repo - just copy-paste code-structure a few times, and you'll get needed.Create solution as function, that can be called from
terrascan_()
hook function and depends on the result, inif
run other flow and stop execution withexit 0
checkov
run checks recursively. need-d GIT_REPO_POOT_PATH
feat: Addterraform_checkov
, that run per folder. Deprecatecheckov
hook #290tfsec
runs recursively without any additional args. feat: Improved speed ofpre-commit run -a
for multiple hooks #338terragrunt_fmt
(terragrunt hclfmt
) runs recursively without any additional args. feat: Improved speed ofpre-commit run -a
for multiple hooks #338terrascan
runs recursively without any additional args. feat:terrascan
- Improve performance duringpre-commit --all (-a)
run #327terragrunt_validate
have optionterragrunt run-all validate
that run checks recursively. feat: Improved speed ofpre-commit run -a
for multiple hooks #338Founded in #305
The text was updated successfully, but these errors were encountered: