Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve perfomance during pre-commit --all (-a) run #309

Closed
5 tasks done
MaxymVlasov opened this issue Dec 22, 2021 · 15 comments · Fixed by #327 or #338
Closed
5 tasks done

Improve perfomance during pre-commit --all (-a) run #309

MaxymVlasov opened this issue Dec 22, 2021 · 15 comments · Fixed by #327 or #338
Labels
estimate/4h Need 4 hours to be done feature New feature or request good first issue Good for newcomers hook/terraform_checkov Bash hook hook/terraform_tfsec Bash hook hook/terragrunt_fmt Bash hook hook/terragrunt_validate Bash hook hook/terrascan Bash hook

Comments

@MaxymVlasov
Copy link
Collaborator

MaxymVlasov commented Dec 22, 2021

Some hooks recursively check all files in provided dir.

So performance degradation exists only in the pre-commit run --all situation, because it will provide all existing repo files to hook:

# consume modified files passed from pre-commit so that
# terrascan runs against only those relevant directories
for file_with_path in "${files[@]}"; do
file_with_path="${file_with_path// /__REPLACED__SPACE__}"
paths[index]=$(dirname "$file_with_path")

Then, unique paths are found and run terrascan for each repo folder:

# for each path run terrascan
for path_uniq in $(echo "${paths[*]}" | tr ' ' '\n' | sort -u); do

It works literally how it should work: checks only diffs.

So, good to know when the --all (-a) argument passed to pre-commit and just run terrascan -d GIT_REPO_ROOT, not all-all dirs.


Useful info

  • How to run hook performance test

  • pre-commit automatically parallel checks to exiting cores, so you need to run tests on a repo that has at least 2x more tf-dirs than CPU cores you have. If you have not so big repo - just copy-paste code-structure a few times, and you'll get needed.

  • Create solution as function, that can be called from terrascan_() hook function and depends on the result, in if run other flow and stop execution with exit 0



Founded in #305

@MaxymVlasov MaxymVlasov added feature New feature or request estimate/4h Need 4 hours to be done hook/terrascan Bash hook labels Dec 22, 2021
@MaxymVlasov MaxymVlasov added the good first issue Good for newcomers label Dec 22, 2021
@MaxymVlasov
Copy link
Collaborator Author

In #290 (checkov) we see the same issue.

@MaxymVlasov MaxymVlasov changed the title [terrascan] Improve perfomance during pre-commit --all (-a) run Improve perfomance during pre-commit --all (-a) run Dec 23, 2021
@carlosbustillordguez
Copy link
Contributor

carlosbustillordguez commented Dec 23, 2021

For terrascan run the recursive scan can be disabled by passing the --non-recursive, which does not scan directories and modules recursively. Can be defined as an argument in the hook settings:

- id: terrascan
  args:
    - --args=--non-recursive # avoids scan errors on subdirectories without Terraform config files
    - --args=--policy-type=azure

I noted that pre-commit try-repo ignores the .pre-commit-config.yaml config file, so the recursive scan (if what we want) could be disabled in the hook settings:

entry: terrascan.sh

by adding --args=--non-recursive argument at the end of the line.

For testing purposes (meantime we found a solution to detect when --all | -a was passed in pre-commit), I think we can use the --files argument (pre-commit try-repo supports all available options for pre-commit run).

For example:

TEST_NUM=5
TEST_DIR='/home/carlos/devel/azure/terraform/test/terraform_big_repo'
TEST_DESCRIPTION='`terrascan` for big TF repo with pre-commit-terraform v1.62.3'
RAW_TEST_RESULTS_FILE_NAME='terrascan_big_terrfaform_repo'

# if you don't have a .tf file in the root directory, run the following:
touch ${TEST_DIR}/main-fake.tf

# set the TEST_COMMAND by defining a valid .tf file in the root directory
TEST_COMMAND="pre-commit try-repo --ref v1.62.3 https://github.com/antonbabenko/pre-commit-terraform terrascan --files ${TEST_DIR}/main-fake.tf"

# run the test
./hooks_performance_test.sh "$TEST_NUM" "$TEST_COMMAND" "$TEST_DIR" "$TEST_DESCRIPTION" "$RAW_TEST_RESULTS_FILE_NAME"

I made a test in a fake TF project (I copied and pasted the infra code a few times as you suggested) with 189 directories, 610 files:

5 runs 'terrascan for big TF repo with pre-commit-terraform v1.62.3'

time command max min mean median
users seconds 6.67 6.08 6.476 6.56
system seconds 0.4 0.3 0.358 0.37
CPU % 165 159 162.6 163
Total time 4.33 3.89 4.192 4.29
Run details
  • Test Start: Thu Dec 23 09:12:35 UTC 2021
  • Test End: Thu Dec 23 09:12:56 UTC 2021
Variable name Value
TEST_NUM 5
TEST_COMMAND pre-commit try-repo --ref v1.62.3 https://github.com/antonbabenko/pre-commit-terraform terrascan --files /home/carlos/devel/azure/terraform/test/terraform_big_repo/main-fake.tf
TEST_DIR /home/carlos/devel/azure/terraform/test/terraform_big_repo
TEST_DESCRIPTION 5 runs 'terrascan for big TF repo with pre-commit-terraform v1.62.3'
RAW_TEST_RESULTS_FILE_NAME terrascan_big_terrfaform_repo

Memory info (head -n 6 /proc/meminfo):

MemTotal:       26034676 kB
MemFree:        20826292 kB
MemAvailable:   23280460 kB
Buffers:          232212 kB
Cached:          2866148 kB
SwapCached:            0 kB

CPU info:

Real procs: 4
Virtual (hyper-threading) procs: 8

processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 140
model name	: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
stepping	: 1
microcode	: 0xffffffff
cpu MHz		: 2803.198
cache size	: 12288 KB
physical id	: 0
siblings	: 8
core id		: 3
cpu cores	: 4
apicid		: 7
initial apicid	: 7
fpu		: yes
fpu_exception	: yes
cpuid level	: 21
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512vbmi umip avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid fsrm avx512_vp2intersect flush_l1d arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs
bogomips	: 5606.39
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

@MaxymVlasov
Copy link
Collaborator Author

The problem is not that hooks run recursively (at least, not in this issue), problem is that the same check runs multiply times.

Not sure why, but 1 recursive check works much faster than for_each dir in the repo, no matter, with --non-recursive or without.

5 runs 'with --non-recursive'

time command max min mean median
users seconds 130.08 116.22 121.908 121.41
system seconds 8.51 7.01 7.532 7.41
CPU % 165 160 162 161
Total time 83.68 77.01 79.642 79.56
Run details
  • Test Start: Thu Dec 23 14:35:52 UTC 2021
  • Test End: Thu Dec 23 14:42:30 UTC 2021
Variable name Value
TEST_NUM 5
TEST_COMMAND pre-commit try-repo -a /mnt/c/Users/vm/code/open-source/pre-commit-terraform terrascan
TEST_DIR /home/vm/code/Oslo
TEST_DESCRIPTION 5 runs 'with --non-recursive'
RAW_TEST_RESULTS_FILE_NAME pr290

Memory info (head -n 6 /proc/meminfo):

MemTotal:       12765352 kB
MemFree:         7107868 kB
MemAvailable:    9418472 kB
Buffers:          281768 kB
Cached:          2055276 kB
SwapCached:            0 kB

CPU info:

Real procs: 6
Virtual (hyper-threading) procs: 12

processor	: 11
vendor_id	: GenuineIntel
cpu family	: 6
model		: 165
model name	: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
stepping	: 2
microcode	: 0xffffffff
cpu MHz		: 2712.009
cache size	: 12288 KB
physical id	: 0
siblings	: 12
core id		: 5
cpu cores	: 6
apicid		: 11
initial apicid	: 11
fpu		: yes
fpu_exception	: yes
cpuid level	: 21
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit
bogomips	: 5424.01
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

5 runs 'as is'

time command max min mean median
users seconds 130.08 105.84 115.695 114.535
system seconds 8.51 4.89 6.566 6.605
CPU % 168 160 163.5 163.5
Total time 83.68 67.76 74.619 74.68
Run details
  • Test Start: Thu Dec 23 16:58:08 UTC 2021
  • Test End: Thu Dec 23 17:03:56 UTC 2021
Variable name Value
TEST_NUM 5
TEST_COMMAND pre-commit try-repo -a /mnt/c/Users/vm/code/open-source/pre-commit-terraform terrascan
TEST_DIR /home/vm/code/Oslo
TEST_DESCRIPTION 5 runs 'as is'
RAW_TEST_RESULTS_FILE_NAME pr290

Memory info (head -n 6 /proc/meminfo):

MemTotal:       12765352 kB
MemFree:         8346212 kB
MemAvailable:   10672940 kB
Buffers:          295600 kB
Cached:          2057220 kB
SwapCached:            0 kB

CPU info:

Real procs: 6
Virtual (hyper-threading) procs: 12

processor	: 11
vendor_id	: GenuineIntel
cpu family	: 6
model		: 165
model name	: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
stepping	: 2
microcode	: 0xffffffff
cpu MHz		: 2712.009
cache size	: 12288 KB
physical id	: 0
siblings	: 12
core id		: 5
cpu cores	: 6
apicid		: 11
initial apicid	: 11
fpu		: yes
fpu_exception	: yes
cpuid level	: 21
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit
bogomips	: 5424.01
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

5 runs 'one recurcive check'

time command max min mean median
users seconds 10.25 9.24 9.682 9.44
system seconds 1.49 1.07 1.316 1.38
CPU % 94 85 91 91
Total time 12.44 11.64 12.016 11.93
Run details
  • Test Start: Thu Dec 23 17:15:22 UTC 2021
  • Test End: Thu Dec 23 17:16:22 UTC 2021
Variable name Value
TEST_NUM 5
TEST_COMMAND pre-commit try-repo -a /mnt/c/Users/vm/code/open-source/pre-commit-terraform terrascan
TEST_DIR /home/vm/code/Oslo
TEST_DESCRIPTION 5 runs 'one recurcive check'
RAW_TEST_RESULTS_FILE_NAME pr290

Memory info (head -n 6 /proc/meminfo):

MemTotal:       12765352 kB
MemFree:         8090392 kB
MemAvailable:   10445696 kB
Buffers:          303004 kB
Cached:          2073544 kB
SwapCached:            0 kB

CPU info:

Real procs: 6
Virtual (hyper-threading) procs: 12

processor	: 11
vendor_id	: GenuineIntel
cpu family	: 6
model		: 165
model name	: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
stepping	: 2
microcode	: 0xffffffff
cpu MHz		: 2712.009
cache size	: 12288 KB
physical id	: 0
siblings	: 12
core id		: 5
cpu cores	: 6
apicid		: 11
initial apicid	: 11
fpu		: yes
fpu_exception	: yes
cpuid level	: 21
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit
bogomips	: 5424.01
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

The main goal of this issue is to find how to check what args specified to pre-commit, not to hook.

This means that need to check pre-commit.com docs, maybe code, and if needed - open issue to repo.
And then write a check that will check is --all or -a passed and return true of false

@carlosbustillordguez
Copy link
Contributor

Not sure why, but 1 recursive check works much faster than for_each dir in the repo, no matter, with --non-recursive or without.

I noted that inspecting by directories with terrascan scan is much faster than scanning each file (terrascan scan by default inspect directory but also supports inspections by files). You are right with the above pointing.

I think we should avoid the --non-recursive argument, otherwise we can get the following error when no .tf files are present in the root directory:

terrascan scan -i terraform -t azure --non-recursive
2021-12-27T18:56:45.841+0100    error   cli/run.go:132  scan run failed{error 26 0  1 error occurred:
        * directory '/home/carlos/devel/azure/terraform/test/terraform_big_repo' has no terraform config files

The main goal of this issue is to find how to check what args specified to pre-commit, not to hook.
This means that need to check pre-commit.com docs, maybe code, and if needed - open issue to repo.
And then write a check that will check is --all or -a passed and return true of false

Regarding this, I opened an issue in the pre-comit project to know how we can check for --all | -a inside the hook, but:

there isn't a way to do that, somewhat intentionally -- the hook shouldn't care about how it is called, just that it receives a list of filenames

See pre-commit/pre-commit#2172 (comment) for details.

I have implemented a basic solution around that (I need to make some tests to see if all possible cases are covered and handled properly):

https://github.com/carlosbustillordguez/pre-commit-terraform/blob/227f6203e849b97a24daae54d99bee100eae7a36/terrascan.sh#L21-L29

@MaxymVlasov meantime could you test that branch against your repo to see if the performance is improved?

I will back later, with some performance tests and more comments.

@MaxymVlasov
Copy link
Collaborator Author

MaxymVlasov commented Dec 28, 2021

Yeah, I double-checked pre-commit code and am pretty sure that to add needed by us logic is pretty challenging and will introduce breaking changes or add not very intuitive crutches.

@carlosbustillordguez super! You found the solution (227f620), let's make a reusable realization.

Test results

10 runs 'fix/improve-perfomance-during-pre-commit-run-all pre-commit run --all:'

time command max min mean median
users seconds 7.84 7.16 7.583 7.62
system seconds 0.38 0.2 0.292 0.3
CPU % 136 117 126 124.5
Total time 6.83 5.71 6.249 6.27
Run details
  • Test Start: Mon Dec 27 23:03:20 UTC 2021
  • Test End: Mon Dec 27 23:04:22 UTC 2021
Variable name Value
TEST_NUM 10
TEST_COMMAND pre-commit try-repo -a /tmp/issue309 terrascan
TEST_DIR /home/vm/code/Oslo
TEST_DESCRIPTION 10 runs 'fix/improve-perfomance-during-pre-commit-run-all pre-commit run --all:'
RAW_TEST_RESULTS_FILE_NAME issue309

Memory info (head -n 6 /proc/meminfo):

MemTotal:       12765352 kB
MemFree:         6614128 kB
MemAvailable:    9193352 kB
Buffers:          280404 kB
Cached:          2315700 kB
SwapCached:            0 kB

CPU info:

Real procs: 6
Virtual (hyper-threading) procs: 12

processor	: 11
vendor_id	: GenuineIntel
cpu family	: 6
model		: 165
model name	: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
stepping	: 2
microcode	: 0xffffffff
cpu MHz		: 2712.010
cache size	: 12288 KB
physical id	: 0
siblings	: 12
core id		: 5
cpu cores	: 6
apicid		: 11
initial apicid	: 11
fpu		: yes
fpu_exception	: yes
cpuid level	: 21
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit
bogomips	: 5424.02
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

10 runs 'fix/improve-perfomance-during-pre-commit-run-all pre-commit run:'

time command max min mean median
users seconds 4.49 4.21 4.336 4.36
system seconds 0.33 0.15 0.234 0.22
CPU % 69 54 64 66.5
Total time 8.7 6.48 7.183 6.925
Run details
  • Test Start: Mon Dec 27 23:04:45 UTC 2021
  • Test End: Mon Dec 27 23:05:57 UTC 2021
Variable name Value
TEST_NUM 10
TEST_COMMAND pre-commit try-repo /tmp/issue309 terrascan
TEST_DIR /home/vm/code/Oslo
TEST_DESCRIPTION 10 runs 'fix/improve-perfomance-during-pre-commit-run-all pre-commit run:'
RAW_TEST_RESULTS_FILE_NAME issue309

Memory info (head -n 6 /proc/meminfo):

MemTotal:       12765352 kB
MemFree:         6633428 kB
MemAvailable:    9213592 kB
Buffers:          281244 kB
Cached:          2315700 kB
SwapCached:            0 kB

CPU info:

Real procs: 6
Virtual (hyper-threading) procs: 12

processor	: 11
vendor_id	: GenuineIntel
cpu family	: 6
model		: 165
model name	: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
stepping	: 2
microcode	: 0xffffffff
cpu MHz		: 2712.010
cache size	: 12288 KB
physical id	: 0
siblings	: 12
core id		: 5
cpu cores	: 6
apicid		: 11
initial apicid	: 11
fpu		: yes
fpu_exception	: yes
cpuid level	: 21
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit
bogomips	: 5424.02
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

10 runs 'v1.62.3 pre-commit run --all:'

time command max min mean median
users seconds 92.15 84.48 86.376 85.71
system seconds 5.2 3.15 4.262 4.26
CPU % 155 145 150.9 151.5
Total time 64.45 58.19 59.949 59.66
Run details
  • Test Start: Mon Dec 27 23:09:37 UTC 2021
  • Test End: Mon Dec 27 23:19:37 UTC 2021
Variable name Value
TEST_NUM 10
TEST_COMMAND pre-commit try-repo --all /mnt/c/Users/vm/code/open-source/pre-commit-terraform terrascan
TEST_DIR /home/vm/code/Oslo
TEST_DESCRIPTION 10 runs 'v1.62.3 pre-commit run --all:'
RAW_TEST_RESULTS_FILE_NAME issue309

Memory info (head -n 6 /proc/meminfo):

MemTotal:       12765352 kB
MemFree:         6603292 kB
MemAvailable:    9188652 kB
Buffers:          283984 kB
Cached:          2317136 kB
SwapCached:            0 kB

CPU info:

Real procs: 6
Virtual (hyper-threading) procs: 12

processor	: 11
vendor_id	: GenuineIntel
cpu family	: 6
model		: 165
model name	: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
stepping	: 2
microcode	: 0xffffffff
cpu MHz		: 2712.010
cache size	: 12288 KB
physical id	: 0
siblings	: 12
core id		: 5
cpu cores	: 6
apicid		: 11
initial apicid	: 11
fpu		: yes
fpu_exception	: yes
cpuid level	: 21
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit
bogomips	: 5424.02
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

10 runs 'v1.62.3 pre-commit run:'

time command max min mean median
users seconds 4.41 4.18 4.258 4.26
system seconds 0.97 0.76 0.848 0.835
CPU % 52 39 47.2 48.5
Total time 12.86 9.9 10.815 10.49
Run details
  • Test Start: Tue Dec 28 00:08:06 UTC 2021
  • Test End: Tue Dec 28 00:09:54 UTC 2021
Variable name Value
TEST_NUM 10
TEST_COMMAND pre-commit try-repo /mnt/c/Users/vm/code/open-source/pre-commit-terraform terrascan
TEST_DIR /home/vm/code/Oslo
TEST_DESCRIPTION 10 runs 'v1.62.3 pre-commit run:'
RAW_TEST_RESULTS_FILE_NAME issue309

Memory info (head -n 6 /proc/meminfo):

MemTotal:       12765352 kB
MemFree:         6621388 kB
MemAvailable:    9212708 kB
Buffers:          289692 kB
Cached:          2317140 kB
SwapCached:            0 kB

CPU info:

Real procs: 6
Virtual (hyper-threading) procs: 12

processor	: 11
vendor_id	: GenuineIntel
cpu family	: 6
model		: 165
model name	: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
stepping	: 2
microcode	: 0xffffffff
cpu MHz		: 2712.010
cache size	: 12288 KB
physical id	: 0
siblings	: 12
core id		: 5
cpu cores	: 6
apicid		: 11
initial apicid	: 11
fpu		: yes
fpu_exception	: yes
cpuid level	: 21
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit
bogomips	: 5424.02
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

After the holidays, #310 will be merged, so I suggest relying on the introduced #310 structure and future "The next big thing" look, described in the description.

So, what I propose:
In function common::per_dir_hook right after function args init, add check like:

# check is (optional) function defined
if [ $(type -t run_hook_on_whole_repo) == function ] \
   # check is hook run via `pre-commit run --all`
   && [[ $(common::is_hook_run_on_whole_repo "${files[@]}") ]]; then  
  run_hook_on_whole_repo "$args"
  exit 0
fi

Where:

  1. function common::is_hook_run_on_whole_repo - function that implement next check:

    local sorted_string
    local git_ls_files
    sorted_string=$(printf '%s\n' "${files[@]}" | sort | tr '\n' ' ')
    git_ls_files=$(git ls-files | sort | grep -e ".tf$" | tr '\n' ' ')
    # This is a workaround to improve performance when all files are passed
    # See: https://github.com/antonbabenko/pre-commit-terraform/issues/309
    if [[ "$sorted_string" == "$git_ls_files" ]]; then

    1. Return boolean (true or false)

    2. Grub and use files: and exclude: from .pre-commit-hooks.yaml, not hardcode the same things in code.

      - id: terrascan
      name: terrascan
      description: Runs terrascan on Terraform templates.
      language: script
      entry: terrascan.sh
      files: \.tf$
      exclude: \.terraform\/.*$
      require_serial: true

      git_ls_files=$(git ls-files | sort | grep -e ".tf$" | tr '\n' ' ')

      1. We need introduce readonly global variable HOOK_ID, for grub data from needed configuration

      Notes:

      • pre-commit clones the whole repo during the environment initialization process to ~/.cache/pre-commit/repo*, so you should be able to use any files in the pre-commit-terraform repo by relative path.
      • I prefer to avoid adding yq as a dependency for this small task, not because I know the reasons why node_modules is so big, but because it will introduce breaking changes for folks - they will need to go and install yq.
        So let's use things that have already been installed:
        HOOK_ID=terraform_docs
        files=$(grep -A20 "id: ${HOOK_ID}$" .pre-commit-hooks.yaml | grep -m 1 'files:' | awk '{print $2}')
  2. function run_hook_on_whole_repo - very simple function, like function per_dir_hook_unique_part, that will have inside:

    terrascan scan -i terraform $args

@yermulnik
Copy link
Collaborator

    So let's use things that have already been installed:
    ```shell
    HOOK_ID=terraform_docs
    files=$(grep -A20 "id: ${HOOK_ID}$" .pre-commit-hooks.yaml | grep -m 1 'files:' | awk '{print $2}')
    ```

awk on its own is sufficient here:

HOOK_ID=terraform_docs
files=$(awk -v HOOK_ID=$HOOK_ID '{if (match($0, "^- id: " HOOK_ID "$")) found_start=1; if (found_start == 1 && NF == 0) exit; if (found_start == 1 && $1 == "files:") {print $2; exit}}' .pre-commit-hooks.yaml)

or to retain readability though to replace grep -A20 quirk (which may introduce unexpected results for target blocks w/o files key in them):

HOOK_ID=terraform_docs
files=$(sed -n "/^- id: $HOOK_ID$/,/^$/p" .pre-commit-hooks.yaml | awk '$1 == "files:" {print $2}')

@carlosbustillordguez
Copy link
Contributor

@MaxymVlasov great! I will be back tomorrow with the changes. Should I wait for the merge of #310 to send my contribution?

Notes:

  • pre-commit clones the whole repo during the environment initialization process to ~/.cache/pre-commit/repo*, so you should be able to use any files in the pre-commit-terraform repo by relative path.

Yes, that applies to pre-commit run, but if you are using pre-commit try-repo is copied/cloned to /tmp, e.g.: /tmp/tmp9bwre37l/shadow-repo. So I think a good option to get the scripts/files inside that directory could be:

SCRIPT_DIR="$(dirname "$(realpath "${BASH_SOURCE[0]}")")"

Like is used here:

function common::initialize {
local SCRIPT_DIR
# get directory containing this script
SCRIPT_DIR="$(dirname "$(realpath "${BASH_SOURCE[0]}")")"
# source getopt function
# shellcheck source=lib_getopt
. "$SCRIPT_DIR/lib_getopt"
}

Maybe for "The next big thing" we will need to use the above approach to source the common.sh file on each hook.


HOOK_ID=terraform_docs
files=$(sed -n "/^- id: $HOOK_ID$/,/^$/p" .pre-commit-hooks.yaml | awk '$1 == "files:" {print $2}')

@yermulnik thanks for the above code to catch the "files"!!

@MaxymVlasov
Copy link
Collaborator Author

#310 will be merged, approximately, in a few days. You can start implementation based on what we have now in the #310 related branch if you want.

@yermulnik
Copy link
Collaborator

yermulnik commented Jan 3, 2022

@carlosbustillordguez For what it's worth:

  1. I have no clue whether pre-commit pre-validates its YAML-config and hence to protect against multiple files: lines in the hook config block (presuming this is not expected and is an error) it might worth to exit on first match (I added the "; exit" bit at the end of awk command statement):
files=$(sed -n "/^- id: $HOOK_ID$/,/^$/p" .pre-commit-hooks.yaml | awk '$1 == "files:" {print $2; exit}')
  1. Also I just realized this exercise can be done using sed command only w/o a need for any additional tools like awk, though at the expense of readability and compatibility (since it'd require GNU sed) and hence may be unacceptable for the sake of cross-platform compatibility (also prints only first match and exits like in item 1):
files=$(sed -rn "/^- id: terraform_docs$/,/^$/{s/[[:space:]]*files:[[:space:]]*(.+)$/\1/p;T;q}" .pre-commit-hooks.yaml)

@MaxymVlasov
Copy link
Collaborator Author

MaxymVlasov commented Jan 4, 2022

awk exists in Linux (gawk) and macOS (mawk) by default.

No need to create unreadable sed constructions.

@carlosbustillordguez
Copy link
Contributor

@MaxymVlasov I have implemented your suggested changes for terrascan.sh in https://github.com/carlosbustillordguez/pre-commit-terraform/tree/chore/refactoring

To implementation can read the included and excluded files from .pre-commit-hooks.yaml:

https://github.com/carlosbustillordguez/pre-commit-terraform/blob/e8c2220f74377ad99a6bed3b94a6ebb5bfa0d52f/terrascan.sh#L64-L66

As commented above I had to use the SCRIPT_DIR variable to parse the .pre-commit-hooks.yaml file. I think that variable should be considered as global to use when is needed.

Regarding global variables, should we pass them as arguments to the needed function or consume them directly inside the function? I mean for HOOK_ID and SCRIPT_DIR (if is defined as global)?

@MaxymVlasov
Copy link
Collaborator Author

MaxymVlasov commented Jan 11, 2022

Hey @carlosbustillordguez
Please check our comments in #314 and prepare PR to master.

Pro tip: we are looking forward to using BASH Google Style Guide, the documentation part is already done.
So good to see changes in PR closest to chosen style guide as possible.

@carlosbustillordguez
Copy link
Contributor

Hey @carlosbustillordguez Please check our comments in #314 and prepare PR to master.

Pro tip: we are looking forward to using BASH Google Style Guide, the documentation part is already done. So good to see changes in PR closest to chosen style guide as possible.

Hi @MaxymVlasov! Sorry for the delay. Great! I can start working this afternoon/night.

@antonbabenko
Copy link
Owner

This issue has been resolved in version 1.63.0 🎉

@antonbabenko
Copy link
Owner

This issue has been resolved in version 1.64.0 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
estimate/4h Need 4 hours to be done feature New feature or request good first issue Good for newcomers hook/terraform_checkov Bash hook hook/terraform_tfsec Bash hook hook/terragrunt_fmt Bash hook hook/terragrunt_validate Bash hook hook/terrascan Bash hook
Projects
None yet
4 participants