Improve perfomance during `pre-commit --all` (`-a`) run #309

MaxymVlasov · 2021-12-22T18:42:22Z

Some hooks recursively check all files in provided dir.

So performance degradation exists only in the pre-commit run --all situation, because it will provide all existing repo files to hook:

pre-commit-terraform/terrascan.sh

Lines 15 to 19 in e6ffbcd

    
           # consume modified files passed from pre-commit so that 
        
           # terrascan runs against only those relevant directories 
        
           for file_with_path in "${files[@]}"; do 
        
             file_with_path="${file_with_path// /__REPLACED__SPACE__}" 
        
             paths[index]=$(dirname "$file_with_path")

Then, unique paths are found and run terrascan for each repo folder:

pre-commit-terraform/terrascan.sh

Lines 29 to 30 in e6ffbcd

    
           # for each path run terrascan 
        
           for path_uniq in $(echo "${paths[*]}" | tr ' ' '\n' | sort -u); do

It works literally how it should work: checks only diffs.

So, good to know when the --all (-a) argument passed to pre-commit and just run terrascan -d GIT_REPO_ROOT, not all-all dirs.

Useful info

How to run hook performance test
pre-commit automatically parallel checks to exiting cores, so you need to run tests on a repo that has at least 2x more tf-dirs than CPU cores you have. If you have not so big repo - just copy-paste code-structure a few times, and you'll get needed.
Create solution as function, that can be called from terrascan_() hook function and depends on the result, in if run other flow and stop execution with exit 0

checkov run checks recursively. need -d GIT_REPO_POOT_PATH feat: Add terraform_checkov, that run per folder. Deprecate checkov hook #290
tfsec runs recursively without any additional args. feat: Improved speed of pre-commit run -a for multiple hooks #338
terragrunt_fmt ( terragrunt hclfmt) runs recursively without any additional args. feat: Improved speed of pre-commit run -a for multiple hooks #338
terrascan runs recursively without any additional args. feat: terrascan - Improve performance during pre-commit --all (-a) run #327
terragrunt_validate have option terragrunt run-all validate that run checks recursively. feat: Improved speed of pre-commit run -a for multiple hooks #338

Founded in #305

The text was updated successfully, but these errors were encountered:

MaxymVlasov · 2021-12-22T19:41:22Z

In #290 (checkov) we see the same issue.

carlosbustillordguez · 2021-12-23T09:29:48Z

For terrascan run the recursive scan can be disabled by passing the --non-recursive, which does not scan directories and modules recursively. Can be defined as an argument in the hook settings:

- id: terrascan
  args:
    - --args=--non-recursive # avoids scan errors on subdirectories without Terraform config files
    - --args=--policy-type=azure

I noted that pre-commit try-repo ignores the .pre-commit-config.yaml config file, so the recursive scan (if what we want) could be disabled in the hook settings:

pre-commit-terraform/.pre-commit-hooks.yaml

Line 110 in ac9299c

entry: terrascan.sh

by adding --args=--non-recursive argument at the end of the line.

For testing purposes (meantime we found a solution to detect when --all | -a was passed in pre-commit), I think we can use the --files argument (pre-commit try-repo supports all available options for pre-commit run).

For example:

TEST_NUM=5
TEST_DIR='/home/carlos/devel/azure/terraform/test/terraform_big_repo'
TEST_DESCRIPTION='`terrascan` for big TF repo with pre-commit-terraform v1.62.3'
RAW_TEST_RESULTS_FILE_NAME='terrascan_big_terrfaform_repo'

# if you don't have a .tf file in the root directory, run the following:
touch ${TEST_DIR}/main-fake.tf

# set the TEST_COMMAND by defining a valid .tf file in the root directory
TEST_COMMAND="pre-commit try-repo --ref v1.62.3 https://github.com/antonbabenko/pre-commit-terraform terrascan --files ${TEST_DIR}/main-fake.tf"

# run the test
./hooks_performance_test.sh "$TEST_NUM" "$TEST_COMMAND" "$TEST_DIR" "$TEST_DESCRIPTION" "$RAW_TEST_RESULTS_FILE_NAME"

I made a test in a fake TF project (I copied and pasted the infra code a few times as you suggested) with 189 directories, 610 files:

5 runs 'terrascan for big TF repo with pre-commit-terraform v1.62.3'

time command	max	min	mean	median
users seconds	6.67	6.08	6.476	6.56
system seconds	0.4	0.3	0.358	0.37
CPU %	165	159	162.6	163
Total time	4.33	3.89	4.192	4.29

Run details

Test Start: Thu Dec 23 09:12:35 UTC 2021
Test End: Thu Dec 23 09:12:56 UTC 2021

Variable name	Value
`TEST_NUM`	`5`
`TEST_COMMAND`	`pre-commit try-repo --ref v1.62.3 https://github.com/antonbabenko/pre-commit-terraform terrascan --files /home/carlos/devel/azure/terraform/test/terraform_big_repo/main-fake.tf`
`TEST_DIR`	`/home/carlos/devel/azure/terraform/test/terraform_big_repo`
`TEST_DESCRIPTION`	`5 runs 'terrascan for big TF repo with pre-commit-terraform v1.62.3'`
`RAW_TEST_RESULTS_FILE_NAME`	`terrascan_big_terrfaform_repo`

Memory info (head -n 6 /proc/meminfo):

MemTotal:       26034676 kB
MemFree:        20826292 kB
MemAvailable:   23280460 kB
Buffers:          232212 kB
Cached:          2866148 kB
SwapCached:            0 kB

CPU info:

Real procs: 4
Virtual (hyper-threading) procs: 8

processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 140
model name	: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
stepping	: 1
microcode	: 0xffffffff
cpu MHz		: 2803.198
cache size	: 12288 KB
physical id	: 0
siblings	: 8
core id		: 3
cpu cores	: 4
apicid		: 7
initial apicid	: 7
fpu		: yes
fpu_exception	: yes
cpuid level	: 21
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512vbmi umip avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid fsrm avx512_vp2intersect flush_l1d arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs
bogomips	: 5606.39
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

MaxymVlasov · 2021-12-23T17:22:42Z

The problem is not that hooks run recursively (at least, not in this issue), problem is that the same check runs multiply times.

Not sure why, but 1 recursive check works much faster than for_each dir in the repo, no matter, with --non-recursive or without.

5 runs 'with --non-recursive'

time command	max	min	mean	median
users seconds	130.08	116.22	121.908	121.41
system seconds	8.51	7.01	7.532	7.41
CPU %	165	160	162	161
Total time	83.68	77.01	79.642	79.56

Run details

Test Start: Thu Dec 23 14:35:52 UTC 2021
Test End: Thu Dec 23 14:42:30 UTC 2021

Variable name	Value
`TEST_NUM`	`5`
`TEST_COMMAND`	`pre-commit try-repo -a /mnt/c/Users/vm/code/open-source/pre-commit-terraform terrascan`
`TEST_DIR`	`/home/vm/code/Oslo`
`TEST_DESCRIPTION`	`5 runs 'with --non-recursive'`
`RAW_TEST_RESULTS_FILE_NAME`	`pr290`

Memory info (head -n 6 /proc/meminfo):

MemTotal:       12765352 kB
MemFree:         7107868 kB
MemAvailable:    9418472 kB
Buffers:          281768 kB
Cached:          2055276 kB
SwapCached:            0 kB

CPU info:

Real procs: 6
Virtual (hyper-threading) procs: 12

processor	: 11
vendor_id	: GenuineIntel
cpu family	: 6
model		: 165
model name	: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
stepping	: 2
microcode	: 0xffffffff
cpu MHz		: 2712.009
cache size	: 12288 KB
physical id	: 0
siblings	: 12
core id		: 5
cpu cores	: 6
apicid		: 11
initial apicid	: 11
fpu		: yes
fpu_exception	: yes
cpuid level	: 21
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit
bogomips	: 5424.01
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

5 runs 'as is'

time command	max	min	mean	median
users seconds	130.08	105.84	115.695	114.535
system seconds	8.51	4.89	6.566	6.605
CPU %	168	160	163.5	163.5
Total time	83.68	67.76	74.619	74.68

Run details

Test Start: Thu Dec 23 16:58:08 UTC 2021
Test End: Thu Dec 23 17:03:56 UTC 2021

Variable name	Value
`TEST_NUM`	`5`
`TEST_COMMAND`	`pre-commit try-repo -a /mnt/c/Users/vm/code/open-source/pre-commit-terraform terrascan`
`TEST_DIR`	`/home/vm/code/Oslo`
`TEST_DESCRIPTION`	`5 runs 'as is'`
`RAW_TEST_RESULTS_FILE_NAME`	`pr290`

Memory info (head -n 6 /proc/meminfo):

MemTotal:       12765352 kB
MemFree:         8346212 kB
MemAvailable:   10672940 kB
Buffers:          295600 kB
Cached:          2057220 kB
SwapCached:            0 kB

CPU info:

Real procs: 6
Virtual (hyper-threading) procs: 12

processor	: 11
vendor_id	: GenuineIntel
cpu family	: 6
model		: 165
model name	: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
stepping	: 2
microcode	: 0xffffffff
cpu MHz		: 2712.009
cache size	: 12288 KB
physical id	: 0
siblings	: 12
core id		: 5
cpu cores	: 6
apicid		: 11
initial apicid	: 11
fpu		: yes
fpu_exception	: yes
cpuid level	: 21
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit
bogomips	: 5424.01
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

5 runs 'one recurcive check'

time command	max	min	mean	median
users seconds	10.25	9.24	9.682	9.44
system seconds	1.49	1.07	1.316	1.38
CPU %	94	85	91	91
Total time	12.44	11.64	12.016	11.93

Run details

Test Start: Thu Dec 23 17:15:22 UTC 2021
Test End: Thu Dec 23 17:16:22 UTC 2021

Variable name	Value
`TEST_NUM`	`5`
`TEST_COMMAND`	`pre-commit try-repo -a /mnt/c/Users/vm/code/open-source/pre-commit-terraform terrascan`
`TEST_DIR`	`/home/vm/code/Oslo`
`TEST_DESCRIPTION`	`5 runs 'one recurcive check'`
`RAW_TEST_RESULTS_FILE_NAME`	`pr290`

Memory info (head -n 6 /proc/meminfo):

MemTotal:       12765352 kB
MemFree:         8090392 kB
MemAvailable:   10445696 kB
Buffers:          303004 kB
Cached:          2073544 kB
SwapCached:            0 kB

CPU info:

Real procs: 6
Virtual (hyper-threading) procs: 12

processor	: 11
vendor_id	: GenuineIntel
cpu family	: 6
model		: 165
model name	: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
stepping	: 2
microcode	: 0xffffffff
cpu MHz		: 2712.009
cache size	: 12288 KB
physical id	: 0
siblings	: 12
core id		: 5
cpu cores	: 6
apicid		: 11
initial apicid	: 11
fpu		: yes
fpu_exception	: yes
cpuid level	: 21
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit
bogomips	: 5424.01
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

The main goal of this issue is to find how to check what args specified to `pre-commit`, not to hook.

This means that need to check pre-commit.com docs, maybe code, and if needed - open issue to repo.
And then write a check that will check is --all or -a passed and return true of false

carlosbustillordguez · 2021-12-27T18:22:44Z

Not sure why, but 1 recursive check works much faster than for_each dir in the repo, no matter, with --non-recursive or without.

I noted that inspecting by directories with terrascan scan is much faster than scanning each file (terrascan scan by default inspect directory but also supports inspections by files). You are right with the above pointing.

I think we should avoid the --non-recursive argument, otherwise we can get the following error when no .tf files are present in the root directory:

terrascan scan -i terraform -t azure --non-recursive
2021-12-27T18:56:45.841+0100    error   cli/run.go:132  scan run failed{error 26 0  1 error occurred:
        * directory '/home/carlos/devel/azure/terraform/test/terraform_big_repo' has no terraform config files

The main goal of this issue is to find how to check what args specified to pre-commit, not to hook.
This means that need to check pre-commit.com docs, maybe code, and if needed - open issue to repo.
And then write a check that will check is --all or -a passed and return true of false

Regarding this, I opened an issue in the pre-comit project to know how we can check for --all | -a inside the hook, but:

there isn't a way to do that, somewhat intentionally -- the hook shouldn't care about how it is called, just that it receives a list of filenames

See pre-commit/pre-commit#2172 (comment) for details.

I have implemented a basic solution around that (I need to make some tests to see if all possible cases are covered and handled properly):

https://github.com/carlosbustillordguez/pre-commit-terraform/blob/227f6203e849b97a24daae54d99bee100eae7a36/terrascan.sh#L21-L29

@MaxymVlasov meantime could you test that branch against your repo to see if the performance is improved?

I will back later, with some performance tests and more comments.

MaxymVlasov · 2021-12-28T02:31:42Z

Yeah, I double-checked pre-commit code and am pretty sure that to add needed by us logic is pretty challenging and will introduce breaking changes or add not very intuitive crutches.

@carlosbustillordguez super! You found the solution (227f620), let's make a reusable realization.

Test results

10 runs 'fix/improve-perfomance-during-pre-commit-run-all pre-commit run --all:'

time command	max	min	mean	median
users seconds	7.84	7.16	7.583	7.62
system seconds	0.38	0.2	0.292	0.3
CPU %	136	117	126	124.5
Total time	6.83	5.71	6.249	6.27

Run details

Test Start: Mon Dec 27 23:03:20 UTC 2021
Test End: Mon Dec 27 23:04:22 UTC 2021

Variable name	Value
`TEST_NUM`	`10`
`TEST_COMMAND`	`pre-commit try-repo -a /tmp/issue309 terrascan`
`TEST_DIR`	`/home/vm/code/Oslo`
`TEST_DESCRIPTION`	`10 runs 'fix/improve-perfomance-during-pre-commit-run-all pre-commit run --all:'`
`RAW_TEST_RESULTS_FILE_NAME`	`issue309`

Memory info (head -n 6 /proc/meminfo):

MemTotal:       12765352 kB
MemFree:         6614128 kB
MemAvailable:    9193352 kB
Buffers:          280404 kB
Cached:          2315700 kB
SwapCached:            0 kB

CPU info:

Real procs: 6
Virtual (hyper-threading) procs: 12

processor	: 11
vendor_id	: GenuineIntel
cpu family	: 6
model		: 165
model name	: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
stepping	: 2
microcode	: 0xffffffff
cpu MHz		: 2712.010
cache size	: 12288 KB
physical id	: 0
siblings	: 12
core id		: 5
cpu cores	: 6
apicid		: 11
initial apicid	: 11
fpu		: yes
fpu_exception	: yes
cpuid level	: 21
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit
bogomips	: 5424.02
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

10 runs 'fix/improve-perfomance-during-pre-commit-run-all pre-commit run:'

time command	max	min	mean	median
users seconds	4.49	4.21	4.336	4.36
system seconds	0.33	0.15	0.234	0.22
CPU %	69	54	64	66.5
Total time	8.7	6.48	7.183	6.925

Run details

Test Start: Mon Dec 27 23:04:45 UTC 2021
Test End: Mon Dec 27 23:05:57 UTC 2021

Variable name	Value
`TEST_NUM`	`10`
`TEST_COMMAND`	`pre-commit try-repo /tmp/issue309 terrascan`
`TEST_DIR`	`/home/vm/code/Oslo`
`TEST_DESCRIPTION`	`10 runs 'fix/improve-perfomance-during-pre-commit-run-all pre-commit run:'`
`RAW_TEST_RESULTS_FILE_NAME`	`issue309`

Memory info (head -n 6 /proc/meminfo):

MemTotal:       12765352 kB
MemFree:         6633428 kB
MemAvailable:    9213592 kB
Buffers:          281244 kB
Cached:          2315700 kB
SwapCached:            0 kB

CPU info:

Real procs: 6
Virtual (hyper-threading) procs: 12

processor	: 11
vendor_id	: GenuineIntel
cpu family	: 6
model		: 165
model name	: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
stepping	: 2
microcode	: 0xffffffff
cpu MHz		: 2712.010
cache size	: 12288 KB
physical id	: 0
siblings	: 12
core id		: 5
cpu cores	: 6
apicid		: 11
initial apicid	: 11
fpu		: yes
fpu_exception	: yes
cpuid level	: 21
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit
bogomips	: 5424.02
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

10 runs 'v1.62.3 pre-commit run --all:'

time command	max	min	mean	median
users seconds	92.15	84.48	86.376	85.71
system seconds	5.2	3.15	4.262	4.26
CPU %	155	145	150.9	151.5
Total time	64.45	58.19	59.949	59.66

Run details

Test Start: Mon Dec 27 23:09:37 UTC 2021
Test End: Mon Dec 27 23:19:37 UTC 2021

Variable name	Value
`TEST_NUM`	`10`
`TEST_COMMAND`	`pre-commit try-repo --all /mnt/c/Users/vm/code/open-source/pre-commit-terraform terrascan`
`TEST_DIR`	`/home/vm/code/Oslo`
`TEST_DESCRIPTION`	`10 runs 'v1.62.3 pre-commit run --all:'`
`RAW_TEST_RESULTS_FILE_NAME`	`issue309`

Memory info (head -n 6 /proc/meminfo):

MemTotal:       12765352 kB
MemFree:         6603292 kB
MemAvailable:    9188652 kB
Buffers:          283984 kB
Cached:          2317136 kB
SwapCached:            0 kB

CPU info:

Real procs: 6
Virtual (hyper-threading) procs: 12

processor	: 11
vendor_id	: GenuineIntel
cpu family	: 6
model		: 165
model name	: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
stepping	: 2
microcode	: 0xffffffff
cpu MHz		: 2712.010
cache size	: 12288 KB
physical id	: 0
siblings	: 12
core id		: 5
cpu cores	: 6
apicid		: 11
initial apicid	: 11
fpu		: yes
fpu_exception	: yes
cpuid level	: 21
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit
bogomips	: 5424.02
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

10 runs 'v1.62.3 pre-commit run:'

time command	max	min	mean	median
users seconds	4.41	4.18	4.258	4.26
system seconds	0.97	0.76	0.848	0.835
CPU %	52	39	47.2	48.5
Total time	12.86	9.9	10.815	10.49

Run details

Test Start: Tue Dec 28 00:08:06 UTC 2021
Test End: Tue Dec 28 00:09:54 UTC 2021

Variable name	Value
`TEST_NUM`	`10`
`TEST_COMMAND`	`pre-commit try-repo /mnt/c/Users/vm/code/open-source/pre-commit-terraform terrascan`
`TEST_DIR`	`/home/vm/code/Oslo`
`TEST_DESCRIPTION`	`10 runs 'v1.62.3 pre-commit run:'`
`RAW_TEST_RESULTS_FILE_NAME`	`issue309`

Memory info (head -n 6 /proc/meminfo):

MemTotal:       12765352 kB
MemFree:         6621388 kB
MemAvailable:    9212708 kB
Buffers:          289692 kB
Cached:          2317140 kB
SwapCached:            0 kB

CPU info:

Real procs: 6
Virtual (hyper-threading) procs: 12

processor	: 11
vendor_id	: GenuineIntel
cpu family	: 6
model		: 165
model name	: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
stepping	: 2
microcode	: 0xffffffff
cpu MHz		: 2712.010
cache size	: 12288 KB
physical id	: 0
siblings	: 12
core id		: 5
cpu cores	: 6
apicid		: 11
initial apicid	: 11
fpu		: yes
fpu_exception	: yes
cpuid level	: 21
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit
bogomips	: 5424.02
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

After the holidays, #310 will be merged, so I suggest relying on the introduced #310 structure and future "The next big thing" look, described in the description.

So, what I propose:
In function common::per_dir_hook right after function args init, add check like:

# check is (optional) function defined
if [ $(type -t run_hook_on_whole_repo) == function ] \
   # check is hook run via `pre-commit run --all`
   && [[ $(common::is_hook_run_on_whole_repo "${files[@]}") ]]; then  
  run_hook_on_whole_repo "$args"
  exit 0
fi

Where:

function common::is_hook_run_on_whole_repo - function that implement next check:

pre-commit-terraform/terrascan.sh

Lines 15 to 23 in 227f620

    
           local sorted_string 
        
           local git_ls_files 
        
           sorted_string=$(printf '%s\n' "${files[@]}" | sort | tr '\n' ' ') 
        
           git_ls_files=$(git ls-files | sort | grep -e ".tf$" | tr '\n' ' ') 
        
           # This is a workaround to improve performance when all files are passed 
        
           # See: https://github.com/antonbabenko/pre-commit-terraform/issues/309 
        
           if [[ "$sorted_string" == "$git_ls_files" ]]; then

Return boolean (true or false)

Grub and use files: and exclude: from .pre-commit-hooks.yaml, not hardcode the same things in code.

pre-commit-terraform/.pre-commit-hooks.yaml

Lines 106 to 113 in ac9299c

    
           - id: terrascan 
        
             name: terrascan 
        
             description: Runs terrascan on Terraform templates. 
        
             language: script 
        
             entry: terrascan.sh 
        
             files: \.tf$ 
        
             exclude: \.terraform\/.*$ 
        
             require_serial: true

pre-commit-terraform/terrascan.sh

Line 19 in 227f620

git_ls_files=$(git ls-files | sort | grep -e ".tf$" | tr '\n' ' ')

We need introduce readonly global variable HOOK_ID, for grub data from needed configuration

Notes:

pre-commit clones the whole repo during the environment initialization process to ~/.cache/pre-commit/repo*, so you should be able to use any files in the pre-commit-terraform repo by relative path.
I prefer to avoid adding yq as a dependency for this small task, not because I know the reasons why node_modules is so big, but because it will introduce breaking changes for folks - they will need to go and install yq.
So let's use things that have already been installed:
```
HOOK_ID=terraform_docs
files=$(grep -A20 "id: ${HOOK_ID}$" .pre-commit-hooks.yaml | grep -m 1 'files:' | awk '{print $2}')
```

function run_hook_on_whole_repo - very simple function, like function per_dir_hook_unique_part, that will have inside:

pre-commit-terraform/terrascan.sh

Line 26 in 227f620

terrascan scan -i terraform $args

yermulnik · 2021-12-28T12:43:18Z

    So let's use things that have already been installed:
    ```shell
    HOOK_ID=terraform_docs
    files=$(grep -A20 "id: ${HOOK_ID}$" .pre-commit-hooks.yaml | grep -m 1 'files:' | awk '{print $2}')
    ```

awk on its own is sufficient here:

HOOK_ID=terraform_docs
files=$(awk -v HOOK_ID=$HOOK_ID '{if (match($0, "^- id: " HOOK_ID "$")) found_start=1; if (found_start == 1 && NF == 0) exit; if (found_start == 1 && $1 == "files:") {print $2; exit}}' .pre-commit-hooks.yaml)

or to retain readability though to replace grep -A20 quirk (which may introduce unexpected results for target blocks w/o files key in them):

HOOK_ID=terraform_docs
files=$(sed -n "/^- id: $HOOK_ID$/,/^$/p" .pre-commit-hooks.yaml | awk '$1 == "files:" {print $2}')

carlosbustillordguez · 2022-01-03T18:49:48Z

@MaxymVlasov great! I will be back tomorrow with the changes. Should I wait for the merge of #310 to send my contribution?

Notes:

pre-commit clones the whole repo during the environment initialization process to ~/.cache/pre-commit/repo*, so you should be able to use any files in the pre-commit-terraform repo by relative path.

Yes, that applies to pre-commit run, but if you are using pre-commit try-repo is copied/cloned to /tmp, e.g.: /tmp/tmp9bwre37l/shadow-repo. So I think a good option to get the scripts/files inside that directory could be:

SCRIPT_DIR="$(dirname "$(realpath "${BASH_SOURCE[0]}")")"

Like is used here:

pre-commit-terraform/terrascan.sh

Lines 10 to 18 in d19a0e3

    
           function common::initialize { 
        
             local SCRIPT_DIR 
        
             # get directory containing this script 
        
             SCRIPT_DIR="$(dirname "$(realpath "${BASH_SOURCE[0]}")")" 
        
             # source getopt function 
        
             # shellcheck source=lib_getopt 
        
             . "$SCRIPT_DIR/lib_getopt" 
        
           }

Maybe for "The next big thing" we will need to use the above approach to source the common.sh file on each hook.

HOOK_ID=terraform_docs
files=$(sed -n "/^- id: $HOOK_ID$/,/^$/p" .pre-commit-hooks.yaml | awk '$1 == "files:" {print $2}')

@yermulnik thanks for the above code to catch the "files"!!

MaxymVlasov · 2022-01-03T19:17:14Z

#310 will be merged, approximately, in a few days. You can start implementation based on what we have now in the #310 related branch if you want.

yermulnik · 2022-01-03T20:19:01Z

@carlosbustillordguez For what it's worth:

I have no clue whether pre-commit pre-validates its YAML-config and hence to protect against multiple files: lines in the hook config block (presuming this is not expected and is an error) it might worth to exit on first match (I added the "; exit" bit at the end of awk command statement):

files=$(sed -n "/^- id: $HOOK_ID$/,/^$/p" .pre-commit-hooks.yaml | awk '$1 == "files:" {print $2; exit}')

Also I just realized this exercise can be done using sed command only w/o a need for any additional tools like awk, though at the expense of readability and compatibility (since it'd require GNU sed) and hence may be unacceptable for the sake of cross-platform compatibility (also prints only first match and exits like in item 1):

files=$(sed -rn "/^- id: terraform_docs$/,/^$/{s/[[:space:]]*files:[[:space:]]*(.+)$/\1/p;T;q}" .pre-commit-hooks.yaml)

MaxymVlasov · 2022-01-04T10:10:10Z

awk exists in Linux (gawk) and macOS (mawk) by default.

No need to create unreadable sed constructions.

carlosbustillordguez · 2022-01-04T16:28:43Z

@MaxymVlasov I have implemented your suggested changes for terrascan.sh in https://github.com/carlosbustillordguez/pre-commit-terraform/tree/chore/refactoring

To implementation can read the included and excluded files from .pre-commit-hooks.yaml:

https://github.com/carlosbustillordguez/pre-commit-terraform/blob/e8c2220f74377ad99a6bed3b94a6ebb5bfa0d52f/terrascan.sh#L64-L66

As commented above I had to use the SCRIPT_DIR variable to parse the .pre-commit-hooks.yaml file. I think that variable should be considered as global to use when is needed.

Regarding global variables, should we pass them as arguments to the needed function or consume them directly inside the function? I mean for HOOK_ID and SCRIPT_DIR (if is defined as global)?

MaxymVlasov · 2022-01-11T12:37:07Z

Hey @carlosbustillordguez
Please check our comments in #314 and prepare PR to master.

Pro tip: we are looking forward to using BASH Google Style Guide, the documentation part is already done.
So good to see changes in PR closest to chosen style guide as possible.

carlosbustillordguez · 2022-01-11T15:03:59Z

Hey @carlosbustillordguez Please check our comments in #314 and prepare PR to master.

Pro tip: we are looking forward to using BASH Google Style Guide, the documentation part is already done. So good to see changes in PR closest to chosen style guide as possible.

Hi @MaxymVlasov! Sorry for the delay. Great! I can start working this afternoon/night.

antonbabenko · 2022-02-10T15:55:26Z

This issue has been resolved in version 1.63.0 🎉

antonbabenko · 2022-02-10T18:48:59Z

This issue has been resolved in version 1.64.0 🎉

MaxymVlasov added feature New feature or request estimate/4h Need 4 hours to be done hook/terrascan Bash hook labels Dec 22, 2021

MaxymVlasov added this to the Features implementing and cleanup without breaking changes milestone Dec 22, 2021

MaxymVlasov added the good first issue Good for newcomers label Dec 22, 2021

MaxymVlasov added the hook/terraform_checkov Bash hook label Dec 22, 2021

MaxymVlasov mentioned this issue Dec 22, 2021

feat: Add terraform_checkov, that run per folder. Deprecate checkov hook #290

Merged

4 tasks

MaxymVlasov added hook/terragrunt_validate Bash hook hook/terraform_tfsec Bash hook labels Dec 22, 2021

MaxymVlasov changed the title ~~[terrascan] Improve perfomance during pre-commit --all (-a) run~~ Improve perfomance during pre-commit --all (-a) run Dec 23, 2021

MaxymVlasov added the hook/terragrunt_fmt Bash hook label Dec 23, 2021

MaxymVlasov mentioned this issue Jan 5, 2022

[PR to discuss the concept] Improve performance for hooks when all files are scanned #314

Closed

4 tasks

carlosbustillordguez mentioned this issue Jan 13, 2022

feat: terrascan - Improve performance during pre-commit --all (-a) run #327

Merged

4 tasks

antonbabenko closed this as completed in #327 Feb 10, 2022

MaxymVlasov reopened this Feb 10, 2022

MaxymVlasov mentioned this issue Feb 10, 2022

feat: Improved speed of pre-commit run -a for multiple hooks #338

Merged

4 tasks

antonbabenko closed this as completed in #338 Feb 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve perfomance during `pre-commit --all` (`-a`) run #309

Improve perfomance during `pre-commit --all` (`-a`) run #309

MaxymVlasov commented Dec 22, 2021 •

edited

Loading

MaxymVlasov commented Dec 22, 2021

carlosbustillordguez commented Dec 23, 2021 •

edited

Loading

MaxymVlasov commented Dec 23, 2021

carlosbustillordguez commented Dec 27, 2021

MaxymVlasov commented Dec 28, 2021 •

edited

Loading

yermulnik commented Dec 28, 2021

carlosbustillordguez commented Jan 3, 2022

MaxymVlasov commented Jan 3, 2022

yermulnik commented Jan 3, 2022 •

edited

Loading

MaxymVlasov commented Jan 4, 2022 •

edited

Loading

carlosbustillordguez commented Jan 4, 2022

MaxymVlasov commented Jan 11, 2022 •

edited

Loading

carlosbustillordguez commented Jan 11, 2022

antonbabenko commented Feb 10, 2022

antonbabenko commented Feb 10, 2022

Improve perfomance during pre-commit --all (-a) run #309

Improve perfomance during pre-commit --all (-a) run #309

Comments

MaxymVlasov commented Dec 22, 2021 • edited Loading

Useful info

MaxymVlasov commented Dec 22, 2021

carlosbustillordguez commented Dec 23, 2021 • edited Loading

MaxymVlasov commented Dec 23, 2021

The main goal of this issue is to find how to check what args specified to pre-commit, not to hook.

carlosbustillordguez commented Dec 27, 2021

MaxymVlasov commented Dec 28, 2021 • edited Loading

yermulnik commented Dec 28, 2021

carlosbustillordguez commented Jan 3, 2022

MaxymVlasov commented Jan 3, 2022

yermulnik commented Jan 3, 2022 • edited Loading

MaxymVlasov commented Jan 4, 2022 • edited Loading

carlosbustillordguez commented Jan 4, 2022

MaxymVlasov commented Jan 11, 2022 • edited Loading

carlosbustillordguez commented Jan 11, 2022

antonbabenko commented Feb 10, 2022

antonbabenko commented Feb 10, 2022

Improve perfomance during `pre-commit --all` (`-a`) run #309

Improve perfomance during `pre-commit --all` (`-a`) run #309

MaxymVlasov commented Dec 22, 2021 •

edited

Loading

carlosbustillordguez commented Dec 23, 2021 •

edited

Loading

The main goal of this issue is to find how to check what args specified to `pre-commit`, not to hook.

MaxymVlasov commented Dec 28, 2021 •

edited

Loading

yermulnik commented Jan 3, 2022 •

edited

Loading

MaxymVlasov commented Jan 4, 2022 •

edited

Loading

MaxymVlasov commented Jan 11, 2022 •

edited

Loading