Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Split Integration tests out of first phase of pipeline #9128

Merged
merged 2 commits into from
Sep 29, 2021

Conversation

Mousius
Copy link
Member

@Mousius Mousius commented Sep 26, 2021

I took a look at the time taken by each stage in the Jenkins pipeline and what comprises the 6 hour CI build time. CPU Integration tests took 60 minutes of the 100 minutes of Build: CPU. By adding python3: CPU with just those Integration tests, it lines up with python3: GPU and python3: i386 which both take a similar amount of time and takes roughly 60 minutes off the overall run time.

Numbers copied from sample successful run (final time approx: 358 minutes):

Phase ID Job Minutes Start
0 0 Sanity 3 0
1 0 BUILD: arm 2 3
1 1 BUILD: i386 33 3
1 2 BUILD: CPU 100 3
1 3 BUILD: GPU 25 3
1 4 BUILD: QEMU 6 3
1 5 BUILD: WASM 2 3
2 0 java: GPU 1 103
2 1 python3: GPU 66 103
2 2 python3: arm 22 103
2 3 python3: i386 70 103
3 0 docs: GPU 3 173
3 1 frontend: CPU 40 173
3 2 frontend: GPU 185 173
3 3 topi: GPU 110 173

Numbers predicted after change (final time approx: 293 minutes):

Phase ID Job Minutes Start
0 0 Sanity 3 0
1 0 BUILD: arm 2 3
1 1 BUILD: i386 33 3
1 2 BUILD: CPU 35 3
1 3 BUILD: GPU 25 3
1 4 BUILD: QEMU 6 3
1 5 BUILD: WASM 2 3
2 0 java: GPU 1 38
2 1 python3: GPU 66 38
2 2 python3: arm 22 38
2 3 python3: i386 70 38
2 4 python3: CPU 60 38
3 0 docs: GPU 3 108
3 1 frontend: CPU 40 108
3 2 frontend: GPU 185 108
3 3 topi: GPU 110 108

I took a look at the time taken by each stage in the Jenkins pipeline and what comprises the 6 hour CI build time. CPU Integration tests took `65` minutes of the `100` minutes of `Build: CPU`. By adding `python3: CPU` with just those Integration tests, it lines up with `python3: GPU` and `python3: i386` which both take a similar amount of time and takes roughly 60 minutes off the overall run time.

Numbers copied from sample successful run (final time approx: 358 minutes):
|Phase|ID                           |Job   |Minutes                                      |Start|
|-----|-----------------------------|------|---------------------------------------------|-----|
|0    |0                            |Sanity|3                                            |0    |
|1    |0                            |BUILD: arm|2                                            |3    |
|1    |1                            |BUILD: i386|33                                           |3    |
|1    |2                            |BUILD: CPU|100                                          |3    |
|1    |3                            |BUILD: GPU|25                                           |3    |
|1    |4                            |BUILD: QEMU|6                                            |3    |
|1    |5                            |BUILD: WASM|2                                            |3    |
|2    |0                            |java: GPU|1                                            |103  |
|2    |1                            |python3: GPU|66                                           |103  |
|2    |2                            |python3: arm|22                                           |103  |
|2    |3                            |python3: i386|70                                           |103  |
|3    |0                            |docs: GPU|3                                            |173  |
|3    |1                            |frontend: CPU|40                                           |173  |
|3    |2                            |frontend: GPU|185                                          |173  |
|3    |3                            |topi: GPU|110                                          |173  |
|     |                             |      |                                             |     |

Numbers predicted after change (final time approx: 293 minutes):
|Phase|ID                           |Job   |Minutes                                      |Start|
|-----|-----------------------------|------|---------------------------------------------|-----|
|0    |0                            |Sanity|3                                            |0    |
|1    |0                            |BUILD: arm|2                                            |3    |
|1    |1                            |BUILD: i386|33                                           |3    |
|1    |2                            |BUILD: CPU|35                                           |3    |
|1    |3                            |BUILD: GPU|25                                           |3    |
|1    |4                            |BUILD: QEMU|6                                            |3    |
|1    |5                            |BUILD: WASM|2                                            |3    |
|2    |0                            |java: GPU|1                                            |38   |
|2    |1                            |python3: GPU|66                                           |38   |
|2    |2                            |python3: arm|22                                           |38   |
|2    |3                            |python3: i386|70                                           |38   |
|2    |4                            |python3: CPU|60                                           |38   |
|3    |0                            |docs: GPU|3                                            |108  |
|3    |1                            |frontend: CPU|40                                           |108  |
|3    |2                            |frontend: GPU|185                                          |108  |
|3    |3                            |topi: GPU|110                                          |108  |
@Mousius Mousius requested a review from a team as a code owner September 26, 2021 14:27
@areusch
Copy link
Contributor

areusch commented Sep 27, 2021

@tqchen can you comment on why we have integration tests in the first part? IIRC it was originally due to scarcity of GPU nodes but now perhaps we don't need to worry so much. wdyt? i agree with @Mousius assessment that the CPU is the long pole in the first phase, and switching to xdist will only make that more obvious.

@tqchen
Copy link
Member

tqchen commented Sep 28, 2021

i agree we can do that, the main thing is to be able to test on staging before merge

@Mousius
Copy link
Member Author

Mousius commented Sep 28, 2021

Thanks for taking a look @areusch / @tqchen - could someone trigger the staging job so we can see if I've managed to get the incantation right? 😸

@jroesch
Copy link
Member

jroesch commented Sep 28, 2021

This is building, https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/ci-docker-staging/160/pipeline only merge after this has gone green.

Copy link
Member

@jroesch jroesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait for CI staging run before merging.

@Mousius Mousius force-pushed the optimise-jenkinsfile branch from e108fcf to 8cbf84b Compare September 29, 2021 08:41
@Mousius
Copy link
Member Author

Mousius commented Sep 29, 2021

Had to re-push due to a flakey unit test on the PR build, docker-staging build is here now:
https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/ci-docker-staging/161/pipeline

@jroesch
Copy link
Member

jroesch commented Sep 29, 2021

In bias for action, I am going to merge this one feel free to follow up if other things need to happen.

@jroesch jroesch merged commit 7a08ae4 into apache:main Sep 29, 2021
AndrewZhaoLuo added a commit to AndrewZhaoLuo/tvm that referenced this pull request Sep 29, 2021
* main:
  Fix flaky NMS test by making sure scores are unique (apache#9140)
  [Relay] Merge analysis/context_analysis.cc and transforms/device_annotation.cc (apache#9038)
  [LLVM] Make changes needed for opaque pointers (apache#9138)
  Arm(R) Ethos(TM)-U NPU codegen integration (apache#8849)
  [CI] Split Integration tests out of first phase of pipeline (apache#9128)
  [Meta Schedule][M3b] Runner (apache#9111)
  Fix Google Mock differences between Ubuntu 18.04 and 16.04 (apache#9141)
  [TIR] add loop partition hint pragma (apache#9121)
  fix things (apache#9146)
  [Meta Schedule][M3a] SearchStrategy (apache#9132)
  [Frontend][PyTorch] support for quantized conv_transpose2d op (apache#9133)
  [UnitTest] Parametrized test_conv2d_int8_intrinsics (apache#9143)
  [OpenCL] Remove redundant visit statement in CodeGen. (apache#9144)
  [BYOC] support arbitrary input dims for add/mul/relu of dnnl c_src codegen (apache#9127)
  [Relay][ConvertLayout] Support for qnn.conv2d_transpose (apache#9139)
  add nn.global_avgpool to fq2i (apache#9137)
  [UnitTests] Enable minimum testing on Vulkan target in CI (apache#9093)
  [Torch] Support returning quantized weights and bias for BYOC use cases (apache#9135)
  [Relay] Prepare for new plan_devices.cc (part II) (apache#9130)
  [microTVM][Zephyr] Add MIMXRT1050 board support (apache#9068)
AndrewZhaoLuo added a commit to AndrewZhaoLuo/tvm that referenced this pull request Sep 30, 2021
* main: (80 commits)
  Introduce centralised name transformation functions (apache#9088)
  [OpenCL] Add vectorization to cuda conv2d_nhwc schedule (apache#8636)
  [6/6] Arm(R) Ethos(TM)-U NPU codegen integration with `tvmc` (apache#8854)
  [microTVM] Add wrapper for creating project using a MLF (apache#9090)
  Fix typo (apache#9156)
  [Hotfix][Testing] Wait for RPCServer to be established (apache#9150)
  Update find cublas so it search default path if needed. (apache#9149)
  [TIR][LowerMatchBuffer] Fix lowering strides when source region has higher dimension than the buffer (apache#9145)
  Fix flaky NMS test by making sure scores are unique (apache#9140)
  [Relay] Merge analysis/context_analysis.cc and transforms/device_annotation.cc (apache#9038)
  [LLVM] Make changes needed for opaque pointers (apache#9138)
  Arm(R) Ethos(TM)-U NPU codegen integration (apache#8849)
  [CI] Split Integration tests out of first phase of pipeline (apache#9128)
  [Meta Schedule][M3b] Runner (apache#9111)
  Fix Google Mock differences between Ubuntu 18.04 and 16.04 (apache#9141)
  [TIR] add loop partition hint pragma (apache#9121)
  fix things (apache#9146)
  [Meta Schedule][M3a] SearchStrategy (apache#9132)
  [Frontend][PyTorch] support for quantized conv_transpose2d op (apache#9133)
  [UnitTest] Parametrized test_conv2d_int8_intrinsics (apache#9143)
  ...
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022
)

* [CI] Split Integration tests out of first phase of pipeline

I took a look at the time taken by each stage in the Jenkins pipeline and what comprises the 6 hour CI build time. CPU Integration tests took `65` minutes of the `100` minutes of `Build: CPU`. By adding `python3: CPU` with just those Integration tests, it lines up with `python3: GPU` and `python3: i386` which both take a similar amount of time and takes roughly 60 minutes off the overall run time.

Numbers copied from sample successful run (final time approx: 358 minutes):
|Phase|ID                           |Job   |Minutes                                      |Start|
|-----|-----------------------------|------|---------------------------------------------|-----|
|0    |0                            |Sanity|3                                            |0    |
|1    |0                            |BUILD: arm|2                                            |3    |
|1    |1                            |BUILD: i386|33                                           |3    |
|1    |2                            |BUILD: CPU|100                                          |3    |
|1    |3                            |BUILD: GPU|25                                           |3    |
|1    |4                            |BUILD: QEMU|6                                            |3    |
|1    |5                            |BUILD: WASM|2                                            |3    |
|2    |0                            |java: GPU|1                                            |103  |
|2    |1                            |python3: GPU|66                                           |103  |
|2    |2                            |python3: arm|22                                           |103  |
|2    |3                            |python3: i386|70                                           |103  |
|3    |0                            |docs: GPU|3                                            |173  |
|3    |1                            |frontend: CPU|40                                           |173  |
|3    |2                            |frontend: GPU|185                                          |173  |
|3    |3                            |topi: GPU|110                                          |173  |
|     |                             |      |                                             |     |

Numbers predicted after change (final time approx: 293 minutes):
|Phase|ID                           |Job   |Minutes                                      |Start|
|-----|-----------------------------|------|---------------------------------------------|-----|
|0    |0                            |Sanity|3                                            |0    |
|1    |0                            |BUILD: arm|2                                            |3    |
|1    |1                            |BUILD: i386|33                                           |3    |
|1    |2                            |BUILD: CPU|35                                           |3    |
|1    |3                            |BUILD: GPU|25                                           |3    |
|1    |4                            |BUILD: QEMU|6                                            |3    |
|1    |5                            |BUILD: WASM|2                                            |3    |
|2    |0                            |java: GPU|1                                            |38   |
|2    |1                            |python3: GPU|66                                           |38   |
|2    |2                            |python3: arm|22                                           |38   |
|2    |3                            |python3: i386|70                                           |38   |
|2    |4                            |python3: CPU|60                                           |38   |
|3    |0                            |docs: GPU|3                                            |108  |
|3    |1                            |frontend: CPU|40                                           |108  |
|3    |2                            |frontend: GPU|185                                          |108  |
|3    |3                            |topi: GPU|110                                          |108  |

* Fix typo in ci_cpu commands
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
)

* [CI] Split Integration tests out of first phase of pipeline

I took a look at the time taken by each stage in the Jenkins pipeline and what comprises the 6 hour CI build time. CPU Integration tests took `65` minutes of the `100` minutes of `Build: CPU`. By adding `python3: CPU` with just those Integration tests, it lines up with `python3: GPU` and `python3: i386` which both take a similar amount of time and takes roughly 60 minutes off the overall run time.

Numbers copied from sample successful run (final time approx: 358 minutes):
|Phase|ID                           |Job   |Minutes                                      |Start|
|-----|-----------------------------|------|---------------------------------------------|-----|
|0    |0                            |Sanity|3                                            |0    |
|1    |0                            |BUILD: arm|2                                            |3    |
|1    |1                            |BUILD: i386|33                                           |3    |
|1    |2                            |BUILD: CPU|100                                          |3    |
|1    |3                            |BUILD: GPU|25                                           |3    |
|1    |4                            |BUILD: QEMU|6                                            |3    |
|1    |5                            |BUILD: WASM|2                                            |3    |
|2    |0                            |java: GPU|1                                            |103  |
|2    |1                            |python3: GPU|66                                           |103  |
|2    |2                            |python3: arm|22                                           |103  |
|2    |3                            |python3: i386|70                                           |103  |
|3    |0                            |docs: GPU|3                                            |173  |
|3    |1                            |frontend: CPU|40                                           |173  |
|3    |2                            |frontend: GPU|185                                          |173  |
|3    |3                            |topi: GPU|110                                          |173  |
|     |                             |      |                                             |     |

Numbers predicted after change (final time approx: 293 minutes):
|Phase|ID                           |Job   |Minutes                                      |Start|
|-----|-----------------------------|------|---------------------------------------------|-----|
|0    |0                            |Sanity|3                                            |0    |
|1    |0                            |BUILD: arm|2                                            |3    |
|1    |1                            |BUILD: i386|33                                           |3    |
|1    |2                            |BUILD: CPU|35                                           |3    |
|1    |3                            |BUILD: GPU|25                                           |3    |
|1    |4                            |BUILD: QEMU|6                                            |3    |
|1    |5                            |BUILD: WASM|2                                            |3    |
|2    |0                            |java: GPU|1                                            |38   |
|2    |1                            |python3: GPU|66                                           |38   |
|2    |2                            |python3: arm|22                                           |38   |
|2    |3                            |python3: i386|70                                           |38   |
|2    |4                            |python3: CPU|60                                           |38   |
|3    |0                            |docs: GPU|3                                            |108  |
|3    |1                            |frontend: CPU|40                                           |108  |
|3    |2                            |frontend: GPU|185                                          |108  |
|3    |3                            |topi: GPU|110                                          |108  |

* Fix typo in ci_cpu commands
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants