Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dciborow/linux template testing #1008

Closed
wants to merge 20 commits into from
Closed
Show file tree
Hide file tree
Changes from 18 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 5 additions & 16 deletions tests/ci/azure_pipeline_test/dsvm_nightly_linux_cpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,22 +45,11 @@ jobs:
parameters:
conda_env: nightly_reco_base

- script: |
. /anaconda/etc/profile.d/conda.sh && \
conda activate nightly_reco_base && \
echo "Smoke tests" && \
pytest tests/smoke --durations 0 -m "smoke and not spark and not gpu" --junitxml=reports/test-smoke.xml && \
echo "Integration tests" && \
pytest tests/integration --durations 0 -m "integration and not spark and not gpu" --junitxml=reports/test-integration.xml && \
conda deactivate
displayName: 'Run Tests'

- task: PublishTestResults@2
displayName: 'Publish Test Results '
inputs:
testResultsFiles: '**/test-*.xml'
failTaskOnFailedTests: true
condition: succeededOrFailed()
- template: steps/conda_pytest_linux.yml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand the workflow, this file will call tests/ci/azure_pipeline_test/steps/conda_pytest_linux.yml, right?
Two questions:

  1. what does coalesce do?
  2. are PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON always set, even if they are not in the spark environment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. it fills in the first string that is not empty. I am using it to minimize duplicated inputs. in particular, the notebooks has a strange mix of params, where it most cases i need to fill in "unit", but in other cases "notebook"
  2. yes, as they are just env vars, (and i'm still working on my devops jujitsu to do conditional addition of params), i decided to just set them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foot in mouth: i figured out how to only insert the parmas for spark.

parameters:
conda_env: nightly_reco_base
smoke_tests: true
integration_tests: true

- template: .ci/steps/reco_conda_clean_linux.yml@aitemplates
parameters:
Expand Down
22 changes: 6 additions & 16 deletions tests/ci/azure_pipeline_test/dsvm_nightly_linux_gpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,22 +44,12 @@ jobs:
parameters:
conda_env: nightly_reco_gpu

- script: |
. /anaconda/etc/profile.d/conda.sh && \
conda activate nightly_reco_gpu && \
echo "Smoke tests" && \
pytest tests/smoke --durations 0 -m "smoke and not spark and gpu" --junitxml=reports/test-smoke.xml && \
echo "Integration tests" && \
pytest tests/integration --durations 0 -m "integration and not spark and gpu" --junitxml=reports/test-integration.xml && \
conda deactivate
displayName: 'Run Tests'

- task: PublishTestResults@2
displayName: 'Publish Test Results '
inputs:
testResultsFiles: '**/test-*.xml'
failTaskOnFailedTests: true
condition: succeededOrFailed()
- template: steps/conda_pytest_linux.yml
parameters:
conda_env: nightly_reco_gpu
gpu: gpu
smoke_tests: true
integration_tests: true

- template: .ci/steps/reco_conda_clean_linux.yml@aitemplates
parameters:
Expand Down
43 changes: 43 additions & 0 deletions tests/ci/azure_pipeline_test/dsvm_nightly_linux_multistage.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
#
# A Github Service Connection must also be created with the name "AI-GitHub"
# https://docs.microsoft.com/en-us/azure/devops/pipelines/process/demands?view=azure-devops&tabs=yaml

resources:
repositories:
- repository: aitemplates
type: github
name: microsoft/AI
endpoint: AI-GitHub

schedules:
- cron: "7 0 * * *"
displayName: Nightly build master
branches:
include:
- master
always: true
- cron: "7 12 * * *"
displayName: Nightly build staging
branches:
include:
- staging
always: true

trigger: none

pr: none

variables:
- group: LinuxAgentPool

stages:
- template: stages/linux_test_stages.yml
parameters:
stage_name: "Nightly"
job_name: nightly
conda_env_root: nightly_reco
Agent_Pool: $(Agent_Pool)
smoke_tests: true
integration_tests: true
22 changes: 6 additions & 16 deletions tests/ci/azure_pipeline_test/dsvm_nightly_linux_pyspark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,22 +44,12 @@ jobs:
parameters:
conda_env: nightly_reco_pyspark

- script: |
. /anaconda/etc/profile.d/conda.sh && \
conda activate nightly_reco_pyspark && \
echo "Smoke tests" && \
pytest tests/smoke --durations 0 -m "smoke and spark and not gpu" --junitxml=reports/test-smoke.xml && \
echo "Integration tests" && \
pytest tests/integration --durations 0 -m "integration and spark and not gpu" --junitxml=reports/test-integration.xml && \
conda deactivate
displayName: 'Run Tests'

- task: PublishTestResults@2
displayName: 'Publish Test Results '
inputs:
testResultsFiles: '**/test-*.xml'
failTaskOnFailedTests: true
condition: succeededOrFailed()
- template: steps/conda_pytest_linux.yml
parameters:
conda_env: nightly_reco_pyspark
spark: spark
smoke_tests: true
integration_tests: true

- template: .ci/steps/reco_conda_clean_linux.yml@aitemplates
parameters:
Expand Down
16 changes: 4 additions & 12 deletions tests/ci/azure_pipeline_test/dsvm_notebook_linux_cpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,18 +37,10 @@ jobs:
conda_env: reco_base
install_conda: false

- script: |
. /anaconda/etc/profile.d/conda.sh && \
conda activate reco_base && \
pytest tests/unit --durations 0 -m "notebooks and not spark and not gpu" --junitxml=reports/test-unit.xml
displayName: 'Run Tests'

- task: PublishTestResults@2
displayName: 'Publish Test Results **/test-*.xml'
inputs:
testResultsFiles: '**/test-*.xml'
failTaskOnFailedTests: true
condition: succeededOrFailed()
- template: steps/conda_pytest_linux.yml
parameters:
conda_env: reco_base
notebook_tests: true

- template: .ci/steps/reco_conda_clean_linux.yml@aitemplates
parameters:
Expand Down
17 changes: 5 additions & 12 deletions tests/ci/azure_pipeline_test/dsvm_notebook_linux_gpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,18 +37,11 @@ jobs:
conda_env: reco_gpu
install_conda: false

- script: |
. /anaconda/etc/profile.d/conda.sh && \
conda activate reco_gpu && \
pytest tests/unit --durations 0 -m "notebooks and not spark and gpu" --junitxml=reports/test-unit.xml
displayName: 'Run Tests'

- task: PublishTestResults@2
displayName: 'Publish Test Results **/test-*.xml'
inputs:
testResultsFiles: '**/test-*.xml'
failTaskOnFailedTests: true
condition: succeededOrFailed()
- template: steps/conda_pytest_linux.yml
parameters:
conda_env: reco_gpu
gpu: gpu
notebook_tests: true

- template: .ci/steps/reco_conda_clean_linux.yml@aitemplates
parameters:
Expand Down
22 changes: 6 additions & 16 deletions tests/ci/azure_pipeline_test/dsvm_notebook_linux_pyspark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,24 +34,14 @@ jobs:
steps:
- template: .ci/steps/reco_config_conda_linux.yml@aitemplates
parameters:
conda_env: nightly_reco_base
conda_env: reco_pyspark
install_conda: false

- script: |
. /anaconda/etc/profile.d/conda.sh && \
conda activate reco_pyspark && \
pytest tests/unit --durations 0 -m "notebooks and spark and not gpu" --junitxml=reports/test-unit.xml
displayName: 'Run Tests'
env:
PYSPARK_PYTHON: /anaconda/envs/reco_pyspark/bin/python
PYSPARK_DRIVER_PYTHON: /anaconda/envs/reco_pyspark/bin/python

- task: PublishTestResults@2
displayName: 'Publish Test Results **/test-*.xml'
inputs:
testResultsFiles: '**/test-*.xml'
failTaskOnFailedTests: true
condition: succeededOrFailed()
- template: steps/conda_pytest_linux.yml
parameters:
conda_env: reco_pyspark
spark: true
notebook_tests: true

- template: .ci/steps/reco_conda_clean_linux.yml@aitemplates
parameters:
Expand Down
42 changes: 42 additions & 0 deletions tests/ci/azure_pipeline_test/dsvm_trigger_linux_multistage.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this used?

Copy link
Contributor Author

@dciborow dciborow Dec 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://dev.azure.com/AZGlobal/Azure%20Global%20CAT%20Engineering/_build?definitionId=134&_a=summary

we can create a pipeline off of it in the other ado as well, but I didnt want to do to much there without everyone on the team understanding it first (i only did the minimum i needed to param the agents)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

# Licensed under the MIT License.
#
# A Github Service Connection must also be created with the name "AI-GitHub"
# https://docs.microsoft.com/en-us/azure/devops/pipelines/process/demands?view=azure-devops&tabs=yaml

resources:
repositories:
- repository: aitemplates
type: github
name: microsoft/AI
endpoint: AI-GitHub

# Pull request against these branches will trigger this build
pr:
- master
- staging

#Any commit to this branch will trigger the build.
trigger:
- staging
- master

variables:
- group: LinuxAgentPool

stages:
- template: stages/linux_test_stages.yml
parameters:
stage_name: "Unit"
job_name: unit
conda_env_root: reco
Agent_Pool: $(Agent_Pool)
notebook_tests: true

- template: stages/linux_test_stages.yml
parameters:
stage_name: "Notebook"
job_name: notebook
conda_env_root: reco
Agent_Pool: $(Agent_Pool)
notebook_tests: true
16 changes: 4 additions & 12 deletions tests/ci/azure_pipeline_test/dsvm_unit_linux_cpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,18 +37,10 @@ jobs:
conda_env: reco_base
install_conda: false

- script: |
. /anaconda/etc/profile.d/conda.sh && \
conda activate reco_base && \
pytest tests/unit --durations 0 -m "not notebooks and not spark and not gpu" --junitxml=reports/test-unit.xml
displayName: 'Run Tests'

- task: PublishTestResults@2
displayName: 'Publish Test Results **/test-*.xml'
inputs:
testResultsFiles: '**/test-*.xml'
failTaskOnFailedTests: true
condition: succeededOrFailed()
- template: steps/conda_pytest_linux.yml
parameters:
conda_env: reco_base
unit_tests: true

- template: .ci/steps/reco_conda_clean_linux.yml@aitemplates
parameters:
Expand Down
17 changes: 5 additions & 12 deletions tests/ci/azure_pipeline_test/dsvm_unit_linux_gpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,18 +37,11 @@ jobs:
conda_env: reco_gpu
install_conda: false

- script: |
. /anaconda/etc/profile.d/conda.sh && \
conda activate reco_gpu && \
pytest tests/unit --durations 0 -m "not notebooks and not spark and gpu" --junitxml=reports/test-unit.xml
displayName: 'Run Tests'

- task: PublishTestResults@2
displayName: 'Publish Test Results **/test-*.xml'
inputs:
testResultsFiles: '**/test-*.xml'
failTaskOnFailedTests: true
condition: succeededOrFailed()
- template: steps/conda_pytest_linux.yml
parameters:
conda_env: reco_gpu
gpu: gpu
unit_tests: true

- template: .ci/steps/reco_conda_clean_linux.yml@aitemplates
parameters:
Expand Down
20 changes: 5 additions & 15 deletions tests/ci/azure_pipeline_test/dsvm_unit_linux_pyspark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,21 +37,11 @@ jobs:
conda_env: reco_pyspark
install_conda: false

- script: |
. /anaconda/etc/profile.d/conda.sh && \
conda activate reco_pyspark && \
pytest tests/unit --durations 0 -m "not notebooks and spark and not gpu" --junitxml=reports/test-unit.xml
displayName: 'Run Tests'
env:
PYSPARK_PYTHON: /anaconda/envs/reco_pyspark/bin/python
PYSPARK_DRIVER_PYTHON: /anaconda/envs/reco_pyspark/bin/python

- task: PublishTestResults@2
displayName: 'Publish Test Results **/test-*.xml'
inputs:
testResultsFiles: '**/test-*.xml'
failTaskOnFailedTests: true
condition: succeededOrFailed()
- template: steps/conda_pytest_linux.yml
parameters:
conda_env: reco_pyspark
spark: spark
unit_tests: true

- template: .ci/steps/reco_conda_clean_linux.yml@aitemplates
parameters:
Expand Down
41 changes: 41 additions & 0 deletions tests/ci/azure_pipeline_test/stage/linux_test_stage.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
parameters:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is similar but no exactly the same as tests/ci/azure_pipeline_test/stages/linux_test_stages.yml.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linux_test_stage is the generic template for one stage

while linux_test_stages specifically calls that template 3 different ways, for CPU, GPU, and Spark tests.

Copy link
Collaborator

@miguelgfierro miguelgfierro Dec 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, first question, we have dsvm_trigger_linux_multistage.yml that is calling stages/linux_test_stages.yml that calls stage/linux_test_stage.yml three times, for CPU, GPU, and Spark tests, then each of them calling steps/conda_pytest_linux.yml, this is for the uni tests. Then we have dsvm_nightly_linux_multistage.yml that is doing the same thing but for the nightly tests?

second question: who is calling conda_pytest_linux_steps.yml?

third question, the original yml are still there, are they still needed if we have (1st Q)?

fourth question, if we don't need the yamls in (3rd Q), is github going to identify the 12 original pipelines independently so they will appear when someone does a PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a general comment, I'm honestly making an effort to understand the value of all this parametrization, but to me all this is increasing the complexity. I'm trying to understand how we are making things easier for everyone, but so far I can't

Copy link
Contributor Author

@dciborow dciborow Dec 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I'm slowly trying to expand the test harness that we created for the HPs to the Best Practice repos. I'm looking to minimize the configuration differences, as well as the duplicated code, for all of the build yamls that I need to bring together to do this. I included the additional templates in this repo, for this PR, to try and make it more clear then my previous PR how the templates fit together. Then, as I look to the next BP repo, I'll start pulling common pieces into the central microsoft/ai location.

for your questions...
1 - let me try and create a diagram for this.... It's funny that I think AML studio would be a great pallet to show this on.... but I can create something in PowerPoint that will also show how NIghtly and Trigger both use the same stages.
Which of these particular levels do you think is necessary and does any layer feel like to much?

2 - I had a typo in linux_test_stage, that should have been calling conda_pytest_linux_steps, updated it.

3 - So I did not want to touch these, because they are connecting with your CI/CD system. We could do a separate PR to remove them, only after the multistage blending versions are committing, and a new ADO pipeline has been created, if you are interested in that.
For my expanding test harness, I did not particularly need separate pipelines for each test so I preferred having a single pipeline connecting with the repo.
I want to leave the decision about what you want to do with the individual pipelines completely up to the reco team, and just leave what I am using next to it.

4 - if someone does a PR, and you use the multistage pipeline, Github will roll this into a single check, but if the check fails, the user can check the logs to see which stage failed. The "Stages" in the ADO pipeline is a new feature, that they haven't included in any type of badge yet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

I think we ideally would like to have the same 12 pipelines for unit test + 6 for nightly builds, but trying to make them simple.

An idea that came from your development would be this. First there will be some components in this repo and some in the common repo (microsoft/ai). The common components will be:

  1. schedule.yml -> https://github.com/microsoft/recommenders/blob/master/tests/ci/azure_pipeline_test/dsvm_nightly_linux_gpu.yml#L5
  2. conda remove linux https://github.com/microsoft/recommenders/blob/master/tests/ci/azure_pipeline_test/dsvm_nightly_linux_gpu.yml#L54
  3. conda remove windows
  4. publish tests
    these components can be reused in other repos.

In this repo we will have:

  1. template for creating the conda env https://github.com/microsoft/recommenders/blob/master/tests/ci/azure_pipeline_test/dsvm_nightly_linux_gpu.yml#L29
  2. template for creating the unit tests https://github.com/microsoft/recommenders/blob/master/tests/ci/azure_pipeline_test/dsvm_unit_linux_cpu.yml#L37
  3. template for creating the nightly tests https://github.com/microsoft/recommenders/blob/master/tests/ci/azure_pipeline_test/dsvm_nightly_linux_gpu.yml#L36
    these components are specific for this repo, and are the core controllers of the tests, if we want to change something, we don't need to go to an external repo to change them

These are the bricks, then for each of the unit and nightly tests, we will have a yaml file and this will be connected to github.

A final thing, I think we will save a lot of lines of code if we don't use true and false flag for each component but we group them. An example on this file: https://github.com/microsoft/recommenders/blob/master/tests/ci/azure_pipeline_test/dsvm_unit_linux_gpu.yml, this code:

- bash: |
    python scripts/generate_conda_file.py  --gpu
    conda env update -n reco_gpu -f reco_gpu.yaml
  displayName: 'Creating Conda Environment with dependencies'
- script: |
   . /anaconda/etc/profile.d/conda.sh && \
   conda activate reco_gpu && \
   pytest tests/unit -m "not notebooks and not spark and gpu" --junitxml=reports/test-unit.xml

could be parametrized with conda_params (can be empty, --gpu or --pyspark), env_name and test_string (that would be "not notebooks and not spark and gpu").

would this make sense?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anargyri proposed the following idea, we would have only 2 yaml files, one for unit tests and another for nightly builds. Similarly to the previous comment, we will parametrize all the content like conda_params, conda_env, test_instruction, etc.

Now we have two possibilities:

  1. we can inject the correct parameters from ADO to the template and github is able to link each test
  2. if 1) doesn't work, we will need to create a yaml file for each pipeline with parameters and template.

if this idea makes sense to people, next step would be to explore if 1 is feasable

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also add that if (1) fails, you could still use two templates and create the yml files with a script (and check them in, so that GitHub can find them).
A similar approach to what I am suggesting is used frequently in other contexts, especially when there are many parameters to select. E.g. Jun did something similar in this repo, for creating yml files programmatically and passing them to Kubeflow
https://github.com/microsoft/HyperTune/blob/master/src/kubeflow/manifest/hypertune.template
https://github.com/microsoft/HyperTune/blob/a40e8e07b308f878013c812d069bbcececb948b7/src/kubeflow/utils.py#L187

Agent_Pool: #
stage_name: #
job_name: #
conda_env: #
spark: not spark # | spark
gpu: not gpu # | gpu
unit_tests: false
smoke_tests: false
integration_tests: false
notebook_tests: false

stages:
- stage: ${{parameters.stage_name}}
dependsOn: []

jobs:
- job: ${{parameters.job_name}}
displayName: 'Nightly tests Linux CPU'
timeoutInMinutes: 180 # how long to run the job before automatically cancelling
pool:
name: ${{parameters.Agent_Pool}}

steps:
- template: .ci/steps/reco_config_conda_linux.yml@aitemplates
parameters:
conda_env: ${{parameters.conda_env}}

- template: ../steps/conda_pytest_linux.yml
parameters:
conda_env: ${{parameters.conda_env}}
spark: ${{parameters.spark}}
gpu: ${{parameters.gpu}}
unit_tests: ${{parameters.unit_tests}}
smoke_tests: ${{parameters.smoke_tests}}
integration_tests: ${{parameters.integration_tests}}
notebook_tests: ${{parameters.notebook_tests}}

- template: .ci/steps/reco_conda_clean_linux.yml@aitemplates
parameters:
conda_env: ${{parameters.conda_env}}
Loading