[CI] Add Codecov and Test Logs #1349

barry-jin · 2020-09-03T23:08:13Z

Description

Add codecov report and upload batch job logs as build artifacts.

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

cc @dmlc/gluon-nlp-team

codecov · 2020-09-03T23:32:15Z

Codecov Report

Merging #1349 into master will increase coverage by 0.10%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1349      +/-   ##
==========================================
+ Coverage   81.65%   81.75%   +0.10%     
==========================================
  Files          52       52              
  Lines        6862     6862              
==========================================
+ Hits         5603     5610       +7     
+ Misses       1259     1252       -7

Impacted Files	Coverage Δ
src/gluonnlp/data/tokenizers/subword_nmt.py	`79.43% <0.00%> (+0.93%)`	⬆️
src/gluonnlp/data/loading.py	`83.39% <0.00%> (+2.26%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 203d84c...cab4c6d. Read the comment docs.

sxjscience

This looks good to me.

sxjscience · 2020-09-04T03:53:17Z

@szha Would you review/merge?

szha · 2020-09-04T04:20:01Z

.github/workflows/unittests-gpu.yml

+          echo "Start submitting job"
+          python ./tools/batch/submit-job.py --region us-east-1 \
+                                             --job-type g4dn.4x \
+                                             --name GluonNLP-${{ github.ref }}-${{ github.run_id }} \


Does the ref contain PR number?

For push event, ${{ github.ref }} will only contain the branch name the pr merged to, like 'master'.

having the PR number in the job name will help identify them in AWS batch

szha · 2020-09-04T04:23:47Z

.github/workflows/unittests-gpu.yml


      - name: Upload log file for AWS Batch test results
+        if: ${{ failure() || success() }}
        uses: actions/upload-artifact@v2
        with:


Maybe have an output folder where tests can dump outputs to, and then always expose that folder as artifact? This should help simplify maintenance for managing build artifacts.

You mean redirect the test results as stdout and stderr to different files in a folder and then upload this folder as build artifact so that the developers can access it through github workflow's artifacts.

yes exactly.

For me, I think both look fine and we may merge this is first and improve later.

You mean redirect the test results as stdout and stderr to different files in a folder and then upload this folder as build artifact so that the developers can access it through github workflow's artifacts.

yes exactly.

That's the existing practice in 0.x but doesn't work well as it doesn't capture errors outside the immediate test execution. Instead, we need to capture the full cloudwatch log.

leezu · 2020-09-04T17:44:51Z

.github/workflows/unittests-gpu.yml

                                             --source-ref ${{ github.event.pull_request.head.ref }} \
                                             --work-dir tools/batch \
                                             --save-path temp \
                                             --remote https://github.com/${{ github.event.pull_request.head.repo.full_name }} \
-                                             --command "./batch_states/test.sh | tee > ./script.log" \
+                                             --command "(./batch_states/test.sh | tee > ./gputest.stdout.log) 3>&1 1>&2 2>&3 | tee ./gputest.stderr.log" \


You need to retrieve the log from cloudwatch. Just looking at the stdout and stderr of this command is not sufficient

leezu

Need to specify --quiet option to pip to avoid large output. Do not discard the output of the pip installation as currently done by only logging stdout and stderr of the test.sh

sxjscience · 2020-09-04T18:11:35Z

For me, I feel that we can have two artifacts: 1) the log of the unittest, 2) the complete log generated by the batch job. The advantage is that we may choose to download one of them to reduce the time for investigating the log.

leezu · 2020-09-04T20:27:18Z

Having two logs may be confusing. The problem is that the unittest can fail with cryptic errors if there is a problem in the setup phase of the batch job. People may not understand that they need to check a separate log file to figure out the error

tools/batch/docker/gluon_nlp_job.sh

sxjscience · 2020-09-04T20:29:13Z

@leezu The problem here is that the full log can have >400MB, which looks scary for most people. We may write a README to teach the user how to investigate batch-related problems.

leezu · 2020-09-04T20:31:45Z

The problem here is that the full log can have >400MB

That's due to misconfiguration.

leezu · 2020-09-04T21:22:03Z

tools/batch/docker/gluon_nlp_cpu_job.sh

-python3 -m pip install -U --pre "mxnet>=2.0.0b20200802" -f https://dist.mxnet.io/python
-pip3 install -v -e .[extras]
+python3 -m pip install -U --quiet --pre "mxnet>=2.0.0b20200802" -f https://dist.mxnet.io/python
+pip3 --quiet install -v -e .[extras]


Besides adding quiet, you should also remove the verbose option. Maybe removing the verbose option is already enough and the quiet option is not needed.

You also need to rebuild the Docker container after changing these scripts

[CI] Update

* Fix BERT fp16 bugs, add test (dmlc#1270) * Fix fp16 bug: not passing dtype to TransformerEncoderLayer * Re-hybridize after casting & add BERT test * Skip fp16 test if CPU ctx * remove debugging messages Co-authored-by: root <[email protected]> * [Fix][SageMaker] Make sure that the installation works in SageMaker (dmlc#1348) * Fasttext to 0.9.1 * Update setup.py * [CI] Add Codecov and Test Logs (dmlc#1349) * [Fix] Some minor fixes for AMLC Tutorial (dmlc#1355) * update update update update * Update test_utils_misc.py * update * update * Update test_layers.py * Update misc.py * Update mobilebert.py * add in_units and in_channels * Update __init__.py * Update mobilebert.py * Update README.md * fix test case * fix * Update test_utils_misc.py * fix bug * [FEATURE] gpt2 generation scripts (dmlc#1354) * remove prev_len in hybrid_forward parameters * update * sample * update * add gpt2_1558M * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update Co-authored-by: Hu <[email protected]> * [Fix] Minor fix for AMLC Tutorial - QA (dmlc#1359) * update Update README.md update try to use dataclasses * Update squad_utils.py * Update preprocessing.py * Update squad_utils.py * Update run_squad.py * [Log Message Improvement] Improve nlp process (dmlc#1362) * Update learn_subword.py * Update learn_subword.py * Update learn_subword.py * Update apply_subword.py * Set default ctx in conftest (dmlc#1363) * Fix the correctness of the Horovod support on squad (dmlc#1353) * revise squad * tiny fix * fix total_norm logging * shuffle before and after splitting * make pre_shuffle_seed fixed * fix flags * remove do_pre_shuffle * remove inside_split_shuffle Co-authored-by: Ubuntu <[email protected]> * [CI][BUGFIX] Custom Step for Uploading Code Coverage in Pull Request Event (dmlc#1364) * [FEATURE]Generation script improvement (dmlc#1365) * update * update * update * update * update * udpate * update * update * update * update Co-authored-by: Hu <[email protected]> * [Website][CI] Build Website without Warnings + Add Workflow for Building Website (dmlc#1327) * [Website] Documentation warnings Fixed + Create Makefile [Website] Documentation bug fix [Website] Bug fix [Website] Build without model_zoo [Website] Fix notebook * [Website][CI] Add workflow for building website * [CI] Add more dependencies * [CI] Update buildwebsite.yml [CI] Update buildwebsite.yml * [CI] Update buildwebsite.yml * [CI] Update buildwebsite.yml * [CI] Update buildwebsite.yml * [CI] Update buildwebsite.yml * [CI] Update buildwebsite.yml * [CI] Update buildwebsite.yml * [CI] Update buildwebsite.yml * [CI] Update buildwebsite.yml [CI] Update buildwebsite.yml [CI] Update buildwebsite.yml [CI] Update buildwebsite.yml [CI] Update buildwebsite.yml [CI] Update buildwebsite.yml [CI] Update buildwebsite.yml [CI] Update buildwebsite.yml [CI] Update buildwebsite.yml * [CI] Update buildwebsite.yml * [CI] Update buildwebsite.yml * [Website] Add more dependencies * [Website][CI] Add Compile notebook step + Preview website * [CI] Add shell script for compiling notebooks * [CI] Add permission for shell script * [Website] Update * [Website] Update * [CI] Add uploading build artifacts * [CI] Update * [CI] Update Indentation * [CI] Remove some dependencies * [BUGFIX] Fix URL encoding (dmlc#1370) * [FEATURE]Update readme of nmt (dmlc#1373) * update * update * update * update * update * update * update * update Co-authored-by: Hu <[email protected]> * [CI] Improve website building workflow (dmlc#1377) * BERT pretraining (dmlc#1376) * bert * update * address comments * update * [Fix][Docker] Fix the docker image + Fix pretrain_corpus document. (dmlc#1378) * update * Update ubuntu18.04-devel-gpu.Dockerfile * fix the docker image * Update README.md * Update ubuntu18.04-devel-gpu.Dockerfile * Update README.md * fix readme * Add CPU DockerFile * update * update * Update ubuntu18.04-devel-gpu.Dockerfile * update * prepare to add TVM to docker * try to update * Update ubuntu18.04-devel-gpu.Dockerfile * Update ubuntu18.04-devel-gpu.Dockerfile * Update install_openmpi.sh * update * Create install_llvm.sh * Update ubuntu18.04-base-gpu.Dockerfile * Update ubuntu18.04-base-gpu.Dockerfile * Update run_squad2_albert_base.sh * Update prepare_squad.py * Update prepare_squad.py * Update prepare_squad.py * fix * Update README.md * update * update * Update README.md * Update README.md * Update ubuntu18.04-devel-gpu.Dockerfile * update * Update README.md * fix * Update ubuntu18.04-base-cpu.Dockerfile * update * add tvm to lazy import * update * Update README.md * update * Update README.md * Update run_squad2_albert_base.sh * update * update * update * update * update * Update README.md * Update install_ubuntu18.04_core.sh * update * update * update * fix * Update README.md * Update run_batch_squad.sh * update * Update run_batch_squad.sh * Update run_batch_squad.sh * update * Update README.md * fix * Update gluon_nlp_job.sh * update * Update README.md * Update README.md * Update README.md * update * Update README.md * update * Update install_python_packages.sh * Update install_llvm.sh * Update install_python_packages.sh * Update install_llvm.sh * update * Update install_ubuntu18.04_core.sh * fix * Update submit-job.py * Update submit-job.py * Update README.md * Update README.md * Update prepare_gutenberg.py * Delete gluon_nlp_cpu_job.sh * Update prepare_gutenberg.py * Update prepare_gutenberg.py * Update prepare_gutenberg.py * Update conf.py * update * Update generate_commands.py * fix readme * use os.link for hard link * Update README.md * Update README.md * Update gluon_nlp_job.sh * Update __init__.py * Update benchmark_utils.py * try to use multi-stage build * Update benchmark_utils.py * multi-stage build * Update README.md * Update README.md * update * Update submit-job.py * fix documentation * fix * update * Update test.sh * Update test.sh * Update test.sh * Update test.sh * Update README.md * Update test.sh * fix * Update README.md * Update gluon_nlp_job.sh * [Website] Add AMLC Tutorial to Website (dmlc#1379) * [Website] Add AMLC Tutorial * [Website] Add tsv encoding * [Website] Add model zoo * [Website] Update Makefile * [Website] Update Makefile * [Website] Update Makefile * [Website] Update compile_notebooks.sh * [Website] Update Makefile * [Website] Add title to generation * [Website] Update workflow * update * [Website] Update model_zoo.rst * [Website] Update model_zoo.rst * [BUGFIX] Fix Codecov (dmlc#1391) * Update coveragerc * Update coveragerc * Update coveragerc * Update workflow * Update workflow * update * update Co-authored-by: MoisesHer <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: Xingjian Shi <[email protected]> Co-authored-by: barry-jin <[email protected]> Co-authored-by: ht <[email protected]> Co-authored-by: Hu <[email protected]> Co-authored-by: Leonard Lausen <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Ziyue Huang <[email protected]>

barry-jin added 6 commits September 3, 2020 14:35

[CI] Add codecov and log

078b031

[CI] Update CI

55ec073

[CI] Update CI

72da3da

[CI] Update CI

1d53c10

[CI] Update CI

878b790

[CI] Update CI

a03d8a8

barry-jin requested a review from a team as a code owner September 3, 2020 23:08

barry-jin added 3 commits September 3, 2020 16:43

[CI] Update CI

1ac85cb

[CI] Update CI and add failure test

2737888

[CI] Remove assert

6838434

sxjscience approved these changes Sep 4, 2020

View reviewed changes

szha reviewed Sep 4, 2020

View reviewed changes

[CI] Update gpu workflow script

803bb0b

leezu reviewed Sep 4, 2020

View reviewed changes

leezu suggested changes Sep 4, 2020

View reviewed changes

barry-jin added 2 commits September 4, 2020 11:22

[CI] Add cloudwatch log as build artifact

97a78ca

[CI] Update unittests-gpu.yml

6c09790

leezu reviewed Sep 4, 2020

View reviewed changes

tools/batch/docker/gluon_nlp_job.sh Outdated Show resolved Hide resolved

[CI] Mute pip install in gluon-nlp setup

f9a607f

leezu reviewed Sep 4, 2020

View reviewed changes

barry-jin added 4 commits September 4, 2020 14:40

[CI] Remover verbose

07b880e

[CI] Update Build Artifacts files

5a2f9fa

Update Docker

a247c93

Merge remote-tracking branch 'upstream/master' into gpu-ci-logs-codecov

38b77f6

[CI] Update

cab4c6d

[CI] Update

leezu approved these changes Sep 8, 2020

View reviewed changes

leezu merged commit 4c2867d into dmlc:master Sep 8, 2020

barry-jin mentioned this pull request Sep 8, 2020

expose batch job logs as build artifacts #1341

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Add Codecov and Test Logs #1349

[CI] Add Codecov and Test Logs #1349

barry-jin commented Sep 3, 2020

codecov bot commented Sep 3, 2020 •

edited

Loading

sxjscience left a comment

sxjscience commented Sep 4, 2020

szha Sep 4, 2020

barry-jin Sep 4, 2020

szha Sep 4, 2020

szha Sep 4, 2020

barry-jin Sep 4, 2020

szha Sep 4, 2020

sxjscience Sep 4, 2020

leezu Sep 4, 2020

leezu Sep 4, 2020

leezu left a comment

sxjscience commented Sep 4, 2020

leezu commented Sep 4, 2020

sxjscience commented Sep 4, 2020

leezu commented Sep 4, 2020

leezu Sep 4, 2020 •

edited

Loading

barry-jin Sep 4, 2020

leezu Sep 4, 2020

[CI] Add Codecov and Test Logs #1349

[CI] Add Codecov and Test Logs #1349

Conversation

barry-jin commented Sep 3, 2020

Description

Checklist

Essentials

Changes

Comments

codecov bot commented Sep 3, 2020 • edited Loading

Codecov Report

sxjscience left a comment

Choose a reason for hiding this comment

sxjscience commented Sep 4, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leezu left a comment

Choose a reason for hiding this comment

sxjscience commented Sep 4, 2020

leezu commented Sep 4, 2020

sxjscience commented Sep 4, 2020

leezu commented Sep 4, 2020

leezu Sep 4, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Sep 3, 2020 •

edited

Loading

leezu Sep 4, 2020 •

edited

Loading