[Fix][Docker] Fix the docker image + Fix pretrain_corpus document. #1378

sxjscience · 2020-09-30T04:16:15Z

Description

Since the horovod support has been fixed, improve our docker image.
Now, the CI docker will depend on the base docker image, which supports:

horovod training
TVM

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

cc @dmlc/gluon-nlp-team

github-actions · 2020-09-30T04:33:01Z

The documentation website for preview: http://gluon-nlp-dev.s3-accelerate.amazonaws.com/PR1378/fix_docker/index.html

codecov · 2020-09-30T04:40:40Z

Codecov Report

Merging #1378 into master will decrease coverage by 0.09%.
The diff coverage is 19.04%.

@@            Coverage Diff             @@
##           master    #1378      +/-   ##
==========================================
- Coverage   71.09%   71.00%   -0.10%     
==========================================
  Files         107      107              
  Lines       12607    12619      +12     
==========================================
- Hits         8963     8960       -3     
- Misses       3644     3659      +15

Impacted Files	Coverage Δ
...ipts/datasets/pretrain_corpus/prepare_gutenberg.py	`0.00% <0.00%> (ø)`
...ts/datasets/question_answering/prepare_searchqa.py	`0.00% <0.00%> (ø)`
...ripts/datasets/question_answering/prepare_squad.py	`0.00% <0.00%> (ø)`
src/gluonnlp/models/__init__.py	`96.87% <ø> (ø)`
src/gluonnlp/utils/lazy_imports.py	`54.32% <20.00%> (-2.26%)`	⬇️
tests/test_models.py	`100.00% <100.00%> (ø)`
src/gluonnlp/data/filtering.py	`78.26% <0.00%> (-4.35%)`	⬇️
src/gluonnlp/data/tokenizers/subword_nmt.py	`79.43% <0.00%> (+0.93%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4fb41d7...3e1a326. Read the comment docs.

zheyuye

Looks good

github-actions · 2020-09-30T17:56:20Z

The documentation website for preview: http://gluon-nlp-dev.s3-accelerate.amazonaws.com/PR1378/fix_docker/index.html

github-actions · 2020-09-30T18:06:36Z

The documentation website for preview: http://gluon-nlp-dev.s3-accelerate.amazonaws.com/PR1378/fix_docker/index.html

github-actions · 2020-09-30T19:24:48Z

The documentation website for preview: http://gluon-nlp-dev.s3-accelerate.amazonaws.com/PR1378/fix_docker/index.html

github-actions · 2020-10-01T02:39:26Z

The documentation website for preview: http://gluon-nlp-dev.s3-accelerate.amazonaws.com/PR1378/fix_docker/index.html

github-actions · 2020-10-08T05:25:23Z

The documentation website for preview: http://gluon-nlp-dev.s3-accelerate.amazonaws.com/PR1378/fix_docker/index.html

github-actions · 2020-10-08T22:10:23Z

The documentation website for preview: http://gluon-nlp-dev.s3-accelerate.amazonaws.com/PR1378/fix_docker/index.html

github-actions · 2020-10-08T23:28:11Z

The documentation website for preview: http://gluon-nlp-dev.s3-accelerate.amazonaws.com/PR1378/fix_docker/index.html

github-actions · 2020-10-08T23:28:35Z

The documentation website for preview: http://gluon-nlp-dev.s3-accelerate.amazonaws.com/PR1378/fix_docker/index.html

szha · 2020-10-09T20:25:18Z

@sxjscience ready to merge?

sxjscience · 2020-10-09T20:26:37Z

No, there are some problems of the GPU docker due to the fact that " libcuda is required just to import mxnet" apache/mxnet#19139 (comment)

sxjscience · 2020-10-09T20:32:19Z

Basically, horovod relies on runtime check of MXNet to fill in the cmake flags. But the import mxnet will fail if you call it inside docker build. I'm figuring out the solution to that.

scripts/benchmarks/benchmark_utils.py

related part has changed

tools/batch/batch_states/test.sh

github-actions · 2020-10-14T21:07:58Z

The documentation website for preview: http://gluon-nlp-dev.s3-accelerate.amazonaws.com/PR1378/fix_docker/index.html

github-actions · 2020-10-15T01:27:34Z

The documentation website for preview: http://gluon-nlp-dev.s3-accelerate.amazonaws.com/PR1378/fix_docker/index.html

barry-jin

LGTM

tools/docker/gluon_nlp_job.sh

* Fix BERT fp16 bugs, add test (dmlc#1270) * Fix fp16 bug: not passing dtype to TransformerEncoderLayer * Re-hybridize after casting & add BERT test * Skip fp16 test if CPU ctx * remove debugging messages Co-authored-by: root <[email protected]> * [Fix][SageMaker] Make sure that the installation works in SageMaker (dmlc#1348) * Fasttext to 0.9.1 * Update setup.py * [CI] Add Codecov and Test Logs (dmlc#1349) * [Fix] Some minor fixes for AMLC Tutorial (dmlc#1355) * update update update update * Update test_utils_misc.py * update * update * Update test_layers.py * Update misc.py * Update mobilebert.py * add in_units and in_channels * Update __init__.py * Update mobilebert.py * Update README.md * fix test case * fix * Update test_utils_misc.py * fix bug * [FEATURE] gpt2 generation scripts (dmlc#1354) * remove prev_len in hybrid_forward parameters * update * sample * update * add gpt2_1558M * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update Co-authored-by: Hu <[email protected]> * [Fix] Minor fix for AMLC Tutorial - QA (dmlc#1359) * update Update README.md update try to use dataclasses * Update squad_utils.py * Update preprocessing.py * Update squad_utils.py * Update run_squad.py * [Log Message Improvement] Improve nlp process (dmlc#1362) * Update learn_subword.py * Update learn_subword.py * Update learn_subword.py * Update apply_subword.py * Set default ctx in conftest (dmlc#1363) * Fix the correctness of the Horovod support on squad (dmlc#1353) * revise squad * tiny fix * fix total_norm logging * shuffle before and after splitting * make pre_shuffle_seed fixed * fix flags * remove do_pre_shuffle * remove inside_split_shuffle Co-authored-by: Ubuntu <[email protected]> * [CI][BUGFIX] Custom Step for Uploading Code Coverage in Pull Request Event (dmlc#1364) * [FEATURE]Generation script improvement (dmlc#1365) * update * update * update * update * update * udpate * update * update * update * update Co-authored-by: Hu <[email protected]> * [Website][CI] Build Website without Warnings + Add Workflow for Building Website (dmlc#1327) * [Website] Documentation warnings Fixed + Create Makefile [Website] Documentation bug fix [Website] Bug fix [Website] Build without model_zoo [Website] Fix notebook * [Website][CI] Add workflow for building website * [CI] Add more dependencies * [CI] Update buildwebsite.yml [CI] Update buildwebsite.yml * [CI] Update buildwebsite.yml * [CI] Update buildwebsite.yml * [CI] Update buildwebsite.yml * [CI] Update buildwebsite.yml * [CI] Update buildwebsite.yml * [CI] Update buildwebsite.yml * [CI] Update buildwebsite.yml * [CI] Update buildwebsite.yml [CI] Update buildwebsite.yml [CI] Update buildwebsite.yml [CI] Update buildwebsite.yml [CI] Update buildwebsite.yml [CI] Update buildwebsite.yml [CI] Update buildwebsite.yml [CI] Update buildwebsite.yml [CI] Update buildwebsite.yml * [CI] Update buildwebsite.yml * [CI] Update buildwebsite.yml * [Website] Add more dependencies * [Website][CI] Add Compile notebook step + Preview website * [CI] Add shell script for compiling notebooks * [CI] Add permission for shell script * [Website] Update * [Website] Update * [CI] Add uploading build artifacts * [CI] Update * [CI] Update Indentation * [CI] Remove some dependencies * [BUGFIX] Fix URL encoding (dmlc#1370) * [FEATURE]Update readme of nmt (dmlc#1373) * update * update * update * update * update * update * update * update Co-authored-by: Hu <[email protected]> * [CI] Improve website building workflow (dmlc#1377) * BERT pretraining (dmlc#1376) * bert * update * address comments * update * [Fix][Docker] Fix the docker image + Fix pretrain_corpus document. (dmlc#1378) * update * Update ubuntu18.04-devel-gpu.Dockerfile * fix the docker image * Update README.md * Update ubuntu18.04-devel-gpu.Dockerfile * Update README.md * fix readme * Add CPU DockerFile * update * update * Update ubuntu18.04-devel-gpu.Dockerfile * update * prepare to add TVM to docker * try to update * Update ubuntu18.04-devel-gpu.Dockerfile * Update ubuntu18.04-devel-gpu.Dockerfile * Update install_openmpi.sh * update * Create install_llvm.sh * Update ubuntu18.04-base-gpu.Dockerfile * Update ubuntu18.04-base-gpu.Dockerfile * Update run_squad2_albert_base.sh * Update prepare_squad.py * Update prepare_squad.py * Update prepare_squad.py * fix * Update README.md * update * update * Update README.md * Update README.md * Update ubuntu18.04-devel-gpu.Dockerfile * update * Update README.md * fix * Update ubuntu18.04-base-cpu.Dockerfile * update * add tvm to lazy import * update * Update README.md * update * Update README.md * Update run_squad2_albert_base.sh * update * update * update * update * update * Update README.md * Update install_ubuntu18.04_core.sh * update * update * update * fix * Update README.md * Update run_batch_squad.sh * update * Update run_batch_squad.sh * Update run_batch_squad.sh * update * Update README.md * fix * Update gluon_nlp_job.sh * update * Update README.md * Update README.md * Update README.md * update * Update README.md * update * Update install_python_packages.sh * Update install_llvm.sh * Update install_python_packages.sh * Update install_llvm.sh * update * Update install_ubuntu18.04_core.sh * fix * Update submit-job.py * Update submit-job.py * Update README.md * Update README.md * Update prepare_gutenberg.py * Delete gluon_nlp_cpu_job.sh * Update prepare_gutenberg.py * Update prepare_gutenberg.py * Update prepare_gutenberg.py * Update conf.py * update * Update generate_commands.py * fix readme * use os.link for hard link * Update README.md * Update README.md * Update gluon_nlp_job.sh * Update __init__.py * Update benchmark_utils.py * try to use multi-stage build * Update benchmark_utils.py * multi-stage build * Update README.md * Update README.md * update * Update submit-job.py * fix documentation * fix * update * Update test.sh * Update test.sh * Update test.sh * Update test.sh * Update README.md * Update test.sh * fix * Update README.md * Update gluon_nlp_job.sh * [Website] Add AMLC Tutorial to Website (dmlc#1379) * [Website] Add AMLC Tutorial * [Website] Add tsv encoding * [Website] Add model zoo * [Website] Update Makefile * [Website] Update Makefile * [Website] Update Makefile * [Website] Update compile_notebooks.sh * [Website] Update Makefile * [Website] Add title to generation * [Website] Update workflow * update * [Website] Update model_zoo.rst * [Website] Update model_zoo.rst * [BUGFIX] Fix Codecov (dmlc#1391) * Update coveragerc * Update coveragerc * Update coveragerc * Update workflow * Update workflow * update * update Co-authored-by: MoisesHer <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: Xingjian Shi <[email protected]> Co-authored-by: barry-jin <[email protected]> Co-authored-by: ht <[email protected]> Co-authored-by: Hu <[email protected]> Co-authored-by: Leonard Lausen <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Ziyue Huang <[email protected]>

sxjscience added 3 commits September 29, 2020 01:37

update

df1480f

Update ubuntu18.04-devel-gpu.Dockerfile

120c4f4

fix the docker image

9f0b129

sxjscience requested a review from a team as a code owner September 30, 2020 04:16

sxjscience changed the title ~~Fix our docker image~~ Fix the docker image Sep 30, 2020

sxjscience changed the title ~~Fix the docker image~~ [Fix] Fix the docker image Sep 30, 2020

zheyuye approved these changes Sep 30, 2020

View reviewed changes

sxjscience added 2 commits September 30, 2020 10:35

Update README.md

47c1676

Update ubuntu18.04-devel-gpu.Dockerfile

be6aa35

Update README.md

07d9e0f

fix readme

3d18977

sxjscience changed the title ~~[Fix] Fix the docker image~~ [Fix] Fix the docker image + Fix pretrain_corpus document. Oct 1, 2020

Add CPU DockerFile

146b826

update

487e88e

sxjscience added 2 commits October 8, 2020 16:08

update

0fbecd4

Update ubuntu18.04-devel-gpu.Dockerfile

9b454bd

update

0e6d40a

sxjscience added 7 commits October 13, 2020 14:28

Update benchmark_utils.py

9233326

try to use multi-stage build

6c604ea

Update benchmark_utils.py

fe4d089

multi-stage build

c381eae

Update README.md

eadf268

Update README.md

aadd03d

update

207d018

sxjscience commented Oct 14, 2020

View reviewed changes

scripts/benchmarks/benchmark_utils.py Show resolved Hide resolved

sxjscience changed the title ~~[WIP][Fix][Docker] Fix the docker image + Fix pretrain_corpus document.~~ [Fix][Docker] Fix the docker image + Fix pretrain_corpus document. Oct 14, 2020

sxjscience added 8 commits October 14, 2020 12:06

Update submit-job.py

2c9e84e

fix documentation

bee78a6

fix

e9889ec

update

f52fbf6

Update test.sh

bbe13f7

Update test.sh

b8046a0

Update test.sh

ce551c8

Update test.sh

3ac97b6

sxjscience commented Oct 14, 2020

View reviewed changes

tools/batch/batch_states/test.sh Show resolved Hide resolved

sxjscience added 4 commits October 14, 2020 13:26

Update README.md

d34d693

Update test.sh

899c613

fix

c73d3ed

Update README.md

42c8e41

Update gluon_nlp_job.sh

3e1a326

barry-jin approved these changes Oct 15, 2020

View reviewed changes

tools/docker/gluon_nlp_job.sh Outdated Show resolved Hide resolved

sxjscience merged commit 02c0ef8 into dmlc:master Oct 15, 2020

sxjscience mentioned this pull request Oct 15, 2020

[doc] File is missing in the README.md #1380

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix][Docker] Fix the docker image + Fix pretrain_corpus document. #1378

[Fix][Docker] Fix the docker image + Fix pretrain_corpus document. #1378

sxjscience commented Sep 30, 2020 •

edited

Loading

github-actions bot commented Sep 30, 2020

codecov bot commented Sep 30, 2020 •

edited

Loading

zheyuye left a comment

github-actions bot commented Sep 30, 2020

github-actions bot commented Sep 30, 2020

github-actions bot commented Sep 30, 2020

github-actions bot commented Oct 1, 2020

github-actions bot commented Oct 8, 2020

github-actions bot commented Oct 8, 2020

github-actions bot commented Oct 8, 2020

github-actions bot commented Oct 8, 2020

szha commented Oct 9, 2020

sxjscience commented Oct 9, 2020 •

edited

Loading

sxjscience commented Oct 9, 2020

github-actions bot commented Oct 14, 2020

github-actions bot commented Oct 15, 2020

barry-jin left a comment

[Fix][Docker] Fix the docker image + Fix pretrain_corpus document. #1378

[Fix][Docker] Fix the docker image + Fix pretrain_corpus document. #1378

Conversation

sxjscience commented Sep 30, 2020 • edited Loading

Description

Checklist

Essentials

github-actions bot commented Sep 30, 2020

codecov bot commented Sep 30, 2020 • edited Loading

Codecov Report

zheyuye left a comment

Choose a reason for hiding this comment

github-actions bot commented Sep 30, 2020

github-actions bot commented Sep 30, 2020

github-actions bot commented Sep 30, 2020

github-actions bot commented Oct 1, 2020

github-actions bot commented Oct 8, 2020

github-actions bot commented Oct 8, 2020

github-actions bot commented Oct 8, 2020

github-actions bot commented Oct 8, 2020

szha commented Oct 9, 2020

sxjscience commented Oct 9, 2020 • edited Loading

sxjscience commented Oct 9, 2020

github-actions bot commented Oct 14, 2020

github-actions bot commented Oct 15, 2020

barry-jin left a comment

Choose a reason for hiding this comment

sxjscience commented Sep 30, 2020 •

edited

Loading

codecov bot commented Sep 30, 2020 •

edited

Loading

sxjscience commented Oct 9, 2020 •

edited

Loading