Skip to content

Commit

Permalink
[CI] remove data.mxnet.io usage for CI stability (apache#18871)
Browse files Browse the repository at this point in the history
* remove duplicate mnist functions

* remove data.mxnet.io usage in tests

* add waitall
  • Loading branch information
szha authored Aug 7, 2020
1 parent 708a900 commit 1694d2f
Show file tree
Hide file tree
Showing 10 changed files with 38 additions and 138 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
Run on Amazon SageMaker
-----------------------

This chapter will give a high level overview about Amazon SageMaker,
This chapter will give a high level overview about running MXNet on Amazon SageMaker,
in-depth tutorials can be found on the `Sagemaker
website <https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html>`__.

Expand All @@ -29,16 +29,7 @@ charged by time. Within this notebook you can `fetch, explore and
prepare training
data <https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-notebooks-instances.html>`__.

::

import mxnet as mx
import sagemaker
mx.test_utils.get_cifar10() # Downloads Cifar-10 dataset to ./data
sagemaker_session = sagemaker.Session()
inputs = sagemaker_session.upload_data(path='data/cifar',
key_prefix='data/cifar10')

Once the data is ready, you can easily launch training via the SageMaker
With your own data on the notebook instance, you can easily launch training via the SageMaker
SDK. So there is no need to manually configure and log into EC2
instances. You can either bring your own model or use SageMaker's
`built-in
Expand All @@ -51,11 +42,11 @@ instance:
::

from sagemaker.mxnet import MXNet as MXNetEstimator
estimator = MXNetEstimator(entry_point='train.py',
estimator = MXNetEstimator(entry_point='train.py',
role=sagemaker.get_execution_role(),
train_instance_count=1,
train_instance_count=1,
train_instance_type='local',
hyperparameters={'batch_size': 1024,
hyperparameters={'batch_size': 1024,
'epochs': 30})
estimator.fit(inputs)

Expand Down
32 changes: 4 additions & 28 deletions docs/static_site/src/pages/api/faq/cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,40 +37,16 @@ and maintain the resources for precisely the amount of time needed.
In this document, we provide a step-by-step guide that will teach you
how to set up an AWS cluster with _MXNet_. We show how to:

- [Use Amazon S3 to host data](#use-amazon-s3-to-host-data)
- [Set up an EC2 GPU instance with all dependencies installed](#set-up-an-ec2-gpu-instance)
- [Use Pre-installed EC2 GPU Instance](#use-pre-installed-ec2-gpu-instance)
- [Build and run MXNet on a single computer](#build-and-run-mxnet-on-a-gpu-instance)
- [Set up an EC2 GPU cluster for distributed training](#set-up-an-ec2-gpu-cluster-for-distributed-training)

### Use Amazon S3 to Host Data

Amazon S3 provides distributed data storage which proves especially convenient for hosting large datasets.
To use S3, you need [AWS credentials](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSGettingStartedGuide/AWSCredentials.html),
including an `ACCESS_KEY_ID` and a `SECRET_ACCESS_KEY`.

To use _MXNet_ with S3, set the environment variables `AWS_ACCESS_KEY_ID` and
`AWS_SECRET_ACCESS_KEY` by adding the following two lines in
`~/.bashrc` (replacing the strings with the correct ones):

```bash
export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
```

There are several ways to upload data to S3. One simple way is to use
[s3cmd](https://s3tools.org/s3cmd). For example:

```bash
wget http://data.mxnet.io/mxnet/data/mnist.zip
unzip mnist.zip && s3cmd put t*-ubyte s3://dmlc/mnist/
```

### Use Pre-installed EC2 GPU Instance
The [Deep Learning AMIs](https://aws.amazon.com/marketplace/search/results?x=0&y=0&searchTerms=Deep+Learning+AMI)
are a series of images supported and maintained by Amazon Web Services for use
on Amazon Elastic Compute Cloud (Amazon EC2) and contain the latest MXNet release.

Now you can launch _MXNet_ directly on an EC2 GPU instance.
Now you can launch _MXNet_ directly on an EC2 GPU instance.
You can also use [Jupyter](https://jupyter.org) notebook on EC2 machine.
Here is a [good tutorial](https://github.com/dmlc/mxnet-notebooks)
on how to connect to a Jupyter notebook running on an EC2 instance.
Expand All @@ -81,7 +57,7 @@ on how to connect to a Jupyter notebook running on an EC2 instance.
provide a foundational image with NVIDIA CUDA, cuDNN, GPU drivers, Intel
MKL-DNN, Docker and Nvidia-Docker, etc. for deploying your own custom deep
learning environment. You may follow the [MXNet Build From Source
instructions](<https://mxnet.apache.org/get_started/build_from_source easily on
instructions](https://mxnet.apache.org/get_started/build_from_source) easily on
the Deep Learning Base AMIs.

### Set Up an EC2 GPU Cluster for Distributed Training
Expand Down Expand Up @@ -146,7 +122,7 @@ Put all of the record files into a folder, and point the data path to the folder

#### Use YARN and SGE
Although using SSH can be simple when you don't have a cluster scheduling framework,
_MXNet_ is designed to be portable to various platforms.
_MXNet_ is designed to be portable to various platforms.
We provide scripts available in [tracker](https://github.com/dmlc/dmlc-core/tree/master/tracker)
to allow running on other cluster frameworks, including Hadoop (YARN) and SGE.
We welcome contributions from the community of examples of running _MXNet_ on your favorite distributed platform.
3 changes: 2 additions & 1 deletion example/gluon/image_classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,8 @@ def train(opt, ctx):
ctx = [ctx]

train_data, val_data = get_data_iters(dataset, batch_size, opt)
net.collect_params().reset_ctx(ctx)
for p in net.collect_params().values():
p.reset_ctx(ctx)
trainer = gluon.Trainer(net.collect_params(), 'sgd',
optimizer_params={'learning_rate': opt.lr,
'wd': opt.wd,
Expand Down
73 changes: 21 additions & 52 deletions python/mxnet/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -1764,10 +1764,15 @@ def download(url, fname=None, dirname=None, overwrite=False, retries=5):
def get_mnist(path='data'):
"""Download and load the MNIST dataset
Parameters
----------
path : str
Path in which to save the files.
Returns
-------
dict
A dict containing the data
A dict containing the data.
"""
def read_data(label_url, image_url):
if not os.path.isdir(path):
Expand All @@ -1782,26 +1787,14 @@ def read_data(label_url, image_url):
return (label, image)

# changed to mxnet.io for more stable hosting
# path = 'http://yann.lecun.com/exdb/mnist/'
url_path = 'http://data.mxnet.io/data/mnist/'
url_path = 'https://repo.mxnet.io/gluon/dataset/mnist/'
(train_lbl, train_img) = read_data(
url_path+'train-labels-idx1-ubyte.gz', url_path+'train-images-idx3-ubyte.gz')
(test_lbl, test_img) = read_data(
url_path+'t10k-labels-idx1-ubyte.gz', url_path+'t10k-images-idx3-ubyte.gz')
return {'train_data':train_img, 'train_label':train_lbl,
'test_data':test_img, 'test_label':test_lbl}

def get_mnist_pkl(path='data'):
"""Downloads MNIST dataset as a pkl.gz into a directory in the current directory
with the name `data`
"""
if not os.path.isdir(path):
os.makedirs(path)
if not os.path.exists(os.path.join(path, 'mnist.pkl.gz')):
mx.gluon.utils.download('http://deeplearning.net/data/mnist/mnist.pkl.gz',
sha1_hash='0b07d663e8a02d51849faa39e226ed19d7b7ed23',
path=path)

def get_mnist_ubyte(path='data'):
"""Downloads ubyte version of the MNIST dataset into a directory in the current directory
with the name `data` and extracts all files in the zip archive to this directory.
Expand All @@ -1811,12 +1804,13 @@ def get_mnist_ubyte(path='data'):
files = ['train-images-idx3-ubyte', 'train-labels-idx1-ubyte',
't10k-images-idx3-ubyte', 't10k-labels-idx1-ubyte']
if not all(os.path.exists(os.path.join(path, f)) for f in files):
url = 'http://data.mxnet.io/mxnet/data/mnist.zip'
sha1 = '74fc763958b9d6e04eb32717f80355bf895f0561'
zip_file_path = mx.gluon.utils.download(url, path=path, sha1_hash=sha1,
verify_ssl=False)
with zipfile.ZipFile(zip_file_path) as zf:
zf.extractall(path)
get_mnist(path)
for f in files:
ubyte_file_path = os.path.join(path, f)
zip_file_path = ubyte_file_path + '.gz'
with gzip.GzipFile(zip_file_path) as zf:
with open(ubyte_file_path, 'wb') as ubyte_file:
ubyte_file.write(zf.read())

def get_cifar10(path='data'):
"""Downloads CIFAR10 dataset into a directory in the current directory with the name `data`,
Expand All @@ -1828,23 +1822,23 @@ def get_cifar10(path='data'):
(not os.path.exists(os.path.join(path, 'cifar', 'test.rec'))) or \
(not os.path.exists(os.path.join(path, 'cifar', 'train.lst'))) or \
(not os.path.exists(os.path.join(path, 'cifar', 'test.lst'))):
url = 'http://data.mxnet.io/mxnet/data/cifar10.zip'
url = 'https://repo.mxnet.io/gluon/dataset/cifar10/cifar10-b9ac2870.zip'
sha1 = 'b9ac287012f2dad9dfb49d8271c39ecdd7db376c'
zip_file_path = mx.gluon.utils.download(url, path=path, sha1_hash=sha1,
verify_ssl=False)
with zipfile.ZipFile(zip_file_path) as zf:
zf.extractall(path)

def get_mnist_iterator(batch_size, input_shape, num_parts=1, part_index=0):
def get_mnist_iterator(batch_size, input_shape, num_parts=1, part_index=0, path='data'):
"""Returns training and validation iterators for MNIST dataset
"""

get_mnist_ubyte()
get_mnist_ubyte(path)
flat = len(input_shape) != 3

train_dataiter = mx.io.MNISTIter(
image="data/train-images-idx3-ubyte",
label="data/train-labels-idx1-ubyte",
image=os.path.join(path, "train-images-idx3-ubyte"),
label=os.path.join(path, "train-labels-idx1-ubyte"),
input_shape=input_shape,
batch_size=batch_size,
shuffle=True,
Expand All @@ -1853,8 +1847,8 @@ def get_mnist_iterator(batch_size, input_shape, num_parts=1, part_index=0):
part_index=part_index)

val_dataiter = mx.io.MNISTIter(
image="data/t10k-images-idx3-ubyte",
label="data/t10k-labels-idx1-ubyte",
image=os.path.join(path, "t10k-images-idx3-ubyte"),
label=os.path.join(path, "t10k-labels-idx1-ubyte"),
input_shape=input_shape,
batch_size=batch_size,
flat=flat,
Expand All @@ -1863,31 +1857,6 @@ def get_mnist_iterator(batch_size, input_shape, num_parts=1, part_index=0):

return (train_dataiter, val_dataiter)

def get_zip_data(data_dir, url, data_origin_name):
"""Download and extract zip data.
Parameters
----------
data_dir : str
Absolute or relative path of the directory name to store zip files
url : str
URL to download data from
data_origin_name : str
Name of the downloaded zip file
Examples
--------
>>> get_zip_data("data_dir",
"http://files.grouplens.org/datasets/movielens/ml-10m.zip",
"ml-10m.zip")
"""
data_origin_name = os.path.join(data_dir, data_origin_name)
if not os.path.exists(data_origin_name):
download(url, dirname=data_dir, overwrite=False)
zip_file = zipfile.ZipFile(data_origin_name)
zip_file.extractall(path=data_dir)

def get_bz2_data(data_dir, data_name, url, data_origin_name):
"""Download and extract bz2 data.
Expand Down
38 changes: 0 additions & 38 deletions tests/nightly/download.sh

This file was deleted.

2 changes: 1 addition & 1 deletion tests/python/gpu/test_gluon_model_zoo_gpu.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ def eprint(*args, **kwargs):
VAL_DATA='data/val-5k-256.rec'
def download_data():
return mx.test_utils.download(
'http://data.mxnet.io/data/val-5k-256.rec', VAL_DATA)
'https://repo.mxnet.io/gluon/dataset/test/val-5k-256-9e70d85e0.rec', VAL_DATA)

@with_seed()
@pytest.mark.serial
Expand Down
3 changes: 2 additions & 1 deletion tests/python/unittest/test_contrib_gluon_data_vision.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ def _generate_objects():


class TestImage(unittest.TestCase):
IMAGES_URL = "http://data.mxnet.io/data/test_images.tar.gz"
IMAGES_URL = "https://repo.mxnet.io/gluon/dataset/test/test_images-9cebe48a.tar.gz"

def setUp(self):
self.IMAGES_DIR = tempfile.mkdtemp()
Expand Down Expand Up @@ -146,3 +146,4 @@ def test_bbox_augmenters(self):
max_attempts=50)
for batch in det_iter:
pass
mx.nd.waitall()
2 changes: 1 addition & 1 deletion tests/python/unittest/test_gluon_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ def test_array_dataset():
def prepare_record(tmpdir_factory):
test_images = tmpdir_factory.mktemp("test_images")
test_images_tar = test_images.join("test_images.tar.gz")
gluon.utils.download("http://data.mxnet.io/data/test_images.tar.gz", str(test_images_tar))
gluon.utils.download("https://repo.mxnet.io/gluon/dataset/test/test_images-9cebe48a.tar.gz", str(test_images_tar))
tarfile.open(test_images_tar).extractall(str(test_images))
imgs = os.listdir(str(test_images.join("test_images")))
record = mx.recordio.MXIndexedRecordIO(str(test_images.join("test.idx")), str(test_images.join("test.rec")), 'w')
Expand Down
2 changes: 1 addition & 1 deletion tests/python/unittest/test_image.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ def _test_imageiter_last_batch(imageiter_list, assert_data_shape):


class TestImage(unittest.TestCase):
IMAGES_URL = "http://data.mxnet.io/data/test_images.tar.gz"
IMAGES_URL = "https://repo.mxnet.io/gluon/dataset/test/test_images-9cebe48a.tar.gz"

def setUp(self):
self.IMAGES_DIR = tempfile.mkdtemp()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ def _generate_objects():


class TestImage(unittest.TestCase):
IMAGES_URL = "http://data.mxnet.io/data/test_images.tar.gz"
IMAGES_URL = "https://repo.mxnet.io/gluon/dataset/test/test_images-9cebe48a.tar.gz"

def setUp(self):
self.IMAGES_DIR = tempfile.mkdtemp()
Expand Down

0 comments on commit 1694d2f

Please sign in to comment.