Skip to content

Commit

Permalink
Migrate the legacy examples to the Merlin repo (#1711)
Browse files Browse the repository at this point in the history
* Migrate the legacy examples to the Merlin repo

We may (or may not) want to keep these examples but they've overstayed their welcome in the NVTabular repo, which is burdened with the accumulation of a lot of historical cruft. Since some of these examples use inference code that's moving to Systems, it makes more sense for them to live in the Merlin repo (if we want to keep them.)

* update READMEs

* docs: Contribute to examples clean up

- Fix difficult to detect broken links.
- Revise TOC.

* Handle data loader as an iterator (#1720)

* Update test_gpu_dl_break to handle data loader as an iterator

* Use peek method to look at first batch in notebooks

* Revert whitespace change to image cell

* Revert change to PyTorch training example notebook

* Call peek on data iter to get batch

* Describe how to check for broken links (#1719)

This is one way to check for broken links,
but I'm happy to adopt something that is
better.

Co-authored-by: Karl Higley <[email protected]>

Co-authored-by: Benedikt Schifferer <[email protected]>
Co-authored-by: Mike McKiernan <[email protected]>
Co-authored-by: Oliver Holworthy <[email protected]>
  • Loading branch information
4 people authored Dec 6, 2022
1 parent 51af616 commit 0f3a9b8
Show file tree
Hide file tree
Showing 56 changed files with 116 additions and 14,271 deletions.
15 changes: 9 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,13 +78,16 @@ To use these Docker containers, you'll first need to install the [NVIDIA Contain

### Notebook Examples and Tutorials

We provide a [collection of examples, use cases, and tutorials](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples) as Jupyter notebooks covering:

* Feature engineering and preprocessing with NVTabular
We provide a [collection of examples](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples) to demonstrate feature engineering with NVTabular as Jupyter notebooks:
* Introduction to NVTabular's High-Level API
* Advanced workflows with NVTabular
* Scaling to multi-GPU and multi-node systems
* Integrating NVTabular with HugeCTR
* Deploying to inference with Triton
* NVTabular on CPU
* Scaling NVTabular to multi-GPU systems

In addition, NVTabular is used in many of our examples in other Merlin libraries:
- [End-To-End Examples with Merlin](https://github.com/NVIDIA-Merlin/Merlin/tree/main/examples)
- [Training Examples with Merlin Models](https://github.com/NVIDIA-Merlin/models/tree/main/examples)
- [Training Examples with Transformer4Rec](https://github.com/NVIDIA-Merlin/Transformers4Rec/tree/main/examples)

### Feedback and Support

Expand Down
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@
#
html_theme = "sphinx_rtd_theme"
html_theme_options = {
"navigation_depth": 3,
"navigation_depth": 2,
"analytics_id": "G-NVJ1Y1YJHK",
}
html_copy_source = False
Expand Down
2 changes: 1 addition & 1 deletion docs/source/core_features.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ workflow = nvt.Workflow(..., client=client)

Currently, there are many ways to deploy a "cluster" for Dask. This [article](https://blog.dask.org/2020/07/23/current-state-of-distributed-dask-clusters) gives a summary of all the practical options. For a single machine with multiple GPUs, the `dask_cuda.LocalCUDACluster` API is typically the most convenient option.

Since NVTabular already uses [Dask-CuDF](https://docs.rapids.ai/api/cudf/stable/dask-cudf.html) for internal data processing, there are no other requirements for multi-GPU scaling. With that said, the parallel performance can depend strongly on (1) the size of `Dataset` partitions, (2) the shuffling procedure used for data output, and (3) the specific arguments used for both global-statistics and transformation operations. For additional information, see [Multi-GPU](https://github.com/NVIDIA/NVTabular/blob/main/examples/multi-gpu-toy-example/multi-gpu_dask.ipynb) for a simple step-by-step example.
Since NVTabular already uses [Dask-CuDF](https://docs.rapids.ai/api/cudf/stable/) for internal data processing, there are no other requirements for multi-GPU scaling. With that said, the parallel performance can depend strongly on (1) the size of `Dataset` partitions, (2) the shuffling procedure used for data output, and (3) the specific arguments used for both global-statistics and transformation operations. For additional information, see [Multi-GPU](https://github.com/NVIDIA/NVTabular/blob/main/examples/multi-gpu-toy-example/multi-gpu_dask.ipynb) for a simple step-by-step example.

## Multi-Node Support ##

Expand Down
23 changes: 15 additions & 8 deletions docs/source/resources/cloud_integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,9 @@ To run NVTabular on the cloud using GCP, do the following:
* **Boot Disk**: Ubuntu version 18.04
* **Storage**: Local 8xSSD-NVMe

2. [Install the appropriate NVIDIA drivers and CUDA](https://cloud.google.com/compute/docs/gpus/install-drivers-gpu#ubuntu-driver-steps) by running the following commands:
```
2. Install the NVIDIA drivers and CUDA by running the following commands:

```shell
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
Expand All @@ -70,8 +71,12 @@ To run NVTabular on the cloud using GCP, do the following:
nvidia-smi # Check installation
```

> For more information, refer to [Install GPU drivers](https://cloud.google.com/compute/docs/gpus/install-drivers-gpu)
> in the Google Cloud documentation.
3. [Install Docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) by running the following commands:
```

```shell
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia-merlin.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia-merlin.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
Expand All @@ -82,7 +87,8 @@ To run NVTabular on the cloud using GCP, do the following:
```

4. Configure the storage as RAID 0 by running the following commands:
```

```shell
sudo mdadm --create --verbose /dev/md0 --level=0 --name=MY_RAID --raid-devices=2 /dev/nvme0n1 /dev/nvme0n2
sudo mkfs.ext4 -L MY_RAID /dev/md0
sudo mkdir -p /mnt/raid
Expand All @@ -94,7 +100,8 @@ To run NVTabular on the cloud using GCP, do the following:
```

5. Run the container by running the following command:
```

```shell
docker run --gpus all --rm -it -p 8888:8888 -p 8797:8787 -p 8796:8786 --ipc=host --cap-add SYS_PTRACE -v /mnt/raid:/raid nvcr.io/nvidia/nvtabular:0.3 /bin/bash
```

Expand Down Expand Up @@ -179,12 +186,12 @@ conda activate nvtabular
8. Install additional packages, such as TensorFlow or PyTorch

```
pip install tensorflow-gpu
pip install tensorflow-gpu
pip install torch
pip install graphviz
```

9. Install Transformer4Rec, torchmetrics and ipykernel
9. Install Transformer4Rec, torchmetrics and ipykernel

```
conda install -y -c nvidia -c rapidsai -c numba -c conda-forge transformers4rec
Expand All @@ -197,6 +204,6 @@ conda install -y torchmetrics ipykernel
python -m ipykernel install --user --name=nvtabular
```

11. You can switch in jupyter lab and run the [movielens example](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples/getting-started-movielens).
11. You can switch in jupyter lab and run the [movielens example](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples/getting-started-movielens).

This workflow enables NVTabular ETL and training with TensorFlow or Pytorch. Deployment with Triton Inference Server will follow soon.
2 changes: 1 addition & 1 deletion docs/source/resources/links.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Talks
Blog posts
----------

We frequently post updates on [our blog](https://medium.com/nvidia-merlin) and on the [NVIDIA Developer News](https://news.developer.nvidia.com/tag/recommendation-systems/).
We frequently post updates on [our blog](https://medium.com/nvidia-merlin) and on the [NVIDIA Developer Technical Blog](https://developer.nvidia.com/blog?r=1&tags=&categories=recommendation-systems).

Some highlights:

Expand Down
44 changes: 7 additions & 37 deletions docs/source/toc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,43 +8,13 @@ subtrees:
- file: training/index.rst
- file: examples/index.md
title: Example Notebooks
subtrees:
- entries:
- file: examples/getting-started-movielens/index.md
title: Getting Started with MovieLens
entries:
- file: examples/getting-started-movielens/01-Download-Convert.ipynb
title: Download and Convert
- file: examples/getting-started-movielens/02-ETL-with-NVTabular.ipynb
title: ETL with NVTabular
- file: examples/getting-started-movielens/03-Training-with-HugeCTR.ipynb
title: Train with HugeCTR
- file: examples/getting-started-movielens/03-Training-with-TF.ipynb
title: Train with TensorFlow
- file: examples/getting-started-movielens/03-Training-with-PyTorch.ipynb
title: Train with PyTorch
- file: examples/getting-started-movielens/04-Triton-Inference-with-HugeCTR.ipynb
title: Serve a HugeCTR Model
- file: examples/getting-started-movielens/04-Triton-Inference-with-TF.ipynb
title: Serve a TensorFlow Model
- file: examples/scaling-criteo/index.md
entries:
- file: examples/scaling-criteo/01-Download-Convert.ipynb
title: Download and Convert
- file: examples/scaling-criteo/02-ETL-with-NVTabular.ipynb
title: ETL with NVTabular
- file: examples/scaling-criteo/03-Training-with-HugeCTR.ipynb
title: Train with HugeCTR
- file: examples/scaling-criteo/03-Training-with-TF.ipynb
title: Train with TensorFlow
- file: examples/scaling-criteo/04-Triton-Inference-with-HugeCTR.ipynb
title: Serve a HugeCTR Model
- file: examples/scaling-criteo/04-Triton-Inference-with-TF.ipynb
title: Serve a TensorFlow Model
- file: examples/multi-gpu-movielens/index.md
entries:
- file: examples/multi-gpu-movielens/01-03-MultiGPU-Download-Convert-ETL-with-NVTabular-Training-with-TensorFlow.ipynb
- file: examples/multi-gpu-toy-example/multi-gpu_dask.ipynb
entries:
- file: examples/01-Getting-started.ipynb
title: Getting Started with NVTabular
- file: examples/02-Advanced-NVTabular-workflow.ipynb
title: Advanced NVTabular Workflow
- file: examples/03-Running-on-multiple-GPUs-or-on-CPU.ipynb
title: Run on multi-GPU or CPU-only
- file: api
title: API Documentation
- file: resources/index
Expand Down
13 changes: 6 additions & 7 deletions docs/source/training/hugectr.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,18 @@ Accelerated Training with HugeCTR
=================================

A real-world production model serves hundreds of millions of users,
which contains embedding tables with up to 100GB to 1TB in size. Training deep
which contains embedding tables with up to 100GB to 1TB in size. Training deep
learning recommender system models with such large embedding tables can be challenging
as they do not fit into the memory of a single GPU.

To combat that challenge, we’ve developed HugeCTR, which is an open-source deep learning framework that is a highly optimized library
To combat that challenge, we developed HugeCTR, which is an open-source deep learning framework that is a highly optimized library
written in CUDA C++, specifically for recommender systems. It supports
an optimized dataloader and is able to scale embedding tables using
multiple GPUs and nodes. As a result, there’s no embedding table size
multiple GPUs and nodes. As a result, there is no embedding table size
limitation. HugeCTR also offers the following:

- Model oversubscription for training embedding tables with
single nodes that dont fit within the GPU or CPU memory (only
single nodes that don't fit within the GPU or CPU memory (only
required embeddings are prefetched from a parameter server per
batch).
- Asynchronous and multithreaded data pipelines.
Expand Down Expand Up @@ -126,6 +126,5 @@ When training is accelerated with HugeCTR, the following happens:
metrics = sess.evaluation()
print("[HUGECTR][INFO] iter: {}, {}".format(i, metrics))
Additional examples can be found `here`_.
.. _here: https://github.com/NVIDIA/NVTabular/tree/main/examples/hugectr
For more information, refer to the `HugeCTR documentation <https://nvidia-merlin.github.io/HugeCTR/main/hugectr_user_guide.html>`_
or the `HugeCTR repository <https://github.com/NVIDIA-Merlin/HugeCTR>`_ on GitHub.
22 changes: 10 additions & 12 deletions docs/source/training/pytorch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ PyTorch. The NVTabular dataloader is capable of:

- removing bottlenecks from dataloading by processing large chunks of
data at a time instead of item by item
- processing datasets that dont fit within the GPU or CPU memory by
- processing datasets that don't fit within the GPU or CPU memory by
streaming from the disk
- reading data directly into the GPU memory and removing CPU-GPU
communication
Expand Down Expand Up @@ -42,9 +42,9 @@ happens:
TRAIN_PATHS = glob.glob("./train/*.parquet")
train_dataset = TorchAsyncItr(
nvt.Dataset(TRAIN_PATHS),
cats=CATEGORICAL_COLUMNS,
conts=CONTINUOUS_COLUMNS,
nvt.Dataset(TRAIN_PATHS),
cats=CATEGORICAL_COLUMNS,
conts=CONTINUOUS_COLUMNS,
labels=LABEL_COLUMNS,
batch_size=BATCH_SIZE
)
Expand All @@ -54,10 +54,10 @@ happens:
.. code:: python
train_loader = DLDataLoader(
train_dataset,
batch_size=None,
collate_fn=collate_fn,
pin_memory=False,
train_dataset,
batch_size=None,
collate_fn=collate_fn,
pin_memory=False,
num_workers=0
)
Expand All @@ -79,8 +79,6 @@ happens:
5. The ``TorchAsyncItr`` dataloader can be initialized for the
validation dataset using the same structure.

You can find additional examples in our repository such as `MovieLens`_
and `Criteo`_.
You can find additional `examples`_ in our repository.

.. _MovieLens: ../examples/getting-started-movielens/
.. _Criteo: ../examples/scaling-criteo/
.. _examples: ../examples/
7 changes: 4 additions & 3 deletions docs/source/training/tensorflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ following happens:
dataloader.

.. code:: python
history = model.fit(train_dataset_tf, epochs=5)
**Note**: If using the NVTabular dataloader for the validation dataset,
Expand All @@ -112,5 +112,6 @@ a callback can be used for it.
validation_callback = KerasSequenceValidater(valid_dataset_tf)
history = model.fit(train_dataset_tf, callbacks=[validation_callback], epochs=5)
You can find additional examples in our repository such as
`MovieLens <../examples/getting-started-movielens/>`__.
You can find additional `examples`_ in our repository.

.. _examples: ../examples/
Loading

0 comments on commit 0f3a9b8

Please sign in to comment.