Migrate the legacy examples to the Merlin repo (#1711)

* Migrate the legacy examples to the Merlin repo We may (or may not) want to keep these examples but they've overstayed their welcome in the NVTabular repo, which is burdened with the accumulation of a lot of historical cruft. Since some of these examples use inference code that's moving to Systems, it makes more sense for them to live in the Merlin repo (if we want to keep them.) * update READMEs * docs: Contribute to examples clean up - Fix difficult to detect broken links. - Revise TOC. * Handle data loader as an iterator (#1720) * Update test_gpu_dl_break to handle data loader as an iterator * Use peek method to look at first batch in notebooks * Revert whitespace change to image cell * Revert change to PyTorch training example notebook * Call peek on data iter to get batch * Describe how to check for broken links (#1719) This is one way to check for broken links, but I'm happy to adopt something that is better. Co-authored-by: Karl Higley <[email protected]> Co-authored-by: Benedikt Schifferer <[email protected]> Co-authored-by: Mike McKiernan <[email protected]> Co-authored-by: Oliver Holworthy <[email protected]>
NVIDIA-Merlin · Dec 6, 2022 · 0f3a9b8 · 0f3a9b8
1 parent 51af616
commit 0f3a9b8
Show file tree

Hide file tree

Showing 56 changed files with 116 additions and 14,271 deletions.
diff --git a/README.md b/README.md
@@ -78,13 +78,16 @@ To use these Docker containers, you'll first need to install the [NVIDIA Contain
 
 ### Notebook Examples and Tutorials
 
-We provide a [collection of examples, use cases, and tutorials](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples) as Jupyter notebooks covering:
-
-* Feature engineering and preprocessing with NVTabular
+We provide a [collection of examples](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples) to demonstrate feature engineering with NVTabular as Jupyter notebooks:
+* Introduction to NVTabular's High-Level API
 * Advanced workflows with NVTabular
-* Scaling to multi-GPU and multi-node systems
-* Integrating NVTabular with HugeCTR
-* Deploying to inference with Triton
+* NVTabular on CPU
+* Scaling NVTabular to multi-GPU systems
+
+In addition, NVTabular is used in many of our examples in other Merlin libraries:
+- [End-To-End Examples with Merlin](https://github.com/NVIDIA-Merlin/Merlin/tree/main/examples)
+- [Training Examples with Merlin Models](https://github.com/NVIDIA-Merlin/models/tree/main/examples)
+- [Training Examples with Transformer4Rec](https://github.com/NVIDIA-Merlin/Transformers4Rec/tree/main/examples)
 
 ### Feedback and Support
 

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -87,7 +87,7 @@
 #
 html_theme = "sphinx_rtd_theme"
 html_theme_options = {
-    "navigation_depth": 3,
+    "navigation_depth": 2,
     "analytics_id": "G-NVJ1Y1YJHK",
 }
 html_copy_source = False

diff --git a/docs/source/core_features.md b/docs/source/core_features.md
@@ -37,7 +37,7 @@ workflow = nvt.Workflow(..., client=client)
 
 Currently, there are many ways to deploy a "cluster" for Dask. This [article](https://blog.dask.org/2020/07/23/current-state-of-distributed-dask-clusters) gives a summary of all the practical options. For a single machine with multiple GPUs, the `dask_cuda.LocalCUDACluster` API is typically the most convenient option.
 
-Since NVTabular already uses [Dask-CuDF](https://docs.rapids.ai/api/cudf/stable/dask-cudf.html) for internal data processing, there are no other requirements for multi-GPU scaling. With that said, the parallel performance can depend strongly on (1) the size of `Dataset` partitions, (2) the shuffling procedure used for data output, and (3) the specific arguments used for both global-statistics and transformation operations. For additional information, see [Multi-GPU](https://github.com/NVIDIA/NVTabular/blob/main/examples/multi-gpu-toy-example/multi-gpu_dask.ipynb) for a simple step-by-step example.
+Since NVTabular already uses [Dask-CuDF](https://docs.rapids.ai/api/cudf/stable/) for internal data processing, there are no other requirements for multi-GPU scaling. With that said, the parallel performance can depend strongly on (1) the size of `Dataset` partitions, (2) the shuffling procedure used for data output, and (3) the specific arguments used for both global-statistics and transformation operations. For additional information, see [Multi-GPU](https://github.com/NVIDIA/NVTabular/blob/main/examples/multi-gpu-toy-example/multi-gpu_dask.ipynb) for a simple step-by-step example.
 
 ## Multi-Node Support ##
 

diff --git a/docs/source/resources/cloud_integration.md b/docs/source/resources/cloud_integration.md
@@ -59,8 +59,9 @@ To run NVTabular on the cloud using GCP, do the following:
     * **Boot Disk**: Ubuntu version 18.04
     * **Storage**: Local 8xSSD-NVMe
 
-2. [Install the appropriate NVIDIA drivers and CUDA](https://cloud.google.com/compute/docs/gpus/install-drivers-gpu#ubuntu-driver-steps) by running the following commands:
-   ```
+2. Install the NVIDIA drivers and CUDA by running the following commands:
+
+   ```shell
    curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
    sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
    sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
@@ -70,8 +71,12 @@ To run NVTabular on the cloud using GCP, do the following:
    nvidia-smi # Check installation
    ```
 
+   > For more information, refer to [Install GPU drivers](https://cloud.google.com/compute/docs/gpus/install-drivers-gpu)
+   > in the Google Cloud documentation.
+
 3. [Install Docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) by running the following commands:
-   ```
+
+   ```shell
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
       && curl -s -L https://nvidia-merlin.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
       && curl -s -L https://nvidia-merlin.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
@@ -82,7 +87,8 @@ To run NVTabular on the cloud using GCP, do the following:
    ```
 
 4. Configure the storage as RAID 0 by running the following commands:
-   ```
+
+   ```shell
    sudo mdadm --create --verbose /dev/md0 --level=0 --name=MY_RAID --raid-devices=2 /dev/nvme0n1 /dev/nvme0n2
    sudo mkfs.ext4 -L MY_RAID /dev/md0
    sudo mkdir -p /mnt/raid
@@ -94,7 +100,8 @@ To run NVTabular on the cloud using GCP, do the following:
    ```
 
 5. Run the container by running the following command:
-   ```
+
+   ```shell
    docker run --gpus all --rm -it -p 8888:8888 -p 8797:8787 -p 8796:8786 --ipc=host --cap-add SYS_PTRACE -v /mnt/raid:/raid nvcr.io/nvidia/nvtabular:0.3 /bin/bash
    ```
 
@@ -179,12 +186,12 @@ conda activate nvtabular
 8. Install additional packages, such as TensorFlow or PyTorch
 
 ```
-pip install tensorflow-gpu 
+pip install tensorflow-gpu
 pip install torch
 pip install graphviz
 ```
 
-9. Install Transformer4Rec, torchmetrics and ipykernel 
+9. Install Transformer4Rec, torchmetrics and ipykernel
 
 ```
 conda install -y -c nvidia -c rapidsai -c numba -c conda-forge transformers4rec
@@ -197,6 +204,6 @@ conda install -y torchmetrics ipykernel
 python -m ipykernel install --user --name=nvtabular
 ```
 
-11. You can switch in jupyter lab and run the [movielens example](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples/getting-started-movielens). 
+11. You can switch in jupyter lab and run the [movielens example](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples/getting-started-movielens).
 
 This workflow enables NVTabular ETL and training with TensorFlow or Pytorch. Deployment with Triton Inference Server will follow soon.
diff --git a/docs/source/resources/links.md b/docs/source/resources/links.md
@@ -16,7 +16,7 @@ Talks
 Blog posts
 ----------
 
-We frequently post updates on [our blog](https://medium.com/nvidia-merlin) and on the [NVIDIA Developer News](https://news.developer.nvidia.com/tag/recommendation-systems/).
+We frequently post updates on [our blog](https://medium.com/nvidia-merlin) and on the [NVIDIA Developer Technical Blog](https://developer.nvidia.com/blog?r=1&tags=&categories=recommendation-systems).
 
 Some highlights:
 

diff --git a/docs/source/toc.yaml b/docs/source/toc.yaml
@@ -8,43 +8,13 @@ subtrees:
       - file: training/index.rst
       - file: examples/index.md
         title: Example Notebooks
-        subtrees:
-          - entries:
-              - file: examples/getting-started-movielens/index.md
-                title: Getting Started with MovieLens
-                entries:
-                  - file: examples/getting-started-movielens/01-Download-Convert.ipynb
-                    title: Download and Convert
-                  - file: examples/getting-started-movielens/02-ETL-with-NVTabular.ipynb
-                    title: ETL with NVTabular
-                  - file: examples/getting-started-movielens/03-Training-with-HugeCTR.ipynb
-                    title: Train with HugeCTR
-                  - file: examples/getting-started-movielens/03-Training-with-TF.ipynb
-                    title: Train with TensorFlow
-                  - file: examples/getting-started-movielens/03-Training-with-PyTorch.ipynb
-                    title: Train with PyTorch
-                  - file: examples/getting-started-movielens/04-Triton-Inference-with-HugeCTR.ipynb
-                    title: Serve a HugeCTR Model
-                  - file: examples/getting-started-movielens/04-Triton-Inference-with-TF.ipynb
-                    title: Serve a TensorFlow Model
-              - file: examples/scaling-criteo/index.md
-                entries:
-                  - file: examples/scaling-criteo/01-Download-Convert.ipynb
-                    title: Download and Convert
-                  - file: examples/scaling-criteo/02-ETL-with-NVTabular.ipynb
-                    title: ETL with NVTabular
-                  - file: examples/scaling-criteo/03-Training-with-HugeCTR.ipynb
-                    title: Train with HugeCTR
-                  - file: examples/scaling-criteo/03-Training-with-TF.ipynb
-                    title: Train with TensorFlow
-                  - file: examples/scaling-criteo/04-Triton-Inference-with-HugeCTR.ipynb
-                    title: Serve a HugeCTR Model
-                  - file: examples/scaling-criteo/04-Triton-Inference-with-TF.ipynb
-                    title: Serve a TensorFlow Model
-              - file: examples/multi-gpu-movielens/index.md
-                entries:
-                  - file: examples/multi-gpu-movielens/01-03-MultiGPU-Download-Convert-ETL-with-NVTabular-Training-with-TensorFlow.ipynb
-                  - file: examples/multi-gpu-toy-example/multi-gpu_dask.ipynb
+        entries:
+        - file: examples/01-Getting-started.ipynb
+          title: Getting Started with NVTabular
+        - file: examples/02-Advanced-NVTabular-workflow.ipynb
+          title: Advanced NVTabular Workflow
+        - file: examples/03-Running-on-multiple-GPUs-or-on-CPU.ipynb
+          title: Run on multi-GPU or CPU-only
       - file: api
         title: API Documentation
       - file: resources/index

diff --git a/docs/source/training/hugectr.rst b/docs/source/training/hugectr.rst
@@ -2,18 +2,18 @@ Accelerated Training with HugeCTR
 =================================
 
 A real-world production model serves hundreds of millions of users,
-which contains embedding tables with up to 100GB to 1TB in size. Training deep 
+which contains embedding tables with up to 100GB to 1TB in size. Training deep
 learning recommender system models with such large embedding tables can be challenging
 as they do not fit into the memory of a single GPU.
 
-To combat that challenge, we’ve developed HugeCTR, which is an open-source deep learning framework that is a highly optimized library
+To combat that challenge, we developed HugeCTR, which is an open-source deep learning framework that is a highly optimized library
 written in CUDA C++, specifically for recommender systems. It supports
 an optimized dataloader and is able to scale embedding tables using
-multiple GPUs and nodes. As a result, there’s no embedding table size
+multiple GPUs and nodes. As a result, there is no embedding table size
 limitation. HugeCTR also offers the following:
 
 -  Model oversubscription for training embedding tables with
-   single nodes that don’t fit within the GPU or CPU memory (only
+   single nodes that don't fit within the GPU or CPU memory (only
    required embeddings are prefetched from a parameter server per
    batch).
 -  Asynchronous and multithreaded data pipelines.
@@ -126,6 +126,5 @@ When training is accelerated with HugeCTR, the following happens:
            metrics = sess.evaluation()
            print("[HUGECTR][INFO] iter: {}, {}".format(i, metrics))
 
-Additional examples can be found `here`_.
-
-.. _here: https://github.com/NVIDIA/NVTabular/tree/main/examples/hugectr
+For more information, refer to the `HugeCTR documentation <https://nvidia-merlin.github.io/HugeCTR/main/hugectr_user_guide.html>`_
+or the `HugeCTR repository <https://github.com/NVIDIA-Merlin/HugeCTR>`_ on GitHub.
diff --git a/docs/source/training/pytorch.rst b/docs/source/training/pytorch.rst
@@ -9,7 +9,7 @@ PyTorch. The NVTabular dataloader is capable of:
 
 -  removing bottlenecks from dataloading by processing large chunks of
    data at a time instead of item by item
--  processing datasets that don’t fit within the GPU or CPU memory by
+-  processing datasets that don't fit within the GPU or CPU memory by
    streaming from the disk
 -  reading data directly into the GPU memory and removing CPU-GPU
    communication
@@ -42,9 +42,9 @@ happens:
 
       TRAIN_PATHS = glob.glob("./train/*.parquet")
       train_dataset = TorchAsyncItr(
-         nvt.Dataset(TRAIN_PATHS), 
-         cats=CATEGORICAL_COLUMNS, 
-         conts=CONTINUOUS_COLUMNS, 
+         nvt.Dataset(TRAIN_PATHS),
+         cats=CATEGORICAL_COLUMNS,
+         conts=CONTINUOUS_COLUMNS,
          labels=LABEL_COLUMNS,
          batch_size=BATCH_SIZE
       )
@@ -54,10 +54,10 @@ happens:
    .. code:: python
 
       train_loader = DLDataLoader(
-         train_dataset, 
-         batch_size=None, 
-         collate_fn=collate_fn, 
-         pin_memory=False, 
+         train_dataset,
+         batch_size=None,
+         collate_fn=collate_fn,
+         pin_memory=False,
          num_workers=0
       )
 
@@ -79,8 +79,6 @@ happens:
 5. The ``TorchAsyncItr`` dataloader can be initialized for the
    validation dataset using the same structure.
 
-You can find additional examples in our repository such as `MovieLens`_
-and `Criteo`_.
+You can find additional `examples`_ in our repository.
 
-.. _MovieLens: ../examples/getting-started-movielens/
-.. _Criteo: ../examples/scaling-criteo/
+.. _examples: ../examples/
diff --git a/docs/source/training/tensorflow.rst b/docs/source/training/tensorflow.rst
@@ -100,7 +100,7 @@ following happens:
    dataloader.
 
   .. code:: python
-  
+
     history = model.fit(train_dataset_tf, epochs=5)
 
 **Note**: If using the NVTabular dataloader for the validation dataset,
@@ -112,5 +112,6 @@ a callback can be used for it.
     validation_callback = KerasSequenceValidater(valid_dataset_tf)
     history = model.fit(train_dataset_tf, callbacks=[validation_callback], epochs=5)
 
-You can find additional examples in our repository such as
-`MovieLens <../examples/getting-started-movielens/>`__.
+You can find additional `examples`_ in our repository.
+
+.. _examples: ../examples/