Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate the legacy examples to the Merlin repo #1711

Merged
merged 10 commits into from
Dec 6, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 9 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,13 +78,16 @@ To use these Docker containers, you'll first need to install the [NVIDIA Contain

### Notebook Examples and Tutorials

We provide a [collection of examples, use cases, and tutorials](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples) as Jupyter notebooks covering:

* Feature engineering and preprocessing with NVTabular
We provide a [collection of examples](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples) to demonstrate feature engineering with NVTabular as Jupyter notebooks:
* Introduction to NVTabular's High-Level API
* Advanced workflows with NVTabular
* Scaling to multi-GPU and multi-node systems
* Integrating NVTabular with HugeCTR
* Deploying to inference with Triton
* NVTabular on CPU
* Scaling NVTabular to multi-GPU systems

In addition, NVTabular is used in many of our examples in other Merlin libraries:
- [End-To-End Examples with Merlin](https://github.com/NVIDIA-Merlin/Merlin/tree/main/examples)
- [Training Examples with Merlin Models](https://github.com/NVIDIA-Merlin/models/tree/main/examples)
- [Training Examples with Transformer4Rec](https://github.com/NVIDIA-Merlin/Transformers4Rec/tree/main/examples)

### Feedback and Support

Expand Down
52 changes: 37 additions & 15 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,24 +9,12 @@ Follow the instructions below to build the docs.

## Steps to follow:

1. To build the docs, you need to install a developer environment:
1. To build the docs, you need to install a developer environment and run `tox`:

```shell
python3 -m vevn .venv
source .venv/bin/activate
python -m pip install -r requirements.txt
python -m pip install -r requirements-dev.txt
```

> If you add or change dependencies, review the `ci/build_and_test.sh` file
> and make a similar change to the `pip install` stanzas.

Alternatively, you might be able use a Conda environment. See the [installation instructions](https://github.com/NVIDIA/NVTabular).

1. Build the documentation:

```shell
make -C docs clean html
tox -e docs
```

This runs Sphinx in your shell and outputs to `docs/build/html/`.
Expand All @@ -43,6 +31,40 @@ Follow the instructions below to build the docs.

Check that your docs edits formatted correctly, and read well.

## Checking for broken links

1. Build the documentation, as described in the preceding section, but use the following command:

```shell
tox -e docs -- linkcheck
```

1. Run the link-checking script:

```shell
./docs/check_for_broken_links.sh
```

If there are no broken links, then the script exits with `0`.

If the script produces any output, cut and paste the `uri` value into your browser to confirm
that the link is broken.

```json
{
"filename": "hugectr_core_features.md",
"lineno": 88,
"status": "broken",
"code": 0,
"uri": "https://github.com/NVIDIA-Merlin/Merlin/blob/main/docker/build-hadoop.sh",
"info": "404 Client Error: Not Found for url: https://github.com/NVIDIA-Merlin/Merlin/blob/main/docker/build-hadoop.sh"
}
```

If the link is OK, and this is the case with many URLs that reference GitHub repository file headings,
then cut and paste the JSON output and add it to `docs/false_positives.json`.
Run the script again to confirm that the URL is no longer reported as a broken link.

## Decisions

### Source management: README and index files
Expand All @@ -65,7 +87,7 @@ Follow the instructions below to build the docs.
* Add the file to the `docs/source/toc.yaml` file. Keep in mind that notebooks are
copied into the `docs/source/` directory, so the paths are relative to that location.
Follow the pattern that is already established and you'll be fine.

### Adding links

TIP: When adding a link to a method or any heading that has underscores in it, repeat
Expand Down
50 changes: 50 additions & 0 deletions docs/check_for_broken_links.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/usr/bin/env bash

DOCS_DIR=$(dirname "${BASH_SOURCE[0]}")
FALSE_POSITIVES_JSON="${DOCS_DIR}/false_positives.json"
LINKCHECK_JSON="${DOCS_DIR}/build/linkcheck/output.json"

function check_environment {
local err=0
if ! [ -x "$(command -v jq)" ]; then
>&2 echo "jq is required but is not found."
((err++))
fi
if [ ! -f "${FALSE_POSITIVES_JSON}" ]; then
>&2 echo "A JSON file with false positives is required: ${FALSE_POSITIVES_JSON}"
((err++))
fi
if [ ! -f "${LINKCHECK_JSON}" ]; then
>&2 echo "Did not find linkcheck output JSON file: ${LINKCHECK_JSON}."
>&2 echo "Run Sphinx with the linkcheck arg: make -C docs clean linkcheck"
((err++))
fi
if [ "${err}" -gt 0 ]; then
exit 2
fi
}

function check_links {
local err=0
# If you know how to prevent the hack with using jq twice, lmk.
broken=$(jq 'select(.status == "broken")' "${LINKCHECK_JSON}" | jq -s)
count=$(echo "${broken}" | jq 'length')
for i in $(seq 0 $(($count - 1)))
do
entry=$(echo "${broken}" | jq ".[${i}]")
link=$(echo "${entry}" | jq -r '.uri')
[ -n "${DEBUG}" ] && {
echo >&2 "Checking for false positive: ${link}"
}
local resp; resp=$(jq --arg check "${link}" -s 'any(.uri == $check)' < "${FALSE_POSITIVES_JSON}")
# "false" indicates that the URL did not match any of the URIs in the false positive file.
if [ "false" = "${resp}" ]; then
((err++))
echo "${entry}"
fi
done
exit "${err}"
}

check_environment
check_links
32 changes: 32 additions & 0 deletions docs/false_positives.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{
"filename": "index.rst",
"lineno": 7,
"status": "broken",
"code": 0,
"uri": "Introduction.html",
"info": ""
}
{
"filename": "examples/index.md",
"lineno": 20,
"status": "broken",
"code": 0,
"uri": "https://github.com/NVIDIA/NVTabular#installation",
"info": "Anchor 'installation' not found"
}
{
"filename": "resources/troubleshooting.md",
"lineno": 24,
"status": "broken",
"code": 0,
"uri": "https://github.com/rapidsai/cudf/pull/6796#issue-522934284",
"info": "Anchor 'issue-522934284' not found"
}
{
"filename": "resources/links.md",
"lineno": 24,
"status": "broken",
"code": 0,
"uri": "https://news.developer.nvidia.com/democratizing-deep-learning-recommenders-resources/?ncid=so-link-59588#cid=dl19_so-link_en-us",
"info": "Anchor 'cid=dl19_so-link_en-us' not found"
}
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@
#
html_theme = "sphinx_rtd_theme"
html_theme_options = {
"navigation_depth": 3,
"navigation_depth": 2,
"analytics_id": "G-NVJ1Y1YJHK",
}
html_copy_source = False
Expand Down
2 changes: 1 addition & 1 deletion docs/source/core_features.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ workflow = nvt.Workflow(..., client=client)

Currently, there are many ways to deploy a "cluster" for Dask. This [article](https://blog.dask.org/2020/07/23/current-state-of-distributed-dask-clusters) gives a summary of all the practical options. For a single machine with multiple GPUs, the `dask_cuda.LocalCUDACluster` API is typically the most convenient option.

Since NVTabular already uses [Dask-CuDF](https://docs.rapids.ai/api/cudf/stable/dask-cudf.html) for internal data processing, there are no other requirements for multi-GPU scaling. With that said, the parallel performance can depend strongly on (1) the size of `Dataset` partitions, (2) the shuffling procedure used for data output, and (3) the specific arguments used for both global-statistics and transformation operations. For additional information, see [Multi-GPU](https://github.com/NVIDIA/NVTabular/blob/main/examples/multi-gpu-toy-example/multi-gpu_dask.ipynb) for a simple step-by-step example.
Since NVTabular already uses [Dask-CuDF](https://docs.rapids.ai/api/cudf/stable/) for internal data processing, there are no other requirements for multi-GPU scaling. With that said, the parallel performance can depend strongly on (1) the size of `Dataset` partitions, (2) the shuffling procedure used for data output, and (3) the specific arguments used for both global-statistics and transformation operations. For additional information, see [Multi-GPU](https://github.com/NVIDIA/NVTabular/blob/main/examples/multi-gpu-toy-example/multi-gpu_dask.ipynb) for a simple step-by-step example.

## Multi-Node Support ##

Expand Down
23 changes: 15 additions & 8 deletions docs/source/resources/cloud_integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,9 @@ To run NVTabular on the cloud using GCP, do the following:
* **Boot Disk**: Ubuntu version 18.04
* **Storage**: Local 8xSSD-NVMe

2. [Install the appropriate NVIDIA drivers and CUDA](https://cloud.google.com/compute/docs/gpus/install-drivers-gpu#ubuntu-driver-steps) by running the following commands:
```
2. Install the NVIDIA drivers and CUDA by running the following commands:

```shell
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
Expand All @@ -70,8 +71,12 @@ To run NVTabular on the cloud using GCP, do the following:
nvidia-smi # Check installation
```

> For more information, refer to [Install GPU drivers](https://cloud.google.com/compute/docs/gpus/install-drivers-gpu)
> in the Google Cloud documentation.
3. [Install Docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) by running the following commands:
```

```shell
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia-merlin.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia-merlin.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
Expand All @@ -82,7 +87,8 @@ To run NVTabular on the cloud using GCP, do the following:
```

4. Configure the storage as RAID 0 by running the following commands:
```

```shell
sudo mdadm --create --verbose /dev/md0 --level=0 --name=MY_RAID --raid-devices=2 /dev/nvme0n1 /dev/nvme0n2
sudo mkfs.ext4 -L MY_RAID /dev/md0
sudo mkdir -p /mnt/raid
Expand All @@ -94,7 +100,8 @@ To run NVTabular on the cloud using GCP, do the following:
```

5. Run the container by running the following command:
```

```shell
docker run --gpus all --rm -it -p 8888:8888 -p 8797:8787 -p 8796:8786 --ipc=host --cap-add SYS_PTRACE -v /mnt/raid:/raid nvcr.io/nvidia/nvtabular:0.3 /bin/bash
```

Expand Down Expand Up @@ -179,12 +186,12 @@ conda activate nvtabular
8. Install additional packages, such as TensorFlow or PyTorch

```
pip install tensorflow-gpu
pip install tensorflow-gpu
pip install torch
pip install graphviz
```

9. Install Transformer4Rec, torchmetrics and ipykernel
9. Install Transformer4Rec, torchmetrics and ipykernel

```
conda install -y -c nvidia -c rapidsai -c numba -c conda-forge transformers4rec
Expand All @@ -197,6 +204,6 @@ conda install -y torchmetrics ipykernel
python -m ipykernel install --user --name=nvtabular
```

11. You can switch in jupyter lab and run the [movielens example](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples/getting-started-movielens).
11. You can switch in jupyter lab and run the [movielens example](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples/getting-started-movielens).

This workflow enables NVTabular ETL and training with TensorFlow or Pytorch. Deployment with Triton Inference Server will follow soon.
2 changes: 1 addition & 1 deletion docs/source/resources/links.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Talks
Blog posts
----------

We frequently post updates on [our blog](https://medium.com/nvidia-merlin) and on the [NVIDIA Developer News](https://news.developer.nvidia.com/tag/recommendation-systems/).
We frequently post updates on [our blog](https://medium.com/nvidia-merlin) and on the [NVIDIA Developer Technical Blog](https://developer.nvidia.com/blog?r=1&tags=&categories=recommendation-systems).

Some highlights:

Expand Down
44 changes: 7 additions & 37 deletions docs/source/toc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,43 +8,13 @@ subtrees:
- file: training/index.rst
- file: examples/index.md
title: Example Notebooks
subtrees:
- entries:
- file: examples/getting-started-movielens/index.md
title: Getting Started with MovieLens
entries:
- file: examples/getting-started-movielens/01-Download-Convert.ipynb
title: Download and Convert
- file: examples/getting-started-movielens/02-ETL-with-NVTabular.ipynb
title: ETL with NVTabular
- file: examples/getting-started-movielens/03-Training-with-HugeCTR.ipynb
title: Train with HugeCTR
- file: examples/getting-started-movielens/03-Training-with-TF.ipynb
title: Train with TensorFlow
- file: examples/getting-started-movielens/03-Training-with-PyTorch.ipynb
title: Train with PyTorch
- file: examples/getting-started-movielens/04-Triton-Inference-with-HugeCTR.ipynb
title: Serve a HugeCTR Model
- file: examples/getting-started-movielens/04-Triton-Inference-with-TF.ipynb
title: Serve a TensorFlow Model
- file: examples/scaling-criteo/index.md
entries:
- file: examples/scaling-criteo/01-Download-Convert.ipynb
title: Download and Convert
- file: examples/scaling-criteo/02-ETL-with-NVTabular.ipynb
title: ETL with NVTabular
- file: examples/scaling-criteo/03-Training-with-HugeCTR.ipynb
title: Train with HugeCTR
- file: examples/scaling-criteo/03-Training-with-TF.ipynb
title: Train with TensorFlow
- file: examples/scaling-criteo/04-Triton-Inference-with-HugeCTR.ipynb
title: Serve a HugeCTR Model
- file: examples/scaling-criteo/04-Triton-Inference-with-TF.ipynb
title: Serve a TensorFlow Model
- file: examples/multi-gpu-movielens/index.md
entries:
- file: examples/multi-gpu-movielens/01-03-MultiGPU-Download-Convert-ETL-with-NVTabular-Training-with-TensorFlow.ipynb
- file: examples/multi-gpu-toy-example/multi-gpu_dask.ipynb
entries:
- file: examples/01-Getting-started.ipynb
title: Getting Started with NVTabular
- file: examples/02-Advanced-NVTabular-workflow.ipynb
title: Advanced NVTabular Workflow
- file: examples/03-Running-on-multiple-GPUs-or-on-CPU.ipynb
title: Run on multi-GPU or CPU-only
- file: api
title: API Documentation
- file: resources/index
Expand Down
13 changes: 6 additions & 7 deletions docs/source/training/hugectr.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,18 @@ Accelerated Training with HugeCTR
=================================

A real-world production model serves hundreds of millions of users,
which contains embedding tables with up to 100GB to 1TB in size. Training deep
which contains embedding tables with up to 100GB to 1TB in size. Training deep
learning recommender system models with such large embedding tables can be challenging
as they do not fit into the memory of a single GPU.

To combat that challenge, we’ve developed HugeCTR, which is an open-source deep learning framework that is a highly optimized library
To combat that challenge, we developed HugeCTR, which is an open-source deep learning framework that is a highly optimized library
written in CUDA C++, specifically for recommender systems. It supports
an optimized dataloader and is able to scale embedding tables using
multiple GPUs and nodes. As a result, there’s no embedding table size
multiple GPUs and nodes. As a result, there is no embedding table size
limitation. HugeCTR also offers the following:

- Model oversubscription for training embedding tables with
single nodes that dont fit within the GPU or CPU memory (only
single nodes that don't fit within the GPU or CPU memory (only
required embeddings are prefetched from a parameter server per
batch).
- Asynchronous and multithreaded data pipelines.
Expand Down Expand Up @@ -126,6 +126,5 @@ When training is accelerated with HugeCTR, the following happens:
metrics = sess.evaluation()
print("[HUGECTR][INFO] iter: {}, {}".format(i, metrics))
Additional examples can be found `here`_.
.. _here: https://github.com/NVIDIA/NVTabular/tree/main/examples/hugectr
For more information, refer to the `HugeCTR documentation <https://nvidia-merlin.github.io/HugeCTR/main/hugectr_user_guide.html>`_
or the `HugeCTR repository <https://github.com/NVIDIA-Merlin/HugeCTR>`_ on GitHub.
Loading