The purpose of this scenario is to demonstrate how to operationalize Jupyter notebooks using the Versatile Data Kit (VDK) Jupyter integration. By the end of this guide, you'll understand how to:
- Create a data job with VDK within a Jupyter notebook.
- Write a data workflow in a notebook and make it ready to be put in a production environment.
All the following objectives will be executed within a Jupyter notebook:
- Retrieve Data: - Extract data from the specified URL using pandas.
- Data Cleansing: - Eliminate records associated with 'testuser'.
- Score Classification: - Assign scores into predefined categories for clarity.
- Data Ingestion: - Use VDK job_input to ingest the organized data.
For detailed instructions on working with VDK, please refer to the guide from the provided link.
The tutorial-job directory contains the ready-to-use code from this demo. Make sure to explore it as it will provide hands-on experience with the objectives and VDK Jupyter integration discussed in this guide. Please open up MyBinder to get started on the exercises!
The link did not work? Try this one out:
Throughout this scenario, you've:
- Explored the capabilities of the VDK Jupyter integration.
- Retrieved, cleaned, and processed data using Jupyter and VDK tools.
- Classified scores into meaningful categories.
- Understood the process of ingesting data through VDK within a Jupyter environment.
Congratulations!