Vespa for Data Scientists

IMPORTANT: This repository is deprecated and will be archived - please use vespa.ai to find other resources for ranking.

Vespa for Data Scientists

See documentation at vespa-engine.github.io/learntorank/

Motivation

This library contains application specific code related to data manipulation and analysis of different Vespa use cases. The Vespa python API is used to interact with Vespa applications from python for faster exploration.

The main goal of this space is to facilitate prototyping and experimentation for data scientists. Please visit Vespa sample apps for production-ready use cases and Vespa docs for in-depth Vespa documentation.

Install

Code to support and reproduce the use cases documented here can be found in the learntorank library.

Install via PyPI:

pip install learntorank

Development

All the code and content of this repo is created using nbdev by editing notebooks. We will give a summary below about the main points required to contribute, but we suggest going through nbdev tutorials to learn more.

Setting up environment

Create and activate a virtual environment of your choice. We recommend pipenv.
```
pipenv shell
```
Install Jupyter Lab (or Jupyter Notebook if you prefer).
```
pip3 install jupyterlab
```
Create a new kernel for Jupyter that uses the virtual environment created at step 1.
- Check where the current list of kernels is located with jupyter kernelspec list.
- Copy one of the existing folder and rename it to learntorank.
- Modify the kernel.json file that is inside the new folder to reflect the python3executable associated with your virtual env.
Install nbdev library:
```
pip3 install nbdev
```
Install learntorank in development mode:
```
pip3 install -e .[dev]
```

Most used nbdev commands

From your terminal:

nbdev_help: List all nbdev commands available.
nbdev_readme: Update README.md based on index.ipynb
Preview documentation while editing the notebooks:
- nbdev_preview --port 3000
Workflow before pushing code:
- nbdev_test --n_workers 2: Execute all the tests inside notebooks.
  - Tests can run in parallel but since we create Docker containers we suggest a low number of workers to preserve memory.
- nbdev_export: Export code from notebooks to the python library.
- nbdev_clean: Clean notebooks to avoid merge conflicts.
Publish library
- nbdev_bump_version: Bump library version.
- nbdev_pypi: Publish library to PyPI.

Name		Name	Last commit message	Last commit date
Latest commit History 163 Commits
.github/workflows		.github/workflows
learntorank		learntorank
notebooks		notebooks
resources		resources
.gitignore		.gitignore
001_module_stats.ipynb		001_module_stats.ipynb
002_module_evaluation.ipynb		002_module_evaluation.ipynb
003_module_query.ipynb		003_module_query.ipynb
004_module_ranking.ipynb		004_module_ranking.ipynb
005_module_ml.ipynb		005_module_ml.ipynb
010_passage_dataset.ipynb		010_passage_dataset.ipynb
011_passage_uncertainty_evaluation.ipynb		011_passage_uncertainty_evaluation.ipynb
01_module_passage.ipynb		01_module_passage.ipynb
100_stateless_sequence_classification_task.ipynb		100_stateless_sequence_classification_task.ipynb
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
_quarto.yml		_quarto.yml
index.ipynb		index.ipynb
nbdev.yml		nbdev.yml
settings.ini		settings.ini
setup.py		setup.py
sidebar.yml		sidebar.yml
styles.css		styles.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vespa for Data Scientists

Motivation

Install

Development

Setting up environment

Most used nbdev commands

About

Releases

Packages

Contributors 7

Languages

License

vespa-engine/learntorank-DEPRECATED

Folders and files

Latest commit

History

Repository files navigation

Vespa for Data Scientists

Motivation

Install

Development

Setting up environment

Most used nbdev commands

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages