Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding in local full stack deployment changes. #385

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,10 @@ ENABLE_API_DOCS=True
DOCUMENT_CACHE_TTL_MS=0

# Vespa config
VESPA_URL=http://vespatest:19071
# Search and Feed Interfaces endpoint
VESPA_URL=http://vespatest:8080
# Config Server
VESPA_CONFIG_URL=http://vespatest:19071
VESPA_SECRETS_LOCATION=/secrets
# VESPA_CERT=
# VESPA_KEY=
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -193,3 +193,6 @@ backend/models
.idea/

requirements.txt

# Vespa secrets directory
secrets/*
2 changes: 1 addition & 1 deletion docker-compose.dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ services:
backend:
environment:
PYTHONDONTWRITEBYTECODE: 0
VESPA_URL: http://vespatest:19071
VESPA_URL: http://vespatest:8080
volumes:
- ./:/cpr-backend/:ro
depends_on:
Expand Down
140 changes: 140 additions & 0 deletions docs/local_full_stack_setup/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
# Navigator Full Stack Local Setup
THOR300 marked this conversation as resolved.
Show resolved Hide resolved

## Introduction

Prior to pushing to staging or production it can be desirable to test complicated
searches locally before deploying. This stops issues like subsequent PR's being
merged on top of erroneous commits and deployed etc. resulting in a rather
sticky mess to unpick.

The navigator stack consists of the following sub systems:

- frontend
- backend api
- postgres database
- vespa search database

This document outlines how to setup ones stack locally to test such changes.

## Setup Overview

When we spin up the stack locally via the `make start` the postgres database is
populated with only the data required for the admin service to function.
E.g. we enter no families and thus no family related data like family_geographies
but we do enter data via migrations in the `navigator-db-client` for tables like
the geography table. It is therefore necessary to load the postgres database
locally from a dump of production or staging (this takes the form of a .sql
file and requires ssh-ing into the bastion to interact with the database).

Vespa is spun up with some test data the can be found in this repo at the following
path: `tests/search/vespa/fixtures`. The issue is the corpus's and family documents
etc. won't match what's in the postgres instance from production or staging.
Therefore, it's necessary to take a similar sample or dump of data from vespa
and load this into ones local instance that matches what's in the local
postgres instance. We'll go through how to do that below:

## 1. Set Config to Staging Vespa

The following commands configure your Vespa instance to connect to the staging environment.

### Spin up your local vespa, postgres and backend

> ⚠️ **Warning:** You will need to set `SECRET_KEY` and `TOKEN_SECRET_KEY`,
> I believe these can be taken from the staging stack in navigator-infra.

```shell
cp .env.example .env
make start
```

### Authenticate with Vespa

```shell
vespa auth login
```

### Set the target to the Vespa Cloud

```shell
vespa config set target cloud
```

### Set the specific Vespa Cloud target URL

Note: This can be found in vespa cloud

```shell
vespa config set target $VESPA_ENDPOINT
```

### List the Vespa configuration directory

You should have a directory with the application name you're looking to connect to,
this should have a public key and private cert.

```shell
ls ~/.vespa/
```

### Set the application to the staging environment

```shell
vespa config set application climate-policy-radar.${APP}.${INSTANCE}
```

### Verify the current Vespa configuration

```shell
vespa config get
```

### Run a test query to verify connection

```shell
vespa query 'select corpus_import_id from family_document where true limit 10'
```

### Sample some documents from the family_document table

```shell
echo $(vespa visit --slices 1000 --slice-id 0 --selection "family_document") > family_document_sample.jsonl
```

### Read the related documents based on document_import_id

```shell
echo $(python -m extract_document_import_ids $family_document_sample.jsonl | xargs -I {} vespa visit --selection 'document_passage AND id="id:doc_search:document_passage::{}.1"') > document_passage_sample.jsonl
```

### Set your vespa cli to point to local

Vespa config set target local
vespa config set application default.application.default

### Feed the documents into the local instance

> ⚠️ **Warning:** I had some issues with the `family_document_sample.jsonl`,
> I copied the first line into `family_document_sample_1.json` and ensured
> the `family_publication_ts` was correct as initially this parsed incorrectly.

```shell
vespa feed family_document_sample.jsonl
vespa feed document_passage_sample.jsonl
```

### Run the frontend application

Load up the frontend repository and run:

> ⚠️ **Warning:** You will need to set the following token: `NEXT_PUBLIC_APP_TOKEN`.
> Also check the backed api url is correct.
> You may also need to clear your browser cache as it may have cached prod or staging.

```shell
cp .env.example .env
make start
```

### Test the application

Navigate to the frontend endpoint in browser and test your feature.
43 changes: 43 additions & 0 deletions docs/local_full_stack_setup/extract_document_import_ids.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import re
import sys
from typing import List

# Regular expression pattern to extract the document import ID
DOCUMENT_IMPORT_ID_PATTERN = r'"document_import_id":"([A-Za-z0-9._-]+)"'


def extract_document_ids_from_file(
file_path: str, pattern: str = DOCUMENT_IMPORT_ID_PATTERN
) -> List[str]:
"""Extracts document import IDs from a JSONL file.

Args:
file_path (str): The path to the JSONL file.
pattern (str): The regular expression pattern to match document IDs.

Returns:
List[str]: A list of extracted document import IDs.
"""
try:
with open(file_path, "r") as file:
return [match for line in file for match in re.findall(pattern, line)]
except FileNotFoundError:
print(f"Error: The file '{file_path}' was not found.")
return []
except IOError:
print(f"Error: An error occurred while reading the file '{file_path}'.")
return []


if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python script_name.py <file_path>")
sys.exit(1)

input_file_path = sys.argv[1]
document_ids = extract_document_ids_from_file(input_file_path)
if document_ids:
for document_id in document_ids:
print(document_id)
else:
print("No document import IDs found in the file.")
9 changes: 8 additions & 1 deletion makefile-docker.defs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,14 @@
# ----------------------------------
# starting, stopping, migrating DB
# ----------------------------------
start:

create_mock_vespa_certs:
mkdir -p secrets
[ -f secrets/cert.pem ] || touch secrets/cert.pem
[ -f secrets/key.pem ] || touch secrets/key.pem


start: create_mock_vespa_certs
# Build & run containers and setup vespa
docker compose -f docker-compose.yml -f docker-compose.dev.yml up -d --remove-orphans
$(MAKE) vespa_setup
Expand Down
Loading