Skip to content

Commit

Permalink
Maintenance updates (#207)
Browse files Browse the repository at this point in the history
Why these changes are being introduced:
* Routine maintenance updates

How this addresses that need:
* Upgrade to Python 3.12

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1079
  • Loading branch information
jonavellecuerdo authored Oct 30, 2024
1 parent 395e612 commit e00c46e
Show file tree
Hide file tree
Showing 8 changed files with 908 additions and 832 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
default_language_version:
python: python3.11 # set for project python version
python: python3.12 # set for project python version
repos:
- repo: local
hooks:
Expand Down
2 changes: 1 addition & 1 deletion .python-version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.11.2
3.12
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM python:3.11-slim as build
FROM python:3.12-slim as build
WORKDIR /app
COPY . .

Expand Down
2 changes: 1 addition & 1 deletion Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ ruff = "*"
safety = "*"

[requires]
python_version = "3.11"
python_version = "3.12"

[scripts]
transform = "python -c \"from transmogrifier.cli import main; main()\""
1,666 changes: 870 additions & 796 deletions Pipfile.lock

Large diffs are not rendered by default.

60 changes: 31 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@

An application to transform source records to the TIMDEX data model to facilitate ingest into an OpenSearch index.

## Description

TIMDEX ingests records from various sources with different metadata formats, necessitating an application to transform those source records to a common metadata format, the TIMDEX data model in this case. This application processes both XML and JSON source records and outputs a JSON file of records formatted according to the TIMDEX data model.

```mermaid
Expand All @@ -18,10 +16,10 @@ flowchart TD
transmogrifier((transmogrifier))
JSON
timdex-index-manager
ArchivesSpace[("ArchivesSpace\n(EAD XML)")] --> transmogrifier
DSpace[("DSpace\n(METS XML)")] --> transmogrifier
GeoData[("GeoData\n(Aardvark JSON)")] --> transmogrifier
MARC[("Alma\n(MARCXML)")] --> transmogrifier
ArchivesSpace[("ArchivesSpace<br>(EAD XML)")] --> transmogrifier
DSpace[("DSpace<br>(METS XML)")] --> transmogrifier
GeoData[("GeoData<br>(Aardvark JSON)")] --> transmogrifier
MARC[("Alma<br>(MARCXML)")] --> transmogrifier
transmogrifier --> JSON["TIMDEX JSON"]
JSON[TIMDEX JSON file] --> timdex-index-manager((timdex-index-manager))
```
Expand All @@ -34,34 +32,38 @@ After the JSON file of transformed records is produced, it is processed by `timd

## Development

To install with dev dependencies:

```
make install
```

To run unit tests:

```
make test
```

To lint the repo:
- To preview a list of available Makefile commands: `make help`
- To install with dev dependencies: `make install`
- To update dependencies: `make update`
- To run unit tests: `make test`
- To lint the repo: `make lint`
- To run the app: `pipenv run transform <command>`

```
make lint
```
## Environment Variables

To run the app:
### Required

```
pipenv run transform <command>
```shell
SENTRY_DSN=### If set to a valid Sentry DSN, enables Sentry exception monitoring. This is not needed for local development.
STATUS_UPDATE_INTERVAL=### The transform process logs the # of records transformed every nth record (1000 by default). Set this env variable to any integer to change the frequency of logging status updates. Can be useful for developm ent/debugging.
WORKSPACE=### Set to `dev` for local development, this will be set to `stage` and `prod` in those environments by Terraform.
```

## Required ENV
## CLI commands

`SENTRY_DSN` = If set to a valid Sentry DSN, enables Sentry exception monitoring. This is not needed for local development.
### `transform`

`STATUS_UPDATE_INTERVAL` = The transform process logs the # of records transformed every nth record (1000 by default). Set this env variable to any integer to change the frequency of logging status updates. Can be useful for development/debugging.
```text
Usage: -c [OPTIONS]
`WORKSPACE` = Set to `dev` for local development, this will be set to `stage` and `prod` in those environments by Terraform.
Options:
-i, --input-file TEXT Filepath for harvested input records to
transform [required]
-o, --output-file TEXT Filepath to write output TIMDEX JSON records
to [required]
-s, --source [alma|aspace|dspace|jpal|libguides|gismit|gisogm|researchdatabases|whoas|zenodo]
Source records were harvested from, must
choose from list of options [required]
-v, --verbose Pass to log at debug level instead of info
--help Show this message and exit.
```
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ exclude = ["tests/"]
log_level = "INFO"

[tool.ruff]
target-version = "py311"
target-version = "py312"

# set max line length
line-length = 90
Expand Down
4 changes: 2 additions & 2 deletions transmogrifier/sources/transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import os
from abc import ABC, abstractmethod
from importlib import import_module
from typing import TYPE_CHECKING, TypeAlias, final
from typing import TYPE_CHECKING, final

import smart_open # type: ignore[import-untyped]
from attrs import asdict
Expand All @@ -24,7 +24,7 @@

logger = logging.getLogger(__name__)

JSON: TypeAlias = dict[str, "JSON"] | list["JSON"] | str | int | float | bool | None
type JSON = dict[str, "JSON"] | list["JSON"] | str | int | float | bool | None


class Transformer(ABC):
Expand Down

0 comments on commit e00c46e

Please sign in to comment.