Testing Datami widget to edit CSV files https://datami.multi.coop/ with an additional validation in CLI or through github action.
Demonstrate and document how we can use Datami and other components to:
- ease the edition of a csv file stored in Github
- constraint the display of fields in the Datami widget
- investigate how we can use github actions to ensure that the csv file structure and or content is valid according to a model.
The approach of Datami component is:
- to rely on Github to store the CSV file
- offer a html widget to visualize or edit content of the file for users who may not want to use Github directly
- automate the push of modifications to the data done via the widget as Github pull requests
- Data and data validation:
- examples/csv/data: data files and related resources files to validate in CI
- data (csv file):project-list.csv
- definition (model file) for data validation (CLI or CI) project-list.resources.yaml
- .github/workflows: actions to automate validation
- Example github action that validates data file on pull request validate-sample-data.yml
- examples/csv/data: data files and related resources files to validate in CI
- Data visualisation and edition (Configuration of Datami widget)
- examples/csv/model: model for the csv data (be used by the widget)
- Table schema project-list.frictionless-table-schema.json
- example/csv/widget: widget and widget configuration examples
- configuration file for the widget: project-list.fields-custom-properties.json
- configured widget:project-list-widget.html
- examples/csv/model: model for the csv data (be used by the widget)
Warning
The data validation and the data edition (widget) are configured using different set of files or data models. These data model use different syntax but have to be kept in sync manually !
Note
This validation is unrelated to Datami or the use of the widget.
The Goal here is to be able to validate that the CSV file is consistent with the data model.
We can validate the file in CLI (local mode) or / and as a github action.
The output of the validation is easier (more direct) to read and analyse in local mode. But the github action still provides a report file that can be downloaded if validation fails.
The model describing the file is project-list.resources.yaml.
Warning
In the case of multi valued columns, the validation involves describing the allowed values as a REGEXP.
pip install "frictionless[excel,json]" --pre
Example with a file that contains invalid / not authorized values.
Line 3 of data contains a value csharp
which is not in the pattern of authorized values (see project-list.resources.yaml) where the pattern or allowed values / types are defined.
Boaviztapi,https://github.com/Boavizta/,ready,csharp
cd examples/csv/data
frictionless validate project-list.resources.yaml
─────────────────────────────────────────────────────── Dataset ────────────────────────────────────────────────────────
dataset
┏━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ name ┃ type ┃ path ┃ status ┃
┡━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ project-list │ table │ project-list.csv │ INVALID │
└──────────────┴───────┴──────────────────┴─────────┘
──────────────────────────────────────────────────────── Tables ────────────────────────────────────────────────────────
project-list
┏━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Row ┃ Field ┃ Type ┃ Message ┃
┡━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 3 │ 4 │ constraint-error │ The cell "csharp" in row at position "3" and field "languages" at position "4" does │
│ │ │ │ not conform to a constraint: constraint "pattern" is "^(rust|python|docker|other\ │
│ │ │ │ tek)?(\|(rust|python|docker|other\ tek))*$" │
└─────┴───────┴──────────────────┴─────────────────────────────────────────────────────────────────────────────────────┘
After fixing the data file (replace the csharp
value by python|docker
).
Boaviztapi,https://github.com/Boavizta/,ready,python|docker
frictionless validate project-list.resources.yaml
─────────────────────────────────────────────────────── Dataset ────────────────────────────────────────────────────────
dataset
┏━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ name ┃ type ┃ path ┃ status ┃
┡━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ project-list │ table │ project-list.csv │ VALID │
└──────────────┴───────┴──────────────────┴────────┘
See sample github action (.github/workflows/validate-sample-data.yml
)
jobs:
# Validate
validate:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v2
- name: Validate data
uses: frictionlessdata/repository@v2
with:
resources: examples/csv/data/project-list.resources.yaml
The widget is mainly configured using 2 distinct files:
- a data model (table schema) project-list.frictionless-table-schema.json
- a configuration file for the widget: project-list.fields-custom-properties.json
Tip
The model itself make no distinction between mono-valued or multi-valued fields, the 2 types of fields are described as enum without cardinality. To distinguish mono-valued fields from multi-valued fields in edition, the widget configuration file use respectively the subtype tag (singular) vs tags (with a plural and an optional field separator).