The artsdata-planet-lavitrine pipeline transforms data from Artsdata.ca into the LaVitrine data model. All the configs and batch scripts are managed in this repo.
The LaVitrine data model differs slightly from the Artsdata data model. This pipeline transforms data from the Artsdata model to better fit the LaVitrine model. For a discussion see Github Discussions
This repo generates the following artifacts derived from Artsdata and refreshes them on a daily schedule. An artifact is a versioned dump of all events on a website including nested places, people, organizations and event types.
Table of source dataset uploads here.
View the history of daily workflow dumps here.
Click on any of the following artifacts to get a link to download the JSON data file.
- grandtheatre-qc-ca - Website events from https://grandtheatre.qc.ca
- placedesarts-com - Website events from https://placedesarts.com
- dia-logGraphs - 21 Quebec websites that are part of dia-log
- hector-charland-com - Website events from https://hector-charland.com
- theatredumarais-com - Website events from https://theatredumarais.com
- tout-culture-cms-events - Outaouais region events from https://toutculture.ca
- signe-laval-cms-events - Laval region events from https://signelaval.com/fr
- culture-mauricie-cms-events - Culture Mauricie region events
- Organizations - coming soon
- Venues - coming soon
- Artists - coming soon
For a formal documentation of the classes and properties of the La Vitrine data model used in this pipeline, please consult the JSON-LD Frame lavitrine_event_frame.jsonld.
All event data is passed through this JSON-LD Frame. The concept is similar to GraphQL. The Frame selects the properties to be extracted for each class including nested classes.
The pipeline also checks the artifacts for violations based on the data model SHACL here. The list of violations for each artifact can be viewed by switching the extension of the download url from .json
to .yml
. If the file is empty then there are no violations. The list of violations contains a reference to the event URI and the related SHACL rules that are in violation.
As new data arrives, new violations will occur due to the distributed community based nature of Artsdata. Most issues can be fixed direclty in this pipeline. Other issues will have to be communicated to the source data contributor.
Please raise an issue in this Github repo if you find data model violations blocking your reuse of the data dumps.
This website is crawled by an agent on the Artsdata platform. See https://github.com/culturecreates/artsdata-planet-gtq for details. It also has a taxonomy gtq-event-type-mapping.ttl to map strings from the original website to Artsdata event types.
One way to visually check if the event data looks good is to compare event images between the source website and Artsdata using the generic Artsdata Viewer (new version coming end of 2023).
Source | link |
---|---|
GTQ | https://grandtheatre.qc.ca/programmation/ |
Artsdata Viewer | https://api.artsdata.ca/events?source=http://kg.artsdata.ca/culture-creates/huginn/derived-grandtheatre-qc-ca |
The GTQ event data from Artsdata is saved to a dump daily using Github workflows managed in this repo. To download a dump you must call the Artsdata Databus API and pass the URI of the artifact. This will return the downloadUrl to download the data dump.
Data dump latest artifact | all artifacts
This website is crawed using Footlight. Footlight is a managed service by Culture Creates. Here is a link to the Footlight Console https://console.footlight.io/events?seedurl=placedesarts-com
One way to visually check if the event data looks good is to compare event images between the source website and Artsdata using the generic Artsdata Event viewer (new version coming end of 2023).
The PDA event data from Artsdata is saved to a dump daily using Github workflows managed in this repo. To download a dump you must call the Artsdata Databus API and pass the URI of the artifact. This will return the downloadUrl to download the data dump.
Data dump latest artifact | all artifacts
The pipeline can be run locally for development and testing. You will need Ruby 3 installed.
- Clone repository from Github.
bundle install
rake test
cd src ; ./batch.sh