-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Omeka import #65
Omeka import #65
Conversation
|
WalkthroughThis update introduces significant enhancements to the GitHub workflow for a Jekyll site, focusing on Omeka S integration and improved deployment processes. It adds a new data processing job for handling data with Python, specifically for interacting with the Omeka API, and updates the build and deployment jobs to ensure a smooth workflow. The changes aim to streamline the deployment of the Jekyll site while enabling efficient data processing and integration with Omeka S collections. Changes
Assessment against linked issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files ignored due to path filters (41)
_data/config-metadata.csv
is excluded by:!**/*.csv
_data/demo-compoundobjects-metadata.csv
is excluded by:!**/*.csv
_data/demo-metadata.csv
is excluded by:!**/*.csv
_data/sgb-metadata.csv
is excluded by:!**/*.csv
objects/210a.jpg
is excluded by:!**/*.jpg
objects/210b.jpg
is excluded by:!**/*.jpg
objects/abb21535.jpeg.jpg
is excluded by:!**/*.jpg
objects/abb42444.jpeg.jpg
is excluded by:!**/*.jpg
objects/demo_001.jpg
is excluded by:!**/*.jpg
objects/demo_002.pdf
is excluded by:!**/*.pdf
objects/demo_003.mp3
is excluded by:!**/*.mp3
objects/demo_004.jpg
is excluded by:!**/*.jpg
objects/hells_half_theta.jpg
is excluded by:!**/*.jpg
objects/hughes_article.png
is excluded by:!**/*.png
objects/object_list.csv
is excluded by:!**/*.csv
objects/small/210a_sm.jpg
is excluded by:!**/*.jpg
objects/small/210b_sm.jpg
is excluded by:!**/*.jpg
objects/small/abb13578_sm.jpg
is excluded by:!**/*.jpg
objects/small/abb19619_sm.jpg
is excluded by:!**/*.jpg
objects/small/abb21535.jpeg_sm.jpg
is excluded by:!**/*.jpg
objects/small/abb40689_sm.jpg
is excluded by:!**/*.jpg
objects/small/abb42444.jpeg_sm.jpg
is excluded by:!**/*.jpg
objects/small/demo_001_sm.jpg
is excluded by:!**/*.jpg
objects/small/demo_002_sm.jpg
is excluded by:!**/*.jpg
objects/small/demo_003_sm.jpg
is excluded by:!**/*.jpg
objects/small/demo_004_sm.jpg
is excluded by:!**/*.jpg
objects/small/hells_half_theta_sm.jpg
is excluded by:!**/*.jpg
objects/small/hughes_article_sm.jpg
is excluded by:!**/*.jpg
objects/thumbs/210a_th.jpg
is excluded by:!**/*.jpg
objects/thumbs/210b_th.jpg
is excluded by:!**/*.jpg
objects/thumbs/abb13578_th.jpg
is excluded by:!**/*.jpg
objects/thumbs/abb19619_th.jpg
is excluded by:!**/*.jpg
objects/thumbs/abb21535.jpeg_th.jpg
is excluded by:!**/*.jpg
objects/thumbs/abb40689_th.jpg
is excluded by:!**/*.jpg
objects/thumbs/abb42444.jpeg_th.jpg
is excluded by:!**/*.jpg
objects/thumbs/demo_001_th.jpg
is excluded by:!**/*.jpg
objects/thumbs/demo_002_th.jpg
is excluded by:!**/*.jpg
objects/thumbs/demo_003_th.jpg
is excluded by:!**/*.jpg
objects/thumbs/demo_004_th.jpg
is excluded by:!**/*.jpg
objects/thumbs/hells_half_theta_th.jpg
is excluded by:!**/*.jpg
objects/thumbs/hughes_article_th.jpg
is excluded by:!**/*.jpg
Files selected for processing (1)
- .github/workflows/jekyll.yml (1 hunks)
Additional comments: 6
.github/workflows/jekyll.yml (6)
- 1-1: The workflow name has been updated to reflect the new functionality of data processing alongside Jekyll site deployment.
- 6-7: Branch triggers have been updated to include both
main
andomeka-import
. This ensures that the workflow runs for changes made to either branch, facilitating continuous integration and testing for the Omeka S integration efforts.- 19-23: Environment variables for the Omeka API have been added. These are crucial for authenticating and interacting with the Omeka S API. It's important to ensure that these secrets (
OMEKA_API_URL
,KEY_IDENTITY
,KEY_CREDENTIAL
,ITEM_SET_ID
) are properly set in the repository's secrets before merging this PR.- 26-110: The
data-processing
job has been added to handle data processing tasks using Python. This job fetches items from Omeka collections and performs necessary mapping of columns. It's important to ensure that the Python script is correctly fetching and processing data from the Omeka API. The use of environment variables for API credentials and item set ID is a good practice. However, there are a few points to consider:
- Error handling: The script should handle potential errors more gracefully. For example, if the response status code is not 200, it simply breaks the loop without logging any error or raising an exception. It would be beneficial to log an error message or raise an exception to make debugging easier in case of API failures.
- Data mapping: The mapping function
map_columns
assumes the presence of certain keys in the data. It's important to verify that these keys always exist in the data returned by the Omeka API or handle cases where they might be missing.- Image URLs: The script attempts to set
image_small
andimage_thumb
based on thelarge
thumbnail URL. This might not be the intended behavior, as typically, different sizes of images are used for different purposes. Consider verifying this logic.- if response.status_code != 200: + if response.status_code != 200: + raise Exception(f"Failed to fetch items from collection: HTTP {response.status_code}")
- 113-122: The
build
job has been updated to depend on the successful completion of thedata-processing
job. This ensures that the site build process only proceeds if the data processing tasks have been successfully executed. The Ruby setup configuration has also been updated. These changes help maintain the integrity and reliability of the deployment process.- 135-135: The deployment job has been updated to depend on the build job, refining the workflow's execution sequence and ensuring a coherent and efficient deployment process. This change aligns with best practices for CI/CD workflows.
@koilebeit Can you please take a look at the actions with https://github.com/nektos/act. My docker is on the frizz at the moment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 1
Configuration used: CodeRabbit UI
Files selected for processing (1)
- .github/workflows/jekyll.yml (1 hunks)
Additional comments: 5
.github/workflows/jekyll.yml (5)
- 1-1: The workflow name change to "Deploy Jekyll site with Data Processing" clearly reflects the new functionality being added. This is a good practice for clarity and maintainability.
- 6-7: Adding
main
andomeka-import
as branch triggers ensures that the workflow runs on relevant branches. This is a necessary update for integrating the new data processing functionality.- 19-23: The addition of environment variables for Omeka API configuration is crucial for secure and flexible API interaction. Storing sensitive information in GitHub secrets is a best practice for security.
- 118-132: The build job's dependency on the
data-processing
job is correctly set up to ensure data is processed before the site build begins. The Ruby setup and caching are appropriately configured, which should help with build performance.- 145-145: The deployment job configuration looks correct, with a dependency on the build job to ensure deployment happens after the site is built. This ensures a smooth deployment process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I try it with act I get following Error-Message:
[Deploy Jekyll site with Data Processing/data-processing] ⭐ Run Main Process Data [Deploy Jekyll site with Data Processing/data-processing] 🐳 docker exec cmd=[python /var/run/act/workflow/3.py] user= workdir= | Traceback (most recent call last): | File "/var/run/act/workflow/3.py", line 72, in <module> | item_set = get_items_from_collection(ITEM_SET_ID) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | File "/var/run/act/workflow/3.py", line 18, in get_items_from_collection | response = requests.get(url, params=params) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/site-packages/requests/api.py", line 73, in get | return request("get", url, params=params, **kwargs) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/site-packages/requests/api.py", line 59, in request | return session.request(method=method, url=url, **kwargs) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/site-packages/requests/sessions.py", line 575, in request | prep = self.prepare_request(req) | ^^^^^^^^^^^^^^^^^^^^^^^^^ | File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/site-packages/requests/sessions.py", line 486, in prepare_request | p.prepare( | File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/site-packages/requests/models.py", line 368, in prepare | self.prepare_url(url, params) | File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/site-packages/requests/models.py", line 439, in prepare_url | raise MissingSchema( | requests.exceptions.MissingSchema: Invalid URL 'items': No scheme supplied. Perhaps you meant https://items? [Deploy Jekyll site with Data Processing/data-processing] ❌ Failure - Main Process Data [Deploy Jekyll site with Data Processing/data-processing] exitcode '1': failure [Deploy Jekyll site with Data Processing/data-processing] 🏁 Job failed Error: Job 'data-processing' failed
edit: probably I do not have set the OS environment variables..
Apart from the github action, the objectid must be lowercase (see https://collectionbuilder.github.io/cb-docs/docs/metadata/csv_metadata/#objectid). |
I get it to work with uppercase objectid (perhaps the sanitization of cb does the trick). Leaving out object_location is by design (or choice). As long as we cannot access these files directly from Omeka we cannot integrate the links. PS: If it doesn't work, we should think about renaming the objects in Omeka. |
I added the secrets also to my fork and it seems to work https://github.com/maehr/collectionbuilder-csv/actions/runs/8173244290 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 1
Configuration used: CodeRabbit UI
Files selected for processing (1)
- .github/workflows/jekyll.yml (1 hunks)
Additional comments: 4
.github/workflows/jekyll.yml (4)
- 1-1: The workflow name has been updated to "Deploy Jekyll site with Data Processing" to reflect the new functionality of processing data alongside deploying the Jekyll site. This change accurately describes the enhanced capabilities of the workflow.
- 6-7: Branch triggers have been updated to include both
main
andomeka-import
. This ensures that the workflow runs on pushes to these branches, facilitating continuous integration and deployment for both the primary branch and the feature branch related to Omeka S integration.- 19-23: Environment variables for Omeka API configuration have been added. This is a good practice as it keeps sensitive information out of the codebase and allows for easy updates without modifying the workflow file. Ensure that these secrets (
OMEKA_API_URL
,KEY_IDENTITY
,KEY_CREDENTIAL
,ITEM_SET_ID
) are properly set in the repository's settings.- 118-145: The
build
anddeploy
jobs have been updated to depend on thedata-processing
andbuild
jobs, respectively. This ensures that the workflow executes in the correct order, with data processing completed before the site build, and the site build completed before deployment. The use of artifact upload and download steps to pass data between jobs is a good practice for maintaining modularity and separation of concerns within the workflow.
@koilebeit the script turns all multiple/compound_items into compound_items https://github.com/maehr/collectionbuilder-csv/actions/runs/8207996344 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 3
Configuration used: CodeRabbit UI
Files selected for processing (2)
- .github/workflows/jekyll.yml (1 hunks)
- .github/workflows/process_data.py (1 hunks)
Additional comments: 12
.github/workflows/jekyll.yml (7)
- 1-1: The workflow name change to "Deploy Jekyll site with Data Processing" clearly reflects the new functionality being added, which is a good practice for clarity and maintainability.
- 6-7: Adding the
omeka-import
branch to the workflow triggers ensures that the data processing and deployment steps are executed for changes made specifically for Omeka S integration, which is a thoughtful addition.- 19-23: The addition of environment variables for Omeka API configuration is a secure way to handle sensitive information. However, ensure that these secrets (
OMEKA_API_URL
,KEY_IDENTITY
,KEY_CREDENTIAL
,ITEM_SET_ID
) are properly set in the repository's secrets before merging this PR.- 43-43: The dependency of the
build
job on thedata-processing
job ensures that the site is not built until the data processing is complete, which is a crucial step for ensuring data consistency.- 48-52: Downloading the artifact produced by the
data-processing
job in thebuild
job is a good practice to ensure that the latest processed data is used for building the site.- 56-57: The Ruby setup configuration with
bundler-cache: true
and specifying acache-version
is a good practice for speeding up the build process by caching dependencies.- 70-70: Ensuring the
deploy
job runs on the latest Ubuntu version and depends on thebuild
job is a good practice for maintaining a streamlined and dependent workflow execution..github/workflows/process_data.py (5)
- 1-4: The imports are correctly organized and only necessary libraries are imported, which is good for maintainability and avoiding unnecessary dependencies.
- 6-9: Extracting Omeka API configuration from environment variables is a secure practice. Ensure that these variables are correctly set in the GitHub Actions workflow to avoid runtime errors.
- 54-70: The
map_columns
function processes the item set to map columns for the CSV file. This function handles compound objects well by creating separate entries for each media item. However, ensure that the logic for handling items with multiple media is consistent with the project's data requirements.- 73-104: The
create_media_data_dict
function constructs a dictionary for each item/media pair. It's well-implemented, but consider adding validation for the existence of keys in the data dictionary to avoid potentialKeyError
exceptions.Consider adding validation for the existence of keys in the data dictionary in the
create_media_data_dict
function to avoid potentialKeyError
exceptions.
- 106-109: The script execution block correctly retrieves items from the collection, maps columns, and writes the data to a CSV file. This is a crucial step for integrating Omeka S data into the Jekyll site. Ensure that the CSV file path matches the expected location in the Jekyll site structure.
@koilebeit I added multiples as well. Now the only caveat is that PDF files are not shown nor downloadable. See the build files https://github.com/maehr/collectionbuilder-csv/actions/runs/8208244993 |
The PDF files cannot be downloaded because the column object_location is empty. And they are not shown because image_small and image_thumb are empty (see CB display_template) Then I noticed that the display_template is still filled out incorrectly. As I understand it from the documentation CB Compound Objects the parent object gets the display_template "compound_object", the sub-objects each get their format (e.g. "pdf" or "image") like this:
|
I'll check again and fix it. I guess we can add the object_location by checking for the media to be private or not first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- .github/workflows/process_data.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- .github/workflows/process_data.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- .github/workflows/process_data.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- .github/workflows/process_data.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files ignored due to path filters (1)
_data/theme.yml
is excluded by:!**/*.yml
Files selected for processing (2)
- .github/workflows/jekyll.yml (1 hunks)
- _includes/item/image-gallery.html (1 hunks)
Additional comments: 6
.github/workflows/jekyll.yml (6)
- 1-1: The workflow name "Deploy Jekyll site with Data Processing" clearly reflects its purpose and the addition of data processing steps.
- 6-6: Adding
main
to the branch triggers is a good practice, ensuring the workflow runs for updates to the main branch.- 18-22: The addition of environment variables for the Omeka API and related credentials is crucial for secure data processing. Ensure that the
process_data.py
script properly uses these secrets without exposing them in logs or outputs.Verification successful
Given the output, there's no direct evidence of secrets being exposed through
process_data.py
script, further verification is necessary. We need to examine how secrets are accessed and used within the script, beyond just the presence oflogger
statements.
The analysis of the
process_data.py
script, based on the provided output, indicates that the secrets (OMEKA_API_URL, KEY_IDENTITY, KEY_CREDENTIAL, ITEM_SET_ID) are accessed securely usingos.environ.get
and are used appropriately for constructing URLs, parameters, and function calls related to API requests. There's no evidence of these secrets being exposed in logs or outputs. Therefore, the script appears to handle the secrets securely as per the review comment's concern.* 42-42: Ensuring the `build` job depends on the successful completion of `data-processing` is a logical and efficient workflow design. * 55-56: The Ruby setup configuration uses a specific commit hash for the `setup-ruby` action. While this ensures stability, consider using a version tag for readability, unless stability concerns necessitate the use of a specific commit. * 69-69: The deployment job's dependency on the `build` job ensures that the site is built before deployment, which is a good practice for maintaining a reliable deployment process.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that secrets are used securely in the process_data.py script rg --type py 'print|logger' .github/workflows/process_data.pyLength of output: 163
Script:
#!/bin/bash # Verify how secrets are accessed and used in the process_data.py script rg --type py 'OMEKA_API_URL|KEY_IDENTITY|KEY_CREDENTIAL|ITEM_SET_ID' .github/workflows/process_data.pyLength of output: 587
Pull request
Proposed changes
Types of changes
Checklist
Summary by CodeRabbit
New Features
Improvements
Chores