-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add media properties generation #3620
Conversation
@obulat, Would you be working on this PR in the following days? Or should it be closed? |
d7bbf69
to
dd317c9
Compare
bebff77
to
8fa0f3a
Compare
2e30a65
to
6222001
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an excellent start!! I'm really excited to see us make progress on this and I'm loving the final output of the generation. I have a number of structure/code suggestions, as well as potential formatting improvements. I think we will also need tests for most of the utilities that are added here.
Additionally, the media properties themselves feel pretty squished together and difficult to delineate. I think adding a horizontal bar between each property can help visually differentiate the various properties, without affecting the table of contents in any way. Below is some code to accomplish that, and some screenshots of the difference:
diff --git a/catalog/utilities/media_props_gen/generate_media_properties.py b/catalog/utilities/media_props_gen/generate_media_properties.py
index 0505af318..106cb037d 100644
--- a/catalog/utilities/media_props_gen/generate_media_properties.py
+++ b/catalog/utilities/media_props_gen/generate_media_properties.py
@@ -98,7 +98,7 @@ def generate_long_form_doc(markdown_descriptions: dict, media_properties: dict)
prop_doc = "".join(
[f"{Md.heading(4, k)}{Md.line(v)}" for k, v in description.items()]
)
- media_docs += prop_heading + prop_doc
+ media_docs += prop_heading + prop_doc + Md.horizontal_line
return media_docs
diff --git a/catalog/utilities/media_props_gen/md.py b/catalog/utilities/media_props_gen/md.py
index 8010fb011..2fb3a7a09 100644
--- a/catalog/utilities/media_props_gen/md.py
+++ b/catalog/utilities/media_props_gen/md.py
@@ -2,6 +2,9 @@ from pathlib import Path
class Md:
+
+ horizontal_line = "\n---\n\n"
+
@staticmethod
def heading(level: int, text: str) -> str:
"""Add a heading to a markdown string."""
catalog/justfile
Outdated
# Generate the DAG documentation | ||
generate-media-props fail_on_diff="true": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the workflow is almost exactly the same with only the files differing, I think we should create a _generate_markdown
recipe with the more generic steps that then runs the shared logic for this and the above recipe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried moving this workflow into docker, but then realized that we don't mount the local .sql
files that we parse to generate the db docs into the container. Here's the error I got:
FileNotFoundError: [Errno 2] No such file or directory: '/opt/airflow/docker/upstream_db/0006_openledger_audio_schema.sql'
Is it possible to open them from the host, @AetherUnbound ? Seems like this would make the setup unnecessarily complex.
If I keep the code running locally as it is here now, then I won't be able to import media type constants. Even if I do manipulate the path somehow to make the constant importable, those files contain imports from airflow
, which throw new errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohhhh dang, I didn't even think about having to mount those files 😮 We can actually mount those files relatively easily in the recipe at least, it would just require modifying the just ../run
command here:
Lines 127 to 129 in 78d799c
just ../run \ | |
{{ SERVICE }} \ | |
"bash -c 'python catalog/utilities/dag_doc_gen/dag_doc_generation.py && chmod 666 /opt/airflow/catalog/utilities/dag_doc_gen/DAGs.md'" |
Something similar to the way the tests are mounted (note the --volume
here):
Lines 101 to 108 in 78d799c
_mount-test command: up-deps | |
env DC_USER="airflow" just ../run \ | |
--rm \ | |
-e AIRFLOW_VAR_INGESTION_LIMIT=1000000 \ | |
-w /opt/airflow/catalog \ | |
--volume {{ justfile_directory() }}/../docker:/opt/airflow/docker/ \ | |
{{ SERVICE }} \ | |
{{ command }} |
Even though it's not needed for the DAG markdown generation, it can be done for all the generation recipes anyway!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I'll try that.
Not related to this PR, but is the process of keeping those files up-to-date automated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not! I believe that's the purview of #416 😄
|
||
PREAMBLE = open(Path(__file__).parent / "preamble.md").read() | ||
|
||
MEDIA_TYPES: list[MediaType] = ["audio", "image"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same note about constants here:
MEDIA_TYPES = [AUDIO, IMAGE] |
6222001
to
8d685d2
Compare
Full-stack documentation: https://docs.openverse.org/_preview/3620 Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again. You can check the GitHub pages deployment action list to see the current status of the deployments. New files ➕: Changed files 🔄: |
Co-authored-by: Madison Swain-Bowden <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Co-authored-by: Madison Swain-Bowden <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Co-authored-by: Madison Swain-Bowden <[email protected]>
Co-authored-by: Dhruv Bhanushali <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
1fca405
to
91bb537
Compare
Fixes
Fixes #2186 by @obulat
Description
This PR adds the page with documentation on the media properties: a table with all of the audio and image properties with their database and Python characteristics.
The main part of this PR, though, is the script that parses the SQL definition files and the Python column definitions to create the media property tables, and the
catalog/utilities/media_props_gen/media_props.md
file that holds the more detailed descriptions of the media properties. This latter markdown file in this PR only contains the list of properties and the headings for descriptions. It will be filled in in the follow-up PR.A new
just catalog/generate-media-props
script runs the parsers and markdown generators. If the generated markdown file is different from the file that was saved previously, a warning is issued to run the script again and commit the updated markdown file.To parse the Python column definitions, this PR uses the
ast
module.Testing Instructions
Change some property in the SQL file, the
catalog/dags/common/storage/columns.py
file or thecatalog/utilities/media_props_gen/media_props.md
file, and runjust catalog/generate-media-props
. Thedocumentation/meta/media_properties.md
file should reflect the changes, and a warning should appear in the console.Also, review the media properties table and the full document from the comment below, and see if anything needs to be changed.
Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin