Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Species archetypes 🎃 #65

Open
wants to merge 60 commits into
base: main
Choose a base branch
from
Open

Species archetypes 🎃 #65

wants to merge 60 commits into from

Conversation

kwentine
Copy link
Collaborator

@kwentine kwentine commented Oct 31, 2024

Context

This PR started as a question: would it be possible to use the Hugo archetype system to replace our scripts/template directory? A new species would be created with (a wrapper around):

hugo new species/species_name

Experiment

All the experimental changes are found in the following sections:

  • content/halloween/jacobus_lanternibus for an example species
  • archetypes/halloween for the archetype
  • layouts/halloween and partials/halloween for templates

I have kept the file names to ease diffing with the originals to emphasize changes. For example:

diff layouts/{species,halloween}/species_intro.html

Points of interest

  • The page resources found under assets and data are now grouped in the species page bundle
    • See in particular how the assembly page (now a leaf bundle) can include a separate contrib.md
  • The taxonomy is rendered directly from ENA XML in partials/halloween/lineage.html. The idea being that a wrapper script can just curl the file. I don't know if it's good move, but it was fun to practice Hugo-fu 🥷
  • The assembly metadata and stats are now packed in a single assembly.yaml
    • I moved the labels from the configuration to the templates (see partials/halloween/assembly_stats.html for example), for better data/display separation of concerns (so that labels can be changed/reordered in a single place, for example).
  • The front matters uses hugo-reserved keys when possible (ex: resources or lastmod) and group the remaining ones under params for clarity

I may have forgotten some points, but halloween is almost over so I move along :)

Next steps

If you want to create your own halloween species and push it to this branch, feel free ;)
But other than that, this can stay as an experiment. If we do identifies bits and pieces to include, I will think further on how to integrate the changes.

Happy Halloween :)

@kwentine kwentine requested review from RMCrean and brinkdp October 31, 2024 13:44
@kwentine
Copy link
Collaborator Author

kwentine commented Nov 20, 2024

For example: the peculiar edge-case with S. marinoi

Thanks for the detailed motivating use case.
I have implemented species-level taxonomy ranks in 34ad61f
To test it, in the hugo directory:

hugo new halloween/skeletonema_marinoi
# Edit the `taxonomy_ranks` key in `content/halloween/skeletonema_marinoi/_index.md`
hugo serve 

Note that the YAML array taxonomy_ranks will preserve ordering of items, sparing us a display_order field.

@brinkdp
Copy link
Collaborator

brinkdp commented Nov 20, 2024

Note that the YAML array taxonomy_ranks will preserve ordering of items, sparing us a display_order field.

Splendid! The override works well, and the preserved ordering is intuitive and useful.

@kwentine kwentine force-pushed the archetypal-halloween branch from 67ef16d to aeb558c Compare November 21, 2024 12:29
@kwentine kwentine marked this pull request as ready for review November 21, 2024 12:32
@kwentine
Copy link
Collaborator Author

So to sum up: a species created with hugo new species/cucurbita_pepo will use the genome-portal-kit Hugo theme. Species thus added will be drafts by default, can be viewed with hugo serve -D, and integrate seamlessly. But I expect the theme to evolve a lot if we wish to pursue that line of work. Let us discuss this further at the point of our next species addition.

@RMCrean
Copy link
Member

RMCrean commented Nov 21, 2024

Wow, this is really nice. I've just been playing around with it for a few minutes now. I have some catching up to do on all the changes you've made since I last looked! Really impressive work @kwentine!

Copy link
Member

@RMCrean RMCrean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, awesome job with this! I have some minor comments/ideas but nothing blocking. :)

It is cool to see just how much can be done with Hugo. Before this PR, I had assumed some of these things would not be possible with Hugo alone like for example: hugo/themes/genome-portal-kit/layouts/partials/species_bundle/get-remote-data.html

Bonus questions:

  • If the plan is to use lastmod in place of last_updated param in the markdown I still think it would be better to have it as [EDIT] flagged by default, rather than date of creation.

  • More general question perhaps just to understand better. Based on Daniel's flow chart about automation's that can be implemented to the species adding process. Do you see the future process looking like this:


1. Hugo new species_name
2. [recieve form/sheet from researcher - double check]
3. run script to apply sheet/form to species files 

Or:


1. [recieve form/sheet from researcher - double check]
2. Hugo new species_name [with flag containing path to filled in species form]

Or: something else?

</div>
</div>

{{ partial "last_updated.html" . }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps your already aware but the current partial does not work with the archetype generated files as they have no last_updated param. Is your thought here to add a last_updated or to extend this partial to support both last_updated and lastmod?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a tricky question, part of a more general one: isn't it unwieldy to have two versions of every layout as is currently the case (and would be compounded by the addition of layouts/partials/species_bundle/last_updated.html This is where this refactoring shows its weakness and rudimentary state.

A possible solution to this would be not to define whole page layouts in the genome-portal-kit module, but only expose smaller partials that the main user site would include.

In this case, that would roughly mean removing {{ partial "last_updated.html" }}, and leaving it to the user to render the last modified date, providing the standard Hugo lastmod.

See issue EB-376 for a very rough sketch of this design.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and I would leave it broken for now (until I can allot more work to this topic), as a reminder if need be of the brittle state of things.

title: "Genome assembly"
type: species_bundle
layout: assembly
lastmod: {{ .Date }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the lastmod be defined only in the species intro markdown page and this and the download file (hugo/themes/genome-portal-kit/archetypes/species/download/index.md) could read the lastmod from the intro page instead?

As an example I have that setup for the non-bundled approach in hugo/layouts/partials/last_updated.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could certainly suit to our needs, but perhaps providing the lastmod field everywhere leaves more flexibility to the (very hypothetical) user (related to the discussion above). But I have no strong opinion on this.

@kwentine
Copy link
Collaborator Author

Sorry for the delay

No need to be, lots of changes to take in at once!

I had assumed some of these things would not be possible with Hugo

Me neither to be honest. I had hoped at best for a small shell wrapper around hugo new to preserve add_new_species.py features. But this is a bit borderline though: the error handling is scant, and should things get more complicated (for our use case at least) I think investing on consolidating add_new_species.py would be safer way forward.

* If the plan is to use` lastmod` in place of `last_updated `param in the markdown I still think it would be better to have it as [EDIT] flagged by default

I don't want EDIT placeholders 😭 Which is quite whimsical, so I will yield to the majority :)

* More general question perhaps just to understand better. Based on Daniel's flow chart about automation's that can be implemented to the species adding process.

Great question. I'm not sure, but what would give is the greater flexibility is an automation pipeline that yields a big species-content.json. That beast could then be accessed and interpolated by the archetype templates. Or equally well be passed to add_new_species.py as an argument.

So either:

# Perhaps not the best location but just to give an idea 
cp species-content.json /tmp/tiny_herb/
hugo new species/tiny_herb # Uses resources.Get magic to bake in all species content

Or

add_new_species.py --context=/tmp/tiny_herb/species-content.json

Another dimension to consider is that this is probably very specific to our portal implementation and workflows...

Not a lot of definite answers, but I hope it makes sense elicits interesting design thinking :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants