Cultivating an accessible new- and ongoing-contributor experience to Openverse development #2155

sarayourfriend · 2023-05-22T02:13:49Z

sarayourfriend
May 22, 2023
Collaborator

Whilst I was on leave last week, I came across neomutt's contributor process page. It reminded me of a conversation I had with @AetherUnbound (and a bit with @dhruvkb) about how to document and cultivate an accessible new and ongoing contributor experience.

Before getting into specific proposed changes, I want to outline the guiding principles I have in mind at the moment. In order of importance, if the goal is to create an accessible contributor experience, then our documentation and processes:

… must go beyond the first contribution. It must include clear pathways for the second, third, etc. contributions.
… should include clear guidelines for how to ask for help, including where to ask and who to ask.
… must include documentation on required skills to attempt a given issue. In particular, this should cover the programming languages and tools involved, but also specify whether and what kinds of tests must be included with the changes.

Based on a combination of the discussion I had with Madison, the neomutt example linked above, our (mostly loosely defined) existing practices, and the Gutenberg project's practices (which are similarly broad, like our existing ones), I propose the following changes and additions to our practices.

Issue labels

Changes to our issue labelling practice will address the first and third points listed above.

Finding issues

We already use "good first issue" fairly consistently. Our documentation also includes a friendly note saying that folks are welcome to complete more than one "good first issue". However, this does not help people move beyond these "good first issues" to see where they can increase the impact of their contributions to the project.

Github includes a default "help wanted" issue. Following neomutt's example (and a suggestion I made during the conversation with Madison), we should document a gradient of issue accessibility for non-maintainers:

"good first issue" indicates issues that will help contributors familiarise themselves with a specific aspect of the project. An small change in the API that does not require deep Django or Openverse domain knowledge should be labelled "stack: api" and "good first issue". These issues probably often do not require brand new unit tests but may require updating existing ones; however, this is not a rule. These issues should have zero ambiguity and a clear implementation suggestion in the description. If an issue is exclusively to add new unit tests with either no or minimal (couple lines) changes to the runtime code, the issue can still be a "good first issue", but it should have an explicit note saying that it is a good introduction to unit testing specifically. This will be slightly redundant with the "pytest", "jest" or "playwright" technology labels, but being explicit about it will make it easier to establish the expectations for contributors who may land on the issue from somewhere other than our documentation.
"help wanted" indicates issues that require more than surface level knowledge about Openverse domain-specific concepts or tools relevant to the issue. Adding a new serializer field, for example, would require adding new unit tests to the serializer and downstream users of the field. This requires novice-level knowledge of Django Rest Framework and pytest, and could require anywhere from novice to expert level Openverse domain knowledge. These may be good first issues for someone contributing to a new part of the stack for the first time (e.g., a catalog contributor making their first API contribution) but almost certainly not for a general first contribution, primarily due to the reliance on Openverse domain-specific knowledge. If an issue requires anything more than a broad knowledge of the tools or Openverse domain, it should not have "help wanted" and should be unlabelled. These issues may have some ambiguity but should still have a concrete suggestion for implementation (at least a starting point) in the issue description.
"staff only" indicates an issue that should not be attempted by non-maintainers. This is either due to the significant complexity of the issue with respect to Openverse domain knowledge or due to an intrinsic relationship to infrastructure changes that cannot be attempted without high-level access.

These labels should be mutually exclusive. If an issue does not include one of these three labels, then it indicates an issue that requires intermediate to expert level knowledge of our tools or domain.

The above can be condensed into the following table.

Label	Openverse knowledge	Tool knowledge	Programming language knowledge	Ambiguity
"good first issue"	None/a tiny amount that is clearly explained in the issue	None	Novice in the specific language or proficient in another	Unambiguous
"help wanted"	A general understanding is required	A general understanding is required or the concepts are accessible in the tool's documentation and could be effectively learned and applied in ~15 minutes	Novice to intermediate	Slight
None	Intermediate to expert	Intermediate to expert	Intermediate to expert	Slight to abstract
"staff only"	Any	Any	Any but likely advanced to expert	Any but likely extremely ambiguous

Contributor documentation should include brief and accessible descriptions of these issues geared towards selection. The description in this section is primarily geared towards maintainers and helping us decide which label to add to a given issue.

"Help me pick!"

In addition to generally describing the issue labelling rationale so that contributors understand how to search through the issue list, the documentation should include links to specific queries. For example, the following would be good candidates for a pre-made query linked from the documentation:

Good introductions to automated testing
Good introductions to JavaScript or Python
Good introductions to Airflow
Good introductions to Django (Rest Framework)
Advanced automated testing
Advanced JavaScript or Python (separate links)

Technology labels

We already have these labels but use them somewhat sporadically. They also slightly overlap with the stack labels. However, for "good first issue" and "help wanted" issues, we should still add them to reduce ambiguity. There are, after all, some changes to the catalog project that may not explicitly require Airflow knowledge (changes to DAG documentation, for example). To put a finer point on it, the catalog, ingestion server, and API may all variably include interactions with Postgres, Elasticsearch, the ingestion server, or important external APIs. It is essential to specify issues that will require knowledge or be helpful introductions to specific pieces of relevant technology.

For "good first issue" the technology labels will carry the implication that the issue will be an accessible introduction to these technologies. If labelled with "python", the issue should not require advanced Python knowledge but does involve making a change to Python code. Likewise for any other technology. For issues that are primarily geared at being introductions to unit testing, they should be labelled with the specific unit testing library that will be used.

For "help wanted", the technology labels will imply that the issue may provide an opportunity to learn or exercise intermediate knowledge of these technologies.

Unlabelled and staff only issues do not require technology labels as they already carry the assumption of potential ambiguity. If the relevant technologies are known, however, then they can be added at the discretion of the issue author.

Changes to the existing labels

Here is the list of the existing labels:

javascript
python
php
vue
django
airflow
docker
postgres
css
bash
typescript

The JavaScript/TypeScript distinction is finicky. The distinction with Vue is also somewhat finicky, especially when it comes to clarifying the fact that much of our Vue code inherently involves TypeScript. If an issue is labelled Vue it can also be labelled TypeScript if special TS knowledge is required or emphasised in the changes. The JavaScript label should be reserved for changes to the automation scripts that are in pure JS and have no relationship to the TypeScript project. In the future we could (and probably should) type check those using JSDoc and could eventually entirely remove the JavaScript label.

While the php label isn't used much these days, it will start to be used again once we begin work on the various WordPress integrations we've been brainstorming, so we can keep it around.

We should also add the following tech labels:

pytest
jest
playwright
elasticsearch
redis

In the future we may need to add a label for Terraform but at the moment all infrastructure contributions are staff only due to repository restrictions.

All labels should have their descriptions updated to replace "Requires familiarity with" with "Involves". "Requires familiarity with" precludes "good first issue"s being accessible introductions to the technology in question.

Issue maintenance

I want to avoid adding new responsibilities to the MSR role. However, I think in addition to checking "awaiting triage", it would be helpful to have a once-weekly check of any "good first issue" and "help wanted" issues created in the previous week to confirm that they are appropriately labelled and have a sufficient level of description and an appropriate level of ambiguity depending on the contributor label. We should also have a call to action during the weekly developer meeting for folks to consider if any of the issues they've opened recently are good candidates for either.

Getting help

These changes aim to address the second point.

Mentorship

Note
This would be a big change in our processes but I think would make certain levels of contribution run more smoothly. It is common, at the moment, to request unit tests for a change from new contributors only to have them share that they're not familiar with how to write unit tests. While the changes to the labelling practice above would mitigate that (by making "good first issue" work at most require small changes to existing tests), issues with a medium-light complexity would be much more accessible to less experienced folks who are nevertheless beyond the level of the "good first issue".

Issues labelled "help wanted" must also include a link to a new documentation site page with instructions for requesting a mentor to help you work through the issue. The goal of providing this option is to help someone with existing knowledge of a particular technology work through any potential ambiguity that exists in the issue with the express intention of increasing their ability to contribute to Openverse generally. Being a mentor for an issue would include helping contributors research Openverse domain-specific concepts, giving special and early review for changes, and potentially even pair-programming on slightly more complex issues.

Mentorship can also be available for unlabelled issues, but those issues do not require a link. With few exceptions, contributors working at that level will already be familiar with the mentorship concept. Mentorship should be on an opt-in basis and should rotate throughout the team. No one should be assigned more than one issue to be a mentor for at a given time, and it should be clear in the contributor documentation that mentorship is subject to availability. If an individual requires mentorship for a given issue, they should not attempt the issue and should choose something else in the meantime.

Links to contact documentation

Issues labelled "good first issue" must include a link to instructions for contacting maintainers for help. This can be the existing "keep in touch" README section but should include more specific instructions. For example, "if seeking help to complete an issue, please ping xyz alias on GitHub or ask for help in the Openverse Make Slack". The current text is not sufficiently explicit on the purpose of each communication channel.

Conclusion

These suggestions are meant as a starting point for discussion and to help make our existing processes slightly more concrete and specific. Please chime in if there are additional changes or if the suggestions I've made miss the mark with respect to the guiding principles I've listed above. Likewise, if additional guiding principles are needed or if the ones I've listed are bunk, please say so.

To reemphasise, however, these are meant as a starting point, both for improvements to the contributor documentation (to help find issues) and to the process of labelling issues. As with all our processes, none of these suggestions are meant as hard and fast rules, all should be understood to include inherent flexibility subject to contributor discretion, all of them should be documented clearer in our documentation site, and every single detail is open to ongoing iteration.

AetherUnbound · 2023-05-23T23:21:11Z

AetherUnbound
May 23, 2023
Collaborator

I hope to look this over and plan on responding by the end of the week.

1 reply

AetherUnbound May 25, 2023
Collaborator

This is fantastic! Thanks for taking the time to think about and collate this. In general I'm totally aligned with this approach and the suggestions you've made. A few other thoughts:

They also slightly overlap with the stack labels. However, for "good first issue" and "help wanted" issues, we should still add them to reduce ambiguity.

+1 to this idea, requiring them for "help wanted"/"good first issue" is a great idea. I don't think they should always be required for the reasons you mention, but ideally those issues are specific enough to have it make sense.

Some other tech labels that I think should be added too:

GitHub Actions
Just

Many of our good first issues involve those two pieces of our stack.

obulat · 2023-05-24T18:06:36Z

obulat
May 24, 2023
Maintainer

Thank you for starting this discussion, @sarayourfriend! The suggestions are great and thorough!

My first suggestion for a good introduction to the project is from BeeWare, where I made my first open source contributions. We can write a tutorial describing how to set up and run unit tests with coverage. Then, after they see which lines are not covered, they can create their first PR writing the tests for a couple of lines.
To make sure that the PRs are not duplicated and the contributors don't unknowingly select a very difficult-to-test code, we can create a tracking issue where the contributors can claim non-covered lines.

Some other suggestions:

If we write very detailed descriptions in the issue, we can lower the issues' difficulty and the required level of familiarity with the Openverse codebase.

We should probably create some templates with instructions describing what we expect from a PR for a "help wanted" issue in specific technology. For example, if the issue is for the front end, we could add that "A successful PR should add unit tests in and Playwright visual regression tests in . If you are not sure how to write them, don't worry! The maintainers are happy to help!"

We could also create views in the Openverse project to make curating the issues easier.

0 replies

sarayourfriend · 2023-06-06T10:26:45Z

sarayourfriend
Jun 6, 2023
Collaborator Author

Thanks, @AetherUnbound and @obulat for the feedback and suggestions. I want to ping @zackkrida for broad-view input here as well. With respect to the mentoring idea, I think it would be best for us to discuss it in a more concerted way as a team. I'll add it to our internal meeting agenda for next week (skipping this week due to WCEU absences 🙂). It might be something we pursue down the line rather than now, especially as our core group of maintainers is already about to undergo some big changes in availability and organisation.

For the rest of the stuff, I'll create new issues (tomorrow) and a milestone to track them. Specifically the following summaries, I think would cover everything aside from the mentoring aspect:

Add the new technology labels (including the additional ones from Madison)
Add new documentation for maintainers outlining guidelines/expectations for issues that have "good first issue" and "help wanted"
- Include templates as suggested by Olga which include the "how to get help" information
Add new contributor-oriented documentation for how to interpret our labels and find issues to work on depending on their experience and interest
- Include the pre-made query links with different "I want to work on ..." or "I want an opportunity to learn about ..."

Incidentally, I think the #2304 handles the issue maintenance aspect of this 🎉

1 reply

sarayourfriend Jun 7, 2023
Collaborator Author

Milestone for this is here: https://github.com/WordPress/openverse/milestone/14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cultivating an accessible new- and ongoing-contributor experience to Openverse development #2155

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Cultivating an accessible new- and ongoing-contributor experience to Openverse development #2155

sarayourfriend May 22, 2023 Collaborator

Issue labels

Finding issues

"Help me pick!"

Technology labels

Changes to the existing labels

Issue maintenance

Getting help

Mentorship

Links to contact documentation

Conclusion

Replies: 3 comments · 2 replies

AetherUnbound May 23, 2023 Collaborator

AetherUnbound May 25, 2023 Collaborator

obulat May 24, 2023 Maintainer

sarayourfriend Jun 6, 2023 Collaborator Author

sarayourfriend Jun 7, 2023 Collaborator Author

sarayourfriend
May 22, 2023
Collaborator

Replies: 3 comments 2 replies

AetherUnbound
May 23, 2023
Collaborator

AetherUnbound May 25, 2023
Collaborator

obulat
May 24, 2023
Maintainer

sarayourfriend
Jun 6, 2023
Collaborator Author

sarayourfriend Jun 7, 2023
Collaborator Author