Triaging contributions

Triaging guide
- Basic review questions
- Subjective review questions
Triager responsibilities
- Labels
- Label workflow guide
Triaging schedule

Triaging guide

Before you begin, familiarize yourself with the basic concepts around skills and knowledge, as well as the file formats for compositional skills, grounded vs freeform skills, and knowledge.

InstructLab Community Learning Guide
Taxonomy README

Triaging is the practice of reviewing existing skill and knowledge pull requests (PRs) to make sure they're relevant, actionable, and have all the information needed to be fully evaluated by both the Taxonomy Triage team (Triagers, @taxonomy-triagers) and the Taxonomy Approvers (Approvers, @taxonomy-approvers).

Triagers review open pull requests and use labels to manage their state and any actions needed. Triagers are also encouraged to provide informative and helpful comments either back to the contributor, to other Triagers or to the Approvers. And remember to be nice.

Important

Triagers DO NOT MERGE skills pull requests. This action will be done after final approval by @taxonomy-approvers

Basic review questions

Does the PR have the pull request template information filled out?
- If skill has not been run through lmdk, assign unverified label
Did all the PR checks pass?
Does the skill have 5 or more examples?
- NOTE 2024-03-12: This has been increased from 3 in the most recent guidance from the approvers!
Make sure fields in YAML are correct
- configure tooling eventually: linting, formatting
No PII in content (may eventually be automated)
No toxic or hateful content (HAP - hate abuse and profanity) (may eventually be automated)
Was response clearly generated by LLM (not easy but if very obvious) (may eventually be automated)

Subjective review questions

Is this a skill or knowledge
Do we think that the model could actually be improved by the examples?
- Is this a skill that you can even teach an LLM
Is the skill appropriately placed within the taxonomy? (If outside of skill directory, address the issue)

Note

The skill taxonomy structure is used in several ways:

Selecting the right subset of the taxonomy to use for data generation.
Interpretability by human contributors and maintainers.
As part of the prompt to the model used to generate synthetic samples. Therefore: Make sure the names of directories match the intent of the taxonomy files, perhaps also see if there's a more logical place in the taxonomy structure for a person's contribution to live before signing off.

Potential automation: (at a later date)

HAP filtering
PII filtering
sanity check: is model response similar or complete different from provided by contributor?
sanity check: is this a skill that you can even teach an LLM?
generation check: do the teacher-model-generated instructions actually make sense with the skill being added?

Triager Responsibilities

Labels

There are a few labels that the triager is responsible for when reviewing PRs:

ci - the PR touches our CI system
enhancement New feature or request - the PR has a new feature or request
knowledge (Auto labeled) - the PR is a knowledge contribution
legal-hold We would like this at some point, but legal advice is needed. - the PR is a good suggestion but we need legal signoff, or review
github_actions Pull requests that update GitHub Actions code - the PR touches our GitHub Actions configuration
help wanted Extra attention is needed - extra attention is needed
question Further information is requested - further information is requested
precheck-generate-ready PR is ready for precheck or generate step - The PR has passed all the linting and "code" now is in the model enagegment loop
sdg-unsuccessful PR failed Synthetic Data Generation - PR failed Synthetic Data Generation
stale stale-bot has marked you as stale - the stale-bot has marked you as stale
skill (Auto labeled) - a skills contribution as opposed to documentation contribution or a knowledge contribution
topic-failure a topic that we are not accepting (leave comment on specifics) - a topic that we are not accepting (leave comment on specifics)
https://github.com/instructlab/taxonomy/labels/triage-approved - triage team has signed off
- re-assign to @taxonomy-approvers
- add comment and tag @taxonomy-approvers
https://github.com/instructlab/taxonomy/labels/triage-follow-up - triager needs to follow up after requested changes have been made
triage-needed (Auto labeled) skill is ready to be triaged - skill needs a triager to review it
- triager assigns to themself when you beginning review
triage-requested-changes skill has been reviewed; changes requested from contributor - skill has been reviewed; changes requested from contributor
- triager provides comment in PR asking for additional changes or information
- triager assigns to contributor
triage-rejected PR fails to meet criteria - skill fails criteria
- add informative comment while tagging @taxonomy-approvers
- re-assign to @taxonomy-approvers
triage-uncertain triager is uncertain which can be for a variety of reasons - triager is uncertain which can be for a variety of reasons
- triager stays assigned
- use comment to ask the rest of the triage team for input tagging @taxonomy-triagers
- if still uncertain
  - then re-assign to @taxonomy-approvers
  - triager tags @taxonomy-approvers in informative comment asking for further review from that team

Label Workflow guide

Helpful guidance for different determinations

Reasons for approval

Generation seeds (successfully creates more instructions in a .jsonl file)
Meets all criteria

Reasons for needing further review

Needs more extensive edits
General "I Don't Know"
Safety tasks and skills should always be escalated to @taxonomy-approvers
Super interesting, warrants further study

Reasons for rejection

Submitted knowledge not a skill. For example, troubleshooting on an uncommon IBM Storage Fusion error message.
Obvious LLM answer, blocklist.
- If you're not familiar with what ChatGPT / Bard / etc writing typically looks like, play with it a bit until you can recognize the tone and linguistic patterns.
Couldn’t verify that the model actually lacks the skills — i.e. model can already answer the submitted questions well enough.
Provide examples of model response is too short and neglected reasoning details. For example: A logical question requires multi-step reasoning to reach to the final answer. The submitted model response only gives the final answer.
Uninformative examples. For example, not all examples match the skill requested; Or the user didn’t put three independent question/answer pairs for the skill, but mistakenly submitted three chat turns for the three questions/answer pairs. Or overly repetitive examples which do not help to clear define of the requested skill.
Missing examples: didn’t provide desired model response for the skill.

Note

Skills triagers should try to include as much information as to why the contribution is rejected.

Scrubbing data from issues and pull requests

Title: edit title to remove information
Comment: simply edit or delete a comment; if info is very sensitive and needs to be fully deleted, after editing the comment, use the edit history dropdown menu in the comment to delete previous versions of the comment’s content
Description of issue or pull request cannot be simply deleted, so follow process above to edit and delete history revisions
Code (in pull request files):
- Do NOT close PR or delete source branch yet (important as this would disconnect the PR from the PR source branch and the PR's changed files view remain visible)
- Edit/delete the files on the forked branch (clone the fork, checkout the PR's branch, edit the file(s), git amend last commit or reset HEAD~n to revert last n commits, force push)
- Now close PR, delete source branch
- The original now orphaned commits can still be found, but it takes some effort and the changed files view no longer shows any of the sensitive information
- Edit any comments on the PR with sensitive info and delete the previous versions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

triaging-contributions.md

triaging-contributions.md

Triaging contributions

Triaging guide

Basic review questions

Subjective review questions

Potential automation: (at a later date)

Triager Responsibilities

Labels

Label Workflow guide

Helpful guidance for different determinations

Reasons for approval

Reasons for needing further review

Reasons for rejection

Scrubbing data from issues and pull requests

Triaging schedule

Files

triaging-contributions.md

Latest commit

History

triaging-contributions.md

File metadata and controls

Triaging contributions

Triaging guide

Basic review questions

Subjective review questions

Potential automation: (at a later date)

Triager Responsibilities

Labels

Label Workflow guide

Helpful guidance for different determinations

Reasons for approval

Reasons for needing further review

Reasons for rejection

Scrubbing data from issues and pull requests

Triaging schedule