Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ground truth labeling module #252

Merged
merged 9 commits into from
Oct 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- added GitHub as code repository option along with AWS CodeCommit for sagemaker templates batch_inference, finetune_llm_evaluation, hf_import_models and xgboost_abalone
- added `ray-orchestrator` module
- added GitHub as alternate option for code repository support along with AWS CodeCommit for sagemaker-templates-service-catalog module
- added SageMaker ground truth labeling module

### **Changed**
- updated manifests to idf release 1.12.0
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ End-to-end example use-cases built using modules in this repository.
| [SageMaker Model Package Promote Pipeline Module](modules/sagemaker/sagemaker-model-package-promote-pipeline/README.md) | Deploy a Pipeline to promote SageMaker Model Packages in a multi-account setup. The pipeline can be triggered through an EventBridge rule in reaction of a SageMaker Model Package Group state event change (Approved/Rejected). Once the pipeline is triggered, it will promote the latest approved model package, if one is found. |
| [SageMaker Model Monitoring Module](modules/sagemaker/sagemaker-model-monitoring/README.md) | Deploy data quality, model quality, model bias, and model explainability monitoring jobs which run against a SageMaker Endpoint. |
| [SageMaker Model CICD Module](modules/sagemaker/sagemaker-model-cicd/README.md) | Creates a comprehensive CICD pipeline using AWS CodePipelines to build and deploy a ML model on SageMaker. |
| [SageMaker Ground Truth Labeling Module](modules/sagemaker/sagemaker-ground-truth-labeling/README.md) | Creates a state machine to allow labeling of images and text file, uploaded to the upload bucket, using various built-in task types in SageMaker Ground Truth. |

### Mlflow Modules

Expand Down
40 changes: 40 additions & 0 deletions examples/sagemaker-ground-truth-labeling/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# SageMaker Ground truth labeling examples

### Description

This folder contains examples for each of the built-in task types for the sagemaker ground truth module. Each folder contains an example manifest as well as any necessary templates. Please upload the templates to an S3 bucket and update the manifest with the correct location.

### Additional workers

For tasks without a verification step (all except `image_bounding_box` and `image_semantic_segmentation`) we recommend increasing the number of human reviewers per object to increase accuracy. This will only work if you have at least that many reviewers in your workteam, as the same reviewer cannot review the same item twice. To adjust the number of workers add the additional parameters below to your manifest:

```yaml
- name: labeling-human-task-config
value:
NumberOfHumanWorkersPerDataObject: 5
TaskAvailabilityLifetimeInSeconds: 21600
TaskTimeLimitInSeconds: 300
```

### Using public workforce

As mentioned in the README you can use a public workforce for your task if you wish (at an additional cost). More information on using a public workforce like Amazon Mechanical Turk is available [here](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-workforce-management-public.html). Labeling and verification task prices is specified in USD, see [here](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_PublicWorkforceTaskPrice.html) for allowed values. [This page](https://aws.amazon.com/sagemaker/groundtruth/pricing/) provides suggested pricing based on task type. To use a public workforce add / adjust the following parameters to your manifest:

```yaml
- name: labeling-workteam-arn
value: 'arn:aws:sagemaker:<region>:394669845002:workteam/public-crowd/default'
- name: labeling-task-price
value:
AmountInUsd:
Dollars: 0
Cents: 3
TenthFractionsOfACent: 6
- name: verification-workteam-arn
value: 'arn:aws:sagemaker:<region>:394669845002:workteam/public-crowd/default'
- name: verification-task-price
value:
AmountInUsd:
Dollars: 0
Cents: 3
TenthFractionsOfACent: 6
```
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"labels": [{"label": "Plane"}, {"label": "Boat"}]}
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
<script src="https://assets.crowd.aws/crowd-html-elements.js"></script>
<crowd-form>
<crowd-bounding-box
name="boundingBox"
src="{{ task.input.taskObject | grant_read_access }}"
header="Please draw a box around all planes and boats in the image."
labels="{{ task.input.labels | to_json | escape }}"
>
<full-instructions header="Bounding box instructions">
<ol>
<li><strong>Inspect</strong> the image</li>
<li><strong>Determine</strong> if the specified label is/are visible in the picture.</li>
<li><strong>Outline</strong> each instance of the specified label in the image using the provided “Box” tool.</li>
</ol>
<ul>
<li>Boxes should fit tight around each object</li>
<li>Do not include parts of the object are overlapping or that cannot be seen, even though you think you can interpolate the whole shape.</li>
<li>Avoid including shadows.</li>
<li>If the target is off screen, draw the box up to the edge of the image.</li>
</ul>
</full-instructions>

<short-instructions>
<Strong>Outline</strong> each instance of the specified label in the image using the provided “Box” tool.
<!-- You may wish to include examples of correctly and incorrectly labeled images here -->
</short-instructions>
</crowd-bounding-box>
</crowd-form>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"labels":[{"label":"Label(s) correct"},{"label":"Incorrect label - missed object"},{"label":"Incorrect label - bounding box not accurate enough"}]}
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
<script src="https://assets.crowd.aws/crowd-html-elements.js"></script>
<crowd-form>
<crowd-image-classifier
name="annotatedResult"
src="{{ task.input.taskObject | grant_read_access }}"
header="Review the existing labels on the objects and choose the appropriate option."
categories="{{ task.input.labels | to_json | escape }}"
overlay="{
'boundingBox': {
labels: ['Plane','Boat'],
value: [
{% for box in task.input.manifestLine["label"].annotations %}
{% capture class_id %}{{ box.class_id }}{% endcapture %}
{% assign label = task.input.manifestLine["label-metadata"].class-map[class_id] %}
{
label: {{label | to_json}},
left: {{box.left}},
top: {{box.top}},
width: {{box.width}},
height: {{box.height}},
},
{% endfor %}
]
}
}"
>
<full-instructions header="Label verification - Bounding box instructions">
<ol>
<li><strong>Read</strong> the task carefully and inspect the image.</li>
<li><strong>Read</strong> the options and review the examples provided to understand more about the labels.</li>
<li><strong>Choose</strong> the appropriate label that best suits the image.</li>
</ol>
</full-instructions>
<short-instructions>
<strong>Choose</strong> the appropriate label that best suits the image.
<!-- You may wish to include examples of correctly and incorrectly labeled images here -->
</short-instructions>
</crowd-image-classifier>
</crowd-form>
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: ground-truth-labeling
path: modules/sagemaker/sagemaker-ground-truth-labeling
targetAccount: primary
parameters:
- name: job_name
value: 'plane-and-boat-bounding-box'
- name: task_type
value: 'image_bounding_box'

- name: labeling-workteam-arn
value: 'arn:aws:sagemaker:<region>:<account>:workteam/private-crowd/<workteam_name>'
- name: labeling-instructions-template-s3-uri
value: 's3://<bucket_name>/image_bounding_box_labeling_template.html'
- name: labeling-categories-s3-uri
value: 's3://<bucket_name>/image_bounding_box_labeling_categories.json'
- name: labeling-task-title
value: 'Labeling - Bounding boxes: Draw bounding boxes around all planes and boats in the image'
- name: labeling-task-description
value: 'Draw bounding boxes around all planes and boats in the image'
- name: labeling-task-keywords
value: ['image', 'object', 'detection']

- name: verification-workteam-arn
value: 'arn:aws:sagemaker:<region>:<account>:workteam/private-crowd/<workteam_name>'
- name: verification-instructions-template-s3-uri
value: 's3://<bucket_name>/image_bounding_box_verification_template.liquid'
- name: verification-categories-s3-uri
value: 's3://<bucket_name>/image_bounding_box_verification_categories.json'
- name: verification-task-title
value: 'Label verification - Bounding boxes: Review the existing labels on the objects and choose the appropriate option.'
- name: verification-task-description
value: 'Verify that all of the planes and boats in the image are correctly labeled'
- name: verification-task-keywords
value: ['image', 'object', 'detection', 'label verification', 'bounding boxes']
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"labels": [{"label": "Plane"}, {"label": "Boat"}]}
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
<script src="https://assets.crowd.aws/crowd-html-elements.js"></script>
<crowd-form>
<crowd-image-classifier-multi-select
name="crowd-image-classifier-multi-select"
src="{{ task.input.taskObject | grant_read_access }}"
header="Please select the correct categories for this image"
categories="{{ task.input.labels | to_json | escape }}"
exclusion-category="{ text: 'None of the above' }"
>
<full-instructions header="Classification Instructions">
<p>If more than one label applies to the image, select multiple labels.</p>
<p>If no labels apply, select <b>None of the above</b></p>
</full-instructions>

<short-instructions>
<p>Read the task carefully and inspect the image.</p>
<p>Choose the appropriate label(s) that best suit the image.</p>
<!-- You may wish to include examples of correctly and incorrectly labeled images here -->
</short-instructions>
</crowd-image-classifier-multi-select>
</crowd-form>
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
name: ground-truth-labeling
path: modules/sagemaker/sagemaker-ground-truth-labeling
targetAccount: primary
parameters:
- name: job_name
value: 'vehicle-classification'
- name: task_type
value: 'image_multi_label_classification'

- name: labeling-workteam-arn
value: 'arn:aws:sagemaker:<region>:<account>:workteam/private-crowd/<workteam_name>'
- name: labeling-instructions-template-s3-uri
value: 's3://<bucket_name>/image_multi_label_labeling_template.html'
- name: labeling-categories-s3-uri
value: 's3://<bucket_name>/image_multi_label_labeling_categories.json'
- name: labeling-task-title
value: 'Labeling - Multi-Classification: Classify all images as containing a plane and/or a boat'
- name: labeling-task-description
value: 'Classify all images as containing a plane and/or a boat, selecting all of the appropriate labels'
- name: labeling-task-keywords
value: ['image', 'object', 'multi classification']
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"labels": [{"label": "Plane"}, {"label": "Boat"}]}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
<script src="https://assets.crowd.aws/crowd-html-elements.js"></script>
<crowd-form>
<crowd-semantic-segmentation
name="crowd-semantic-segmentation"
src="{{ task.input.taskObject | grant_read_access }}"
header="Please fill all planes and boats in the image"
labels="{{ task.input.labels | to_json | escape }}"
>
<full-instructions header="Segmentation Instructions">
<ol>
<li><strong>Read</strong> the task carefully and inspect the image.</li>
<li><strong>Read</strong> the options and review the examples provided to understand more about the labels.</li>
<li><strong>Choose</strong> the appropriate label that best suits the image.</li>
</ol>
</full-instructions>

<short-instructions>
Use the tools to label the requested items in the image
<!-- You may wish to include examples of correctly and incorrectly labeled images here -->
</short-instructions>
</crowd-semantic-segmentation>
</crowd-form>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"labels":[{"label":"Label(s) correct"},{"label":"Incorrect label - missed object"},{"label":"Incorrect label - segmentation not accurate enough"}]}
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
<script src="https://assets.crowd.aws/crowd-html-elements.js"></script>
<crowd-form>
<crowd-image-classifier
name="annotatedResult"
src="{{ task.input.taskObject | grant_read_access }}"
header="Review the existing labels on the objects and choose the appropriate option."
categories="{{ task.input.labels | to_json | escape }}"
overlay="{
'semanticSegmentation': {
'labels': [
{% for key_value in task.input.manifestLine.label-ref-metadata.internal-color-map %}
{% assign item = key_value[1] %}
{% if item['class-name'] != 'BACKGROUND' %}
'{{ item['class-name'] }}',
{% endif %}
{% endfor %}
],
labelMappings: {
{% for key_value in task.input.manifestLine.label-ref-metadata.internal-color-map %}
{% assign item = key_value[1] %}
{% if item['class-name'] != 'BACKGROUND' %}
{{ item['class-name'] }}: {
color: '{{ item['hex-color'] }}'
},
{% endif %}
{% endfor %}
},
src: '{{ task.input.manifestLine['label-ref'] | grant_read_access }}',
}
}"
>
<full-instructions header="Label verification instructions">
<ol>
<li><strong>Read</strong> the task carefully and inspect the image.</li>
<li><strong>Read</strong> the options and review the examples provided to understand more about the labels.</li>
<li><strong>Choose</strong> the appropriate label that best suits the image.</li>
</ol>
</full-instructions>
<short-instructions>
<strong>Choose</strong> the appropriate label that best suits the image.
<!-- You may wish to include examples of correctly and incorrectly labeled images here -->
</short-instructions>
</crowd-image-classifier>
</crowd-form>
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: ground-truth-labeling
path: modules/sagemaker/sagemaker-ground-truth-labeling
targetAccount: primary
parameters:
- name: job_name
value: 'plane-and-boat-sem-seg'
- name: task_type
value: 'image_semantic_segmentation'

- name: labeling-workteam-arn
value: 'arn:aws:sagemaker:<region>:<account>:workteam/private-crowd/<workteam_name>'
- name: labeling-instructions-template-s3-uri
value: 's3://<bucket_name>/image_semantic_segmentation_labeling_template.html'
- name: labeling-categories-s3-uri
value: 's3://<bucket_name>/image_semantic_segmentation_labeling_categories.json'
- name: labeling-task-title
value: 'Labeling - Semantic segmentation: Fill all planes and boats in the image'
- name: labeling-task-description
value: 'Fill all planes and boats in the image using the appropriate label'
- name: labeling-task-keywords
value: ['image', 'object', 'detection']

- name: verification-workteam-arn
value: 'arn:aws:sagemaker:<region>:<account>:workteam/private-crowd/<workteam_name>'
- name: verification-instructions-template-s3-uri
value: 's3://<bucket_name>/image_semantic_segmentation_verification_template.liquid'
- name: verification-categories-s3-uri
value: 's3://<bucket_name>/image_semantic_segmentation_verification_categories.json'
- name: verification-task-title
value: 'Label verification - Semantic segmentation: Review the existing labels on the objects and choose the appropriate option.'
- name: verification-task-description
value: 'Verify that all of the planes and boats in the image are correctly labeled'
- name: verification-task-keywords
value: ['image', 'object', 'detection', 'label verification', 'semantic segmentation']
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"labels": [{"label": "Plane"}, {"label": "Boat"}, {"label": "Neither"}]}
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
<script src="https://assets.crowd.aws/crowd-html-elements.js"></script>
<crowd-form>
<crowd-image-classifier
name="crowd-image-classifier"
src="{{ task.input.taskObject | grant_read_access }}"
header="Please select the correct category for this image"
categories="{{ task.input.labels | to_json | escape }}"
>
<full-instructions header="Classification Instructions">
<p>Read the task carefully and inspect the image.</p>
<p>Choose the appropriate label that best suits the image.</p>
</full-instructions>

<short-instructions>
Choose the appropriate label that best suits the image.
<!-- You may wish to include examples of correctly and incorrectly labeled images here -->
</short-instructions>
</crowd-image-classifier>
</crowd-form>
Loading
Loading