Label Studio Setup & Usage

This is a guide to setting up and using Label Studio for the purpose of annotating data for the entity extraction model.

The finding fossils team setup a privately hosted version of LabelStudio using HuggingFace spaces. The main steps were:

Table of Contents

Label Studio Setup & Usage
- Label Studio Setup
- Label Studio Usage

Label Studio Setup

Create Azure Blob Storage

Create an Azure blob storage container to house both the data to be labelled as well as the output labelled files.
1. Use this guide: Azure Quickstart Upload/Download Blobs
Create a raw folder in the blob storage by specifying the folder name in the Upload to folder field under the advanced tab, and upload a blank TXT file to it (as a placeholder).
Repeat step 3 for labelled folder

To allow LabelStudio to access the objects stored in the bucket, update the Resource Sharing (CORS) settings of the blob storage following the LabelStudio Prerequisites section here.

Setup PostgreSQL Database

Create a Azure Database for PostgreSQL - Flexible Server to take advantage of 750 hours of free compute time, which should be able to run the database free each month.
MAKE SURE YOU REMEMBER THE ADMIN ACCOUNT NAME AND PASSWORD
Create a database once the postgres instance is up and running. e.g. labelstudiodev.
Install the Postgres Azure extensions required by LabelStudio.
Follow section How to use PostgreSQL extensions section from the Azure docs here
- pg_trgm - postgres trigram text extension
- btree_gin - support for indexing common datatypes in GIN

In Label Studio, follow the instructions under section Enable Configuration Persistance and set the required environment variables under settings in label studio.

DJANGO_DB: Set this to default.
POSTGRE_NAME: Set this to the name of the Postgres database.
POSTGRE_USER: Set this to the Postgres username.
POSTGRE_PASSWORD: Set this to the password for your Postgres user.
POSTGRE_HOST: Set this to the host that your Postgres database is running on.
- On Azure this is under Overview and is called Server name, e.g. findingfossilslabelstudio.postgres.database.azure.com
POSTGRE_PORT: Set this to the port that your Pogtgres database is running on. Default is 5432 (recommended).
STORAGE_PERSISTENCE: Set this to 1 to remove the warning about ephemeral storage.

Setup Label Studio External Storage

Inside the Label Studio instance

Storage title: Chosen name to be displayed in LS
Container name: the Azure blob container name
- finding-fossils-labelling-dev
Container prefix: folder path within the contatiner for which to load the data from
- labelling-raw
File filter Regex: pattern matching, .\* for all files
Accountname: name of storage account
- findingfossilsdev
Account key: from Azure storage page --> Access keys

Downloading Labelled Files for Training

To download all currently labelled files follow the following steps:

Navigate to the Azure blob storage account and locate the labelled folder setup above.
All files in labelled are numbered according to the LabelStudio task ID and are JSON files even though they don't have the extension.
Expand the window and scroll all the way to the bottom of all files to ensure they're all in view.
At the top select the radio button that selects all the files and in the menu bar at the top right select the download button, this will begin downloading all files one by one (yes, it's not perfect).
From your downloads folder select all the files and move into a folder data/entity-extraction/raw/<DATE>_label-export/
Now the NER model training can have this folder entered as the raw label input path.

Label Studio Usage

Account creation

Create a Hugging Face hub account: https://huggingface.co/join
Send your profile name to Ty Andrews to be added to the Finding Fossils organization on Hugging Face Email: ty.elgin.andrews@gmail.com, or create a new organization for a different project to work collaboratively with teammates.
Once in the organization, navigate to the organization page from your profile.
In the organization page, click the space LabelStudio.
Create a LabelStudio account and record your password in your password manager.

Navigation

Open the Green project named like Finding Fossils Labelling - Production or create a new one.
Navigate to the settings menu of the project. Here, several options are available to tweak the settings to be compatible for your task,

Review or create labelling instructions.
The instructions look like this:
Labeling configuration: After syncing the buckets, the final step is to define the different categories of entities that the named entity recognition model will be trained to predict. A configuration file is used to define the classes and to initialize the UI components to aid a user label entities. A sample config file has the following tags:

<View>
  <View style="display:flex;align-items:start;gap:8px;flex-direction:row-reverse">
    <Text name="text" valueType="text" value="$text" granularity="word"/>
    <Labels name="label" toName="text" showInline="false">
      <Label value="SITE" background="#336cf0"/>
      <Label value="GEOG" background="#D4380D"/>
      <Label value="AGE" background="#f0c528"/>
      <Label value="ALTI" background="#86d425"/>
      <Label value="TAXA" background="#925ff2"/>
      <Label value="EMAIL" background="#ff941a"/>
    <Label value="REGION" background="#ff9ee5"/></Labels>
  </View>
</View>

For more information about config files to setup a custom LabelStudio NER labeling task, refer the this documentation.

For general information, visit LabelStudios templates page..

Labeling

Select the task with global_index of 0, the global index indicates this is the start of the article and start labelling each task by moving onto the next global_index number.

Ensure pre-labelled entities are correct and/or fix: we have tried to auto-tag entities to make this faster but it’s not perfect and this is what we’re improving, so this commonly misses entities or gets them partially right.
Label any missed entities: these can be things with typos, words being smushed together, etc.

Using the labelling interface:
Correct a pre-labelled entity:

If it’s completely wrong, delete it
- Select the label
- On the right bar click the delete button
To fix a partial match
- Delete the entity using above
- Select the correct label and click/drag the correct span of text

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LabelStudio_README.md

LabelStudio_README.md

Label Studio Setup & Usage

Label Studio Setup

Create Azure Blob Storage

Setup PostgreSQL Database

Setup Label Studio External Storage

Downloading Labelled Files for Training

Label Studio Usage

Account creation

Navigation

Labeling

Files

LabelStudio_README.md

Latest commit

History

LabelStudio_README.md

File metadata and controls

Label Studio Setup & Usage

Label Studio Setup

Create Azure Blob Storage

Setup PostgreSQL Database

Setup Label Studio External Storage

Downloading Labelled Files for Training

Label Studio Usage

Account creation

Navigation

Labeling