Disclaimer: This is not an official Google product.
Organizing the issues in your GitHub repositories can be a different kind of animal, that's why you need LabelCat.
- Install Node.js >= 8.x
git clone https://github.com/GoogleCloudPlatform/LabelCat
cd LabelCat
npm install
npm link .
cp defaultsettings.json settings.json
(settings.json
is where you customize the app)- Modify
settings.json
as necessary.
-
In the GCP Console, go to the Manage Resources page and select or create a new project:
-
Update
settings.json
to include your GCP Project ID and Compute Region. -
Make sure that billing is enabled for your project:
-
Enable the AutoML Natural Language APIs.
-
Follow the instructions to create a service account and download a key file.
-
Set the
GOOGLE_APPLICATION_CREDENTIALS
environment variable to the path to the Service Account key file that you downloaded when you created the Service Account. For example:export GOOGLE_APPLICATION_CREDENTIALS=key-file
-
Give your new Service Account the AutoML Editor IAM role with the following commands:
gcloud auth login gcloud config set project YOUR_PROJECT_ID gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ --member=serviceAccount:SERVICE_ACCOUNT_NAME \ --role='roles/automl.editor'
replacing
YOUR_PROJECT_ID
with your GCP project ID andSERVICE_ACCOUNT_NAME
with the name of your new Service Account, for example[email protected]
. -
Allow the AutoML Natural Language service accounts to access your Google Cloud project resources:
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ --member="serviceAccount:[email protected]" \ --role="roles/storage.admin"
replacing
YOUR_PROJECT_ID
with your GCP project ID. -
Create a Google Cloud Storage bucket to store the documents that you will use to train your custom model. The bucket name must be in the format:
YOUR_PROJECT_ID-lcm
. Runy the following command to create a bucket in theus-central1
region:gsutil mb -p YOUR_PROJECT_ID -c regional -l `us-central1` gs://YOUR_PROJECT_ID-lcm/
replacing
YOUR_PROJECT_ID
with your GCP project ID.
Run labelcat --help
for usage information.
labelcat <command>
Commands:
labelcat retrieveIssues <repoDataFilePath> Retrieves issues from a .txt file of gitHub
<issuesDataFilePath> <label> repositories. Options: -a
labelcat createDataset <datasetName> Create a new Google AutoML NL dataset with the specified
name. Options: -m
labelcat importData <issuesDataPath> <datasetId> Import the GitHub issues data from Google Cloud Storage
bucket into the Google AutoML NL dataset by specifying
the file's path in the bucket and the dataset ID.
Options:
--version Show version number [boolean]
--help Show help [boolean]
Examples:
labelcat retrieveIssues repoData.txt issuesData.csv 'type: Retrieves issues with matching labels from list of repos
bug' -a 'bug' -a 'bugger' in repoData.txt and saves the resulting information to
issuesData.csv.
labelcat createDataset Data Creates a new multilabel dataset with the specified
name.
labelcat importData gs://myproject/mytraindata.csv Imports the GitHub issues data into the dataset by
1248102981 specifying the file of issues data and the dataset ID.
-
Create a
repos.txt
file with a single column list of GitHub repositories from which to collect issue data. The format should be:owner/:repository
:Example:
GoogleCloudPlatform/google-cloud-node GoogleCloudPlatform/google-cloud-java GoogleCloudPlatform/google-cloud-python
-
From the project folder, run the retrieveIssues command with the path of the repository list file, path to a location to save the resulting
.csv
file, desired issue label, and optional alternative issue labels:Example:
labelcat retrieveIssues repos.txt issues.csv "type: bug" -a "bug"
-
Upload the resulting .csv file to your Google Cloud Storage Bucket:
Example:
gsutil cp repos.txt gs://YOUR_PROJECT_ID-lcm/
replacing
YOUR_PROJECT_ID
with your GCP project ID.
-
From the project folder, run the createDataset command with the name of the dataset to create.
Example:
labelcat createDataset TestData
-
Run listDataset to return a list of all AutoML NL datasets for the Google Cloud Platform project.
Example:
labelcat listDatasets
-
Run importData using the Dataset ID returned by the createDataset command and the URI to the issue data
.csv
file.Example:
labelcat importData gs://YOUR_PROJECT_ID-lcm/issues.csv 123ABCD456789
replacing
YOUR_PROJECT_ID
with your GCP project ID.
-
Run createModel using the Dataset ID and the name of the model to be created.
Example:
labelcat createModel 123ABCD456789 firstModel
See CONTRIBUTING.
Copyright 2018, Google, Inc.
Licensed under the Apache License, Version 2.0
See LICENSE.