Skip to content
This repository has been archived by the owner on Sep 26, 2019. It is now read-only.

Project Subjects

Brian Foo edited this page Sep 28, 2015 · 5 revisions

A subject is uniquely identifiable media document, usually containing an image, that users are asked to work on. There are up to three types of subjects in a typical project:

  • primary The initial subject that users are asked to work on; by default belongs to the Mark workflow.
  • secondary Immediate descendant of the primary subject; by default belongs to the Transcribe workflow.
  • tertiary Immediate descendant of the secondary subject; grandchild of the primary subject; by default belongs to the Verify workflow.

Depending on the workflow, subjects may generate child subjects with their own workflow. A typical example is as follows: Users may annotate a primary subject by placing marks on regions that require subsequent transcription. Once a mark is submitted, it becomes a child subject with its own workflow. In this case, the secondary workflow could be a series of transcription tasks. In theory, the child subjects can continue to generate additional child subjects ad infinitum, but keep in mind that the initial goal can take exponentially longer to complete.

Setting up your primary subjects

Primary subjects (usually images) are defined in .csv files in the subjects directory of your project folder:

my_project/
+-- subjects/
|   +-- group_dogs.csv
|   +-- group_cats.csv
|   +-- group_mice.csv
|   +-- groups.csv

SubjectSets

Each csv file represents a set of subjects (we call them _ SubjectSets_). A Subject always belongs to a single SubjectSet. Multi-page documents are represented by multiple subjects associated by a single SubjectSet. The SubjectSet group_cats.csv file may look like this:

order,file_path,thumbnail,width,height
1,http://placekitten.com/800/600/a,http://placekitten.com/200/150/a,800,600
2,http://placekitten.com/800/600/b,http://placekitten.com/200/150/b,800,600
3,http://placekitten.com/800/600/c,http://placekitten.com/200/150/c,800,600
...

The columns are as follows:

  • order - Integer - the sequence of the subjects
  • file_path - String - the URL to the full media file
  • thumbnail - String - the URL to the thumbnail image of the media file
  • width - Integer - width in pixels of media file
  • height - Integer - height in pixels of media file

Groups

Groups organize SubjectSets into related collections. The groups.csv file defines your groups like so:

key,name,description,cover_image_url,external_url,meta_data_1,retire_count
dogs,The Dog Group,A Collection of Dog Images,http://placepuppy.com/400/300,http://en.wikipedia.org/wiki/Dog,Doggy Data,2
cats,The Cat Group,A Collection of Cat Images,http://placekitten.com/400/300,http://en.wikipedia.org/wiki/Cat,Kitty Data,3
mice,The Mouse Group,A Collection of Mouse Images,http://placemouse.com/400/300,http://en.wikipedia.org/wiki/Mouse,Mickey Data,2
...

The columns are as follows:

  • key - String - SubjectSet key in the filename of the SubjectSet, e.g. a key dogs refers to the SubjectSet group_dogs.csv
  • name - String - The name of the group to be displayed
  • description - Text - The description of the group to be displayed
  • cover_image_url - String - The URL of the image that represents the group
  • external_url - String - The external URL to link to for more information about this group
  • meta_data_%integer% - String - Arbitrary known data about the group that may be useful to display
  • retire_count - Integer - Number indicating threshold for retiring the subject operated on in a given workflow. Read more about this on the Project Workflows page.

Next step: Create Help Content