Skip to content

Anatomy of a Reduction

Will Granger edited this page Sep 8, 2020 · 1 revision

ALICE gives researchers the ability to view classifications made by volunteers. That is done after classifications have been passed through Caesar through extracts, compiled into reductions that contain aggregate data, and sent to TOVE. TOVE takes those reductions from Caesar and turns them into Transcriptions.

TOVE populates each transcription.text key with that reduction from Caesar, but several other keys are also created either by TOVE or ALICE and added to that object. This is an overview of all keys in the transcription.text object.

The Transcription.text object

As mentioned in the anatomy of a transcription, the Transcription.text object contains keys that must start with frame_. Each of those keys then contains an array of Reductions. Reductions are defined in the code with the following keys:

clusters_x: types.array(types.number)
Taken from Caesar, this is a list with the start and end x positions of the line of text.

clusters_y: types.array(types.number)
Taken from Caesar, this is a list with the start and end y positions of the line of text.

clusters_text: types.array(types.array(types.string))
Taken from Caesar, this is a list-of-lists containing the aggregated text. There is one item for each word in the line of text. The inner lists contain one item showing what each user said for that word. An empty string indicates a user did not include that word.

consensus_text: types.optional(types.string, '')
Taken from Caesar, this is a string of the most common occurring (consensus) words provided by transcribers. These words provided the algorithm-determined consensus transcription for that line.

consensus_score: types.optional(types.number, 0)
Taken from Caesar, if everyone agrees on the line of text this will be equal to the number of views.

edited_consensus_text: types.optional(types.string, '')
Added by ALICE, this key is created or changes if a reviewer does not accept the algorithm-created consensus_text. If a reviewer creates their own transcription or chooses a transcription from a different transcriber, that selection will appear here. If a researcher reverts back to the consensus_text, this key will return to an empty string.

extract_index: types.array(types.integer)
Taken from Caesar, the extract_index is used to look up what extract created this line. The user_id can be used to pick out an extract, the frame# to pick out the frame, and this index to pull out the text, points.x, and points.y (these three are all lists within the extract). Indexing is 0 based.

flagged: types.optional(types.boolean, false)
Added by ALICE, a reviewer can flag an individual transcription line. If setting this to true for a single line, this will also flag the entire Transcription object as flagged in the checkForFlagUpdates function.

gold_standard: types.array(types.boolean)
Taken from Caesar, A boolean indicating if the transcription was made in “gold standard mode” when transcribing (NOTE: this is the mode that can be turned on by project owners when they classify on their own project).

gutter_label: types.optional(types.integer, 0)
Taken from Caesar, an integer indicating what column block the line of text belongs to. All text with the same gutter_label were written in the same column of text.

line_editor: types.optional(types.string, '')
Added by ALICE, if a reviewer edits a line by accepting a different transcriber's text or editing the line text directly, thus changing the edited_consensus_text key, the line_editor key changes to the username of the reviewer making changes. See the setConsensusText function for more.

line_slope: types.maybeNull(types.number)
Taken from Caesar, the average slope for this block of text (see slope_label).

low_consensus: types.optional(types.boolean, false)
Taken from Caesar, this is true when the consensus_score is less than the low_consensus_threshold set on the reducer parameters.

number_views: types.optional(types.integer, 0)
Taken from Caesar, the number of users who transcribed this line of text.

original_transcriber: types.optional(types.string, '')
Added by ALICE, if a reviewer selects another user's transcription over the algorithm-created consensus_text, this key becomes the name of the user who originally transcribed that line.

seen: types.optional(types.boolean, false)
Added by ALICE, this becomes true is a reviewer adds the green dot flag to a line.

slope_label: types.optional(types.integer, 0)
Taken from Caesar, qn integer indicating what slope block this line of text belongs to. All text with the same slope_label were written at the same angle (i.e. horizontal vs vertical) and will have the same line_slope value.

user_ids: types.array(types.maybeNull(types.union(types.integer, types.string))) Taken from Caesar, the user IDs for each user who transcribed the line of text. - This is in the same order as each word list and gold_standard - The panoptes API can be used to turn these user IDs into a usernames