Skip to content

How are Extracts and Reductions Merged?

Will Granger edited this page Sep 8, 2020 · 2 revisions

I'd say there are two parts in the code that are the most complex. The first tricky part is understanding rearranging pages and accounting for new (and often divided) index and slope values. The second tricky part to comprehend is understanding how extracts and reductions are merged together. There's a great Google doc outlining all values contained in a reduction. I've also outlined in a wiki article what an extract should look like returned from Caesar (see rawExtracts in the link).

Looking at these two definitions (an extract and reduction), it's clear to see similar values for each object and how they would coincide with one another. Comparing the two, extracts need to retrieve some information from reductions, which are then recorded as parsedExtracts in the TranscriptionsStore.

The parseTranscriptionData function handles the heavy lifting of reconciling these two objects. I'll outline briefly what is happening in the several functions here:

SLOPE_BUFFER
Used to round slope values to a human-human readable number. It's unlikely Caesar will return a slope value of exactly 0 degrees. It's more likely a user will place a line at a slight slant (perhaps 2.139578 degrees or so). Because of this, a buffer is used to round these slopes to the nearest tenth value.


function constructCoordinates(line)
Looking at the extracts definition from the link above, extracts have a separate clusters_x (array) and a clusters_y (array). It's easier to map these values together to return an object with an x1, x2, y1, and y2 value. This function only takes into consideration the first and last values in the clusters_x and clusters_y variables.


function constructText(line)
This function constructs sentences from the reduction clusters_text variable. The clusters_text array contains sub arrays for each word in a reduction. Ex: given the following clusters_text value:

[
  ['the', 'The', 'a'],
  ['test', 'Test', 'testing'],
  ['sentence', 'Sentence', 'sentence.']
]

constructText would return the following:

[
  ['the test sentence'],
  ['The Test Sentence'],
  ['a testing sentence.']
]

function mapExtractsToReductions(
  extractsByUser = {},
  reduction = {},
  reductionIndex = 0,
  reductionText = [],
  subjectIndex = 0,
  extractUsers = {}
)

This function does a lot of the heavy lifting, and there are comments provided in the code to explain what is happening. There are also a few existential checks to make sure invalid params cannot be sent into the function.

This function consumes the cleaned up extracts by users and the reduction to spit out data that become the parsedExtracts consumed by the TranscriptionsStore. There is a bit of an arbitrary check at isMatch && similarSlope which determines if that extract matches the same index values and slope from the reduction. However, it would be very rare for an extract and reduction to match those values and have differing data.