-
Notifications
You must be signed in to change notification settings - Fork 1
How are Extracts and Reductions Merged?
I'd say there are two parts in the code that are the most complex. The first tricky part is understanding rearranging pages and accounting for new (and often divided) index and slope values. The second tricky part to comprehend is understanding how extracts and reductions are merged together. There's a great Google doc outlining all values contained in a reduction. I've also outlined in a wiki article what an extract should look like returned from Caesar (see rawExtracts
in the link).
Looking at these two definitions (an extract and reduction), it's clear to see similar values for each object and how they would coincide with one another. Comparing the two, extracts need to retrieve some information from reductions, which are then recorded as parsedExtracts
in the TranscriptionsStore
.
The parseTranscriptionData
function handles the heavy lifting of reconciling these two objects. I'll outline briefly what is happening in the several functions here:
SLOPE_BUFFER
Used to round slope values to a human-human readable number. It's unlikely Caesar will return a slope value of exactly 0 degrees. It's more likely a user will place a line at a slight slant (perhaps 2.139578 degrees or so). Because of this, a buffer is used to round these slopes to the nearest tenth value.
function constructCoordinates(line)
Looking at the extracts definition from the link above, extracts have a separate clusters_x
(array) and a clusters_y
(array). It's easier to map these values together to return an object with an x1
, x2
, y1
, and y2
value. This function only takes into consideration the first and last values in the clusters_x
and clusters_y
variables.
function constructText(line)
This function constructs sentences from the reduction clusters_text
variable. The clusters_text
array contains sub arrays for each word in a reduction. Ex: given the following clusters_text
value:
[
['the', 'The', 'a'],
['test', 'Test', 'testing'],
['sentence', 'Sentence', 'sentence.']
]
constructText
would return the following:
[
['the test sentence'],
['The Test Sentence'],
['a testing sentence.']
]
function mapExtractsToReductions(
extractsByUser = {},
reduction = {},
reductionIndex = 0,
reductionText = [],
subjectIndex = 0,
extractUsers = {}
)
This function does a lot of the heavy lifting, and there are comments provided in the code to explain what is happening. There are also a few existential checks to make sure invalid params cannot be sent into the function.
This function consumes the cleaned up extracts by users and the reduction to spit out data that become the parsedExtracts
consumed by the TranscriptionsStore
. There is a bit of an arbitrary check at isMatch && similarSlope
which determines if that extract matches the same index values and slope from the reduction. However, it would be very rare for an extract and reduction to match those values and have differing data.