Skip to content

Post Alignment Change Proposal

Christopher Klapp edited this page Oct 16, 2017 · 9 revisions

Alignment Data Changes

Previous Projects

  • Projects import as regular USFM, whether through tS or tC.
  • Projects store targetLanguage verses as strings.
  • Can be aligned and store alignment data separately.
  • Alignment data is to be exportable as USFM 3.

Aligned Projects

  • Projects import as USFM 3 with embedded alignment data in the words object.
  • Parsed alignment data is persisted via alignment reducer.
  • Parsed verse data could be persisted as a string to maintain backward compatibility.
  • Using tools to edit the verse can modify the string but would invalidate some alignments.
  • Alignment data is to be exportable as USFM 3.

Invalidated Alignments

  • Alignment data matches the BHP version used in the app.
  • Uses of the alignment data such as scripture pane and alignment tool need to gracefully handle changed BHP data.
  • Edited verses in tools like Autographa and translationWords invalidates alignment data.
  • App and Tools need to gracefully handle invalidated alignments due to verse edits.
  • Export needs to be able to validate alignments.

USFM.js

  • Current parsing doesn't maintain punctuation, only word objects.

Non-Word Markers

  • Bidirectional support for mixing word markers and non-word markers.
    • Needs to maintain data like footnote markers and footnote text on import/export.
    • Needs to maintain markers like quotes that are inline and have verse text in them.
    • Preserve punctuation through support of mixed word markers and text
    • Support for all USFM markers

Tests

  • Add as few test use cases as possible to have good coverage of issues.

Inline Markers such as Footnotes

  • Inline Markers need to be maintained in the target language verse text to ensure they are not lost.
  • Inline Markers need to be filtered dynamically when handled in Alignment tool.
    • Footnote text should be filtered out, other marker data should be left in but not the marker itself.
    • This allows the alignment to function on the verse text but bypasses the non-verse text.
    • There may be other markers to be treated as footnotes, but not sure which ones are used.
    • Handle footnotes in a way that can be extended in case others arise.

tC_Resources Importing

  • Create a script or function that parses data from USFM 3 to the desired bible resource format used in tC (Chapter, Verse, word Objects format)

Resource Bibles

  • Gateway Langauge Resource Bibles should now have alignment data but not all of them.
  • Importing Resource Bibles needs to maintain alignment data.
  • Store as an array of word objects like Primary Languages like Greek/Hebrew.
    • See USFM.js changes

Scripture Pane

  • Primary Bible to be the BHP, the Primary Language.
  • Highlighting phrases from the tool needs to be the Primary Language.
  • Gracefully handle showing or not showing word details (lexicon information) for non-Greek bibles resources.
  • Add support for non-word marker objects that come from non-word markers.

Proposed workflow:

  • Render the ULB and BHP based on array of objects
  • Pass the highlighted/quote BHP word object (word, strongs, occurrence, occurrences) to both BHP and ULB.
  • Highlight the word object(s) in the BHP that matches the BHP quote.
  • Highlight the word object(s) in the ULB/others that are aligned to BHP quote.
  • Languages without alignment data or no aligned matches found will not highlight any words.

Verse Check

  • Maintaining target language as strings ensures backward compatibility.
  • Verse edits may invalidate alignments but may not be a concern here.
  • Footnotes and inline markers may need filtered on view but not on edit to ensure they are not lost.
  • Selections made can populate the word alignment reducer.

Word Alignment Tool

  • Validate alignments as changes may have invalidated some.
    • The verse may have been edited since last aligned.
    • Primary Language version may have been updated.

TranslationWords Tool

  • The tool needs an overhaul to base check off of the Strongs number associated with the check.
  • Quote needs updated to be the Primary Language word via Strongs number instead of ULB.
  • Check Card needs to display the Primary Language word and Strongs number along with GL.
  • ContextIds in GroupData needs to be based on Strongs and Greek word in Quote.
  • Multiple occurrences of a strongs number in the Greek to be verified how to handle.
  • False positives where strongs number may occur.
  • Scripture Pane Greek highlighting to be done via Strongs number in check and should be present in the contextId, potentially the Greek word as the quote as well.
  • Check Info Card to additionally pass Greek info into it and needs to be designed on how to display.
  • Show greek lexical information in the Check Info card or anywhere else?

Other Changes

  • Remove tons of defunct code no longer in use.
    • Folders: filters/js/scripts/translation_words/utils
    • Files: loader.js/USFMParse.js
  • Fix linting errors

Tools GroupData Menu

  • The previous tool checking menu is showing English words from the article.
  • Is this still the expected behavior? It seems as though it should be.
  • Support for showing translated article titles in the menu, when source tool articles are translated.

Core Helpers

  • May need core helpers for common functionality.
  • Pivot a verse that is an array of word objects to a string and alignment data.
  • Pivot a verse that is a string and alignment data to an array of word objects.
  • Finding Gateway Language words to be highlighted with the provided Primary Language word.
  • Convert verses that are array of word objects with punctuation
  • CSV Export - to prepare alignment data for CSV export
  • Project Details Helper - Tool Details - Calculate Progress to handle alignment data for word alignment tool (maybe relocate?).
  • Project Validation - validate alignments?
  • Selection Helpers - occurrence/occurrences in a string... if used on ULB that is going to change.
    • May not be needed since word objects include occurrence/occurrences.

Core Actions

  • CSV Export - actions to export alignment data if needed
  • GroupDataActions - explore changes to completed verse alignment not from empty word bank
  • Import Local - handling aligned USFM 3 files and pivoting into verse strings and alignment data
    • word markers with alignment data populates alignments
    • text/word markers without alignment data populate the word bank.
  • Merge Conflict Actions - support aligned usfm 3 files
  • Resource Loading - always load/generate alignment reducer for uses with selections
  • Project Details - Do we want to show if alignment data is available?
  • Project Validation Actions - check/validate alignment data
  • Project Selection Actions - load/generate alignment data
  • Resources Actions - always load/generate alignment data for other uses.
  • Selections Actions - leverage selections as alignments and alignments as selections
  • Target Language Actions - update USFM import/export to handle alignment and USFM 3.
  • USFM Export Actions - support alignment data and USFM 3, also support USFM 2 w/o alignment.
  • Word Alignment Actions - Generate selections when alignments are created?
  • May need new actions for alignment prediction.

Other Changes

  • Rename BodyUIActions to something more specific.
  • LoaderActions - see if sendProgressForKey is really being used? Do we wan to finish this?
  • ModalActions - see if it is needed and remove, tools has its own system.
  • OnlineModeAction - see if it is needed and remove
  • Recent Projects Actions vs My Projects, do we need "recent" projects or just My Projects?
  • Remove all code and components related to old modal.
  • SideBar Actions - Update naming conventions for consistency to GroupMenu and fix it.

Core Reducers

  • No changes needed from what we can tell.

Preserving Punctuation

  • There are quite a few places in the code base that join the BHP words into a string.
  • Create a helper for doing this it would be in a single place, and extend it for things other than words/punctuation.
  • resourcesReducer.bibles.targetLanguage[1][1] #=> "Jesus, wept."
[
  { word: 'Jesus', attributes: '...' },
  { text: ', ' },
  { word: 'wept', attributes: '...' },
  { text: '.' },
  { footnote: 'Here is a foot note that needs to be preserved but not words in the verse.' }
]

Alignment Prediction

  • A new tool may need to be created to keep the code maintainable.
  • May need a new reducer to handle alignment prediction look-ups efficiently.
  • Alignment prediction can be included in Verse Check tool to predict selections.

Exporting USFM 3

  • Should be able to export a USFM 3 project with aligned data information.
  • Create a helper to find word alignments for USFM3 export based on the word order of target language verse.
  • Pivot the data to match that of what USFM 3 library would provide when parsing USFM 3.
    • Example output:
\id TIT
\c 1
\v 1
\w The book of|x-bhp-phrase="βίβλος" x-bhp-occurrence="1/1" x-occurrence="1/1" \w*
\w the genealogy of|x-bhp-phrase="γενέσεως" x-bhp-occurrence="1/1" x-occurrence="1/1" \w*
\w Jesus|x-bhp-phrase="ἰησοῦ" x-bhp-occurrence="1/1" x-occurrence="1/1" \w*
\w Christ|x-bhp-phrase="χριστοῦ" x-bhp-occurrence="1/1" x-occurrence="1/1" \w*,
\w son of|x-bhp-phrase="υἱοῦ" x-bhp-occurrence="1/2" x-occurrence="1/2" \w*
\w David|x-bhp-phrase="δαυεὶδ" x-bhp-occurrence="1/1" x-occurrence="1/1" \w*,
\w son of|x-bhp-phrase="υἱοῦ" x-bhp-occurrence="2/2" x-occurrence="2/2" \w*
\w abraham|x-bhp-phrase="ἀβραάμ" x-bhp-occurrence="1/1" x-occurrence="1/1" \w*.

Importing USFM 3

  • Should be able to import a USFM 3 project with aligned data information, see above.
  • This project should be able to generate a target bible as strings and alignment data matching app expectations.
    • The data coming from USFM.js should look something like:
{
  "1": {
    "1": [
      {
        "word": "the book of",
        "bhp-phrase": "βίβλος",
        "bhp-occurrence": "1/1",
        "occurrence": "1/1"
      },
      {
        "word": "the genealogy of",
        "bhp-phrase": "γενέσεως",
        "bhp-occurrence": "1/1",
        "occurrence": "1/1"
      },
      {
        "word": "jesus",
        "bhp-phrase": "ἰησοῦ",
        "bhp-occurrence": "1/1",
        "occurrence": "1/1"
      },
      {
        "word": "christ",
        "bhp-phrase": "χριστοῦ",
        "bhp-occurrence": "1/1",
        "occurrence": "1/1"
      },
      {
        "word": "son of",
        "bhp-phrase": "υἱοῦ",
        "bhp-occurrence": "1/2",
        "occurrence": "1/2"
      },
      {
        "word": "david",
        "bhp-phrase": "δαυεὶδ",
        "bhp-occurrence": "1/1",
        "occurrence": "1/1"
      },
      {
        "word": "son of",
        "bhp-phrase": "υἱοῦ",
        "bhp-occurrence": "2/2",
        "occurrence": "2/2"
      },
      {
        "word": "abraham",
        "bhp-phrase": "ἀβραάμ",
        "bhp-occurrence": "1/1",
        "occurrence": "1/1"
      }
    ]
  }
}

Data Pivot

  • Create a helper if not already created that renders alignment data from the array of objects.
    • Alignment example for son of && υἱοῦ would be:
      • { topWords: [{ word: 'υἱοῦ', ... }], bottomWords: [{ word: 'son', ... }], { word: 'of', ... }] }
    • Splitting words would have to update occurrence(s) when pivoting.
      • Ex. even though son of occurs twice, son occurs twice, but of occurs four times.

Target Language Verse

  • Create a helper if not already created that renders a text string from the array of objects.
  • Target Language Verse would be:
    • The book of the genealogy of Jesus Christ, son of David, son of Abraham.
  • Something like verseArray.map((el)=> {return el.word }).join(" "); in the target language actions create target language bible from USFM.
  • Add support for added object types for punctuation and markers such as footnotes.