Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speech to text UI for media objects #1527

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Speech to text UI for media objects #1527

wants to merge 6 commits into from

Conversation

peetucket
Copy link
Member

@peetucket peetucket commented Oct 11, 2024

Why was this change made? 🤔

Fixes #1524 - allows the user to start speech to text for media object

HOLD for integration tests on stage

Also:

  • allows for OCR or speech to text or both to be enabled in settings by changing how the feature flag works from doing it in both the view on the server side and in the stimulus controller, to just doing it in JS on the client side in the stimulus controller (by passing in the settings value to the stimulus controller and letting it handle it)
  • sets the workflow context as needed for both cases
  • changes the setting of workflow context variables so that they are only sent went true. This is a change from current pre-assembly where it always sends workflow variables, even if the values are false. This sends a lot of unnecessary workflow variables (ie we don't need to send runOcr if it's false, that's the default). See https://github.com/sul-dlss/common-accessioning/blob/main/lib/dor/text_extraction/ocr.rb#L68-L71 for where we use the workflow context to decide if we need to run OCR (nil is that same as false)

Note: shared_configs turns this in just in QA for now for testing.

Question:
- since media objects always provide a file manifest, we may not need any of the changes in the fileset builder since it will never be used... and the user will be expected to provide sdrGenerated, corrected, etc. attributes for the files they provide

Screenshot 2024-10-11 at 3 38 20 PM

Screenshot 2024-10-15 at 12 47 57 PM

Screenshot 2024-10-11 at 3 38 31 PM

How was this change tested? 🤨

Spec
Integration test on stage

Comment on lines +33 to +39
ocrEnabled () {
return this.data.get('ocr-enabled') === 'true'
}

sttEnabled () {
return this.data.get('stt-enabled') === 'true'
}
Copy link
Member Author

@peetucket peetucket Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these enable the JS to see the Settings, which are passed into the stimulus controller

@@ -180,7 +180,7 @@ def verify_output_dir_no_exists
def verify_file_manifest_selected_for_media
return unless content_structure == 'media' && !using_file_manifest

errors.add(:content_structure, 'requires a file manifest. Please select the checkbox and ensure a file manifest is present.')
errors.add(:content_structure, 'requires a file manifest. Please indicate you have a file manifest and ensure a file manifest is present.')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixes an existing error message, this control is no longer a checkbox, it is a radio button

<div data-controller="globus <%= "caption" if Settings.ocr.enabled%>">
<div data-controller="globus caption"
data-caption-stt-enabled="<%=Settings.speech_to_text.enabled%>"
data-caption-ocr-enabled="<%=Settings.ocr.enabled%>">
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always enable the caption stimulus controller and always add the controls to the html (but hidden)... we will instead make the settings values visible to the stimulus controller so the JS can do the work of deciding which controls to show/hide as needed based on feature flag settings and user selections

this makes it more flexible and easier to reason with (since it all happens in one place now, in the stimulus controller)

@peetucket peetucket force-pushed the 1524-start-stt branch 2 times, most recently from c136a79 to de05e0f Compare October 15, 2024 18:36
@@ -10,6 +10,7 @@ class StructuralBuilder
# @param [String] reading_order
# @param [Boolean] all_files_public
# @param [Boolean] manually_corrected_ocr set by user when creating the job
# @param [Boolean] ocr_available set by user when creating the job
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noticed this missing param documentation

@peetucket peetucket marked this pull request as ready for review October 15, 2024 19:58
@peetucket peetucket changed the title Speech to text UI for media objects [HOLD] Speech to text UI for media objects Oct 15, 2024
@peetucket peetucket changed the title [HOLD] Speech to text UI for media objects Speech to text UI for media objects Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Start speech to text WF for media objects in pre-assembly
1 participant