-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vocabulary for ProcessingStepDescriptions #36
Comments
Thank you for your feedback/request. The whole processingStepType will be investigated in the light of this, in connection with issues #35, #27, #13. For the vocabulary, allowing embedded use of elements from e.g. PREMIS or similar established/standardised vocabularies can be considered, but this will need a wider discussion as well. |
Looking into this a little further, the scope is possibly tremendous...processingSteps can basically take any form, from image-related tasks to OCR corrections and all forms of linguistic post-processing, semantic enrichment asf. I am not sure it is really feasible to create a useful formal vocabulary of attribute values? |
If allowing any kind of processing step descriptions I would suggest to follow the Semantic Web/ Linked Data approach and at least to require a URI/IRI as reference for what is actually meant (there could be countless variants of image enhancement, segmentation, etc. methods). |
Sorry, closed this by accident (wrong button)... |
I agree on the proposal that it is benetial to have predefined voavbularies. The way how it was done on METS I think also can cover the "rare or edge cases" Clemens outlined. On METS was always as additional option then "OTHER" available. In this case the description can handle to express the real details but helps to classify and make analysis of processing history also by machines to cluster the informations. |
what is the point of processing step history? is it to serve as an audit trail (who did what and when did s/he do it?)? or is it to show what changes have been made to the file, that is, how is it different now than it was before the processing step? (subtle difference) |
Some kind of common vocabulary is needed for at least (image processing, OCR, proof reading etc.) which is more simple issue than the |
Continued in #39. |
In continuation of yesterdays call discussion here the results of my first thoughts about possible value list. As mentioned on the call I have in mind the METS agent solution with also just short list of top level areas which can be used for filtering / analysis of main parts, and all the remaining special processing types can be noted as "Other". TextGeneration This would cover the main areas of ALTO "layout" and "text" in specific to be able to filter out of processing inforation where the layout and text comes from. In my point of view image operation isolated are not relevant for ALTO. Only as part of text and layout actions it might be of interest to record parameters used on the operation (like image conversion). |
A naive comment: do we really want/need to repeat in all the ALTO files of a document all the operations which were applied on this document? This kind of processing history is generally stored in the document manifest, at the higher level possible. |
In todays tech call we discussed this feedback from Jean-Philip shortly again and we concluded that the history recording inside the ALTO will be necessary in case it is more granular than document level. |
The change for the processing history is included in the current draft schema version 4-0 for public review. |
Basic vocabulary included in v4.0. |
One more from the wish list.
The nature of common *ProcessingStep elements (layout analysis, any kind of postcorrection) is only incompletely captured by MIX's change history and seem often to be out of scope of the MIX schema. It would therefore be beneficial to define a (optional?) vocabulary of possible processingStepDescription attribute values to increase interoperability between data sources.
Any comments?
The text was updated successfully, but these errors were encountered: