-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize each CG sub-reading separately, like phonemisation #58
Comments
Currently the transcriptor is set up to look up nearest surface form, with subreadings without surface form tags or other similar tags it falls back to 15-jagágij which is not in transcriptor. Maybe using lemma makes sense with transcription though |
I see the other bug now, yeah it would be much easier possibly to not mess with more subreadings here... |
this is current output after normalise |
Looks good to me. What do you think, @ilm024 ? what would be the full compound output? |
With newest "<15-jagágij>"
"jahke" Ex/N Sem/Time Der/k A Pl Com "15-#»jagág9>ij"MIDTAPE <W:0.0> @<ADVL #7->3
"15" Num Cmp/Hyph Cmp "15-#»jagág9>ij"MIDTAPE <W:0.0> #7->3
"jahke" Ex/N Sem/Time Der/k A Sg Ill "15-#»jagág9>ij"MIDTAPE <W:0.0> @<ADVL #7->3
"15" Num Cmp/Hyph Cmp "15-#»jagág9>ij"MIDTAPE <W:0.0> #7->3 What is missing to get what you get? |
Probably version differences, the midtapes would confuse the normalise lookup and I don't get them with my hfst as it is now. So the output of smj-normaliser6-cg,mode is just:
|
Ok. What is the input and the command you used to get the desired output? |
e.g. |
I commented midtape reading out , not sure if it made sense in normalising step or copy-paste from phonemiser |
I am not sure either whether we need midtape in the normaliser process, but we definitely need to retain midtape strings for later IPA conversion. IIRC the idea was to have an option for "deep analysis" that would generate the midtape stuff for normalised input. |
well, midtape is kind-of retained now if it gets used by phon:
|
ok, good 🙂 |
we probably have to use the deep analyser thing to get a full MIDTAPE representation, if we need that |
Cf #44 (comment)
See also the following example:
where in 15-jagágij
15
is not transcribed.The text was updated successfully, but these errors were encountered: