-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MWE annotation #149
Comments
Could you be more clear about this issue? What exactly are you suggesting to improve the annotation of globs? |
It probably does the trick, but I'm not quite sure. Let me try to write a pseudo-code to help make my point.
It is something that could be done manually, but "drilling"has 230 occurrences, "mud" has 95 occurrences and "drilling mud" shows up ate least 16 times. Those are very high counts to rely on visual inspection. And that's just one MWE. With "drilling" alone we have "drilling bit", "drilling rig", "drilling column", etc. Ideally, I should be able to provide a list of MWEs and Sensetion would automatically glob them within the corpus. |
This is a very particular description. We need a more general approach for the tool. Moreover, I am not sure if that should be a functionality in the sensetion or implemented as scripts. |
Ok, maybe not work form a list. Still, when I create a glob that sensetion wasn't able to recognize in the first place, it should have a way to let me know and check the remaining occurrences of that glob in in the text. |
The @hmuniz suggestion of using emacs macros is really cool. We can temporarily define a single key to execute a sequent of command. So, in a buffer with K occurrences of
as a single key command. So instead of K*(approx. 13+2+1+12+1) keys, we would press only K keys. But the important point, keeping the visual inspection and manual inspection for quality control. |
originally our idea was to have globbing be automatic, using the enrich.py
script; manual input would be restricted to difficult cases not detected by
the globbing mechanism, or to unglobbing wrong globs (which is much faster
than globbing)
|
Yep, that is the reason for my #149 (comment) above. |
There should be a way of targeting MWE for globbing in a easy way.
Currently, to glob, say,
drilling mud
, one must target for eitherdrilling
ormud
, and manually search for co-occurrences and glob them. Howeverdrilling
andmud
are both very frequent word in the corpus, and not always together. Even so, there is a huge amount ofdrilling mud
to be globbed and annotated.The text was updated successfully, but these errors were encountered: