Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MWE annotation #149

Open
alexandretessarollo opened this issue Aug 10, 2019 · 7 comments
Open

MWE annotation #149

alexandretessarollo opened this issue Aug 10, 2019 · 7 comments

Comments

@alexandretessarollo
Copy link

alexandretessarollo commented Aug 10, 2019

There should be a way of targeting MWE for globbing in a easy way.

Currently, to glob, say, drilling mud, one must target for either drilling or mud, and manually search for co-occurrences and glob them. However drilling and mud are both very frequent word in the corpus, and not always together. Even so, there is a huge amount of drilling mud to be globbed and annotated.

@hmuniz
Copy link
Collaborator

hmuniz commented Aug 12, 2019

Could you be more clear about this issue? What exactly are you suggesting to improve the annotation of globs?
Maybe KeyboardMacros helps you.

@alexandretessarollo
Copy link
Author

It probably does the trick, but I'm not quite sure. Let me try to write a pseudo-code to help make my point.

  1. call sensation on targeted mode, looking for "drilling", PoS any
  2. for each "drilling"check if word-on-the-right is "mud".
  3. if 2 = true, then glob it with lemma "drilling mud", else do nothing.
  4. repeat 2 and 3 until last occurrence of "drilling".

It is something that could be done manually, but "drilling"has 230 occurrences, "mud" has 95 occurrences and "drilling mud" shows up ate least 16 times. Those are very high counts to rely on visual inspection. And that's just one MWE. With "drilling" alone we have "drilling bit", "drilling rig", "drilling column", etc.

Ideally, I should be able to provide a list of MWEs and Sensetion would automatically glob them within the corpus.

@arademaker
Copy link
Member

This is a very particular description. We need a more general approach for the tool. Moreover, I am not sure if that should be a functionality in the sensetion or implemented as scripts.

@alexandretessarollo
Copy link
Author

Ok, maybe not work form a list. Still, when I create a glob that sensetion wasn't able to recognize in the first place, it should have a way to let me know and check the remaining occurrences of that glob in in the text.

@arademaker
Copy link
Member

The @hmuniz suggestion of using emacs macros is really cool. We can temporarily define a single key to execute a sequent of command. So, in a buffer with K occurrences of drilling mud, we could record a sequence of commands such as:

  1. search for drilling mud
  2. mark both (m, m)
  3. press g
  4. write 'drilling mud'
  5. choose n

as a single key command. So instead of K*(approx. 13+2+1+12+1) keys, we would press only K keys. But the important point, keeping the visual inspection and manual inspection for quality control.

@odanoburu
Copy link
Contributor

odanoburu commented Aug 13, 2019 via email

@arademaker
Copy link
Member

Yep, that is the reason for my #149 (comment) above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants