-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tagging and features for "one" #373
Comments
All OK by me except 'Gender=Com', which is not otherwise used in English ATM. Unless we plan a broader campaign to include it in various places, I think adding it just for this item is confusing. I would just leave it genderless for now. |
Are there other places where it would make sense? I thought gender in English was just for 3rd person singular pronouns (and possibly who(m)(ever) to contrast with which(ever)). |
Not sure, but if there is no other relevant word, then I think not including it on 'one' is the way to go. There are many words with generic meaning/indeterminate gender, so having this unique feature value just on this word isn't worth it IMO - it's not really an agreement category. |
It's a (semantic) agreement category as much as masculine and feminine for non-generic 3sg pronouns, right? I.e. you wouldn't use "one" to refer to a generic inanimate object, only a generic person. |
Although there is an argument to be made that if we are putting in features for animacy, we should do so as well for indefinites pronouns: -one/-body (Gender=Com) vs. -thing (Gender=Neut). Maybe this level of detail will just confuse people. |
Yeah, since this never 'agrees' with another verb, noun etc. I would just leave it alone. |
OK. Can we enforce a rule that "one" as NUM/CD cannot have |
No det seems in keeping with PTB guidelines, I can't imagine an exception to it. GUM has only 2 exceptions to amod (and an error I just fixed), which are for post nominal amods, like:
What's weird about it is that according to PTB, a postnominal relative clause would not cause us not to tag it as CD. So I feel a bit weird tagging it as NN, even though it's pretty clear it's just a heavy extraposed amod (due to "good for your health" having a PP dependent on the adjective). What do you think, is it weird to change that to NN when the following stays CD:
|
Couldn't that be pluralized:
I think that qualifies it as NN per PTB. |
This is a little tricky, because even if we decide 'if it has a relative clause it must be NN', many cases that don't have relative clauses COULD have one (hey, that's an example!), but PTB unambiguously tags those as CD. Actually there is indeed a majority for NN for cases that have a relative, but the following kind is almost always CD (wsj examples):
Compare made up examples:
So my problem is, with a little imagination, 'could be pluralized' is applicable to very many cases which are 100% tagged CD in PTB. Making it depend on a literal, present article is at least implementable, but pluralization seems very murky to me (or at least not consistent with what is actually in PTB). |
"One" can have a similar meaning whether it is CD or NN, which is why I find the PTB distinction tricky. I think the partitive construction is relevant—if you're willing to rephrase a partitive as a non-partitive or vice versa it changes the judgments. But I'm not sure partitive always implies CD (what if it is "the last one of the lost jewels"?). Maybe partitive with no determiner implies CD? One of the places where the minimal nature of PTB guidelines is frustrating. FTR, CGEL discusses two kinds of "one" starting on p. 1513, but I think they draw a somewhat different boundary than PTB. |
Yeah, I'll be honest this pos distinction in PTB is not very convincing, I just want to stay consistent with it as best as possible, and drawing a line in the sand around a determiner actually being realized there seemed simplest. So let's say presence of And another thing I'm noticing from the validator now:
This is triggered by those cases where the canonical xpos is CD, but the UPOS is now PRON, primarily "one another". Should "one" in "one another" not be tagged Card? And similarly it triggers an error for NumForm=Word. |
Lets say Num* features for |
OK |
Would substitutability of the numeral "1" in casual (but not SMS-style) writing make a good heuristic for CD? I would never write something like
unless extremely pressed to save characters. But I might causally write
|
I wouldn't be confident that people are consistent about what they would and wouldn't substitute by numerals, and in spoken data we can't even tell how things are 'spelled', so I would stick to a totally decidable version of the criteria which most closely reproduces prior art (PTB xpos), and mapping to UPOS as we've discussed based on pretty rigid criteria. |
According to the PTB tagging guidelines, one should be
det
: EWT, GUM. I'm not sure NUM/CD is correct for any of these.amod
but notdet
: GUM has some NUM/CD tokens (all are NN(S) in EWT)Features:
PRON
oneGender=Com|Number=Sing|Person=3|PronType=Prs
. Not sure there's a good feature to indicate genericity.Number=Sing|PronType=Neg
(Inconsistent UPOS for anyone, someone and everyone across English treebanks #372)NOUN
one: just aNumber
feature, I assumeNUM
one: I assume these should all beNumType=Card
Related: #123, #183 ("one another")
The text was updated successfully, but these errors were encountered: