Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tagging and features for "one" #373

Closed
nschneid opened this issue Oct 16, 2022 · 16 comments
Closed

Tagging and features for "one" #373

nschneid opened this issue Oct 16, 2022 · 16 comments
Milestone

Comments

@nschneid
Copy link
Contributor

nschneid commented Oct 16, 2022

According to the PTB tagging guidelines, one should be

  • CD by default, even when not a prenominal modifier, e.g. one of the best reasons, BUT
  • a pronoun if referring to a generic individual (roughly 'a person')
  • NN(S) "if it could be pluralized or modified by an adjective", and in another one
    • shouldn't this include this one, which one, a later one, etc.? Currently some of these are tagged as NUM/CD.
    • one as head of det: EWT, GUM. I'm not sure NUM/CD is correct for any of these.
    • one as head of amod but not det: GUM has some NUM/CD tokens (all are NN(S) in EWT)
    • remaining uses of one: EWT, GUM

Features:

Related: #123, #183 ("one another")

nschneid referenced this issue in UniversalDependencies/docs Oct 16, 2022
@amir-zeldes
Copy link
Contributor

All OK by me except 'Gender=Com', which is not otherwise used in English ATM. Unless we plan a broader campaign to include it in various places, I think adding it just for this item is confusing. I would just leave it genderless for now.

@nschneid
Copy link
Contributor Author

Are there other places where it would make sense? I thought gender in English was just for 3rd person singular pronouns (and possibly who(m)(ever) to contrast with which(ever)).

@amir-zeldes
Copy link
Contributor

Not sure, but if there is no other relevant word, then I think not including it on 'one' is the way to go. There are many words with generic meaning/indeterminate gender, so having this unique feature value just on this word isn't worth it IMO - it's not really an agreement category.

@nschneid
Copy link
Contributor Author

nschneid commented Oct 17, 2022

It's a (semantic) agreement category as much as masculine and feminine for non-generic 3sg pronouns, right? I.e. you wouldn't use "one" to refer to a generic inanimate object, only a generic person.

@nschneid
Copy link
Contributor Author

Although there is an argument to be made that if we are putting in features for animacy, we should do so as well for indefinites pronouns: -one/-body (Gender=Com) vs. -thing (Gender=Neut). Maybe this level of detail will just confuse people.

@amir-zeldes
Copy link
Contributor

Yeah, since this never 'agrees' with another verb, noun etc. I would just leave it alone.

@nschneid
Copy link
Contributor Author

OK. Can we enforce a rule that "one" as NUM/CD cannot have det or amod dependents?

@amir-zeldes
Copy link
Contributor

No det seems in keeping with PTB guidelines, I can't imagine an exception to it. GUM has only 2 exceptions to amod (and an error I just fixed), which are for post nominal amods, like:

  • A popular option, and one/CD good for your health too, is by sea kayak

What's weird about it is that according to PTB, a postnominal relative clause would not cause us not to tag it as CD. So I feel a bit weird tagging it as NN, even though it's pretty clear it's just a heavy extraposed amod (due to "good for your health" having a PP dependent on the adjective). What do you think, is it weird to change that to NN when the following stays CD:

  • one/CD that was made with the American public in mind

@nschneid
Copy link
Contributor Author

  • one/CD that was made with the American public in mind

Couldn't that be pluralized:

  • The most popular options, ones that were made with the American public in mind, ...

I think that qualifies it as NN per PTB.

@amir-zeldes
Copy link
Contributor

This is a little tricky, because even if we decide 'if it has a relative clause it must be NN', many cases that don't have relative clauses COULD have one (hey, that's an example!), but PTB unambiguously tags those as CD. Actually there is indeed a majority for NN for cases that have a relative, but the following kind is almost always CD (wsj examples):

  • IBM adapted one/CD of these versions in 1981
  • a form asking them to check one/CD of three answers
  • a solicitor representing one/CD of the banks said

Compare made up examples:

  • Intel adapted these versions, but IBM adapted ones from 1981
  • answers ... a form asking them to check other ones
  • banks ... a solicitor representing other ones

So my problem is, with a little imagination, 'could be pluralized' is applicable to very many cases which are 100% tagged CD in PTB. Making it depend on a literal, present article is at least implementable, but pluralization seems very murky to me (or at least not consistent with what is actually in PTB).

@nschneid
Copy link
Contributor Author

nschneid commented Oct 19, 2022

"One" can have a similar meaning whether it is CD or NN, which is why I find the PTB distinction tricky. I think the partitive construction is relevant—if you're willing to rephrase a partitive as a non-partitive or vice versa it changes the judgments. But I'm not sure partitive always implies CD (what if it is "the last one of the lost jewels"?). Maybe partitive with no determiner implies CD? One of the places where the minimal nature of PTB guidelines is frustrating.

FTR, CGEL discusses two kinds of "one" starting on p. 1513, but I think they draw a somewhat different boundary than PTB.

@amir-zeldes
Copy link
Contributor

Yeah, I'll be honest this pos distinction in PTB is not very convincing, I just want to stay consistent with it as best as possible, and drawing a line in the sand around a determiner actually being realized there seemed simplest.

So let's say presence of det or amod mandates NN, but I'll leave relative alone. Sound OK?

And another thing I'm noticing from the validator now:

Value Card of feature NumType is not permitted with UPOS PRON

This is triggered by those cases where the canonical xpos is CD, but the UPOS is now PRON, primarily "one another". Should "one" in "one another" not be tagged Card? And similarly it triggers an error for NumForm=Word.

@nschneid
Copy link
Contributor Author

Lets say Num* features for NUM tokens only.

@amir-zeldes
Copy link
Contributor

OK

@nschneid
Copy link
Contributor Author

nschneid commented Oct 21, 2022

Would substitutability of the numeral "1" in casual (but not SMS-style) writing make a good heuristic for CD?

I would never write something like

  • A popular option, and 1 good for your health too, is by sea kayak
  • It is 1 that was made with the American public in mind

unless extremely pressed to save characters. But I might causally write

  • IBM adapted 1 of these versions in 1981
  • a form asking them to check 1 of 3 answers
  • a solicitor representing 1 of the banks said

@amir-zeldes
Copy link
Contributor

I wouldn't be confident that people are consistent about what they would and wouldn't substitute by numerals, and in spoken data we can't even tell how things are 'spelled', so I would stick to a totally decidable version of the criteria which most closely reproduces prior art (PTB xpos), and mapping to UPOS as we've discussed based on pretty rigid criteria.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants