Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing PronType? #230

Open
nschneid opened this issue Sep 11, 2021 · 20 comments
Open

Missing PronType? #230

nschneid opened this issue Sep 11, 2021 · 20 comments

Comments

@nschneid
Copy link
Contributor

nschneid commented Sep 11, 2021

From the UD overview article:

Prominent examples [of features that cut across multiple UPOSes] are PronType and NumType. For example, the interrogative and indefinite pronominal types are recognized with pronouns (who vs. somebody), determiners (which vs. some), as well as with adverbs (where vs. somewhere).

However, some of these mentioned types are not consistently bearing a PronType. Other indefinite and interrogative pronouns should be examined as well.

@amir-zeldes
Copy link
Contributor

I'm willing to add them in GUM if you have a clear idea of what should get what!

@nschneid
Copy link
Contributor Author

nschneid commented Sep 14, 2021

u/PronType

en/PronType

Some poking around EWT—we have:

  • With one exception, the WH-words according to XPOS appear to be taken care of.
  • all, both, each: I guess these should all be PronType=Tot
    • What about "every"? This can't stand alone or head partitives (*I bought some M&M's and was so hungry that I ate every.).
    • No PronType for "few", "many", "most", "another", "such/PDT", "quite/PDT", "(n)either", etc.?
      • I'm actually surprised it's "quite/DET/PDT a long trip"—would have guessed ADV (cf. "rather", "never", "just", "only" preceding articles)
  • no (PRON or DET)
    • excludes "no/ADV longer", "no/INTJ", etc.
  • other single-token indefinites, negatives, "every" totals without PronType
  • Should "(n)ever", "always", "forever" be included?
  • other PRON tokens missing PronType
    • What about expletive "there"? Seems weird to call it a personal pronoun, unlike expletive "it".
    • Consider also "no one"; reciprocals "each other", "one another" (Reciprocal pronouns #183); and goeswith combinations

P.S. There are handful of articles and demonstrative DETs erroneously missing PronType.

@nschneid
Copy link
Contributor Author

nschneid commented Sep 14, 2021

Digression: PTB guidelines for PDT

image

Why is "nary" on here but not "never"?

@nschneid
Copy link
Contributor Author

nschneid commented Nov 4, 2021

The udapi checker expects all PRON and DET tokens to have a PronType:

https://github.com/udapi/udapi-python/blob/9528d7cf5d4927c64fba305a0ced8b32449fec4a/udapi/block/ud/markbugs.py#L32-L33

@amir-zeldes
Copy link
Contributor

Thanks for raising this - I had a look at the corresponding cases in GUM, and I think this depedit script would take care of those cases in a reasonable way (there are a couple of GUM typos this covers, but the rules are cascaded to spare cases that already have PronTypes, so I think this should work for most of the EWT cases too):

morph!=/.*PronType.*/&lemma=/all|each|every/&upos=/PRON|DET/	none	#1:morph+=PronType=Tot
morph!=/.*PronType.*/&lemma=/some|any|half/&upos=/PRON|DET/	none	#1:morph+=PronType=Ind
morph!=/.*PronType.*/&lemma=/there|such/&upos=/PRON|DET/	none	#1:morph+=PronType=Dem
morph!=/.*PronType.*/&lemma=/no|another|both|either|an/&upos=/PRON|DET/	none	#1:morph+=PronType=Art
morph!=/.*PronType.*/&lemma=/and|to/&misc=/.*Typo.*/&upos=/DET/	none	#1:morph+=PronType=Art
morph!=/.*PronType.*/&xpos=/WDT/&upos=/PRON/	none	#1:morph+=PronType=Rel
lemma=/quite|.*self|.*selves/&upos=/PRON/&func=/det.*|obl:npmod|nmod:npmod/	none	#1:morph+=PronType=Emp
morph!=/.*PronType.*/&lemma=/.*self|.*selves/&upos=/PRON/	none	#1:morph+=PronType=Prs
morph!=/.*PronType.*/&xpos=/PRP.?/&upos=/PRON/	none	#1:morph+=PronType=Prs
morph!=/.*PronType.*/&func=/det/	none	#1:morph+=PronType=Art

Does this look reasonable?

@nschneid
Copy link
Contributor Author

nschneid commented Nov 5, 2021

morph!=/.*PronType.*/&lemma=/no|another|both|either|an/&upos=/PRON|DET/	none	#1:morph+=PronType=Art

"a", not "an", for the lemma?

morph!=/.*PronType.*/&lemma=/and|to/&misc=/.*Typo.*/&upos=/DET/	none	#1:morph+=PronType=Art

I don't think this is necessary because the lemma should be corrected to "a" or "the". And Typo would be in morph, not misc, right?

EWT would also need a rule for a few demonstratives with typos.

@amir-zeldes
Copy link
Contributor

Whoops, thanks for catching! Yes "a", and that should be morph and the word form not lemma in the other rule. I kept it in because the current GUM morphology was missing those cases, even though the lemma was correct, because it was cribbed off of the corresponding CoreNLP code, which relied on word forms. So then we have:

morph!=/.*PronType.*/&lemma=/all|each|every/&upos=/PRON|DET/	none	#1:morph+=PronType=Tot
morph!=/.*PronType.*/&lemma=/some|any|half/&upos=/PRON|DET/	none	#1:morph+=PronType=Ind
morph!=/.*PronType.*/&lemma=/there|such/&upos=/PRON|DET/	none	#1:morph+=PronType=Dem
morph!=/.*PronType.*/&lemma=/no|another|both|either|a/&upos=/PRON|DET/	none	#1:morph+=PronType=Art
morph!=/.*PronType.*/&lemma=/and|to/&morph=/.*Typo.*/&upos=/DET/	none	#1:morph+=PronType=Art
morph!=/.*PronType.*/&xpos=/WDT/&upos=/PRON/	none	#1:morph+=PronType=Rel
lemma=/quite|.*self|.*selves/&upos=/PRON/&func=/det.*|obl:npmod|nmod:npmod/	none	#1:morph+=PronType=Emp
morph!=/.*PronType.*/&lemma=/.*self|.*selves/&upos=/PRON/	none	#1:morph+=PronType=Prs
morph!=/.*PronType.*/&xpos=/PRP.?/&upos=/PRON/	none	#1:morph+=PronType=Prs
morph!=/.*PronType.*/&func=/det/	none	#1:morph+=PronType=Art

I can add this to the GUM build bot (it will not overwrite any manually specified morph annotations)

@amir-zeldes
Copy link
Contributor

Added Rcp type, resulting in:

# PronTypes 
morph!=/.*PronType.*/&lemma=/all|each|every/&upos=/PRON|DET/	none	#1:morph+=PronType=Tot
morph!=/.*PronType.*/&lemma=/some|any|half/&upos=/PRON|DET/	none	#1:morph+=PronType=Ind
morph!=/.*PronType.*/&lemma=/there|such/&upos=/PRON|DET/	none	#1:morph+=PronType=Dem
morph!=/.*PronType.*/&lemma=/no|another|both|either|a/&upos=/PRON|DET/	none	#1:morph+=PronType=Art
morph!=/.*PronType.*/&lemma=/and|to/&morph=/.*Typo.*/&upos=/DET/	none	#1:morph+=PronType=Art
morph!=/.*PronType.*/&xpos=/WDT/&upos=/PRON/	none	#1:morph+=PronType=Rel
lemma=/quite|.*self|.*selves/&upos=/PRON/&func=/det.*|nmod:npmod/	none	#1:morph+=PronType=Emp
morph!=/.*PronType.*/&lemma=/.*self|.*selves/&upos=/PRON/	none	#1:morph+=PronType=Prs
morph!=/.*PronType.*/&xpos=/PRP.?/&upos=/PRON/	none	#1:morph+=PronType=Prs
morph!=/.*PronType.*/&func=/det/	none	#1:morph+=PronType=Art
lemma=/each|one/;lemma=/(an)?other/&func=/fixed/	#1>#2	#1:morph+=PronType=Rcp

@nschneid
Copy link
Contributor Author

nschneid commented Oct 15, 2022

Since we are listing DepEdit rules here:

; indefinite pronouns
lemma=/(some|any)(body|one|thing)/&upos=/PRON/	none	#1:morph+=PronType=Ind
lemma=/every(body|one|thing)/&upos=/PRON/	none	#1:morph+=PronType=Tot
lemma=/no(body|-one|thing)/&upos=/PRON/	none	#1:morph+=PronType=Neg
lemma=/no/&upos=/DET/;lemma=/one/&upos=/PRON/	#1.#2	#2:morph+=PronType=Neg

EDIT: Updated the PronTypes

@amir-zeldes
Copy link
Contributor

OK, but I think if "no-one" were spelled with a hyphen, we should tokenize it apart and analyze it the same as "no one", plus I see you're doing PronType=Neg for "nobody" - shouldn't the PronType be Neg for "no one" as well then?

@nschneid
Copy link
Contributor Author

nschneid commented Oct 17, 2022

Yes, PronType=Neg for "no one", "nobody", etc.

In terms of the hyphenated one, in EWT there are just two tokens of "noone", which is a nonstandard spelling, so the lemma is "no-one". "No-one" might be tokenized in other corpora as 3 tokens, in which case the hyphen is irrelevant to the analysis.

@amir-zeldes
Copy link
Contributor

I would probably tokenize those with SpaceAfter=No, CorrectSpaceAfter=Yes, but it's not crucial

@nschneid
Copy link
Contributor Author

I see the logic in that but I don't want to manually retokenize this very long sentence if I can help it :)

@amir-zeldes
Copy link
Contributor

OK. Actually I have a fork of the arborator gui that can do it - if you want to paste the conllu here I can easily retokenize it.

@nschneid
Copy link
Contributor Author

For edeps as well?

# sent_id = newsgroup-groups.google.com_alt.animals.badgers_172dcd8baf26948f_ENG_20040823_121900-0014
# text = When you blame it all on society, there's noone to to take responsibility and all of a sudden you have generation of fucked up kids who are likley smoking, drinking, doing drugs, fucking the neighbor or some internet perv just because you are too lazy to see waht they're doing.
1	When	when	SCONJ	WRB	PronType=Int	3	mark	3:mark	_
2	you	you	PRON	PRP	Case=Nom|Person=2|PronType=Prs	3	nsubj	3:nsubj	_
3	blame	blame	VERB	VBP	Mood=Ind|Number=Sing|Person=2|Tense=Pres|VerbForm=Fin	10	advcl	10:advcl:when	_
4	it	it	PRON	PRP	Case=Acc|Gender=Neut|Number=Sing|Person=3|PronType=Prs	3	obj	3:obj	_
5	all	all	DET	DT	_	4	advmod	4:advmod	_
6	on	on	ADP	IN	_	7	case	7:case	_
7	society	society	NOUN	NN	Number=Sing	3	obl	3:obl:on	SpaceAfter=No
8	,	,	PUNCT	,	_	10	punct	10:punct	_
9-10	there's	_	_	_	_	_	_	_	_
9	there	there	PRON	EX	_	10	expl	10:expl	_
10	's	be	VERB	VBZ	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	0	root	0:root	_
11	noone	no-one	PRON	NN	Number=Sing|PronType=Neg|Typo=Yes	10	nsubj	10:nsubj	CorrectForm=no-one
12	to	to	PART	TO	_	14	reparandum	14:reparandum	_
13	to	to	PART	TO	_	14	mark	14:mark	_
14	take	take	VERB	VB	VerbForm=Inf	11	acl	11:acl:to	_
15	responsibility	responsibility	NOUN	NN	Number=Sing	14	obj	14:obj	_
16	and	and	CCONJ	CC	_	22	cc	22:cc	_
17	all	all	ADV	RB	_	20	advmod	20:advmod	_
18	of	of	ADV	RB	_	20	advmod	20:advmod	_
19	a	a	ADV	RB	_	20	advmod	20:advmod	_
20	sudden	sudden	ADV	RB	_	22	advmod	22:advmod	_
21	you	you	PRON	PRP	Case=Nom|Person=2|PronType=Prs	22	nsubj	22:nsubj	_
22	have	have	VERB	VBP	Mood=Ind|Number=Sing|Person=2|Tense=Pres|VerbForm=Fin	10	conj	10:conj:and	_
23	generation	generation	NOUN	NN	Number=Sing	22	obj	22:obj	_
24	of	of	ADP	IN	_	27	case	27:case	_
25	fucked	fuck	VERB	VBN	Tense=Past|VerbForm=Part	27	amod	27:amod	_
26	up	up	ADP	RP	_	25	compound	25:compound	_
27	kids	kid	NOUN	NNS	Number=Plur	23	nmod	23:nmod:of|31:nsubj|33:nsubj|35:nsubj|38:nsubj	_
28	who	who	PRON	WP	PronType=Rel	31	nsubj	27:ref	_
29	are	be	AUX	VBP	Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin	31	aux	31:aux	_
30	likley	likely	ADV	RB	Typo=Yes	31	advmod	31:advmod	CorrectForm=likely
31	smoking	smoke	VERB	VBG	Tense=Pres|VerbForm=Part	27	acl:relcl	27:acl:relcl	SpaceAfter=No
32	,	,	PUNCT	,	_	33	punct	33:punct	_
33	drinking	drink	VERB	VBG	Tense=Pres|VerbForm=Part	31	conj	27:acl:relcl|31:conj	SpaceAfter=No
34	,	,	PUNCT	,	_	35	punct	35:punct	_
35	doing	do	VERB	VBG	Tense=Pres|VerbForm=Part	31	conj	27:acl:relcl|31:conj	_
36	drugs	drug	NOUN	NNS	Number=Plur	35	obj	35:obj	SpaceAfter=No
37	,	,	PUNCT	,	_	38	punct	38:punct	_
38	fucking	fuck	VERB	VBG	Tense=Pres|VerbForm=Part	31	conj	27:acl:relcl|31:conj	_
39	the	the	DET	DT	Definite=Def|PronType=Art	40	det	40:det	_
40	neighbor	neighbor	NOUN	NN	Number=Sing	38	obj	38:obj	_
41	or	or	CCONJ	CC	_	44	cc	44:cc	_
42	some	some	DET	DT	_	44	det	44:det	_
43	internet	internet	NOUN	NN	Number=Sing	44	compound	44:compound	_
44	perv	perv	NOUN	NN	Number=Sing	40	conj	38:obj|40:conj:or	_
45	just	just	ADV	RB	_	50	advmod	50:advmod	_
46	because	because	SCONJ	IN	_	50	mark	50:mark	_
47	you	you	PRON	PRP	Case=Nom|Person=2|PronType=Prs	50	nsubj	50:nsubj	_
48	are	be	AUX	VBP	Mood=Ind|Number=Sing|Person=2|Tense=Pres|VerbForm=Fin	50	cop	50:cop	_
49	too	too	ADV	RB	_	50	advmod	50:advmod	_
50	lazy	lazy	ADJ	JJ	Degree=Pos	31	advcl	31:advcl:because	_
51	to	to	PART	TO	_	52	mark	52:mark	_
52	see	see	VERB	VB	VerbForm=Inf	50	advcl	50:advcl:to	_
53	waht	what	PRON	WP	PronType=Int|Typo=Yes	56	obj	56:obj	CorrectForm=what
54-55	they're	_	_	_	_	_	_	_	_
54	they	they	PRON	PRP	Case=Nom|Number=Plur|Person=3|PronType=Prs	56	nsubj	56:nsubj	_
55	're	be	AUX	VBP	Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin	56	aux	56:aux	_
56	doing	do	VERB	VBG	Tense=Pres|VerbForm=Part	52	ccomp	52:ccomp	SpaceAfter=No
57	.	.	PUNCT	.	_	10	punct	10:punct	_

@amir-zeldes
Copy link
Contributor

Hm, no, not for edeps... Maybe worth doing at some point, we just edeps on the fly for 99% of cases so it hasn't come up :)

@nschneid
Copy link
Contributor Author

nschneid commented Nov 6, 2022

Laura Michaelis (pc) mentioned that the -ever series of pro-forms (whoever, whatever, etc.) are indefinites. I think they should be given PronType=Ind,Rel or PronType=Ind,Int (depending on the use).

Details:

  • Should this exclude uses of however as a discourse connective (no PronType)?
  • Should adverbs introducing an adjunct clause be considered interrogative? "When(ever) somebody comes to the door, my dog barks."
  • The construction exemplified in "Whatever the reasons behind the duel (were), ...": CGEL says this is interrogative, not relative.

@amir-zeldes
Copy link
Contributor

Currently in GUM these are Rel if dominated by a relcl parent, otherwise Int. Not saying that's correct though. I think there is an Int type in:

  • Whatever for?

The free relative kind could reasonably be Rel IMO. As for Ind, I guess if you have something like "eat a sandwhich or whatever", that would be Ind. I don't think they should carry dual types, if that's what you mean by Ind,Int - I think it's either or (I mean, a regular "what" can be answered by an indefinite or definite, and I wouldn't call it either, just Int)

Finally for the DM however, I agree it should not have a PronType at all.

@nschneid
Copy link
Contributor Author

nschneid commented Nov 7, 2022

I think the point is that "whatever", as opposed to "what", is specifically indefinite, whether it functions as interrogative or relative.

@nschneid
Copy link
Contributor Author

nschneid commented May 30, 2024

We should implement the PRON tag and PronType=Neg for "none" (and "naught" if it occurs). UniversalDependencies/docs#517 (comment)

I assume with no Number feature, because it is compatible with either singular or plural agreement? @amir-zeldes?

nschneid added a commit that referenced this issue Jun 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants