-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subjects of non-clausal deprels #1066
Comments
Thanks! The English-EWT results below: OK-looking
Fishy
Postponing changes until after the data freeze. P.S. Also glanced at English-GUM, and 2 of the 5 |
The following Grew request can be used to visualize the examples (with the clustering key
For instance: on UD_French-GSD (in this treebank, most of the cases are related to |
Thanks, Bruno. But this pattern does not include “acl:relcl” as a valid parent, which presumably it should (since this is a subtype of “acl”). So I think one either needs to use a regexp or add all subtypes of the clausal relations as well.
Joakim
From: Bruno Guillaume ***@***.***>
Reply to: UniversalDependencies/docs ***@***.***>
Date: Sunday, 10 November 2024 at 10:59
To: UniversalDependencies/docs ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [UniversalDependencies/docs] Subjects of non-clausal deprels (Issue #1066)
The following Grew request can be used to visualize the examples (with the clustering key e.label to cluster like in @martinpopel<https://github.com/martinpopel>'s command):
pattern {
e: X -[1<>csubj|ccomp|xcomp|advcl|acl|root|list|parataxis|conj|appos]-> Y;
Y -[1=nsubj|csubj]-> Z
}
For instance: on UD_French-GSD<https://universal.grew.fr/?custom=67308f7e9416e> (in this treebank, most of the cases<https://universal.grew.fr/?custom=673091450c9d1> are related to ExtPos=PROPN as @nschneid<https://github.com/nschneid> explained above).
—
Reply to this email directly, view it on GitHub<#1066 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABZ7ZVXYUYLR4K6UZX7BKFDZ744B7AVCNFSM6AAAAABRPSMNWCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRWGY4DONZXHE>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert.
CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/
E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy
|
I had a quick look at the examples in Swedish-PUD and the majority were indeed annotation errors, so thanks for finding those. The one valid annotation involved a song title, similar to some of Nathan’s examples. I will follow up with the larger Swedish-Talbanken when I have the time.
Joakim
From: Nathan Schneider ***@***.***>
Reply to: UniversalDependencies/docs ***@***.***>
Date: Sunday, 10 November 2024 at 01:03
To: UniversalDependencies/docs ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [UniversalDependencies/docs] Subjects of non-clausal deprels (Issue #1066)
Thanks! The English-EWT results below:
OK-looking
* compound: quotative compounds<https://universal.grew.fr/?custom=673000f0a09d3>, e.g. a "You've been yelped!" card. Maybe ExtPos would help here.
* discourse: idioms like "I mean" and "I guess"<https://universal.grew.fr/?custom=6730014b3801f>, e.g. Well, I mean, you don't have to be mad about it.
* reparandum for false starts (I think there are no constraints on the form of a reparandum)
* orphan for predicate ellipsis e.g. it took another 20 mins to get our orders and <took> a further 45 mins till our starters landed on our table.
* obj for an art title that is a clause coerced to a nominal (St. George Fighting the Dragon). This should be solved with the validator recognizing ExtPos=PROPN
* nmod for another art title (the quality of That's Amore), also warranting ExtPos=PROPN
Fishy
* amod for one instance: In January 1998 Fabio Tollis and Chiara Marino, both just 16, disappeared. The post-nominal age construction is a weird case. If it were just an age amod might be plausible, but I think this one has to be acl if read as both [of them] just 16. I'll change it.
* dislocated for one instance with an attachment error. I'll fix it.
* obl for one instance where the tree looks like a mix of a free relative analysis and an interrogative clause analysis: regardless of how bad their day may have been. I think the interrogative reading is better so I'll change to advcl(regardless, bad) (yuck because this is really a complement of "regardless", so the name "advcl" is misleading)
—
Reply to this email directly, view it on GitHub<#1066 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABZ7ZVVSOHAPLXY64U5JPVDZ72WFVAVCNFSM6AAAAABRPSMNWCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRWGUZTCNJZGQ>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert.
CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/
E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy
|
The pattern I gave in the previsous message takes into account sub relations as well. I used the feature structure representation of the edge label (see Grew doc):
But I see that in treebanks with enhanced dependencis like UD_Swedish-PUD, the pattern sould be updated like this to avoid reporting enhanced relations:
|
Thanks, Bruno. I realized I had misunderstood something. :) But I did note edeps, so thanks for fixing that.
Joakim
Skickat från Outlook för iOS<https://aka.ms/o0ukef>
…________________________________
Från: Bruno Guillaume ***@***.***>
Skickat: Sunday, November 10, 2024 1:42:36 PM
Till: UniversalDependencies/docs ***@***.***>
Kopia: Joakim Nivre ***@***.***>; Comment ***@***.***>
Ämne: Re: [UniversalDependencies/docs] Subjects of non-clausal deprels (Issue #1066)
But this pattern does not include “acl:relcl” as a valid parent, which presumably it should (since this is a subtype of “acl”). So I think one either needs to use a regexp or add all subtypes of the clausal relations as well.
The pattern I gave in the previsous message takes into account sub relations as well. I used the feature structure representation of the edge label (see Grew doc<https://grew.fr/doc/graph/#edges>):
pattern { X -[1=acl]-> Y } means that the main relation is acl whether a subrelation is present of not.
But I see that in treebanks with enhanced dependencis like UD_Swedish-PUD, the pattern sould be updated like this<https://universal.grew.fr/?custom=6730b775dac0d> to avoid reporting enhanced relations:
pattern {
e: X -[1<>csubj|ccomp|xcomp|advcl|acl|root|list|parataxis|conj|appos, !enhanced]-> Y;
Y -[1=nsubj|csubj]-> Z
}
—
Reply to this email directly, view it on GitHub<#1066 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABZ7ZVUNX3ATUXXU7U73UEDZ75PEZAVCNFSM6AAAAABRPSMNWCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRWG42DCMZYGY>.
You are receiving this because you commented.Message ID: ***@***.***>
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert.
CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/
E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy
|
Most of the cases flagged in Ancient Greek-PTNK are non-finite clauses with determiners. I'm not sure whether The rest are errors that I'll fix after the data freeze. |
I started by giving a look to these structures in Latin UDante. Most fall under the label of Then we have some genuine errors in two |
Updates for other Latin treebanks with which I have experience:
I also gave a look at
|
Just a couple of comments and cents.
It is similar to what happens in Latin, but I guess you are referring to the article before infinitives, especially. Then I would suggest going for
This is exactly the same case as the proverbs I mentioned above. I just notice that we use |
In English a "You've been yelped!" card would be considered a compound, as the quoted phrase premodifies a noun representing the main meaning of the nominal (it is a kind of card). the king Pelayo: not |
Premodification cannot be the only factor here, though. Apparently English needs has constraints on it, but in Latin it comes mostly after, but could also come before. But it seems to me we are talking of the same exact phenomenon.
Trying to reason about that: king cannot be Then why not Anyway, I think that with respect to @martinpopel 's request, |
appos guidelines: "appos is intended to be used between two nominals. In general, modulo punctuation, the two halves of an apposition can be switched." This works (at least in English) for "the king Pelayo" ~ "Pelayo, the king". Either part of the apposition could stand on its own as a full nominal, which distinguishes it from |
Then here king should be the head and the name depend on it as There are some issues with the notion of swapping, because if, as it seems to imply, this means that either could equally be the head with no change in meaning (or better, reference?), why does this not lead us to the headless This is why I am reticent to use a hierarchical relation such as In any case, the problem for the original issue ( So in one of our Lati nexamples iuxta illud (...) : «Omne regnum in se divisum desolabitur» 'beside that one [saying]: "Every divided kingdom will be forsaken"' where illud 'that one' and the head of the saying desolabitur 'will be forsaken' (which has regnum 'kingdom' as subject) are joined by |
I started going over the Dutch data, and indeed it is a good heuristic for finding conversion/annotation errors! (grew ,atch query Wat vandaag gebeurde, was de ontlading van iets dat al dagen was opgekropt The underlying annotation marks these as wh-relatives (headless relatives) and the conversion treats them as regular nsubj subjects. One might think this is correct, as the phrase wat vandaag gebeurde can be paraphrased as the thing/event that happened today. Also, the distribution of headless relatives is identical to that of regular nominal phrases. Clausal subjects, on the other hand, are subordinate clauses like that this happened today or indirect questions like how this happened and not all verbs can take these as subjects. However, while the documentation of csubj has a 'proper' subordinate clauses and an indirect question as examples, the documentation of complex clauses contains examples for csubj that are headless relatives: what she said makes sense. Also, English EWT has free relatives as csubj (approximate query) Finally, there is a discussion page on relative clauses that addresses free relatives. Unfortunately, no examples where they are subject, but the proposed analysis treats the relative pronoun as head, and assigns the external relation obj to them in cases such as you can eat what you want whih also suggests they are nominal and could be nsubj. So should headless relatives be csubj (as the documentation implicitly suggests) or nsubj (as their semantics and distribution, as well as the discussion page, would suggest)? _ |
I understand perfectly your concerns, since we also had quite some problems wrapping our head around these pesky "free relatives". Since, at least from the current formal point of view, they should be Documentation here, here and here (mostly for latin, currently; please note that here |
Yes, that is definitely the intended analysis in the English TBs. If there are exceptions to that in EWT then they must be conversion errors, and I see there are only 10 hits for your query there. You can also check in GUM which is manually annotated in UD, so it shouldn't have such cases - it looks like your query there produces non-free relative examples of various unusual constructions.
|
This page mostly written in 2022–2023 and discussed by the Core Group has the most complete documentation of how we (in principle) handle relative constructions in English. (The typologically based documentation of relative clauses is still a work in progress.) For free relatives with a WH-pronoun, the head is the relative pronoun, and the predicate of the clause attaches to the relative word as Some caveats:
|
Thanks for pointing this out. I updated the examples to ones that cannot be construed as free relatives. |
Yes, in the subj/obj position, our guidelines basically accept any case where the WH pronoun can be replaced by a split expression (e.g. "that which") as a free relative. We do accept the interrogative interpretation otherwise, for example with "wonder", compare:
For subject, "whether" clauses are csubj, while headless "what/who" follow the same guidelines:
And similarly for "what". |
Noting that this is marginal/archaic with "who" (I don't see any valid matches in EWT or GUM). "Whoever" is preferable here. For verbs like "know" with flexible complementation, I would treat the split expression test with some caution. "I know what horse you rode" (of all the horses, I can pinpoint the one you rode) probably doesn't correspond to "I know that horse which you rode" (meaning, I am personally acquainted with the horse). I'd go with |
Right, this is the "who steals my purse steals trash" type, but it works fine with "what", so I'd expect the same analysis:
That's a different construction, in which the WH pronoun is a determiner, rather than the head of the phrase, so it's not quite the same, and we can't directly apply the splitting test, though we can split it around "horse" with "I know that horse which you rode" as you pointed out, but also "I know the horse which you rode", which doesn't need to mean 'personally'. In any case, the "thing known" evaluates to an NP, not a clause, regardless of the sense of know, so I would go with a free relative analysis across the board. If there is some kind of difference here, I think it's subtle enough that I wouldn't expect annotators to make it reliably. This is different with "wonder", which has a simple test to rule out the relative analysis. |
Looks like the PTNK data also has some
"the one who can interpret, he is not (here)" I currently have |
@mr-martian I don't think there's dislocation here, and I'm not sure there's a relative clause (it's just a nominalized participle, no?). Either way I think the predicate should be ἔστιν, in the strong sense of 'exist', since we don't have an overt location expression or anything (if you want we can also think of it as a promoted copula with an unexpressed location). I'd do: root(ἔστιν) As for αὐτό, I don't think it means "he", since it's neuter - I think it's a back reference to Pharaoh's dream (this is Gen. 41, right?), which is neuter, so it's an object of the participle: "the one who can interpret it isn't (here)". |
@amir-zeldes you're right - I completely misread that pronoun. Nominal participle covers one of the other two cases, leaving only Genesis 15:4 15 ἀλλ᾿ ἀλλά CCONJ _ _ 21 cc _ Gloss=but
16 ὃς ὅς PRON _ Case=Nom|Gender=Masc|Number=Sing|PronType=Rel 17 nsubj _ Gloss=who,which,that,what
17 ἐξελεύσεται ἐξέρχομαι VERB _ Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Fut|VerbForm=Fin|Voice=Mid 21 dislocated:relcl _ Gloss=to-come/go-out
18 ἐκ ἐκ ADP _ _ 19 case _ Gloss=out,out-of
19 σου σύ PRON _ Case=Gen|Number=Sing|Person=2|PronType=Prs 17 obl _ Gloss=you,your
20 οὗτος οὗτος PRON _ Case=Nom|Gender=Masc|Number=Sing|PronType=Dem 21 nsubj _ Gloss=this
21 κληρονομήσει κληρονομέω VERB _ Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Fut|VerbForm=Fin|Voice=Act 11 conj _ Gloss=to-inherit
22 σε σέ PRON _ Case=Acc|Number=Sing|Person=2|PronType=Prs 21 obj _ Gloss=you|SpaceAfter=No
23 . . PUNCT _ _ 22 punct _ _ |
Yeah, this example would be seen as a free relative in the English guidelines. If you want to use the free relative analysis, it would be: dislocated:nsubj(κληρονομήσει,ὃς) |
Let's remark that this is the way free relatives are done in English treebanks, not in general. And it is a rather questionable way. That said,
Independently on the reading of αὐτό, I would strongly suggest to keep ἔστιν as a functional dependent. But was this sentence part of the problems of the original post? |
The original annotation of this sentence had I'm not entirely sure what annotation you're suggesting as an alternative. |
If that συγκρίνων were fronted, to use dislocated. But if the sentence is "there is no interpret...", then have it head of copula ἔστιν. Not sure about the αὐτό. |
I don't really see another way of analyzing it except as the main verb - the relevant clause reads:
So word-for-word in English more or less:
Or a bit more freely to match English word order:
If there were a locative adverbial for ἔστιν (like "one interpreting it is not here"), I could see keeping the verb as a copula, with the adverbial as the head. But since there is no such adverbial, we are left with just the verb - whether you want to think of it as promotion, or simply acknowledge that "be" can have a strong sense meaning "exist", which is a VERB and not AUX, I don't see an alternative to making that word be the root of the clause.
Are you saying that ἔστιν should be a child of συγκρίνων? This doesn't make sense to me because συγκρίνων is the subject of ἔστιν, and not a predicate. |
I do not see why there should not be an "alternative". As a first thing, it is very strange to annotate, purely based on the contextual absence of a locative element, as a main verb a verb which is normally otherwise always a copula. The simplest explanation is that, possibly contrary to English, Italian, and others, Greek (and Latin, by the way) does not need such an element for an existential statement (by the way, I suppose that in a sentence like there is no interpret, there still is not the head despite being a locative element, right?). The verb ἔστιν is really the same in all its occasion. Supposing it has a "stronger" meaning is just overinterpretaion, especially because there do exist words with a real existential meaning, e.g. ὑπάρχω. But ἔστιν no, it is simply the minimum, somewhat underdefined "tool" viable to the language in this case just to state that there is no interpret. The "negative interpret" is the centerpiece of this clause.
Yes. This is a predication of the interpret being or not being there, not about him/her. Other languages would use slightly different constructions with some kind of expletive element, but it seems Greek is content with ἔστιν alone (word order might be a factor to be investigated, though). I don't think there is a real subject in such statements. |
No, I'm sorry but that's simply not correct. συγκρίνων is in fact the subject of ἔστιν here, for several reasons:
Those are the canonical UD guidelines for such situations, and in fact there are many situations in which no other choice exists. Consider an exchange like this:
In the second sentence, if we want to keep "I" as the subject (which again, it unambiguously is), then we have no choice but to promote what is otherwise definitely a copula to be the root - even if you don't believe in the distinction between a copular and existential sense of "be" in Greek. |
Admittedly there might still be something to investigate and define about these constructions, but I think it ultimately boils down to not forcing annotations on languages based on other languages requiring a slightly different set of elements. First to sort out
Of course here I is subject and am is the promoted head of an ellipsis. We have a context and this is a clear case of ellipsis. By the way, English is a language which can do that, while another like Italian cannot, it has to put a pronominal element: Lo sono lit. 'it I-am' (subject is not explicit). Saying sono io lit. 'am-I I' would be different, and interestingly would be expressed as it's me in English, which brings us closer to the Greek sentence we have at hand... where no elements whatsoever speak of an ellipsis.
This is an interesting fact. I see it as a case of "attraction": when there is just one element in the predication, the indexing on the verbal element tends to coincide. But on the other side, this is a third person, and as a possible "default" agreement it does not warrant too many conclusions (the number of the form of συγκρίνω is not a factor here). Cross-lingusitically it is interesting to notice that existential constructions veer towards impersonality: in modern Greek we would always have a third-person singular έχει lit. '(it) has', in German es gibt lit. '(it) gives', in French il y a (also for plural elements), in Italian regionally and dialectally you observe a tendence to use c'è 'there is' even with plural referents, and so on. But Greek (and Latin) do not help themselves with expletive elements (it, es, y, ci, there) or a transitive construction, and make an agreement instead. I again stress that word order might be relevant to distinguish between existential and other constructions. So the famous expression hic sunt leones in Latin means here, there are lions and not '(the) lions are here', which would probably be leones hic sunt / sunt hic. This to say that even if we have a locative adverbial, it might still not be the nonverbal predicate.
I do not think that any paraphrasable sense or translations can be used as an argument here. For example, one Italian version is ... e nessuno sa interpretarlo '... and no-one knows how to interpret it'. Hm. The predicate is totally different. So it just seems that that Greek version has chosen an existential construction, and this is what we have at hand. We are not interested in Biblical exegesis.
Not really. The guidelines do not give an ultimate choice for existential constructions. If we are specifically referring to this page, as I am assuming, nothing is really said about them. It actually seems to speak in favour of keeping a |
For other texts, sure, but in this case I disagree - the Septuagint translators tried pretty hard to stay faithful to the Hebrew text (at times creating very unnatural calques in Greek), and they certainly knew Biblical Hebrew and Greek better than any of us, so if something is ambiguous between a reading that mirrors the original and something that does not, I think assuming parity with the source material is a good tie breaker. Incidentally, your Italian example also has a phrase representing the interpreting person as the subject, though the distribution of the lexical material and negation is different of course ('interpretation' as a lexeme is realized in the verb, and negation is part of the subject NP, not a verbal negator). So just to be clear, am I right to say my position is 1 and yours is 2?
Is that right? If so, then I don't think I have any other arguments to add, and maybe other people who know Ancient Greek can tell us which one they think it right. |
But I do not see any ambiguity in this structure. I might say my point is point 2, but I do not agree on "predicating about an empty subject". Predications do not necessarily need subjects. In logical terms, I might try to express this as "NOT interpreter" (but my logic notations is all over the place). |
I would expect that
nsubj
andcsubj
can appear only as children of a predicate, i.e. a word with one of the "clausal deprels". According to the guidelines, clausal deprels arecsubj
,ccomp
,xcomp
,advcl
andacl
. Obviouslyroot
is also OK and in some cases alsolist
,parataxis
,conj
andappos
.However, when I checked this hypothesis, I found many counterexamples. Many of those seem to be annotation errors, e.g.
amod
instead ofacl
.I could add this as another test into ud.MarkBugs (and maybe later also to the validator), but it would be nice to first reach an agreement that these are actually errors (against the current UD guidelines). Maybe there are some more deprels that should be allowed as parents of subjects (
dislocated
,orphan
,discourse
,reparandum
).I attach results.txt which I've generated using the following command:
The text was updated successfully, but these errors were encountered: