-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Criteria for compound in English nominals #757
Comments
Do we want non-pluralization of the modifier to be a firm criterion for
Consider also
which is a bit like saying "he nominated the defense and treasury secretaries", where we conclude semantically that there are 2 distinct entities in distributive coordination, whereas "they installed enter and exit signs" could mean 2 varieties of signs but more than 2 individual signs. |
As for all points under 7, they seem clear cases of my brother Sam is not really different from President Obama. The key here is that all elements are coreferential, whereas in phone book or dog tail they are not (i.e. there is not any kind of identity between phone and book, or dog and tail). On the contrary, the king himself sees a I do not see the third and fourth points of 4 as decisive. Something similar happens in (obsolete) German, where the genitive article of a modifier "blocks" the article of the head:
It can only be an As for *the dogs toy: confront Italian
The presence or absence of an article is telling us something different: generic vs. specific. That's why either modifier stays invariable independently from coda's ('tail') number or article. This is left more "implicit" i nEnglish, but it is the same structure. |
A couple of things that occur to me:
That would be bad for Semitic languages, in which compound modifiers can be pluralized (with corresponding difference in meaning), while multiple articles are prohibited (so only the head can take a determiner, unlike nmod/obl)
No, I don't think so. Historically the reason for the lack of plural modifiers in Germanic compounds is that the modifier is not a complete word, but just an uninflected stem, similar to the Greek modifiers in -o, Sanskrit in -a, etc. (cf. Greco-Roman, where "Greco" is not an inflectible, full adjective form). In languages that do allow compounding with plural modifiers, genericity is not necessarily implied by pluralization, and vice versa. Here are some examples from Hebrew:
These examples are annotated as compounds in UD Hebrew and the same is done in UD Arabic. In those languages, the main distinguishing property of compounding is the use of a single article for the entire nominal construction, which is placed between the head and modifier.
These are all indeed nmod, but in German the compound form would be spelled together: "das Frauengesicht", so in German UD almost never needs to use the compound relation for nominal compounds. |
Note the title of this issue mentions English. I think it is inevitable that |
Oh, yes, good point! So yes, I think ordinarily English noun compounds do not allow pluralization of the modifier, but there are occasional exceptions, usually where you would say the canonical form would have had singular (esp. for irregular plurals, e.g. "mice shit" - from GUM!), for plurale tantum nouns we conventionally tag as NNS (e.g. "data") or cases that are conventionally pluralized ("special ops mission"). But mostly there is a strong tendency for the modifier to appear as 'singular' without necessarily meaning something singular. |
I will try to explain myself better. In general, I think we could see the gradual loss of morphological markings (including some determiners like articles in a wider sense) as a way to express a shift towards a more generic reference: for example, in Greek, as you mention, we see a different behaviour between ξυλόφωνο 'xylophone', with ξυλ-ο instead of the genitive ξύλου, and Κονσταντινούπολη 'Constantinople', with the "regular" genitive of Κονσταντίνος 'Constantine'. It is clear that the former refers to generic wood or wooden sticks or similar, while the second refers to a very specific Constantine. It is a possible strategy that some languages have and it might be a general tendency, but, as it mixes morphologic with semantic considerations, this might not mean all languages use it the same way or use it at all. The Hebrew sentences might be showing us this. In the end, it is exactly the goal to take into account this variety of approaches that I see as the major point for favouring a transversal For example, I know there has been (is?) discussion about whether (German) compounds such as Frauengesicht should be split or not for syntactical annotation, and I am quite sure something in the way of splitting has been done for Sanskrit, but I don't know the exact details thereof (note: we could actually take this -o-, -en-, etc. segments as a kind of "compounding inflection", and if I am not mistaken something similar has been done for Sanskrit). But if Frauengesicht were indeed to be split, should it not use I mean: does Sorry for the long posts! And please pardon me if you think I am going overboard. |
I think the rationale for |
@nschneid again you are right, sorry for highjacking it! Let's start a new issue! 🙂 |
Just comment here about the issue in UniversalDependencies/UD_English-EWT#133 What is the guideline for names of people and organizations with function words like “Universidade Federal do Rio de Janeiro” I fell like annotating the contraction “de+o” as flat and not case and det confuse parsers! Does it make sense? But I can understand the annotation of “Roberto da Silva Júnior” as flat(Silva, Roberto) with det(a,Silva) and case(de,Silva) |
@arademaker in English or in Portuguese? Per https://universaldependencies.org/u/dep/flat.html, foreign names should be If the text is in Portuguese, I take it “Universidade Federal do Rio de Janeiro” uses regular syntax for the name, so it wouldn't need to be |
Portuguese. But what about names like the one above “Roberto da Silva Júnior”? For the organization name, I can see a complete syntax analysis without flat. But for names like this, I need flat to connect Silva to Roberto. But I don’t like “de“ and “a” as flat ... |
So I think flat(Roberto, Júnior), nmod(Roberto, Silva), case(Silva, da) would technically work, essentially saying the flat expression is "Roberto Júnior" and it has a modifier "da Silva". If a non-initial noun in the name had a PP modifier, that would be a problem. Does it arise in Portuguese? |
Oh, just notice my mistake in the syntax of the examples above Thank you, so if I get it right, because “da” was introducing “da Silva” you make it nmod of “Roberto”. For this particular case, this avoid the problem of the issue UniversalDependencies/UD_English-EWT#133 since Roberto is the head of the flat structure. But I can also have names where “da Silva” would modify not the first name: “Roberto Paulo da Silva Júnior” sorry, this example precisely answer your last question! Yes, we have these cases in Portuguese. |
So it should be Roberto [Paulo da Silva] Júnior? Yeah if the PP is analyzed I think this would break the idealized notion of a flat structure under the current guidelines. Maybe the rule should be relaxed to say that flat dependents usually do not have any dependents of their own, but this would be an exception mixing linear order flat (Roberto + Paulo + Júnior) with an internal PP modifier. |
What are the phrases in “Roberto da Silva Júnior”? Clearly "da Silva" is a phrase, so case(Silva, da). But "da Silva" is a family name that works exactly as Rodriguez in "Roberto Rodriguez". If we have flat(Roberto, Rodriguez), we must have flat(Roberto, Silva). |
This is a fast-moving discussion. :) Apparently there was a consensus that this restriction on flat is too strict; it should allow internal modification. So we can use |
As for Júnior, I would prefer to treat it as some sort of |
Please see #543 for proposed new guidelines on |
"Ludwig van Beethoven" should be analyzed as "Roberto da Silva". Whatever the language where it appears, "van Beethoven" is analyzed as a family name, so we must have a flat relation between Ludwig and "van Beethoven".
This is justified because the internal structure is (Ludwig)(van Beethoven) and both relations are flat. |
@sylvainkahane Linguistically speaking I see the logic there—each subunit in the name would be separate—but since the |
Even setting aside the subtypes, we have to balance the extra expressivity that would afford a very precise analysis (such as making "van Beethoven" a subunit) vs. a simple and enforceable rule that will prevent errors like a chained |
The only problem that I can anticipate is that different annotators would have different intuitions about the internal structures of the names: “Roberto [Paulo da Silva] Júnior” vs “[Roberto Paulo] da Silva] Júnior” vs ... and the decision is outside any syntactic theory... |
Of course “Roberto Paulo da Silva Júnior” is syntactically ambiguous out of context, as many sentences ("I saw the man with a telescope" and so on). But the annotator must decide what is the most probable analysis given the context. |
Many of the issues we've been wrestling with lately for English revolve around the somewhat vague definition of
compound
in noun phrases and how to delineate it from relations likeamod
(though this may be resolved in #756),nmod
,appos
, andflat
.I want to use this as a meta-thread to try to document the criteria that have been proposed and examples that are difficult to resolve. Let's debate specific details elsewhere.
By current guidelines,
compound
requires a distinct head, as opposed toflat
andappos
which are by definition left-to-right. However, whether an expression has a distinct head is not always obvious.amod
, even if the expression falls under the broader traditional definition of compound (e.g. "hot dog", pronounced with stress on the first word).compound
andnmod
for English:nmod
usually has prepositional or possessive case marking: the dog's tail, the tail of the dog;compound
does not: the dog tailnmod:tmod
may be an exception to this requirement (e.g. Syntax for "you guys" amir-zeldes/gum#71)nmod
can be pluralized: the dogs' toy;compound
modifiers ordinarily cannot: *the dogs toynmod
can have its own determiner: the tail of a dog; but possessivenmod
andcompound
modifiers cannot: *the a dog's tail; *the a dog tailappos
applies for two nominals that are adjacent and reversible modulo punctuation:flat
, though the first part is arguably a modifier of the second (le président Macron and President Trump: flat? #503, compound/flat inconsistency UD_English-EWT#59, Syntax for "you guys" amir-zeldes/gum#71):compound:nn
appos
examples: my brother Sam's dog, *my brother, Sam's, dog; my brother Sam's here, *my brother, Sam's, here. Should this be ignored as a simple matter of punctuation?The text was updated successfully, but these errors were encountered: