Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coordination in Kazakh and Basque #189

Closed
ftyers opened this issue Jun 10, 2015 · 6 comments
Closed

Coordination in Kazakh and Basque #189

ftyers opened this issue Jun 10, 2015 · 6 comments

Comments

@ftyers
Copy link
Contributor

ftyers commented Jun 10, 2015

I'm sure this has come up in the documentation somewhere before, but I haven't been able to find it.

The docs write that for coordinated NPs "We take the first conjunct as the head of the coordination." When you have prepositions this is quite nice as you will have something like:

1    in    2    case 
2    Spain    0    nmod
3    ,    2    punct
4    Argentina    2    conj
5    and    2    cc
6    Chile    2    conj

So your first conjoined element has the right function. But if you have a language with postpositions, where the postposition can or must be omitted from all but the last element:

Basque:

1    Txile    Txile    nom    0    nmod 
2    ,    ,    _    1    punct
3    Argentina    Argentina    nom    2    conj
4    eta    eta    cc    2    cc
5    Espainatik    Espaina    abl    2    conj

Kazakh:

1    Мексика    Мексика    nom    0    nmod 
2    ,    ,    _    1    punct
3    Италия    Италия    nom    2    conj
4    және    және    cc    2    cc
5    Германиядан    Германия    abl    2    conj
6    кейін    кейін    post    5    case

Now the first word in the conjunct doesn't have any of the relevant morphology or dependents for its function. This is not ideal.

Would this be something that could be done on a language-dependent basis ? e.g. for Kazakh make everything depend on the last element instead of the first:

1    Мексика    Мексика    nom    5    nmod 
2    ,    ,    _    5    punct
3    Италия    Италия    nom    5    conj
4    және    және    cc    5    cc
5    Германиядан    Германия    abl    0    conj
6    кейін    кейін    post    5    case

Also, has this been written about somewhere before ? I'm sure I'm not the first person to have come across this.

@dan-zeman
Copy link
Member

This is an interesting point. I somewhat sympathetize with making the morphologically relevant conjunct the head and saying that this is required by the specific nature of the language. The only potential issue I see is that it may make more difficult writing transformations that would convert coordination to a different style - something people may want to do for parsing experiments etc. If we keep it unified, i.e. all coordinations in all languages will be headed by the first conjunct, you can still easily and deterministically access the relevant morphology, by traversing one conj edge, right?

@coltekin
Copy link
Member

Not surprisingly, this issue is also relevant to Turkish. I just want to add another example to support the case.

Besides postpositions mentioned in the original post, there are other coordinated constructions that are affected. For Turkish, quite a few suffixes are added to the final conjunct (I believe this is called delayed suffixation). So, marking final conjunct as head would be more natural.

Here are two related example sentences:

Süt  içer,      diş   fırcalar, bir hikaye dinler     ve   uyurLAR.
milk drink-AOR  teeth brush-AOR a   story  listen-AOR and sleep-AOR-3PL.
`(they) drink milk, brush (their) teeth, listen to a story, and sleep'
Süt  içer,      diş   fırcalar, bir hikaye dinler     ve   uyurDUK.
milk drink-AOR  teeth brush-AOR a   story  listen-AOR and sleep-AOR-PAST-2PL.
`we used to drink milk, brush (our) teeth, listen to a story, and sleep'

The point is that the tense and person information is present only at the last conjunct. So, it makes more sense for it to be the head and have dependency relations with the rest of the sentence. For example, if there was a overt subject, it would have to agree with the agreement marker on the last conjunct. The other coordinated clauses do not even have finite predicates in these examples.

METU-Sabancı treebank also marks the last conjunct as the head.

I would also be very much in favor of allowing language-specific decisions on the head of the coordinated structures. After all, choice of head-direction in other constructions is language-specific, and I cannot think of a reason for coordination to be treated differently.

@makazhan
Copy link

Just minor clarifications.
Can someone confirm that Kazakh examples should read:

1 Мексика Мексика nom 0 nmod
2 , , _ 1 punct
3 Италия Италия nom 1 (not 2) conj
4 және және cc 1 (not 2) cc
5 Германиядан Германия abl 1 (not 2) conj
6 кейін кейін post 5 case

1 Мексика Мексика nom 5 conj (not nmod)
2 , , _ 5 punct
3 Италия Италия nom 5 conj
4 және және cc 5 cc
5 Германиядан Германия abl 0 nmod (not conj)
6 кейін кейін post 5 case

@ftyers
Copy link
Contributor Author

ftyers commented Jun 18, 2015

The first example should read:

1 Мексика Мексика nom 0 nmod 
2 , , _ 1 punct
3 Италия Италия nom 1 conj
4 және және _ 1 cc
5 Германиядан Германия abl 1 conj
6 кейін кейін post 5 case

e.g. Everything aside from the postposition attaches to "Мексика", the first item in the list.

And the second example should read:

1 Мексика Мексика nom 5 conj 
2 , , _ 5 punct
3 Италия Италия nom 5 conj
4 және және _ 5 cc
5 Германиядан Германия abl 0 nmod
6 кейін кейін post 5 case

e.g. Everything attaches to Германиядан, including the postposition.

Thanks for spotting those errors. Usually I draw ascii art :) [1]

  1. http://wiki.apertium.org/wiki/Dependency_parsing_for_Turkic

@jnivre
Copy link
Contributor

jnivre commented Jul 1, 2015

This is an important issue that has come up also for Hungarian, I think, where there are cases where morphological agreement is expressed only on the second (or last) conjunction, indicating that this is the head of the coordination. It is also related to the issue of what should be the head in complex names, where the official guidelines say the first element, but many languages have good arguments for saying that it should be the last (typically again based on inflection).

Even more generally, I think we will need a mechanism for handling this kind of variation across languages. Sort of like a small set of parameters where different languages can choose different values.

@dan-zeman
Copy link
Member

The Uppsala meeting / coordination discussion group decided that the current rule should not be changed and the first conjunct should always be the head of coordination. See here for details:

http://universaldependencies.github.io/docs/2015-08-23-uppsala/coordination.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants