Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
bylinina committed Oct 15, 2023
1 parent 008945a commit a55730f
Show file tree
Hide file tree
Showing 30 changed files with 1,536 additions and 40 deletions.
Binary file added _images/adjectives_verbs.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/cat_set.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/cat_tc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/direct_indirect.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/entities.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/entities_mapping.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/function_application.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/language_examples.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/meaning_relations.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/multilingual_resources.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/possible_situations.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/possible_situations_mapping.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/possible_situations_mapping_indirect.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/predicate_conjunction.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/pronouns.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/proper_names.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/proposition.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/proposition2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/proposition3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/vodka_ambiguity.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 5 additions & 5 deletions _sources/ling_puzzles.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# Linguistic puzzles
# Appendix: Linguistic puzzles

This page contains linguistic puzzles grouped by topic. I am not the author of these puzzles! I use them in my course course as a tool for practicing linguistic analysis: inferring underlying rules and structures from linguistic data. These puzzles are one of several activities offered during seminars / practical sessions.
This page contains three linguistic puzzles grouped by topic: phonetics/phonology, morphology, syntax. I am not the author of these puzzles! I use them in my course as a tool for practicing linguistic analysis: inferring underlying rules and structures from linguistic data. These puzzles are one of several activities offered during seminars / practical sessions. The puzzles appeared online in collections of linguistic puzzles (in Russian) and I translated them into English and leave them here as examples of student activity for different topics. If you are a teacher reading this and want to use these puzzles, make sure to attribute them correctly (not to me!); and if you think I should take them down, let me know and I will!


## Puzzle 1: Stress in Muscogee


`````{margin}
Puzzle by Aleksey Pegushev.
Puzzle by Aleksey Pegushev ([source](https://elementy.ru/problems/1885/Krik_dushi))
`````

This puzzle is linked to the topic of weeks 2 and 3 'Transmitting and capturing language'. Here, you will be given (limited) linguistic data, and your task is to uncover a rule that explains these data. Moreover, the data is unlocked in three portions, each new portion of data challenging your previous generalization. The language this puzzle focusses on is the [Muscogee language](https://en.wikipedia.org/wiki/Muscogee_language), a language that has roughly 5000 speakers and belongs to the Muskogean language family in North America. Presumably, this is a language you don't know much or anything about. That's the whole point! Existing Large Language Models also don't have any knowledge about Muscogee and actually weren't able to solve this puzzle last time I checked. Enjoy!
Expand Down Expand Up @@ -161,7 +161,7 @@ In the third group, long vowels appear. Syllables with long vowels behave exactl
This puzzle is related to the topic of Week 4: Morphology. It allows you to dive deeper into morphological analysis of a set of forms that combine a verbal stem and different verbal affixes -- and as a result, given the data, you come up with a rule behind these combinations. This is a direct extension of what we talked about in class.

`````{margin}
Puzzle by Anton Somin.
Puzzle by Anton Somin ([source](https://elementy.ru/problems/1324/Kolenopreklonennyy_verblyud))
`````

The table below contains verbs in Arabic in different forms, all of them 2nd person singular:
Expand Down Expand Up @@ -262,7 +262,7 @@ Now we can fill in the gaps in the tables.
This puzzle is meant to give hands-on experience in analyzing the structure of simple sentences, and in this it connects mostly to the topic of Week 6: Syntax. The puzzle is based on data from [Straits Salish](https://en.wikipedia.org/wiki/North_Straits_Salish_language) -- a language with a very different syntax from the familiar languages of the European standard.

`````{margin}
Puzzle by Peter Arkadiev.
Puzzle by Peter Arkadiev ([source](https://elementy.ru/problems/1404/Ya_vizhus_vorom))
`````
The table below contains sentences in Straits Salish and their English translations.

Expand Down
903 changes: 903 additions & 0 deletions _sources/week6.ipynb

Large diffs are not rendered by default.

10 changes: 0 additions & 10 deletions _sources/week6.md

This file was deleted.

48 changes: 47 additions & 1 deletion _sources/week7.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,56 @@
# Week 7. Outro

<big>To be unlocked October 16th.</big>
This is the wrap-up of the course. This week's agenda is seminar-like recap of the most difficult bits of content (according to students!), exam prep, the remaining student presentations and a bit of an outlook -- things I wish we had time to cover, but we unfortunately didn't.

First, let's take a bird's-eye view on the things that we **did** cover in the very short time of 6 weeks. Our main goal was to discuss the fundamentals of linguistics and to relate these fundamentals -- wherever possible -- to questions, tasks, resources and models in natural language processing and language technology. We talked about different linguistic modalities: spoken language, written language and signed language (but, unfortunately, did not talk in much detail about the latter); how words are built from morphemes, how sentences are built from words, and some ways to approach linguistic meaning. We discussed and sometimes tried out different ways to arrive at a linguistic analysis (experiment, introspection, analysis of text corpora etc.). We saw different ways such analysis can look like (a list of phonemes, a rule of morpheme attachment, a tree, a dependency structure, a function from a set to its subset etc.). I hope this makes a decent starting point for whatever language-related work comes after this course.

At this point, I want to mention things we did **not** cover systematically in this course but are extremely important -- both as separate linguistic sub-disciplines and as sources of insights when building and evaluating language technology of different types:

- **Cross-linguistic variation**. Throughout the course, the material has been selected in a way that emphasizes the many potential ways for language to organize its systems -- sounds, grammar -- and that more than one way is indeed usually attested across languages, while at the same time there are limits to this variability. I really can't stress this enough: languages are different. I wish we had a week or two to focus just on this: this seemingly obvious fact is surprisingly often overlooked when it comes to both linguistic theory and NLP systems. Scaling things in the dimension of multilinguality is very much an unsolved task. Here is a relevant overview from 2020, and what they say there still stands in 2023.

> Joshi, Santy et al. 2020. [The State and Fate of Linguistic Diversity and Inclusion in the NLP World](https://aclanthology.org/2020.acl-main.560). ACL 58.
```{margin}
I thank David Dalé for suggesting this particular paper to me.
```
The paper groups languages into 6 classes with respect to availability of different types of data for these languages and therefore how included they are predicted to be in the processes of NLP and language technology development:

```{margin}
Language Resource Distribution: The size of the gradient circle represents the number of languages in the class. The color spectrum VIBGYOR represents the total speaker population size from low to high. Figure from [Joshi et al. 2020](https://aclanthology.org/2020.acl-main.560).
```
```{image} ./images/multilingual_resources.png
:alt: multilingual resources
:class: bg-primary mb-1
:width: 350px
:align: center
```
Language groups range from **class 0** ('The Left-Behinds', languages with exceptionally limited resources that are basically ignored when it comes to language technology) to **class 5** ('The Winners', languages that have dominant online presence and not only massive proportion of language technology built for them but also the potential for even more, given the amount of resources). Strikingly, almost 90% of languages belong to class 0 and less than 1% belong to class 5. Definitely something to think about in terms of its consequences.

```{image} ./images/language_examples.png
:alt: examples of language classes
:class: bg-primary mb-1
:width: 650px
:align: center
```
<br>


- **Language acquisition**. We didn't talk at all about how people acquire language, and that's really unfortunate because this is a topic of a lot of recent debate in relation to NLP. Typically developing children acquire language surprisingly fast during the first several years of life. It is even more impressive if we think about the fact that linguistic input they get during this time is quite limited, both quantitatively and in coverage of particular types of information that is crucial for learning some properties of language. This observation (**the poverty of stimulus**) has been a foundational argument in favour of linguistic knowledge being partly innate: maybe people are born with predisposition for certain properties of communicative systems, so they don't need to learn all of it from input? Recent language models seem to be very good at language, and they are certainly not 'born' with any prior linguistic knowledge -- maybe the innateness idea is wrong? This debate is as hot as ever now. I am not taking sides! I, of course, have an opinion -- but this is not the place. Anyway, too bad we couldn't talk about this debate! For those who are interested -- read these two papers in this specific order:

> Piantadosi, S. 2023. [Modern language models refute Chomsky’s approach to language](https://lingbuzz.net/lingbuzz/007180). <br> Katzir, R. 2023. [Why large language models are poor theories of human linguistic cognition. A reply to Piantadosi (2023)](https://ling.auf.net/lingbuzz/007190).

- **Language production and processing**. The information in the course has been organized as an exploration of abstract rules of an abstract system: How can we formulate the laws of two morphemes combining together, or what's the structure of a sentence? But that might seem strange (and it probably did!) -- because language is not just something floating in abstract vacuum, language is what people **do** things with: they process it, they produce it, they memorize bits of it, they give commands, make promises and so on. How do they do it? What's the relation between this abstract system we talked about and its use in practice? What happens in people's brains when they produce or process language? What's the relation between linguistic behaviour and other cognitive systems, such as memory or attention? That's particularly important in comparison to what artificial linguistic agents do: what should we expect from such systems, how similar or different from us they are expected to be and in what ways?

- **Sociolinguistics**. Again, it's important to emphasize and study further dimensions in which language is not homogeneous. Practically all levels of language interact with our social personae, and this is a two-way interaction: not only is the choice of lingusitic means affected by who we are, but the other way around as well -- we use language to construct our social identities. Knowing more about how this works helps when we go back to language technology and are then able to ask more informed questions about fairness in NLP systems: what groups of speakers they are tailored to and how do we find out? And after we do find out, what's the next step?

- **Language and thought**. During one of student presentations, the Sapir-Whorf hypothesis (a.k.a. [linguistic relativity](https://en.wikipedia.org/wiki/Linguistic_relativity)) was mentioned, and I felt bad that we didn't have time to talk about it more. Does language shape the way we think? The answer I think really depends on what we mean by this. There is a dangerous path here that leads to exotisation and alienation: sometimes pretty superficial differences in language systems lead to considering speakers of other languages equipped with qualitatively different conceptual tools. Could some weaker version of linguistic relativity be justified? I don't know. There's been some revival of this topic recently, check out [the TED talk by Lera Boroditsky](https://www.youtube.com/watch?v=RKK7wGAYP6k), but approach it with caution!

- **Language change and language history**. A lot of properties of language do not fall under clear and logical rules. One of the reasons for this is that language is constantly undergoing change -- and what we see when we look at language at any particular time point is a snapshot of a dynamic system, which is often best described in terms of stages of different historic processes. Language change is not random, it obeys its own laws and shows paths and tendencies. An exciting topic that we had to completely leave out.

- **Beyond natural language**. We started the course with a definition of linguistics that limited our attention to natural human language. That's great, but it's also limiting in that we miss comparison with other types of systems -- artificial human languages, natural non-human languages, other structured systems that are not necessarily communicative systems in the same way as language is, but share a lot of the organizational principles with language. We had a brief moment of reflection on some of these topics -- recall the discussion of Hectapod language during student presentations! But a systematic discussion was missing. If you are interested in these topics, I think you should look at this exciting recent book:

> Schlenker, P. 2022. [What It All Means: Semantics for (Almost) Everything](https://mitpress.mit.edu/9780262047432/what-it-all-means/).

I guess that's all I want to say! I hope it was useful.
16 changes: 8 additions & 8 deletions ling_puzzles.html
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.18.1: http://docutils.sourceforge.net/" />

<title>Linguistic puzzles &#8212; Linguistics for Language Technology</title>
<title>Appendix: Linguistic puzzles &#8212; Linguistics for Language Technology</title>



Expand Down Expand Up @@ -315,7 +315,7 @@


<div id="jb-print-docs-body" class="onlyprint">
<h1>Linguistic puzzles</h1>
<h1>Appendix: Linguistic puzzles</h1>
<!-- Table of contents -->
<div id="print-main-content">
<div id="jb-print-toc">
Expand Down Expand Up @@ -352,14 +352,14 @@ <h2> Contents </h2>
<div id="searchbox"></div>
<article class="bd-article" role="main">

<section class="tex2jax_ignore mathjax_ignore" id="linguistic-puzzles">
<h1>Linguistic puzzles<a class="headerlink" href="#linguistic-puzzles" title="Permalink to this heading">#</a></h1>
<p>This page contains linguistic puzzles grouped by topic. I am not the author of these puzzles! I use them in my course course as a tool for practicing linguistic analysis: inferring underlying rules and structures from linguistic data. These puzzles are one of several activities offered during seminars / practical sessions.</p>
<section class="tex2jax_ignore mathjax_ignore" id="appendix-linguistic-puzzles">
<h1>Appendix: Linguistic puzzles<a class="headerlink" href="#appendix-linguistic-puzzles" title="Permalink to this heading">#</a></h1>
<p>This page contains three linguistic puzzles grouped by topic: phonetics/phonology, morphology, syntax. I am not the author of these puzzles! I use them in my course as a tool for practicing linguistic analysis: inferring underlying rules and structures from linguistic data. These puzzles are one of several activities offered during seminars / practical sessions. The puzzles appeared online in collections of linguistic puzzles (in Russian) and I translated them into English and leave them here as examples of student activity for different topics. If you are a teacher reading this and want to use these puzzles, make sure to attribute them correctly (not to me!); and if you think I should take them down, let me know and I will!</p>
<section id="puzzle-1-stress-in-muscogee">
<h2>Puzzle 1: Stress in Muscogee<a class="headerlink" href="#puzzle-1-stress-in-muscogee" title="Permalink to this heading">#</a></h2>
<aside class="margin sidebar">
<p class="sidebar-title"></p>
<p>Puzzle by Aleksey Pegushev.</p>
<p>Puzzle by Aleksey Pegushev (<a class="reference external" href="https://elementy.ru/problems/1885/Krik_dushi">source</a>)</p>
</aside>
<p>This puzzle is linked to the topic of weeks 2 and 3 ‘Transmitting and capturing language’. Here, you will be given (limited) linguistic data, and your task is to uncover a rule that explains these data. Moreover, the data is unlocked in three portions, each new portion of data challenging your previous generalization. The language this puzzle focusses on is the <a class="reference external" href="https://en.wikipedia.org/wiki/Muscogee_language">Muscogee language</a>, a language that has roughly 5000 speakers and belongs to the Muskogean language family in North America. Presumably, this is a language you don’t know much or anything about. That’s the whole point! Existing Large Language Models also don’t have any knowledge about Muscogee and actually weren’t able to solve this puzzle last time I checked. Enjoy!</p>
<section id="problem-1">
Expand Down Expand Up @@ -553,7 +553,7 @@ <h2>Puzzle 2: Imperatives in Arabic<a class="headerlink" href="#puzzle-2-imperat
<p>This puzzle is related to the topic of Week 4: Morphology. It allows you to dive deeper into morphological analysis of a set of forms that combine a verbal stem and different verbal affixes – and as a result, given the data, you come up with a rule behind these combinations. This is a direct extension of what we talked about in class.</p>
<aside class="margin sidebar">
<p class="sidebar-title"></p>
<p>Puzzle by Anton Somin.</p>
<p>Puzzle by Anton Somin (<a class="reference external" href="https://elementy.ru/problems/1324/Kolenopreklonennyy_verblyud">source</a>)</p>
</aside>
<p>The table below contains verbs in Arabic in different forms, all of them 2nd person singular:</p>
<ul class="simple">
Expand Down Expand Up @@ -717,7 +717,7 @@ <h2>Puzzle 3: Simple sentences in Straits Salish<a class="headerlink" href="#puz
<p>This puzzle is meant to give hands-on experience in analyzing the structure of simple sentences, and in this it connects mostly to the topic of Week 6: Syntax. The puzzle is based on data from <a class="reference external" href="https://en.wikipedia.org/wiki/North_Straits_Salish_language">Straits Salish</a> – a language with a very different syntax from the familiar languages of the European standard.</p>
<aside class="margin sidebar">
<p class="sidebar-title"></p>
<p>Puzzle by Peter Arkadiev.</p>
<p>Puzzle by Peter Arkadiev (<a class="reference external" href="https://elementy.ru/problems/1404/Ya_vizhus_vorom">source</a>)</p>
</aside>
<p>The table below contains sentences in Straits Salish and their English translations.</p>
<table class="table">
Expand Down
Binary file modified objects.inv
Binary file not shown.
2 changes: 1 addition & 1 deletion searchindex.js

Large diffs are not rendered by default.

Loading

0 comments on commit a55730f

Please sign in to comment.