Separate training and runtime attention #174

cifkao · 2016-12-02T09:35:46Z

Should fix the issue mentioned here:

The problem is that the same Attention object is used for constructing both the training and runtime parts of the graph, and therefore attentions_in_time contains tensors from both parts. Because I'm using attentions_in_time for visualization, I was getting images like this:

jlibovicky

I am sorry, but this is not a good solution for this issue. The reason is that if we would like to use the source sentence coverage (i.e., cumulative attention over time) to compute next attention distribution (as in this paper), we would get into troubles. The correct way will be instantiate two attention object for each way of running the decoder and collect the attentions_in_time only from one of them.

I admit, it will probably require a bigger intervention, but it is definitely worth it.

cifkao · 2016-12-03T12:34:58Z

The correct way will be instantiate two attention object for each way of running the decoder and collect the attentions_in_time only from one of them.

I'm not sure if I understand this correctly. Did you mean to say "one attention object for each way of running the decoder"? That will probably be a cleaner solution, but it won't make the TF computation any different, will it? Both Attention objects will be doing the same thing, except that each one will have a different attentions_in_time.

cifkao · 2016-12-03T15:32:47Z

Is this what you had in mind? It still seems to me that with regard to CoverageAttention, there is really no difference from my first commit.

I made the last commits that need a reivew.

jlibovicky · 2016-12-08T13:54:33Z

After the recent changes, this PR:

introduces the notion of Attentive a class that has a function that is a common ancestor of all attentive encoders (note Python 3 allows multiple inheritance), creating an attention object is centralized there
decoders don't take the pre-made attention_object from an encoder, rather call a get_attention_object method that instantiates one for them

The benefit of having the attention object like this are:

it solves the bug from collecting attentions from both training and runtime run of a RNN decoder into single collection
allows doing attention over the same encoder by multiple decoders.

jlibovicky · 2016-12-08T13:55:52Z

@cifkao, I can't add you among the reviewers, but have a look at the code and tell me whether it is what you wanted to do.

cifkao · 2016-12-08T19:10:13Z

neuralmonkey/encoders/image_encoder.py

@@ -24,7 +24,8 @@ def __init__(self, dimension, output_shape, data_id):
        self.encoded = tf.tanh(tf.matmul(self.flat, project_w) + project_b)

        self.attention_tensor = None
-        self.attention_object = None
+        self.attention_object_train = None
+        self.attention_object_runtime = None



I guess this should inherit from Attentive too.

I don't know whether we've implemented attention in this encoder. It was not used in our submission for the multimodal task, so it might be safe to ignore this for now.

cifkao · 2016-12-08T19:15:23Z

I think you forgot about image_encoder (or we should revert my changes to it). Otherwise, LGTM.

jindrahelcl · 2016-12-08T21:57:21Z

neuralmonkey/decoding_function.py

        """
        self.scope = scope
        self.attentions_in_time = []
        self.attention_states = attention_states
        self.input_weights = input_weights

-        with tf.variable_scope(scope):
+        with tf.variable_scope(scope, reuse=runtime_mode):


Tohle může bejt potenciálně nebezpečný - když ve scope nebude string ale tf.VariableScope objekt, kterej už bude mít nastavený reuse na not runtime_mode, tak si nejsem jistej, jak se to zachová. Pro jistotu bych před a za přidal nějaký rozumný asserty.

Napadá mě situace, kdy runtime_mode bude False a z nějakého důvodu vlezem podruhý do týhle funkce se stejnou scope. Pak to v lepším případě spadne, nebo v horším případě vytvoří novou sadu proměnných, nikomu to nic neřekne, a nebude se to učit správně.

A je tam vůbec nutný tohelto reuse? Když se řekne reuse o úroveň výš, tak se to přenáší i na ty vnitřní scopy, ne?
Jinak mi přijde, že by to nikdy nemělo vyrobit nový proměnný, ale vždycky spadnout.
Jak by měly vypadat ty asserty, nic rozumného mě nenapadá.

jindrahelcl · 2016-12-08T22:22:37Z

neuralmonkey/encoders/attentive.py

+
+        assert hasattr(self, "name")
+        assert hasattr(self, "_padding")
+        assert hasattr(self, "_attention_tensor")


Ač je to asi OK, nepřijde mi jako nejlepší věc diktovat z předka místo v __init__ metodě potomků, ze kterýho se bude volat super.. Navíc mi přijde, že by to asi správně mělo bejt opačně, že by se super měl volat jako první command (á la Java). Nejde to tomu vrazit jako argument konstruktoru, když to potřebuje?

Javový řešení by bylo udělat tady abstraktní proměnný, co by se musely oddědit, pak by super mohlo být první volání v tom konstruktoru. Ale přišlo mi, že by se tím jenom zbytečně prodloužil už tak dlouhej kód těch enkodérů.

Jestli myslíš, že je to tak správnější, tak já to tak s radostí přepíšu, ale sám nevím.

Ne ze bych moc rozumel vicenasobne dedicnosti v Pythonu, ale tohle by mohl byt problem, kdybychom chteli mit vic takovych trid, ktere by do enkoderu pridavaly nejakou funkcnost dedenim.

Nebylo by lepsi, aby Attentive nedelal v konstruktoru nic a inicializoval se az zavolanim nejake metody?

Jeste je moznost neresit to vubec dedicnosti.

Mělo by to být v konstruktoru kvůli tomu, jak se tvoří tensorflow graf z konfiguráku - neni tam moc prostoru pro volání dalších metod..
Ale mohl by to volat dekodér, kterej už dostává hotový enkodéry, to je fakt. To by pak i znamenalo, že pokud attention nepoužijeme, tak by se tahle část grafu v enkodéru ani zbytečně nevytvářela.

Jako že Decoder._collect_attention_objects by ještě na těch enkodérech volalo něco jako if encoder.supports_attention: encoder.create_attention_graph(), který by vracelo Attentive nebo Attention objekt.

@jlibovicky šlo by to takhle? Změnila by se tam ta hierarchie, že místo encoder is_a attentive by to bylo encoder has_a attention

Taky to můžeme nechat na pak, aby se na tenhle pull request zbytečně nenabaloval další a další kód.

Myslel jsem to tak, ze enkoder by zavolal super, ale to by nic neudelalo, a teprv dal v konstruktoru, az by to bylo pripravene, by zavolal self._init_attention().

Attention objekt se uz tak vytvari az volanim z dekoderu, jestli se nepletu.

A co by se dělalo v tom self._init_attention()? Teď se v tom konstruktoru jenom kontrolují ty hasattr. Ty když se vyhodí pryč, tak se super() může zavolat úplně na začátku konstruiktoru enkodéru. Důvod proč jsem to udělal takhle je, aby to spadlo už při inicializaci enkodéru, pokud by tam nevznikly správné tenzory a ne až při volání té metody get_attention_object, protože to se děje až v dekodéru a to by mohlo být matoucí potom při ladění konfiguráků.

jindrahelcl · 2016-12-08T22:27:16Z

neuralmonkey/encoders/attentive.py

+    def get_attention_object(self, runtime: bool=False):
+        # pylint: disable=no-member
+        if self._attention_type and self._attention_tensor is None:
+            raise Exception("Can't get attention: missing attention tensor.")


Dal bych ValueError páč to má blbou hodnotu. Nebo vytvořit v tomhle modulu nějakej specifičtější error.

jindrahelcl · 2016-12-08T22:29:18Z

neuralmonkey/encoders/cnn_encoder.py

-                                                  name),
-                                              dropout_placeholder=self.dropout_placeholder,
-                                              input_weights=att_in_weights)
+            super(CNNEncoder, self).__init__(attention_type)


tohle mě tu dráždí.. Zaprvý z důvodů, který jsem popsal nahoře, zadruhý, protože stačí super().__init__(attention_type)

jindrahelcl · 2016-12-08T22:29:56Z

neuralmonkey/encoders/factored_encoder.py

-                    dropout_placeholder=self.dropout_placeholder,
-                    input_weights=weight_tensor,
-                    max_fertility=attention_fertility)
+                super(FactoredEncoder, self).__init__(


jindrahelcl · 2016-12-08T22:31:05Z

neuralmonkey/encoders/factored_encoder.py

    def feed_dict(self, dataset, train=False):
-        factors = {data_id: dataset.get_series(data_id) for data_id in self.data_ids}
+        factors = {data_id: dataset.get_series(
+            data_id) for data_id in self.data_ids}


já to radši škubu až před for, který se pak zarovná pod data_id, ale proti gustu.. :-)

Já taky, za tohole může autotpep8, to pouštim na soubory, kde pekelně moc dlouhých řádek.

jindrahelcl · 2016-12-08T22:33:12Z

neuralmonkey/encoders/image_encoder.py

@@ -24,7 +24,8 @@ def __init__(self, dimension, output_shape, data_id):
        self.encoded = tf.tanh(tf.matmul(self.flat, project_w) + project_b)

        self.attention_tensor = None
-        self.attention_object = None
+        self.attention_object_train = None
+        self.attention_object_runtime = None



I don't know whether we've implemented attention in this encoder. It was not used in our submission for the multimodal task, so it might be safe to ignore this for now.

jindrahelcl · 2016-12-08T22:33:59Z

neuralmonkey/encoders/sentence_encoder.py

-                input_weights=self.padding,
-                max_fertility=attention_fertility) if attention_type else None
+        super(SentenceEncoder, self).__init__(
+            attention_type, attention_fertility=attention_fertility)


použít super().__init__, promyslet jak by to šlo líp bez toho

jindrahelcl · 2016-12-09T09:19:49Z

nechal bych to bejt.. 2016-12-09 9:19 GMT+00:00 Jindřich Libovický <[email protected]>:

…

***@***.**** commented on this pull request. ------------------------------ In neuralmonkey/encoders/attentive.py <#174>: > @@ -0,0 +1,25 @@ +# pylint: disable=too-few-public-methods +class Attentive(object): + def __init__(self, attention_type, **kwargs): + self._attention_type = attention_type + self._attention_kwargs = kwargs + + assert hasattr(self, "name") + assert hasattr(self, "_padding") + assert hasattr(self, "_attention_tensor") Jestli myslíš, že je to tak správnější, tak já to tak s radostí přepíšu, ale sám nevím. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#174>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABwcs_06ofEc-jX3L0WSxlqUXxG42Se8ks5rGR0NgaJpZM4LCXBy> .

jindrahelcl · 2016-12-09T09:31:08Z

prislo mi uzitecny delat assert not scope.reuse těsně před tim, než se volá reuse_variables, abys věděl, že něco děláš zbytečně. Ale může to bejt i schválně, tak podle toho. Nutný tam neni, jestli se dělá o úroveň výš. 2016-12-09 9:22 GMT+00:00 Jindřich Libovický <[email protected]>:

…

***@***.**** commented on this pull request. ------------------------------ In neuralmonkey/decoding_function.py <#174>: > """ self.scope = scope self.attentions_in_time = [] self.attention_states = attention_states self.input_weights = input_weights - with tf.variable_scope(scope): + with tf.variable_scope(scope, reuse=runtime_mode): A je tam vůbec nutný tohelto reuse? Když se řekne reuse o úroveň výš, tak se to přenáší i na ty vnitřní scopy, ne? Jinak mi přijde, že by to nikdy nemělo vyrobit nový proměnný, ale vždycky spadnout. Jak by měly vypadat ty asserty, nic rozumného mě nenapadá. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#174>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABwcszKRTUR3HM6G8GD0Dcb0W6rV3Fs3ks5rGR3IgaJpZM4LCXBy> .

jindrahelcl · 2016-12-09T11:15:14Z

Jo, to je fakt, to by asi šlo a asi by to bylo i hezčí.. 2016-12-09 11:10 GMT+00:00 Ondřej Cífka <[email protected]>:

…

***@***.**** commented on this pull request. ------------------------------ In neuralmonkey/encoders/attentive.py <#174>: > @@ -0,0 +1,25 @@ +# pylint: disable=too-few-public-methods +class Attentive(object): + def __init__(self, attention_type, **kwargs): + self._attention_type = attention_type + self._attention_kwargs = kwargs + + assert hasattr(self, "name") + assert hasattr(self, "_padding") + assert hasattr(self, "_attention_tensor") Myslel jsem to tak, ze enkoder by zavolal super, ale to by nic neudelalo, a teprv dal v konstruktoru, az by to bylo pripravene, by zavolal self._init_attention(). Attention objekt se uz tak vytvari az volanim z dekoderu, jestli se nepletu. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#174>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABwcs-pcxMgVWytwU_2BfspuVr5CMJdkks5rGTcagaJpZM4LCXBy> .

jindrahelcl · 2016-12-09T17:01:18Z

A nemohla by ta metoda `get_attention_object` mít ty tři věci jako argument? 2016-12-09 14:59 GMT+00:00 Jindřich Libovický <[email protected]>:

…

***@***.**** commented on this pull request. ------------------------------ In neuralmonkey/encoders/attentive.py <#174>: > @@ -0,0 +1,25 @@ +# pylint: disable=too-few-public-methods +class Attentive(object): + def __init__(self, attention_type, **kwargs): + self._attention_type = attention_type + self._attention_kwargs = kwargs + + assert hasattr(self, "name") + assert hasattr(self, "_padding") + assert hasattr(self, "_attention_tensor") A co by se dělalo v tom self._init_attention()? Teď se v tom konstruktoru jenom kontrolují ty hasattr. Ty když se vyhodí pryč, tak se super() může zavolat úplně na začátku konstruiktoru enkodéru. Důvod proč jsem to udělal takhle je, aby to spadlo už při inicializaci enkodéru, pokud by tam nevznikly správné tenzory a ne až při volání té metody get_attention_object, protože to se děje až v dekodéru a to by mohlo být matoucí potom při ladění konfiguráků. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#174>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABwcsxP6JFx98c0kSSXzecZ6O1yhoFcnks5rGWztgaJpZM4LCXBy> .

jlibovicky · 2016-12-09T19:13:16Z

To by mohla, bylo by to asi 1. nepřehlednější, 2. neodstraňuje se tím to, že by to padalo při konstrukci dekodéru. Zatím se mi asi líbí Javovský nápad: attentive bude mít astraktní properties a enkodéry je budou přetěžovat.

jlibovicky · 2016-12-12T13:07:27Z

Teď je tu varianta Attentive s abstraktními properties. Komentáře, připomínky?

cifkao · 2016-12-12T13:14:22Z

neuralmonkey/encoders/attentive.py

+    @property
+    def _attention_tensor(self):
+        """Tensor over which the attention is done."""
+        raise NotImplementedError(


Proč nepoužít @abstractmethod?

jindrahelcl · 2016-12-12T13:42:10Z

neuralmonkey/encoders/attentive.py

+
+# pylint: disable=too-few-public-methods
+class Attentive(object):
+    def __init__(self, attention_type, **kwargs):


Chtělo by to ještě doplnit docstringy, ale už nechci zdržovat merge.

jindrahelcl · 2016-12-12T13:43:35Z

neuralmonkey/encoders/cnn_encoder.py

                encoder_state = tf.concat(
                    1, [encoder_state, backward_encoder_state])

            self.encoded = encoder_state

-            self.attention_tensor = \
+            self.__attention_tensor = \


dvě podtržítka? Neni to moc?
plus, backslash jde vyrefakorovat do konce řádky za reshape(

Dvě podrtžítka, protože private a ne protected.

jindrahelcl · 2016-12-12T13:44:22Z

neuralmonkey/encoders/cnn_encoder.py

+
+    @property
+    def _attention_mask(self):
+        return self.__attention_weights


mask, weights, jedno podtřžítko dvě podtržítka, to je nějakěj bordel teda

Z dvěma podržítkama je to private field, s jedním je to property odděděná od attentive.

Zvážil bych ještě používání cached_property, pak by tam k tomu nemusly být zvlášť ty attributes. Vůbec, mám pocit, že zvaedení těhle chached properties by mohlo dost zpřehlednit a zmodulárnět kód. Vyhlásím k tomu diskutovací issue.

Já jsem asi pro.

jindrahelcl · 2016-12-12T13:45:00Z

neuralmonkey/encoders/sentence_encoder.py

-            self.attention_tensor = tf.concat(2, outputs_bidi_tup)
-            self.attention_tensor = self._dropout(self.attention_tensor)
+            self.__attention_tensor = tf.concat(2, outputs_bidi_tup)
+            self.__attention_tensor = self._dropout(self._attention_tensor)


jindrahelcl · 2016-12-12T13:45:05Z

neuralmonkey/encoders/sentence_encoder.py

@@ -112,16 +119,16 @@ def _create_input_placeholders(self):
        self.inputs = tf.placeholder(tf.int32, shape=[None, self.max_input_len],
                                     name="encoder_input")

-        self.padding = tf.placeholder(
+        self.__input_weights = tf.placeholder(


Weights přejmenuju na mask.

jindrahelcl · 2016-12-12T14:19:28Z

neuralmonkey/encoders/sentence_encoder.py

@@ -91,7 +91,7 @@ def __init__(self,
                dtype=tf.float32)

            self.__attention_tensor = tf.concat(2, outputs_bidi_tup)
-            self.__attention_tensor = self._dropout(self._attention_tensor)
+            self.__attention_tensor = self._dropout(self.__attention_tensor)


a to před tim prošly testy?

Co by neprošlo - akorát to šlo oklikou přes tu property místo toho, aby se volal rovnou ten atribut (měl jsem kliku, že to dělalo to samý, ale bylo to sémanticky blbě).

jo no jo.. hm..

This reverts commit 64b299f.

jlibovicky previously requested changes Dec 2, 2016

View reviewed changes

jlibovicky force-pushed the fix-attentions-in-time branch 2 times, most recently from 7d017dc to c04b76e Compare December 8, 2016 13:40

jlibovicky requested a review from jindrahelcl December 8, 2016 13:54

jlibovicky self-assigned this Dec 8, 2016

cifkao commented Dec 8, 2016

View reviewed changes

jindrahelcl requested changes Dec 8, 2016

View reviewed changes

jlibovicky force-pushed the fix-attentions-in-time branch from 0d2a8d8 to b318c4f Compare December 12, 2016 09:49

cifkao commented Dec 12, 2016

View reviewed changes

jindrahelcl requested changes Dec 12, 2016

View reviewed changes

jindrahelcl approved these changes Dec 12, 2016

View reviewed changes

jindrahelcl reviewed Dec 12, 2016

View reviewed changes

cifkao and others added 6 commits December 12, 2016 15:22

Separate training and runtime attention

35ddf28

Revert "Separate training and runtime attention"

d93d9fe

This reverts commit 64b299f.

Separate attention objects for training and runtime

0ae8ce6

fix after rebase

fe66c3d

attentive base class

1591c3a

make encoders subclasses of attentive + a bit of style

723e5cc

jlibovicky added 6 commits December 12, 2016 15:22

fixes

49f1631

don't call reuse for the attention, rely on higher places

5893d08

fix after rebase

e2223ae

attentive with abstract properties

5992dc5

nicer code

0aa02a6

fix

cb69c61

jlibovicky force-pushed the fix-attentions-in-time branch from d4af86f to cb69c61 Compare December 12, 2016 14:23

jlibovicky merged commit 52a851d into master Dec 12, 2016

jlibovicky deleted the fix-attentions-in-time branch December 12, 2016 14:48

Separate training and runtime attention #174

Separate training and runtime attention #174

Conversation

cifkao commented Dec 2, 2016 • edited Loading

jlibovicky left a comment • edited Loading

Choose a reason for hiding this comment

cifkao commented Dec 3, 2016 • edited Loading

cifkao commented Dec 3, 2016

jlibovicky commented Dec 8, 2016

jlibovicky commented Dec 8, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cifkao commented Dec 8, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jindrahelcl commented Dec 9, 2016 via email

jindrahelcl commented Dec 9, 2016 via email

jindrahelcl commented Dec 9, 2016 via email

jindrahelcl commented Dec 9, 2016 via email

jlibovicky commented Dec 9, 2016

jlibovicky commented Dec 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cifkao commented Dec 2, 2016 •

edited

Loading

jlibovicky left a comment •

edited

Loading

cifkao commented Dec 3, 2016 •

edited

Loading