Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attention visualization #121

Merged
merged 10 commits into from
Nov 8, 2016
Merged

Attention visualization #121

merged 10 commits into from
Nov 8, 2016

Conversation

cifkao
Copy link
Member

@cifkao cifkao commented Oct 30, 2016

Soft alignment visualization (#108) for the bahdanau branch.

Copy link
Contributor

@jlibovicky jlibovicky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, thanks a lot! Please pay attention to the tests. One of the unit tests is failing because of the TensorBoard summaries (it issue can be also in the test). The integration test is failing because it cannot download the data it should work with. It should get fixed when you rebase with master.

@@ -47,6 +48,12 @@ def attention_decoder(decoder_inputs, initial_state, attention_objects,
outputs.append(output)
states.append(state)

if summary_collections:
for i, a in enumerate(attention_objects):
alignments = tf.expand_dims(tf.transpose(tf.pack(a.attentions_in_time), perm=[1, 2, 0]), -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is too long. We check the style of the files using pylint. If you run tests/lint_run.sh, it will check all python files get you a detailed report.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is general enough. If you look e.g, at the Recurrent Neural Machine Translation paper, they use GRU net do the attention instead of the softmax weighting and come with a clever way visualizing this attention. i think that reserving the attention_in_time field for the visualization purpose should work even for the recurrent attention, but please check if it is so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just FYI, the line length used by me is 80. I will add it to the documentation for developers

@@ -223,9 +223,10 @@ def training_loop(sess, saver,

if step % validation_period == validation_period - 1:
decoded_val_sentences, decoded_raw_val_sentences, \
val_evaluation = run_on_dataset(
val_evaluation, val_images = run_on_dataset(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that the inputs of NM can be also images (we did image captioning with it previously), name images may be confusing. What about plots, visualizations, ... I don't know, maybe even images are OK.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about out_images?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd stick with val_plots.

@@ -29,7 +29,7 @@ name=translation
output=tests/tmp-test-output
overwrite_output_dir=True
batch_size=16
epochs=2
epochs=5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 5?

@jlibovicky
Copy link
Contributor

@cifkao: Can you rebase the branch with master and find out whether it will fix the failing unit tets?

@jindrahelcl
Copy link
Member

jindrahelcl commented Nov 5, 2016

@jlibovicky @cifkao I think the best solution is maybe to cherry-pick the commit that adds the testing data to the repository: ab0a0ba

@cifkao
Copy link
Member Author

cifkao commented Nov 7, 2016

@jindrahelcl I agree, rebasing onto master would be difficult. But maybe it's better to cherry-pick ab0a0ba into bahdanau directly?

Apart from that, I found that the visualization is incorrect because attentions_in_time accumulates attention vectors from both training and runtime decoding (is that intended?). Given that I'm visualizing the validation data, I think I want to use only the second half of the list. But what's the right way to separate it from the rest?

@jindrahelcl
Copy link
Member

The data should be already in the bahdanau branch (commit 5d88de3)..

The graph contains the attention mechanism ops twice - one for training and the other one for runtime. During validation, the training ops are not executed, so there should not be a problem.

@jindrahelcl
Copy link
Member

@cifkao if you don't want to cherry-pick the commit (which is imho the best option), merge the bahdanau branch to this one. (Or rebase this on bahdanau.. But rebasing something that is already pushed to github should be avoided)

@cifkao
Copy link
Member Author

cifkao commented Nov 8, 2016

The problem is that the same Attention object is used for constructing both the training and runtime parts of the graph, and therefore attentions_in_time contains tensors from both parts. Because I'm using attentions_in_time for visualization, I was getting images like this:

tensorboard

This was, of course, easily fixed by taking only the last len(decoder_inputs) elements of the list (i.e. the ones that were just appended).

@jindrahelcl
Copy link
Member

Huh, this looks very much like some sort of bug. I'll look into it.

2016-11-08 8:30 GMT+00:00 Ondřej Cífka [email protected]:

The problem is that the same Attention object is used for constructing
both the training and runtime parts of the graph, and therefore
attentions_in_time contains tensors from both parts. Because I'm using
attentions_in_time for visualization, I was getting images like this:

[image: tensorboard]
https://cloud.githubusercontent.com/assets/8046580/20091403/dac7641a-a592-11e6-9777-ce513c1e0bb1.png

This was, of course, easily fixed by taking only the last
len(decoder_inputs) elements of the list.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#121 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABwcs1uiXk8b1HogwqbBU8OOoyH4Lq4Gks5q8DMwgaJpZM4KkZYF
.

@jlibovicky
Copy link
Contributor

It's definitely a bug, but luckily, it does not influence performance of most of the models, but it is probably a reason why the coverage model worked so poorly.

@jlibovicky jlibovicky merged commit bbca906 into ufal:bahdanau Nov 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants