OcuBot

The aim of this project is to develop an AI agent capable of having a conversation with users about images.

This project uses the VisDial 1.0 dataset

Modality Biases in some of the current Visual Dialog Models

In the experimentation phase, we observed that some Visual Dialog models tend to focus much more on the dialog history.
Thus,the answers generated for the current question of the user are not very relevant in all the cases.
In some cases, we observed that some models were more biased rowards prominent image features and thereby failed to answer questions about finer details like minor features in the image.

There have been approaches to reduce modality biases towards dialog history in the generated responses.
These approaches tend to reduce attention over dialog history.
Our project is aimed at reducing the dialog history bias without compromising the attention over dialog history.
Our approach provides more visual context to the model by generating dense object-level captions which describe the subtleties in the image.
We have used DenseCap proposed by Justin Johnson et al. to generate these dense captions.
By attending to these Dense Captions, our model generates answers that are more relevant to the current question because the model has additional visual context.
Also, the bias of existing Visual Dialog models towards prominent image-level features is also reduced by this approach as Dense Captions describe each entity in the image, thus improving the attention over Dense Captions enables the model to generate more relevant answers for questions about the subtleties and minor features present in the image.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
architecture		architecture
configs		configs
README.md		README.md
evaluate.py		evaluate.py
requirements.txt		requirements.txt
train.py		train.py