Skip to content

Latest commit

 

History

History
90 lines (82 loc) · 6.49 KB

vision-and-nlp.md

File metadata and controls

90 lines (82 loc) · 6.49 KB

Image and Language

Image Captioning

  • UCLA / Baidu [Paper]
    • Explain Images with Multimodal Recurrent Neural Networks, arXiv:1410.1090.
  • Toronto [Paper]
    • Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models, arXiv:1411.2539.
  • Berkeley [Paper]
    • Long-term Recurrent Convolutional Networks for Visual Recognition and Description, arXiv:1411.4389.
  • Google [Paper]
    • Show and Tell: A Neural Image Caption Generator, arXiv:1411.4555.
  • Stanford [Web] [Paper]
    • Deep Visual-Semantic Alignments for Generating Image Description, CVPR, 2015.
  • UML / UT [Paper]
    • Translating Videos to Natural Language Using Deep Recurrent Neural Networks, NAACL-HLT, 2015.
  • CMU / Microsoft [Paper-arXiv] [Paper-CVPR]
    • Learning a Recurrent Visual Representation for Image Caption Generation, arXiv:1411.5654.
    • Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation, CVPR 2015
  • Microsoft [Paper]
    • From Captions to Visual Concepts and Back, CVPR, 2015.
  • Univ. Montreal / Univ. Toronto [Web] [Paper]
    • Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention, arXiv:1502.03044 / ICML 2015
  • Idiap / EPFL / Facebook [Paper]
    • Phrase-based Image Captioning, arXiv:1502.03671 / ICML 2015
  • UCLA / Baidu [Paper]
    • Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images, arXiv:1504.06692
  • MS + Berkeley
    • Exploring Nearest Neighbor Approaches for Image Captioning, arXiv:1505.04467 [Paper]
    • Language Models for Image Captioning: The Quirks and What Works, arXiv:1505.01809 [Paper]
  • Adelaide [Paper]
    • Image Captioning with an Intermediate Attributes Layer, arXiv:1506.01144
  • Tilburg [Paper]
    • Learning language through pictures, arXiv:1506.03694
  • Univ. Montreal [Paper]
    • Describing Multimedia Content using Attention-based Encoder-Decoder Networks, arXiv:1507.01053
  • Cornell [Paper]
    • Image Representations and New Domains in Neural Image Captioning, arXiv:1508.02091
  • MS + City Univ. of HongKong [Paper]
    • "Learning Query and Image Similarities with Ranking Canonical Correlation Analysis", ICCV, 2015

Video Captioning

  • Berkeley [Web] [Paper]
    • Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR, 2015.
  • UT / UML / Berkeley [Paper]
    • Translating Videos to Natural Language Using Deep Recurrent Neural Networks, arXiv:1412.4729.
  • Microsoft [Paper]
    • Joint Modeling Embedding and Translation to Bridge Video and Language, arXiv:1505.01861.
  • UT / Berkeley / UML [Paper]
    • Sequence to Sequence--Video to Text, arXiv:1505.00487.
  • Univ. Montreal / Univ. Sherbrooke [Paper]
    • Describing Videos by Exploiting Temporal Structure, arXiv:1502.08029
  • MPI / Berkeley [Paper]
    • The Long-Short Story of Movie Description, arXiv:1506.01698
  • Univ. Toronto / MIT [Paper]
    • Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books, arXiv:1506.06724
  • Univ. Montreal [Paper]
    • Describing Multimedia Content using Attention-based Encoder-Decoder Networks, arXiv:1507.01053
  • TAU / USC [paper]
    • Temporal Tessellation for Video Annotation and Summarization, arXiv:1612.06950.

Question Answering

  • Virginia Tech / MSR [Web] [Paper]

    • VQA: Visual Question Answering, CVPR, 2015 SUNw:Scene Understanding workshop.
  • MPI / Berkeley [Web] [Paper]

    • Ask Your Neurons: A Neural-based Approach to Answering Questions about Images, arXiv:1505.01121.
  • Toronto [Paper] [Dataset]

    • Image Question Answering: A Visual Semantic Embedding Model and a New Dataset, arXiv:1505.02074 / ICML 2015 deep learning workshop.
  • Baidu / UCLA [Paper] [Dataset]

    • Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering, arXiv:1505.05612.
  • POSTECH [Paper] [Project Page]

    • Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction, arXiv:1511.05765
  • CMU / Microsoft Research [Paper]

    • Stacked Attention Networks for Image Question Answering. arXiv:1511.02274.
  • MetaMind [Paper]

    • "Dynamic Memory Networks for Visual and Textual Question Answering." arXiv:1603.01417 (2016).
  • SNU + NAVER [Paper]

    • Multimodal Residual Learning for Visual QA, arXiv:1606:01455
  • UC Berkeley + Sony [Paper]

    • Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding, arXiv:1606.01847
  • Postech [Paper]

    • Training Recurrent Answering Units with Joint Loss Minimization for VQA, arXiv:1606.03647
  • SNU + NAVER [Paper]

    • Hadamard Product for Low-rank Bilinear Pooling, arXiv:1610.04325.