Skip to content

Latest commit

 

History

History
376 lines (364 loc) · 64.9 KB

Papers-2020.md

File metadata and controls

376 lines (364 loc) · 64.9 KB

December 2020

  • The Pile: An 800GB Dataset of Diverse Text for Language Modeling - [Arxiv] [QA]
  • Directed Beam Search: Plug-and-Play Lexically Constrained Language Generation - [Arxiv] [QA]
  • Refine and Imitate: Reducing Repetition and Inconsistency in Persuasion Dialogues via Reinforcement Learning and Human Demonstration - [Arxiv] [QA]
  • A Simple Fine-tuning Is All You Need: Towards Robust Deep Learning Via Adversarial Fine-tuning - [Arxiv] [QA]
  • Evolution Is All You Need: Phylogenetic Augmentation for Contrastive Learning - [Arxiv] [QA]
  • ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language - [Arxiv] [QA]
  • Learning Dense Representations of Phrases at Scale - [Arxiv] [QA]
  • Towards Overcoming False Positives in Visual Relationship Detection - [Arxiv] [QA]
  • A Distributional Approach to Controlled Text Generation - [Arxiv] [QA]
  • OBoW: Online Bag-of-Visual-Words Generation for Self-Supervised Learning - [Arxiv] [QA]
  • Taming Transformers for High-Resolution Image Synthesis - [Arxiv] [QA]
  • Transformer Interpretability Beyond Attention Visualization - [Arxiv] [QA]
  • Neural Volume Rendering: NeRF And Beyond - [Arxiv] [QA]
  • Keyword-Guided Neural Conversational Model - [Arxiv] [QA]
  • CARE: Commonsense-Aware Emotional Response Generation with Latent Concepts - [Arxiv] [QA]
  • Understanding the Behaviour of Contrastive Loss - [Arxiv] [QA]
  • Image Inpainting Guided by Coherence Priors of Semantics and Textures - [Arxiv] [QA]
  • Contrastive Learning with Adversarial Perturbations for Conditional Text Generation - [Arxiv] [QA]
  • A Comprehensive Study of Deep Video Action Recognition - [Arxiv] [QA]
  • Differential Evolution for Neural Architecture Search - [Arxiv] [QA]
  • Few-Shot Segmentation Without Meta-Learning: A Good Transductive Inference Is All You Need? - [Arxiv] [QA]
  • Spatially Conditioned Graphs for Detecting Human-Object Interactions - [Arxiv] [QA]
  • Equivalent Causal Models - [Arxiv] [QA]
  • Explainable Link Prediction for Privacy-Preserving Contact Tracing - [Arxiv] [QA]
  • The Counterfactual NESS Definition of Causation - [Arxiv] [QA]
  • Distilling Knowledge from Reader to Retriever for Question Answering - [Arxiv] [QA]
  • Active Learning: Problem Settings and Recent Developments - [Arxiv] [QA]
  • Sheaf Neural Networks - [Arxiv] [QA]
  • Challenging common interpretability assumptions in feature attribution explanations - [Arxiv] [QA]
  • Practical No-box Adversarial Attacks against DNNs - [Arxiv] [QA]
  • Practical No-box Adversarial Attacks against DNNs - [Arxiv] [QA]
  • RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation - [Arxiv] [QA]
  • pixelNeRF: Neural Radiance Fields from One or Few Images - [Arxiv] [QA]
  • Learned Initializations for Optimizing Coordinate-Based Neural Representations - [Arxiv] [QA]
  • Neural Prototype Trees for Interpretable Fine-grained Image Recognition - [Arxiv] [QA]
  • Just Ask: Learning to Answer Questions from Millions of Narrated Videos - [Arxiv] [QA]
  • CPM: A Large-scale Generative Chinese Pre-trained Language Model - [Arxiv] [QA]

November 2020

  • Feature Learning in Infinite-Width Neural Networks - [Arxiv] [QA]
  • How Well Do Self-Supervised Models Transfer? - [Arxiv] [QA]
  • Can Temporal Information Help with Contrastive Self-Supervised Learning? - [Arxiv] [QA]
  • All You Need is a Good Functional Prior for Bayesian Deep Learning - [Arxiv] [QA]
  • DeRF: Decomposed Radiance Fields - [Arxiv] [QA]
  • GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields - [Arxiv] [QA]
  • Hierarchically Decoupled Spatial-Temporal Contrast for Self-supervised Video Representation Learning - [Arxiv] [QA]
  • ROME: Robustifying Memory-Efficient NAS via Topology Disentanglement and Gradient Accumulation - [Arxiv] [QA]
  • Exploring Simple Siamese Representation Learning - [Arxiv] [QA]
  • A Reputation Mechanism Is All You Need: Collaborative Fairness and Adversarial Robustness in Federated Learning - [Arxiv] [QA]
  • Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning - [Arxiv] [QA]
  • MixMix: All You Need for Data-Free Compression Are Feature and Data Mixing - [Arxiv] [QA]
  • Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge? - [Arxiv] [QA]
  • Contextual Fusion For Adversarial Robustness - [Arxiv] [QA]
  • Contextual Fusion For Adversarial Robustness - [Arxiv] [QA]
  • Functorial Manifold Learning - [Arxiv] [QA]
  • Unsupervised Video Representation Learning by Bidirectional Feature Prediction - [Arxiv] [QA]
  • Multimodal Pretraining for Dense Video Captioning - [Arxiv] [QA]
  • Topological properties of basins of attraction and expressiveness of width bounded neural networks - [Arxiv] [QA]
  • A Broad Dataset is All You Need for One-Shot Object Detection - [Arxiv] [QA]
  • Long Range Arena: A Benchmark for Efficient Transformers - [Arxiv] [QA]
  • Feature Removal Is a Unifying Principle for Model Explanation Methods - [Arxiv] [QA]
  • Language Model is All You Need: Natural Language Understanding as Question Answering - [Arxiv] [QA]
  • This Looks Like That, Because ... Explaining Prototypes for Interpretable Image Recognition - [Arxiv] [QA]
  • Fast Biconnectivity Restoration in Multi-Robot Systems for Robust Communication Maintenance - [Arxiv] [QA]
  • Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies - [Arxiv] [QA]

October 2020

  • A Survey on Contrastive Self-supervised Learning - [Arxiv] [QA]
  • HOI Analysis: Integrating and Decomposing Human-Object Interaction - [Arxiv] [QA]
  • Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning - [Arxiv] [QA]
  • Learning to Actively Learn: A Robust Approach - [Arxiv] [QA]
  • Learning to Actively Learn: A Robust Approach - [Arxiv] [QA]
  • Class-incremental learning: survey and performance evaluation on image classification - [Arxiv] [QA]
  • Cycle-Contrast for Self-Supervised Video Representation Learning - [Arxiv] [QA]
  • How Does the Task Landscape Affect MAML Performance? - [Arxiv] [QA]
  • How Does the Task Landscape Affect MAML Performance? - [Arxiv] [QA]
  • One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL - [Arxiv] [QA]
  • RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning - [Arxiv] [QA]
  • Interpretation of NLP models through input marginalization - [Arxiv] [QA]
  • Attention is All You Need in Speech Separation - [Arxiv] [QA]
  • Model Interpretability through the Lens of Computational Complexity - [Arxiv] [QA]
  • Towards falsifiable interpretability research - [Arxiv] [QA]
  • The Turking Test: Can Language Models Understand Instructions? - [Arxiv] [QA]
  • An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale - [Arxiv] [QA]
  • Transcription Is All You Need: Learning to Separate Musical Mixtures with Score as Supervision - [Arxiv] [QA]
  • MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation - [Arxiv] [QA]
  • Distilling Dense Representations for Ranking using Tightly-Coupled Teachers - [Arxiv] [QA]
  • Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A Review - [Arxiv] [QA]
  • CR-Walker: Tree-Structured Graph Reasoning and Dialog Acts for Conversational Recommendation - [Arxiv] [QA]
  • PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval - [Arxiv] [QA]
  • Improving Dialog Systems for Negotiation with Personality Modeling - [Arxiv] [QA]
  • Self-supervised Co-training for Video Representation Learning - [Arxiv] [QA]
  • Solving relaxations of MAP-MRF problems: Combinatorial in-face Frank-Wolfe directions - [Arxiv] [QA]
  • For self-supervised learning, Rationality implies generalization, provably - [Arxiv] [QA]
  • RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering - [Arxiv] [QA]
  • What is More Likely to Happen Next? Video-and-Language Future Event Prediction - [Arxiv] [QA]
  • NeRF++: Analyzing and Improving Neural Radiance Fields - [Arxiv] [QA]
  • Representable Markov Categories and Comparison of Statistical Experiments in Categorical Probability - [Arxiv] [QA]
  • Pretrained Transformers for Text Ranking: BERT and Beyond - [Arxiv] [QA]
  • HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis - [Arxiv] [QA]
  • Fairness-aware Agnostic Federated Learning - [Arxiv] [QA]
  • Fairness-aware Agnostic Federated Learning - [Arxiv] [QA]
  • Automated Concatenation of Embeddings for Structured Prediction - [Arxiv] [QA]
  • GRF: Learning a General Radiance Field for 3D Representation and Rendering - [Arxiv] [QA]
  • A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks - [Arxiv] [QA]
  • Automatic Backward Filtering Forward Guiding for Markov processes and graphical models - [Arxiv] [QA]
  • Unsupervised Representation Learning by InvariancePropagation - [Arxiv] [QA]
  • Like hiking? You probably enjoy nature: Persona-grounded Dialog with Commonsense Expansions - [Arxiv] [QA]
  • Beyond [CLS] through Ranking by Generation - [Arxiv] [QA]
  • A Transformer-based Framework for Multivariate Time Series Representation Learning - [Arxiv] [QA]
  • Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation - [Arxiv] [QA]
  • MIME: MIMicking Emotions for Empathetic Response Generation - [Arxiv] [QA]
  • Sharpness-Aware Minimization for Efficiently Improving Generalization - [Arxiv] [QA]
  • DecAug: Augmenting HOI Detection via Decomposition - [Arxiv] [QA]
  • DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection - [Arxiv] [QA]
  • All You Need Is CONSTRUCT - [Arxiv] [QA]
  • SparTerm: Learning Term-based Sparse Representation for Fast Text Retrieval - [Arxiv] [QA]
  • Understanding Self-supervised Learning with Dual Deep Networks - [Arxiv] [QA]

September 2020

  • Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval - [Arxiv] [QA]
  • Learning to Plan and Realize Separately for Open-Ended Dialogue Systems - [Arxiv] [QA]
  • From Pixel to Patch: Synthesize Context-aware Features for Zero-shot Semantic Segmentation - [Arxiv] [QA]
  • Learned Low Precision Graph Neural Networks - [Arxiv] [QA]
  • Generation-Augmented Retrieval for Open-domain Question Answering - [Arxiv] [QA]
  • SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning - [Arxiv] [QA]
  • Simplified TinyBERT: Knowledge Distillation for Document Retrieval - [Arxiv] [QA]
  • BERT-QE: Contextualized Query Expansion for Document Re-ranking - [Arxiv] [QA]
  • Efficient Transformers: A Survey - [Arxiv] [QA]
  • Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion - [Arxiv] [QA]
  • Understanding the Role of Individual Units in a Deep Neural Network - [Arxiv] [QA]
  • KNN-DBSCAN: a DBSCAN in high dimensions - [Arxiv] [QA]
  • Generative Language Modeling for Automated Theorem Proving - [Arxiv] [QA]
  • Measuring Massive Multitask Language Understanding - [Arxiv] [QA]
  • Sensors, Safety Models and A System-Level Approach to Safe and Scalable Automated Vehicles - [Arxiv] [QA]
  • Sample-Efficient Automated Deep Reinforcement Learning - [Arxiv] [QA]
  • Sample-Efficient Automated Deep Reinforcement Learning - [Arxiv] [QA]
  • Learning to summarize from human feedback - [Arxiv] [QA]
  • WaveGrad: Estimating Gradients for Waveform Generation - [Arxiv] [QA]
  • Zero-Shot Human-Object Interaction Recognition via Affordance Graphs - [Arxiv] [QA]
  • Neural Architecture Search For Keyword Spotting - [Arxiv] [QA]

August 2020

  • Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics - [Arxiv] [QA]
  • A Survey of Deep Active Learning - [Arxiv] [QA]
  • Against Membership Inference Attack: Pruning is All You Need - [Arxiv] [QA]
  • A Survey of Evaluation Metrics Used for NLG Systems - [Arxiv] [QA]
  • Automated Search for Resource-Efficient Branched Multi-Task Networks - [Arxiv] [QA]
  • Contrastive learning, multi-view redundancy, and linear models - [Arxiv] [QA]
  • A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild - [Arxiv] [QA]
  • PARADE: Passage Representation Aggregation for Document Reranking - [Arxiv] [QA]
  • Monocular Expressive Body Regression through Body-Driven Attention - [Arxiv] [QA]
  • Automated Machine Learning -- a brief review at the end of the early years - [Arxiv] [QA]
  • HiPPO: Recurrent Memory with Optimal Polynomial Projections - [Arxiv] [QA]
  • A Survey of Active Learning for Text Classification using Deep Neural Networks - [Arxiv] [QA]
  • Context-aware Feature Generation for Zero-shot Semantic Segmentation - [Arxiv] [QA]
  • ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection - [Arxiv] [QA]
  • Adaptive Learning of Tensor Network Structures - [Arxiv] [QA]
  • Adaptive Learning of Tensor Network Structures - [Arxiv] [QA]
  • SpeedySpeech: Efficient Neural Speech Synthesis - [Arxiv] [QA]
  • Spatiotemporal Contrastive Video Representation Learning - [Arxiv] [QA]
  • A Boundary Based Out-of-Distribution Classifier for Generalized Zero-Shot Learning - [Arxiv] [QA]
  • Polysemy Deciphering Network for Robust Human-Object Interaction Detection - [Arxiv] [QA]
  • Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework - [Arxiv] [QA]
  • Pose-based Modular Network for Human-Object Interaction Detection - [Arxiv] [QA]
  • Predicting What You Already Know Helps: Provable Self-Supervised Learning - [Arxiv] [QA]
  • Explainable Face Recognition - [Arxiv] [QA]

July 2020

  • Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases - [Arxiv] [QA]
  • Self-supervised Learning for Large-scale Item Recommendations - [Arxiv] [QA]
  • Visual Compositional Learning for Human-Object Interaction Detection - [Arxiv] [QA]
  • Self-Supervised Learning Across Domains - [Arxiv] [QA]
  • Understanding BERT Rankers Under Distillation - [Arxiv] [QA]
  • Video Representation Learning by Recognizing Temporal Transformations - [Arxiv] [QA]
  • Learning Joint Spatial-Temporal Transformations for Video Inpainting - [Arxiv] [QA]
  • Mixture Representation Learning with Coupled Autoencoders - [Arxiv] [QA]
  • Mixture Representation Learning with Coupled Autoencoders - [Arxiv] [QA]
  • Leveraging Seen and Unseen Semantic Relationships for Generative Zero-Shot Learning - [Arxiv] [QA]
  • Towards Deeper Graph Neural Networks - [Arxiv] [QA]
  • Towards Deeper Graph Neural Networks - [Arxiv] [QA]
  • DVI: Depth Guided Video Inpainting for Autonomous Driving - [Arxiv] [QA]
  • Detecting Human-Object Interactions with Action Co-occurrence Priors - [Arxiv] [QA]
  • Hopfield Networks is All You Need - [Arxiv] [QA]
  • Natural Graph Networks - [Arxiv] [QA]
  • Few-shot Scene-adaptive Anomaly Detection - [Arxiv] [QA]
  • Few-shot Scene-adaptive Anomaly Detection - [Arxiv] [QA]
  • Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations - [Arxiv] [QA]
  • A Graph-based Interactive Reasoning for Human-Object Interaction Detection - [Arxiv] [QA]
  • TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech - [Arxiv] [QA]
  • Accuracy Prediction with Non-neural Model for Neural Architecture Search - [Arxiv] [QA]
  • GOLD-NAS: Gradual, One-Level, Differentiable - [Arxiv] [QA]
  • GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis - [Arxiv] [QA]
  • Confidence-Aware Learning for Deep Neural Networks - [Arxiv] [QA]
  • The Fyodorov-Hiary-Keating Conjecture. I - [Arxiv] [QA]
  • Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval - [Arxiv] [QA]
  • Interactive Path Reasoning on Graph for Conversational Recommendation - [Arxiv] [QA]

June 2020

  • Data Movement Is All You Need: A Case Study on Optimizing Transformers - [Arxiv] [QA]
  • ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph - [Arxiv] [QA]
  • PLATO-2: Towards Building an Open-Domain Chatbot via Curriculum Learning - [Arxiv] [QA]
  • Lipschitzness Is All You Need To Tame Off-policy Generative Adversarial Imitation Learning - [Arxiv] [QA]
  • RepBERT: Contextualized Text Embeddings for First-Stage Retrieval - [Arxiv] [QA]
  • Video Representation Learning with Visual Tempo Consistency - [Arxiv] [QA]
  • GPT-GNN: Generative Pre-Training of Graph Neural Networks - [Arxiv] [QA]
  • Space-Time Correspondence as a Contrastive Random Walk - [Arxiv] [QA]
  • Practical applications of metric space magnitude and weighting vectors - [Arxiv] [QA]
  • Generative causal explanations of black-box classifiers - [Arxiv] [QA]
  • Gaining Insight into SARS-CoV-2 Infection and COVID-19 Severity Using Self-supervised Edge Features and Graph Neural Networks - [Arxiv] [QA]
  • A Constructive, Type-Theoretic Approach to Regression via Global Optimisation - [Arxiv] [QA]
  • Unsupervised Evaluation of Interactive Dialog with DialoGPT - [Arxiv] [QA]
  • Efficient Hyperparameter Optimization in Deep Learning Using a Variable Length Genetic Algorithm - [Arxiv] [QA]
  • Logarithmic Pruning is All You Need - [Arxiv] [QA]
  • Towards Understanding Label Smoothing - [Arxiv] [QA]
  • Towards Understanding Label Smoothing - [Arxiv] [QA]
  • wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations - [Arxiv] [QA]
  • Self-Supervised Prototypical Transfer Learning for Few-Shot Classification - [Arxiv] [QA]
  • Denoising Diffusion Probabilistic Models - [Arxiv] [QA]
  • Neural Parameter Allocation Search - [Arxiv] [QA]
  • Neural Parameter Allocation Search - [Arxiv] [QA]
  • Contrastive learning of global and local features for medical image segmentation with limited annotations - [Arxiv] [QA]
  • Stochastic Bandits with Linear Constraints - [Arxiv] [QA]
  • Self-supervised Learning on Graphs: Deep Insights and New Direction - [Arxiv] [QA]
  • Big Self-Supervised Models are Strong Semi-Supervised Learners - [Arxiv] [QA]
  • GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training - [Arxiv] [QA]
  • Unsupervised Learning of Visual Features by Contrasting Cluster Assignments - [Arxiv] [QA]
  • Cross-lingual Retrieval for Iterative Self-Supervised Training - [Arxiv] [QA]
  • When Does Self-Supervision Help Graph Convolutional Networks? - [Arxiv] [QA]
  • Augmented Sliced Wasserstein Distances - [Arxiv] [QA]
  • Augmented Sliced Wasserstein Distances - [Arxiv] [QA]
  • Self-supervised Learning: Generative or Contrastive - [Arxiv] [QA]
  • DeeperGCN: All You Need to Train Deeper GCNs - [Arxiv] [QA]
  • DeeperGCN: All You Need to Train Deeper GCNs - [Arxiv] [QA]
  • IsarStep: a Benchmark for High-level Mathematical Reasoning - [Arxiv] [QA]
  • Interpretable Neural Architecture Search via Bayesian Optimisation with Weisfeiler-Lehman Kernels - [Arxiv] [QA]
  • Rethinking the Value of Labels for Improving Class-Imbalanced Learning - [Arxiv] [QA]
  • Self-Supervised Relational Reasoning for Representation Learning - [Arxiv] [QA]
  • Diagnosing Rarity in Human-Object Interaction Detection - [Arxiv] [QA]
  • Contrastive Multi-View Representation Learning on Graphs - [Arxiv] [QA]
  • Self-supervised Learning from a Multi-view Perspective - [Arxiv] [QA]
  • FastSpeech 2: Fast and High-Quality End-to-End Text to Speech - [Arxiv] [QA]
  • Differentiable Neural Input Search for Recommender Systems - [Arxiv] [QA]
  • CoCon: A Self-Supervised Approach for Controlled Text Generation - [Arxiv] [QA]
  • M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training - [Arxiv] [QA]
  • Situated and Interactive Multimodal Conversations - [Arxiv] [QA]

May 2020

  • Bayesian Updates Compose Optically - [Arxiv] [QA]
  • Explainable Artificial Intelligence: a Systematic Review - [Arxiv] [QA]
  • Language Models are Few-Shot Learners - [Arxiv] [QA]
  • SCAN: Learning to Classify Images without Labels - [Arxiv] [QA]
  • High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling - [Arxiv] [QA]
  • Novel Human-Object Interaction Detection via Adversarial Domain Generalization - [Arxiv] [QA]
  • Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks - [Arxiv] [QA]
  • Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search - [Arxiv] [QA]
  • Novel Policy Seeking with Constrained Optimization - [Arxiv] [QA]
  • Novel Policy Seeking with Constrained Optimization - [Arxiv] [QA]
  • Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation - [Arxiv] [QA]
  • Mirror Descent Policy Optimization - [Arxiv] [QA]
  • Mirror Descent Policy Optimization - [Arxiv] [QA]
  • Normalized Attention Without Probability Cage - [Arxiv] [QA]
  • Normalized Attention Without Probability Cage - [Arxiv] [QA]
  • Vector-Quantized Autoregressive Predictive Coding - [Arxiv] [QA]
  • Semantic Photo Manipulation with a Generative Image Prior - [Arxiv] [QA]
  • Is Your Goal-Oriented Dialog Model Performing Really Well? Empirical Analysis of System-wise Evaluation - [Arxiv] [QA]
  • Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech - [Arxiv] [QA]
  • Local Self-Attention over Long Text for Efficient Document Retrieval - [Arxiv] [QA]
  • Categorical Stochastic Processes and Likelihood - [Arxiv] [QA]
  • Condensed Movies: Story Based Retrieval with Contextual Embeddings - [Arxiv] [QA]
  • DramaQA: Character-Centered Video Story Understanding with Hierarchical QA - [Arxiv] [QA]
  • The Cascade Transformer: an Application for Efficient Answer Sentence Selection - [Arxiv] [QA]
  • Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? - [Arxiv] [QA]
  • Does Visual Self-Supervision Improve Learning of Speech Representations for Emotion Recognition? - [Arxiv] [QA]
  • Learning an Unreferenced Metric for Online Dialogue Evaluation - [Arxiv] [QA]
  • POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training - [Arxiv] [QA]
  • HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training - [Arxiv] [QA]
  • Sparse, Dense, and Attentional Representations for Text Retrieval - [Arxiv] [QA]

April 2020

  • Consistent Video Depth Estimation - [Arxiv] [QA]
  • Training Curricula for Open Domain Answer Re-Ranking - [Arxiv] [QA]
  • Efficient Document Re-Ranking for Transformers by Precomputing Term Representations - [Arxiv] [QA]
  • Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning - [Arxiv] [QA]
  • Complementing Lexical Retrieval with Semantic Residual Embedding - [Arxiv] [QA]
  • Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels - [Arxiv] [QA]
  • Recipes for building an open-domain chatbot - [Arxiv] [QA]
  • Modularized Transfomer-based Ranking Framework - [Arxiv] [QA]
  • ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT - [Arxiv] [QA]
  • All you need is a second look: Towards Tighter Arbitrary shape text detection - [Arxiv] [QA]
  • Multi-Domain Dialogue Acts and Response Co-Generation - [Arxiv] [QA]
  • Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matching - [Arxiv] [QA]
  • A survey on domain adaptation theory: learning bounds and theoretical guarantees - [Arxiv] [QA]
  • Learning Term Discrimination - [Arxiv] [QA]
  • Supervised Contrastive Learning - [Arxiv] [QA]
  • Federated Stochastic Gradient Langevin Dynamics - [Arxiv] [QA]
  • Federated Stochastic Gradient Langevin Dynamics - [Arxiv] [QA]
  • Distilling Knowledge for Fast Retrieval-based Chat-bots - [Arxiv] [QA]
  • Considering Likelihood in NLP Classification Explanations with Occlusion and Language Modeling - [Arxiv] [QA]
  • Detailed 2D-3D Joint Representation for Human-Object Interaction - [Arxiv] [QA]
  • Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks - [Arxiv] [QA]
  • Will I Sound Like Me? Improving Persona Consistency in Dialogues through Pragmatic Self-Consciousness - [Arxiv] [QA]
  • Spatially-Attentive Patch-Hierarchical Network for Adaptive Motion Deblurring - [Arxiv] [QA]
  • Dense Passage Retrieval for Open-Domain Question Answering - [Arxiv] [QA]
  • TextGAIL: Generative Adversarial Imitation Learning for Text Generation - [Arxiv] [QA]
  • There and Back Again: Revisiting Backpropagation Saliency Methods - [Arxiv] [QA]
  • PaStaNet: Toward Human Activity Knowledge Engine - [Arxiv] [QA]
  • A Survey on Conversational Recommender Systems - [Arxiv] [QA]

March 2020

  • How Useful is Self-Supervised Pretraining for Visual Tasks? - [Arxiv] [QA]
  • Learning Human-Object Interaction Detection using Interaction Points - [Arxiv] [QA]
  • InterBERT: Vision-and-Language Interaction for Multi-modal Pretraining - [Arxiv] [QA]
  • VIOLIN: A Large-Scale Dataset for Video-and-Language Inference - [Arxiv] [QA]
  • Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need? - [Arxiv] [QA]
  • Deformable Style Transfer - [Arxiv] [QA]
  • Distributional Reinforcement Learning with Ensembles - [Arxiv] [QA]
  • Distributional Reinforcement Learning with Ensembles - [Arxiv] [QA]
  • Model-based Asynchronous Hyperparameter and Neural Architecture Search - [Arxiv] [QA]
  • Pre-trained Models for Natural Language Processing: A Survey - [Arxiv] [QA]
  • Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification - [Arxiv] [QA]
  • XPersona: Evaluating Multilingual Personalized Chatbot - [Arxiv] [QA]
  • Guidance and Evaluation: Semantic-Aware Image Inpainting for Mixed Scenes - [Arxiv] [QA]
  • VCNet: A Robust Approach to Blind Image Inpainting - [Arxiv] [QA]
  • Document Ranking with a Pretrained Sequence-to-Sequence Model - [Arxiv] [QA]
  • VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions - [Arxiv] [QA]
  • Building and Interpreting Deep Similarity Models - [Arxiv] [QA]
  • xCos: An Explainable Cosine Metric for Face Verification Task - [Arxiv] [QA]
  • Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning - [Arxiv] [QA]
  • ReZero is All You Need: Fast Convergence at Large Depth - [Arxiv] [QA]
  • Improved Baselines with Momentum Contrastive Learning - [Arxiv] [QA]
  • How to Train Your Super-Net: An Analysis of Training Heuristics in Weight-Sharing NAS - [Arxiv] [QA]
  • Cascaded Human-Object Interaction Recognition - [Arxiv] [QA]
  • A Safety Framework for Critical Systems Utilising Deep Neural Networks - [Arxiv] [QA]
  • De Finetti's construction as a categorical limit - [Arxiv] [QA]
  • AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment - [Arxiv] [QA]
  • XGPT: Cross-modal Generative Pre-Training for Image Captioning - [Arxiv] [QA]
  • Benchmarking Graph Neural Networks - [Arxiv] [QA]
  • Benchmarking Graph Neural Networks - [Arxiv] [QA]

February 2020

  • DC-BERT: Decoupling Question and Document for Efficient Contextual Encoding - [Arxiv] [QA]
  • Estimation-Action-Reflection: Towards Deep Interaction Between Conversational and Recommender Systems - [Arxiv] [QA]
  • Automatic Shortcut Removal for Self-Supervised Representation Learning - [Arxiv] [QA]
  • Disentangled Speech Embeddings using Cross-modal Self-supervision - [Arxiv] [QA]
  • Gradient Boosting Neural Networks: GrowNet - [Arxiv] [QA]
  • Gradient Boosting Neural Networks: GrowNet - [Arxiv] [QA]
  • Information Condensing Active Learning - [Arxiv] [QA]
  • Information Condensing Active Learning - [Arxiv] [QA]
  • UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation - [Arxiv] [QA]
  • A Simple Framework for Contrastive Learning of Visual Representations - [Arxiv] [QA]
  • REALM: Retrieval-Augmented Language Model Pre-Training - [Arxiv] [QA]
  • Pre-training Tasks for Embedding-based Large-scale Retrieval - [Arxiv] [QA]
  • Unsupervised pretraining transfers well across languages - [Arxiv] [QA]
  • Dynamic Knowledge Routing Network For Target-Guided Open-Domain Conversation - [Arxiv] [QA]
  • Proving the Lottery Ticket Hypothesis: Pruning is All You Need - [Arxiv] [QA]

January 2020

  • Learning Robust and Multilingual Speech Representations - [Arxiv] [QA]
  • Selective Weak Supervision for Neural Information Retrieval - [Arxiv] [QA]
  • Multi-task self-supervised learning for Robust Speech Recognition - [Arxiv] [QA]
  • TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval - [Arxiv] [QA]
  • Scaling Laws for Neural Language Models - [Arxiv] [QA]
  • Safety Concerns and Mitigation Approaches Regarding the Use of Deep Learning in Safety-Critical Perception Tasks - [Arxiv] [QA]
  • Discriminator Soft Actor Critic without Extrinsic Rewards - [Arxiv] [QA]
  • Latency-Aware Differentiable Neural Architecture Search - [Arxiv] [QA]
  • MixPath: A Unified Approach for One-shot Neural Architecture Search - [Arxiv] [QA]
  • A Categorical Framework for Learning Generalised Tree Automata - [Arxiv] [QA]
  • Classifying All Interacting Pairs in a Single Shot - [Arxiv] [QA]
  • Visually Guided Self Supervised Learning of Speech Representations - [Arxiv] [QA]
  • ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training - [Arxiv] [QA]
  • Visual-Semantic Graph Attention Networks for Human-Object Interaction Detection - [Arxiv] [QA]
  • Correctness of Automatic Differentiation via Diffeologies and Categorical Gluing - [Arxiv] [QA]
  • Deeper Insights into Weight Sharing in Neural Architecture Search - [Arxiv] [QA]