- The Pile: An 800GB Dataset of Diverse Text for Language Modeling - [Arxiv] [QA]
- Directed Beam Search: Plug-and-Play Lexically Constrained Language Generation - [Arxiv] [QA]
- Refine and Imitate: Reducing Repetition and Inconsistency in Persuasion Dialogues via Reinforcement Learning and Human Demonstration - [Arxiv] [QA]
- A Simple Fine-tuning Is All You Need: Towards Robust Deep Learning Via Adversarial Fine-tuning - [Arxiv] [QA]
- Evolution Is All You Need: Phylogenetic Augmentation for Contrastive Learning - [Arxiv] [QA]
- ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language - [Arxiv] [QA]
- Learning Dense Representations of Phrases at Scale - [Arxiv] [QA]
- Towards Overcoming False Positives in Visual Relationship Detection - [Arxiv] [QA]
- A Distributional Approach to Controlled Text Generation - [Arxiv] [QA]
- OBoW: Online Bag-of-Visual-Words Generation for Self-Supervised Learning - [Arxiv] [QA]
- Taming Transformers for High-Resolution Image Synthesis - [Arxiv] [QA]
- Transformer Interpretability Beyond Attention Visualization - [Arxiv] [QA]
- Neural Volume Rendering: NeRF And Beyond - [Arxiv] [QA]
- Keyword-Guided Neural Conversational Model - [Arxiv] [QA]
- CARE: Commonsense-Aware Emotional Response Generation with Latent Concepts - [Arxiv] [QA]
- Understanding the Behaviour of Contrastive Loss - [Arxiv] [QA]
- Image Inpainting Guided by Coherence Priors of Semantics and Textures - [Arxiv] [QA]
- Contrastive Learning with Adversarial Perturbations for Conditional Text Generation - [Arxiv] [QA]
- A Comprehensive Study of Deep Video Action Recognition - [Arxiv] [QA]
- Differential Evolution for Neural Architecture Search - [Arxiv] [QA]
- Few-Shot Segmentation Without Meta-Learning: A Good Transductive Inference Is All You Need? - [Arxiv] [QA]
- Spatially Conditioned Graphs for Detecting Human-Object Interactions - [Arxiv] [QA]
- Equivalent Causal Models - [Arxiv] [QA]
- Explainable Link Prediction for Privacy-Preserving Contact Tracing - [Arxiv] [QA]
- The Counterfactual NESS Definition of Causation - [Arxiv] [QA]
- Distilling Knowledge from Reader to Retriever for Question Answering - [Arxiv] [QA]
- Active Learning: Problem Settings and Recent Developments - [Arxiv] [QA]
- Sheaf Neural Networks - [Arxiv] [QA]
- Challenging common interpretability assumptions in feature attribution explanations - [Arxiv] [QA]
- Practical No-box Adversarial Attacks against DNNs - [Arxiv] [QA]
- Practical No-box Adversarial Attacks against DNNs - [Arxiv] [QA]
- RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation - [Arxiv] [QA]
- pixelNeRF: Neural Radiance Fields from One or Few Images - [Arxiv] [QA]
- Learned Initializations for Optimizing Coordinate-Based Neural Representations - [Arxiv] [QA]
- Neural Prototype Trees for Interpretable Fine-grained Image Recognition - [Arxiv] [QA]
- Just Ask: Learning to Answer Questions from Millions of Narrated Videos - [Arxiv] [QA]
- CPM: A Large-scale Generative Chinese Pre-trained Language Model - [Arxiv] [QA]
- Feature Learning in Infinite-Width Neural Networks - [Arxiv] [QA]
- How Well Do Self-Supervised Models Transfer? - [Arxiv] [QA]
- Can Temporal Information Help with Contrastive Self-Supervised Learning? - [Arxiv] [QA]
- All You Need is a Good Functional Prior for Bayesian Deep Learning - [Arxiv] [QA]
- DeRF: Decomposed Radiance Fields - [Arxiv] [QA]
- GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields - [Arxiv] [QA]
- Hierarchically Decoupled Spatial-Temporal Contrast for Self-supervised Video Representation Learning - [Arxiv] [QA]
- ROME: Robustifying Memory-Efficient NAS via Topology Disentanglement and Gradient Accumulation - [Arxiv] [QA]
- Exploring Simple Siamese Representation Learning - [Arxiv] [QA]
- A Reputation Mechanism Is All You Need: Collaborative Fairness and Adversarial Robustness in Federated Learning - [Arxiv] [QA]
- Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning - [Arxiv] [QA]
- MixMix: All You Need for Data-Free Compression Are Feature and Data Mixing - [Arxiv] [QA]
- Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge? - [Arxiv] [QA]
- Contextual Fusion For Adversarial Robustness - [Arxiv] [QA]
- Contextual Fusion For Adversarial Robustness - [Arxiv] [QA]
- Functorial Manifold Learning - [Arxiv] [QA]
- Unsupervised Video Representation Learning by Bidirectional Feature Prediction - [Arxiv] [QA]
- Multimodal Pretraining for Dense Video Captioning - [Arxiv] [QA]
- Topological properties of basins of attraction and expressiveness of width bounded neural networks - [Arxiv] [QA]
- A Broad Dataset is All You Need for One-Shot Object Detection - [Arxiv] [QA]
- Long Range Arena: A Benchmark for Efficient Transformers - [Arxiv] [QA]
- Feature Removal Is a Unifying Principle for Model Explanation Methods - [Arxiv] [QA]
- Language Model is All You Need: Natural Language Understanding as Question Answering - [Arxiv] [QA]
- This Looks Like That, Because ... Explaining Prototypes for Interpretable Image Recognition - [Arxiv] [QA]
- Fast Biconnectivity Restoration in Multi-Robot Systems for Robust Communication Maintenance - [Arxiv] [QA]
- Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies - [Arxiv] [QA]
- A Survey on Contrastive Self-supervised Learning - [Arxiv] [QA]
- HOI Analysis: Integrating and Decomposing Human-Object Interaction - [Arxiv] [QA]
- Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning - [Arxiv] [QA]
- Learning to Actively Learn: A Robust Approach - [Arxiv] [QA]
- Learning to Actively Learn: A Robust Approach - [Arxiv] [QA]
- Class-incremental learning: survey and performance evaluation on image classification - [Arxiv] [QA]
- Cycle-Contrast for Self-Supervised Video Representation Learning - [Arxiv] [QA]
- How Does the Task Landscape Affect MAML Performance? - [Arxiv] [QA]
- How Does the Task Landscape Affect MAML Performance? - [Arxiv] [QA]
- One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL - [Arxiv] [QA]
- RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning - [Arxiv] [QA]
- Interpretation of NLP models through input marginalization - [Arxiv] [QA]
- Attention is All You Need in Speech Separation - [Arxiv] [QA]
- Model Interpretability through the Lens of Computational Complexity - [Arxiv] [QA]
- Towards falsifiable interpretability research - [Arxiv] [QA]
- The Turking Test: Can Language Models Understand Instructions? - [Arxiv] [QA]
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale - [Arxiv] [QA]
- Transcription Is All You Need: Learning to Separate Musical Mixtures with Score as Supervision - [Arxiv] [QA]
- MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation - [Arxiv] [QA]
- Distilling Dense Representations for Ranking using Tightly-Coupled Teachers - [Arxiv] [QA]
- Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A Review - [Arxiv] [QA]
- CR-Walker: Tree-Structured Graph Reasoning and Dialog Acts for Conversational Recommendation - [Arxiv] [QA]
- PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval - [Arxiv] [QA]
- Improving Dialog Systems for Negotiation with Personality Modeling - [Arxiv] [QA]
- Self-supervised Co-training for Video Representation Learning - [Arxiv] [QA]
- Solving relaxations of MAP-MRF problems: Combinatorial in-face Frank-Wolfe directions - [Arxiv] [QA]
- For self-supervised learning, Rationality implies generalization, provably - [Arxiv] [QA]
- RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering - [Arxiv] [QA]
- What is More Likely to Happen Next? Video-and-Language Future Event Prediction - [Arxiv] [QA]
- NeRF++: Analyzing and Improving Neural Radiance Fields - [Arxiv] [QA]
- Representable Markov Categories and Comparison of Statistical Experiments in Categorical Probability - [Arxiv] [QA]
- Pretrained Transformers for Text Ranking: BERT and Beyond - [Arxiv] [QA]
- HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis - [Arxiv] [QA]
- Fairness-aware Agnostic Federated Learning - [Arxiv] [QA]
- Fairness-aware Agnostic Federated Learning - [Arxiv] [QA]
- Automated Concatenation of Embeddings for Structured Prediction - [Arxiv] [QA]
- GRF: Learning a General Radiance Field for 3D Representation and Rendering - [Arxiv] [QA]
- A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks - [Arxiv] [QA]
- Automatic Backward Filtering Forward Guiding for Markov processes and graphical models - [Arxiv] [QA]
- Unsupervised Representation Learning by InvariancePropagation - [Arxiv] [QA]
- Like hiking? You probably enjoy nature: Persona-grounded Dialog with Commonsense Expansions - [Arxiv] [QA]
- Beyond [CLS] through Ranking by Generation - [Arxiv] [QA]
- A Transformer-based Framework for Multivariate Time Series Representation Learning - [Arxiv] [QA]
- Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation - [Arxiv] [QA]
- MIME: MIMicking Emotions for Empathetic Response Generation - [Arxiv] [QA]
- Sharpness-Aware Minimization for Efficiently Improving Generalization - [Arxiv] [QA]
- DecAug: Augmenting HOI Detection via Decomposition - [Arxiv] [QA]
- DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection - [Arxiv] [QA]
- All You Need Is CONSTRUCT - [Arxiv] [QA]
- SparTerm: Learning Term-based Sparse Representation for Fast Text Retrieval - [Arxiv] [QA]
- Understanding Self-supervised Learning with Dual Deep Networks - [Arxiv] [QA]
- Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval - [Arxiv] [QA]
- Learning to Plan and Realize Separately for Open-Ended Dialogue Systems - [Arxiv] [QA]
- From Pixel to Patch: Synthesize Context-aware Features for Zero-shot Semantic Segmentation - [Arxiv] [QA]
- Learned Low Precision Graph Neural Networks - [Arxiv] [QA]
- Generation-Augmented Retrieval for Open-domain Question Answering - [Arxiv] [QA]
- SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning - [Arxiv] [QA]
- Simplified TinyBERT: Knowledge Distillation for Document Retrieval - [Arxiv] [QA]
- BERT-QE: Contextualized Query Expansion for Document Re-ranking - [Arxiv] [QA]
- Efficient Transformers: A Survey - [Arxiv] [QA]
- Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion - [Arxiv] [QA]
- Understanding the Role of Individual Units in a Deep Neural Network - [Arxiv] [QA]
- KNN-DBSCAN: a DBSCAN in high dimensions - [Arxiv] [QA]
- Generative Language Modeling for Automated Theorem Proving - [Arxiv] [QA]
- Measuring Massive Multitask Language Understanding - [Arxiv] [QA]
- Sensors, Safety Models and A System-Level Approach to Safe and Scalable Automated Vehicles - [Arxiv] [QA]
- Sample-Efficient Automated Deep Reinforcement Learning - [Arxiv] [QA]
- Sample-Efficient Automated Deep Reinforcement Learning - [Arxiv] [QA]
- Learning to summarize from human feedback - [Arxiv] [QA]
- WaveGrad: Estimating Gradients for Waveform Generation - [Arxiv] [QA]
- Zero-Shot Human-Object Interaction Recognition via Affordance Graphs - [Arxiv] [QA]
- Neural Architecture Search For Keyword Spotting - [Arxiv] [QA]
- Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics - [Arxiv] [QA]
- A Survey of Deep Active Learning - [Arxiv] [QA]
- Against Membership Inference Attack: Pruning is All You Need - [Arxiv] [QA]
- A Survey of Evaluation Metrics Used for NLG Systems - [Arxiv] [QA]
- Automated Search for Resource-Efficient Branched Multi-Task Networks - [Arxiv] [QA]
- Contrastive learning, multi-view redundancy, and linear models - [Arxiv] [QA]
- A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild - [Arxiv] [QA]
- PARADE: Passage Representation Aggregation for Document Reranking - [Arxiv] [QA]
- Monocular Expressive Body Regression through Body-Driven Attention - [Arxiv] [QA]
- Automated Machine Learning -- a brief review at the end of the early years - [Arxiv] [QA]
- HiPPO: Recurrent Memory with Optimal Polynomial Projections - [Arxiv] [QA]
- A Survey of Active Learning for Text Classification using Deep Neural Networks - [Arxiv] [QA]
- Context-aware Feature Generation for Zero-shot Semantic Segmentation - [Arxiv] [QA]
- ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection - [Arxiv] [QA]
- Adaptive Learning of Tensor Network Structures - [Arxiv] [QA]
- Adaptive Learning of Tensor Network Structures - [Arxiv] [QA]
- SpeedySpeech: Efficient Neural Speech Synthesis - [Arxiv] [QA]
- Spatiotemporal Contrastive Video Representation Learning - [Arxiv] [QA]
- A Boundary Based Out-of-Distribution Classifier for Generalized Zero-Shot Learning - [Arxiv] [QA]
- Polysemy Deciphering Network for Robust Human-Object Interaction Detection - [Arxiv] [QA]
- Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework - [Arxiv] [QA]
- Pose-based Modular Network for Human-Object Interaction Detection - [Arxiv] [QA]
- Predicting What You Already Know Helps: Provable Self-Supervised Learning - [Arxiv] [QA]
- Explainable Face Recognition - [Arxiv] [QA]
- Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases - [Arxiv] [QA]
- Self-supervised Learning for Large-scale Item Recommendations - [Arxiv] [QA]
- Visual Compositional Learning for Human-Object Interaction Detection - [Arxiv] [QA]
- Self-Supervised Learning Across Domains - [Arxiv] [QA]
- Understanding BERT Rankers Under Distillation - [Arxiv] [QA]
- Video Representation Learning by Recognizing Temporal Transformations - [Arxiv] [QA]
- Learning Joint Spatial-Temporal Transformations for Video Inpainting - [Arxiv] [QA]
- Mixture Representation Learning with Coupled Autoencoders - [Arxiv] [QA]
- Mixture Representation Learning with Coupled Autoencoders - [Arxiv] [QA]
- Leveraging Seen and Unseen Semantic Relationships for Generative Zero-Shot Learning - [Arxiv] [QA]
- Towards Deeper Graph Neural Networks - [Arxiv] [QA]
- Towards Deeper Graph Neural Networks - [Arxiv] [QA]
- DVI: Depth Guided Video Inpainting for Autonomous Driving - [Arxiv] [QA]
- Detecting Human-Object Interactions with Action Co-occurrence Priors - [Arxiv] [QA]
- Hopfield Networks is All You Need - [Arxiv] [QA]
- Natural Graph Networks - [Arxiv] [QA]
- Few-shot Scene-adaptive Anomaly Detection - [Arxiv] [QA]
- Few-shot Scene-adaptive Anomaly Detection - [Arxiv] [QA]
- Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations - [Arxiv] [QA]
- A Graph-based Interactive Reasoning for Human-Object Interaction Detection - [Arxiv] [QA]
- TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech - [Arxiv] [QA]
- Accuracy Prediction with Non-neural Model for Neural Architecture Search - [Arxiv] [QA]
- GOLD-NAS: Gradual, One-Level, Differentiable - [Arxiv] [QA]
- GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis - [Arxiv] [QA]
- Confidence-Aware Learning for Deep Neural Networks - [Arxiv] [QA]
- The Fyodorov-Hiary-Keating Conjecture. I - [Arxiv] [QA]
- Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval - [Arxiv] [QA]
- Interactive Path Reasoning on Graph for Conversational Recommendation - [Arxiv] [QA]
- Data Movement Is All You Need: A Case Study on Optimizing Transformers - [Arxiv] [QA]
- ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph - [Arxiv] [QA]
- PLATO-2: Towards Building an Open-Domain Chatbot via Curriculum Learning - [Arxiv] [QA]
- Lipschitzness Is All You Need To Tame Off-policy Generative Adversarial Imitation Learning - [Arxiv] [QA]
- RepBERT: Contextualized Text Embeddings for First-Stage Retrieval - [Arxiv] [QA]
- Video Representation Learning with Visual Tempo Consistency - [Arxiv] [QA]
- GPT-GNN: Generative Pre-Training of Graph Neural Networks - [Arxiv] [QA]
- Space-Time Correspondence as a Contrastive Random Walk - [Arxiv] [QA]
- Practical applications of metric space magnitude and weighting vectors - [Arxiv] [QA]
- Generative causal explanations of black-box classifiers - [Arxiv] [QA]
- Gaining Insight into SARS-CoV-2 Infection and COVID-19 Severity Using Self-supervised Edge Features and Graph Neural Networks - [Arxiv] [QA]
- A Constructive, Type-Theoretic Approach to Regression via Global Optimisation - [Arxiv] [QA]
- Unsupervised Evaluation of Interactive Dialog with DialoGPT - [Arxiv] [QA]
- Efficient Hyperparameter Optimization in Deep Learning Using a Variable Length Genetic Algorithm - [Arxiv] [QA]
- Logarithmic Pruning is All You Need - [Arxiv] [QA]
- Towards Understanding Label Smoothing - [Arxiv] [QA]
- Towards Understanding Label Smoothing - [Arxiv] [QA]
- wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations - [Arxiv] [QA]
- Self-Supervised Prototypical Transfer Learning for Few-Shot Classification - [Arxiv] [QA]
- Denoising Diffusion Probabilistic Models - [Arxiv] [QA]
- Neural Parameter Allocation Search - [Arxiv] [QA]
- Neural Parameter Allocation Search - [Arxiv] [QA]
- Contrastive learning of global and local features for medical image segmentation with limited annotations - [Arxiv] [QA]
- Stochastic Bandits with Linear Constraints - [Arxiv] [QA]
- Self-supervised Learning on Graphs: Deep Insights and New Direction - [Arxiv] [QA]
- Big Self-Supervised Models are Strong Semi-Supervised Learners - [Arxiv] [QA]
- GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training - [Arxiv] [QA]
- Unsupervised Learning of Visual Features by Contrasting Cluster Assignments - [Arxiv] [QA]
- Cross-lingual Retrieval for Iterative Self-Supervised Training - [Arxiv] [QA]
- When Does Self-Supervision Help Graph Convolutional Networks? - [Arxiv] [QA]
- Augmented Sliced Wasserstein Distances - [Arxiv] [QA]
- Augmented Sliced Wasserstein Distances - [Arxiv] [QA]
- Self-supervised Learning: Generative or Contrastive - [Arxiv] [QA]
- DeeperGCN: All You Need to Train Deeper GCNs - [Arxiv] [QA]
- DeeperGCN: All You Need to Train Deeper GCNs - [Arxiv] [QA]
- IsarStep: a Benchmark for High-level Mathematical Reasoning - [Arxiv] [QA]
- Interpretable Neural Architecture Search via Bayesian Optimisation with Weisfeiler-Lehman Kernels - [Arxiv] [QA]
- Rethinking the Value of Labels for Improving Class-Imbalanced Learning - [Arxiv] [QA]
- Self-Supervised Relational Reasoning for Representation Learning - [Arxiv] [QA]
- Diagnosing Rarity in Human-Object Interaction Detection - [Arxiv] [QA]
- Contrastive Multi-View Representation Learning on Graphs - [Arxiv] [QA]
- Self-supervised Learning from a Multi-view Perspective - [Arxiv] [QA]
- FastSpeech 2: Fast and High-Quality End-to-End Text to Speech - [Arxiv] [QA]
- Differentiable Neural Input Search for Recommender Systems - [Arxiv] [QA]
- CoCon: A Self-Supervised Approach for Controlled Text Generation - [Arxiv] [QA]
- M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training - [Arxiv] [QA]
- Situated and Interactive Multimodal Conversations - [Arxiv] [QA]
- Bayesian Updates Compose Optically - [Arxiv] [QA]
- Explainable Artificial Intelligence: a Systematic Review - [Arxiv] [QA]
- Language Models are Few-Shot Learners - [Arxiv] [QA]
- SCAN: Learning to Classify Images without Labels - [Arxiv] [QA]
- High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling - [Arxiv] [QA]
- Novel Human-Object Interaction Detection via Adversarial Domain Generalization - [Arxiv] [QA]
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks - [Arxiv] [QA]
- Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search - [Arxiv] [QA]
- Novel Policy Seeking with Constrained Optimization - [Arxiv] [QA]
- Novel Policy Seeking with Constrained Optimization - [Arxiv] [QA]
- Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation - [Arxiv] [QA]
- Mirror Descent Policy Optimization - [Arxiv] [QA]
- Mirror Descent Policy Optimization - [Arxiv] [QA]
- Normalized Attention Without Probability Cage - [Arxiv] [QA]
- Normalized Attention Without Probability Cage - [Arxiv] [QA]
- Vector-Quantized Autoregressive Predictive Coding - [Arxiv] [QA]
- Semantic Photo Manipulation with a Generative Image Prior - [Arxiv] [QA]
- Is Your Goal-Oriented Dialog Model Performing Really Well? Empirical Analysis of System-wise Evaluation - [Arxiv] [QA]
- Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech - [Arxiv] [QA]
- Local Self-Attention over Long Text for Efficient Document Retrieval - [Arxiv] [QA]
- Categorical Stochastic Processes and Likelihood - [Arxiv] [QA]
- Condensed Movies: Story Based Retrieval with Contextual Embeddings - [Arxiv] [QA]
- DramaQA: Character-Centered Video Story Understanding with Hierarchical QA - [Arxiv] [QA]
- The Cascade Transformer: an Application for Efficient Answer Sentence Selection - [Arxiv] [QA]
- Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? - [Arxiv] [QA]
- Does Visual Self-Supervision Improve Learning of Speech Representations for Emotion Recognition? - [Arxiv] [QA]
- Learning an Unreferenced Metric for Online Dialogue Evaluation - [Arxiv] [QA]
- POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training - [Arxiv] [QA]
- HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training - [Arxiv] [QA]
- Sparse, Dense, and Attentional Representations for Text Retrieval - [Arxiv] [QA]
- Consistent Video Depth Estimation - [Arxiv] [QA]
- Training Curricula for Open Domain Answer Re-Ranking - [Arxiv] [QA]
- Efficient Document Re-Ranking for Transformers by Precomputing Term Representations - [Arxiv] [QA]
- Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning - [Arxiv] [QA]
- Complementing Lexical Retrieval with Semantic Residual Embedding - [Arxiv] [QA]
- Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels - [Arxiv] [QA]
- Recipes for building an open-domain chatbot - [Arxiv] [QA]
- Modularized Transfomer-based Ranking Framework - [Arxiv] [QA]
- ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT - [Arxiv] [QA]
- All you need is a second look: Towards Tighter Arbitrary shape text detection - [Arxiv] [QA]
- Multi-Domain Dialogue Acts and Response Co-Generation - [Arxiv] [QA]
- Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matching - [Arxiv] [QA]
- A survey on domain adaptation theory: learning bounds and theoretical guarantees - [Arxiv] [QA]
- Learning Term Discrimination - [Arxiv] [QA]
- Supervised Contrastive Learning - [Arxiv] [QA]
- Federated Stochastic Gradient Langevin Dynamics - [Arxiv] [QA]
- Federated Stochastic Gradient Langevin Dynamics - [Arxiv] [QA]
- Distilling Knowledge for Fast Retrieval-based Chat-bots - [Arxiv] [QA]
- Considering Likelihood in NLP Classification Explanations with Occlusion and Language Modeling - [Arxiv] [QA]
- Detailed 2D-3D Joint Representation for Human-Object Interaction - [Arxiv] [QA]
- Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks - [Arxiv] [QA]
- Will I Sound Like Me? Improving Persona Consistency in Dialogues through Pragmatic Self-Consciousness - [Arxiv] [QA]
- Spatially-Attentive Patch-Hierarchical Network for Adaptive Motion Deblurring - [Arxiv] [QA]
- Dense Passage Retrieval for Open-Domain Question Answering - [Arxiv] [QA]
- TextGAIL: Generative Adversarial Imitation Learning for Text Generation - [Arxiv] [QA]
- There and Back Again: Revisiting Backpropagation Saliency Methods - [Arxiv] [QA]
- PaStaNet: Toward Human Activity Knowledge Engine - [Arxiv] [QA]
- A Survey on Conversational Recommender Systems - [Arxiv] [QA]
- How Useful is Self-Supervised Pretraining for Visual Tasks? - [Arxiv] [QA]
- Learning Human-Object Interaction Detection using Interaction Points - [Arxiv] [QA]
- InterBERT: Vision-and-Language Interaction for Multi-modal Pretraining - [Arxiv] [QA]
- VIOLIN: A Large-Scale Dataset for Video-and-Language Inference - [Arxiv] [QA]
- Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need? - [Arxiv] [QA]
- Deformable Style Transfer - [Arxiv] [QA]
- Distributional Reinforcement Learning with Ensembles - [Arxiv] [QA]
- Distributional Reinforcement Learning with Ensembles - [Arxiv] [QA]
- Model-based Asynchronous Hyperparameter and Neural Architecture Search - [Arxiv] [QA]
- Pre-trained Models for Natural Language Processing: A Survey - [Arxiv] [QA]
- Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification - [Arxiv] [QA]
- XPersona: Evaluating Multilingual Personalized Chatbot - [Arxiv] [QA]
- Guidance and Evaluation: Semantic-Aware Image Inpainting for Mixed Scenes - [Arxiv] [QA]
- VCNet: A Robust Approach to Blind Image Inpainting - [Arxiv] [QA]
- Document Ranking with a Pretrained Sequence-to-Sequence Model - [Arxiv] [QA]
- VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions - [Arxiv] [QA]
- Building and Interpreting Deep Similarity Models - [Arxiv] [QA]
- xCos: An Explainable Cosine Metric for Face Verification Task - [Arxiv] [QA]
- Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning - [Arxiv] [QA]
- ReZero is All You Need: Fast Convergence at Large Depth - [Arxiv] [QA]
- Improved Baselines with Momentum Contrastive Learning - [Arxiv] [QA]
- How to Train Your Super-Net: An Analysis of Training Heuristics in Weight-Sharing NAS - [Arxiv] [QA]
- Cascaded Human-Object Interaction Recognition - [Arxiv] [QA]
- A Safety Framework for Critical Systems Utilising Deep Neural Networks - [Arxiv] [QA]
- De Finetti's construction as a categorical limit - [Arxiv] [QA]
- AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment - [Arxiv] [QA]
- XGPT: Cross-modal Generative Pre-Training for Image Captioning - [Arxiv] [QA]
- Benchmarking Graph Neural Networks - [Arxiv] [QA]
- Benchmarking Graph Neural Networks - [Arxiv] [QA]
- DC-BERT: Decoupling Question and Document for Efficient Contextual Encoding - [Arxiv] [QA]
- Estimation-Action-Reflection: Towards Deep Interaction Between Conversational and Recommender Systems - [Arxiv] [QA]
- Automatic Shortcut Removal for Self-Supervised Representation Learning - [Arxiv] [QA]
- Disentangled Speech Embeddings using Cross-modal Self-supervision - [Arxiv] [QA]
- Gradient Boosting Neural Networks: GrowNet - [Arxiv] [QA]
- Gradient Boosting Neural Networks: GrowNet - [Arxiv] [QA]
- Information Condensing Active Learning - [Arxiv] [QA]
- Information Condensing Active Learning - [Arxiv] [QA]
- UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation - [Arxiv] [QA]
- A Simple Framework for Contrastive Learning of Visual Representations - [Arxiv] [QA]
- REALM: Retrieval-Augmented Language Model Pre-Training - [Arxiv] [QA]
- Pre-training Tasks for Embedding-based Large-scale Retrieval - [Arxiv] [QA]
- Unsupervised pretraining transfers well across languages - [Arxiv] [QA]
- Dynamic Knowledge Routing Network For Target-Guided Open-Domain Conversation - [Arxiv] [QA]
- Proving the Lottery Ticket Hypothesis: Pruning is All You Need - [Arxiv] [QA]
- Learning Robust and Multilingual Speech Representations - [Arxiv] [QA]
- Selective Weak Supervision for Neural Information Retrieval - [Arxiv] [QA]
- Multi-task self-supervised learning for Robust Speech Recognition - [Arxiv] [QA]
- TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval - [Arxiv] [QA]
- Scaling Laws for Neural Language Models - [Arxiv] [QA]
- Safety Concerns and Mitigation Approaches Regarding the Use of Deep Learning in Safety-Critical Perception Tasks - [Arxiv] [QA]
- Discriminator Soft Actor Critic without Extrinsic Rewards - [Arxiv] [QA]
- Latency-Aware Differentiable Neural Architecture Search - [Arxiv] [QA]
- MixPath: A Unified Approach for One-shot Neural Architecture Search - [Arxiv] [QA]
- A Categorical Framework for Learning Generalised Tree Automata - [Arxiv] [QA]
- Classifying All Interacting Pairs in a Single Shot - [Arxiv] [QA]
- Visually Guided Self Supervised Learning of Speech Representations - [Arxiv] [QA]
- ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training - [Arxiv] [QA]
- Visual-Semantic Graph Attention Networks for Human-Object Interaction Detection - [Arxiv] [QA]
- Correctness of Automatic Differentiation via Diffeologies and Categorical Gluing - [Arxiv] [QA]
- Deeper Insights into Weight Sharing in Neural Architecture Search - [Arxiv] [QA]