diff --git a/DRL-ConferencePaper/.DS_Store b/DRL-ConferencePaper/.DS_Store index 9013de6..fb14461 100644 Binary files a/DRL-ConferencePaper/.DS_Store and b/DRL-ConferencePaper/.DS_Store differ diff --git a/DRL-ConferencePaper/ICLR/.DS_Store b/DRL-ConferencePaper/ICLR/.DS_Store new file mode 100644 index 0000000..43f75d5 Binary files /dev/null and b/DRL-ConferencePaper/ICLR/.DS_Store differ diff --git a/DRL-ConferencePaper/ICLR/2020/README.md b/DRL-ConferencePaper/ICLR/2020/README.md new file mode 100644 index 0000000..c1a1703 --- /dev/null +++ b/DRL-ConferencePaper/ICLR/2020/README.md @@ -0,0 +1,865 @@ +# ICLR2020 +## 106:"Reinforcement Learning" Accept Paper list + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
RankAverage RatingTitleRatingsVarianceDecision
18.00Dynamics-aware Unsupervised Skill Discovery8 8 80.00Accept (Talk)
18.00Contrastive Learning Of Structured World Models8 8 80.00Accept (Talk)
18.00Implementation Matters In Deep Rl: A Case Study On Ppo And Trpo8 8 80.00Accept (Talk)
18.00Gendice: Generalized Offline Estimation Of Stationary Values8 8 80.00Accept (Talk)
18.00Causal Discovery With Reinforcement Learning8 8 80.00Accept (Talk)
27.33Is A Good Representation Sufficient For Sample Efficient Reinforcement Learning?8 8 60.89Accept (Spotlight)
27.33Harnessing Structures For Value-based Planning And Reinforcement Learning6 8 80.89Accept (Talk)
27.33Explain Your Move: Understanding Agent Actions Using Focused Feature Saliency6 8 80.89Accept (Poster)
27.33Meta-q-learning8 8 60.89Accept (Talk)
27.33Discriminative Particle Filter Reinforcement Learning For Complex Partial Observations8 6 80.89Accept (Poster)
27.33Disagreement-regularized Imitation Learning6 8 80.89Accept (Spotlight)
27.33Doubly Robust Bias Reduction In Infinite Horizon Off-policy Estimation6 8 80.89Accept (Spotlight)
27.33Seed Rl: Scalable And Efficient Deep-rl With Accelerated Central Inference8 6 80.89Accept (Talk)
27.33The Ingredients Of Real World Robotic Reinforcement Learning6 8 80.89Accept (Spotlight)
27.33Watch The Unobserved: A Simple Approach To Parallelizing Monte Carlo Tree Search8 6 80.89Accept (Talk)
27.33Meta-learning Acquisition Functions For Transfer Learning In Bayesian Optimization8 6 80.89Accept (Spotlight)
27.33A Closer Look At Deep Policy Gradients8 6 80.89Accept (Talk)
27.33Fast Task Inference With Variational Intrinsic Successor Features8 6 80.89Accept (Talk)
27.33Learning To Plan In High Dimensions Via Neural Exploration-exploitation Trees8 8 60.89Accept (Spotlight)
37.00Dream To Control: Learning Behaviors By Latent Imagination8 6 6 81.00Accept (Spotlight)
46.67Making Efficient Use Of Demonstrations To Solve Hard Exploration Problems6 8 60.89Accept (Poster)
46.67Intrinsic Motivation For Encouraging Synergistic Behavior6 8 60.89Accept (Poster)
46.67Sqil: Imitation Learning Via Reinforcement Learning With Sparse Rewards8 6 60.89Accept (Poster)
46.67Reinforcement Learning With Competitive Ensembles Of Information-constrained Primitives8 6 60.89Accept (Poster)
46.67Multi-agent Interactions Modeling With Correlated Policies6 6 80.89Accept (Poster)
46.67Influence-based Multi-agent Exploration6 6 80.89Accept (Spotlight)
46.67Learning The Arrow Of Time For Problems In Reinforcement Learning6 6 80.89Accept (Poster)
46.67Amrl: Aggregated Memory For Reinforcement Learning6 6 80.89Accept (Poster)
46.67Model Based Reinforcement Learning For Atari6 8 60.89Accept (Spotlight)
46.67Variational Recurrent Models For Solving Partially Observable Control Tasks6 6 80.89Accept (Poster)
46.67Sample Efficient Policy Gradient Methods With Recursive Variance Reduction6 8 60.89Accept (Poster)
46.67Exploring Model-based Planning With Policy Networks6 8 60.89Accept (Poster)
46.67Reinforcement Learning Based Graph-to-sequence Model For Natural Question Generation6 6 80.89Accept (Poster)
46.67Ride: Rewarding Impact-driven Exploration For Procedurally-generated Environments6 6 80.89Accept (Poster)
46.67Learning Expensive Coordination: An Event-based Deep Rl Approach6 8 60.89Accept (Poster)
46.67Evolutionary Population Curriculum For Scaling Multi-agent Reinforcement Learning6 8 60.89Accept (Poster)
46.67Making Sense Of Reinforcement Learning And Probabilistic Inference6 6 80.89Accept (Spotlight)
46.67Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs8 6 60.89Accept (Poster)
46.67Never Give Up: Learning Directed Exploration Strategies6 6 80.89Accept (Poster)
46.67Robust Reinforcement Learning For Continuous Control With Model Misspecification6 6 80.89Accept (Poster)
46.67Synthesizing Programmatic Policies That Inductively Generalize6 8 60.89Accept (Poster)
46.67Adaptive Correlated Monte Carlo For Contextual Categorical Sequence Generation6 6 80.89Accept (Poster)
46.67Improving Generalization In Meta Reinforcement Learning Using Neural Objectives6 6 80.89Accept (Spotlight)
56.33Single Episode Transfer For Differing Environmental Dynamics In Reinforcement Learning3 8 85.56Accept (Poster)
56.33Decentralized Distributed Ppo: Mastering Pointgoal Navigation3 8 85.56Accept (Poster)
66.25Geometric Insights Into The Convergence Of Nonlinear Td Learning8 3 6 84.19Accept (Poster)
66.25Dynamics-aware Embeddings3 8 6 84.19Accept (Poster)
76.20Reanalysis Of Variance Reduced Temporal Difference Learning8 8 6 3 63.36Accept (Poster)
86.00Q-learning With Ucb Exploration Is Sample Efficient For Infinite-horizon Mdp6 6 6 60.00Accept (Poster)
86.00Automated Curriculum Generation Through Setter-solver Interactions6 6 60.00Accept (Poster)
86.00Optimistic Exploration Even With A Pessimistic Initialisation6 6 60.00Accept (Poster)
86.00Multi-agent Reinforcement Learning For Networked System Control6 6 60.00Accept (Poster)
86.00A Learning-based Iterative Method For Solving Vehicle Routing Problems6 6 60.00Accept (Poster)
86.00Sharing Knowledge In Multi-task Deep Reinforcement Learning6 6 60.00Accept (Poster)
86.00Rtfm: Generalising To New Environment Dynamics Via Reading6 6 60.00Accept (Poster)
86.00Meta Reinforcement Learning With Autonomous Inference Of Subtask Dependencies6 6 60.00Accept (Poster)
86.00Projection Based Constrained Policy Optimization6 6 60.00Accept (Poster)
86.00Graph Constrained Reinforcement Learning For Natural Language Action Spaces6 6 60.00Accept (Poster)
86.00V-mpo: On-policy Maximum A Posteriori Policy Optimization For Discrete And Continuous Control6 6 60.00Accept (Poster)
86.00Thinking While Moving: Deep Reinforcement Learning With Concurrent Control6 6 60.00Accept (Poster)
86.00Keep Doing What Worked: Behavior Modelling Priors For Offline Reinforcement Learning6 6 60.00Accept (Poster)
86.00Imitation Learning Via Off-policy Distribution Matching6 6 60.00Accept (Poster)
86.00Adversarial Autoaugment6 6 60.00Accept (Poster)
86.00Option Discovery Using Deep Skill Chaining6 6 60.00Accept (Poster)
86.00State-only Imitation With Transition Dynamics Mismatch6 6 60.00Accept (Poster)
86.00The Gambler’s Problem And Beyond6 6 60.00Accept (Poster)
86.00Structured Object-aware Physics Prediction For Video Modeling And Planning6 6 60.00Accept (Poster)
86.00Dynamical Distance Learning For Semi-supervised And Unsupervised Skill Discovery6 6 60.00Accept (Poster)
86.00Exploration In Reinforcement Learning With Deep Covering Options6 6 60.00Accept (Poster)
86.00Cm3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning6 6 60.00Accept (Poster)
86.00Learning To Coordinate Manipulation Skills Via Skill Behavior Diversification6 6 60.00Accept (Poster)
86.00Composing Task-agnostic Policies With Deep Reinforcement Learning6 6 60.00Accept (Poster)
86.00Frequency-based Search-control In Dyna6 6 60.00Accept (Poster)
86.00Black-box Off-policy Estimation For Infinite-horizon Reinforcement Learning6 6 60.00Accept (Poster)
86.00Action Semantics Network: Considering The Effects Of Actions In Multiagent Systems6 6 60.00Accept (Poster)
86.00Caql: Continuous Action Q-learning6 60.00Accept (Poster)
86.00Reinforced Active Learning For Image Segmentation6 60.00Accept (Poster)
86.00The Variational Bandwidth Bottleneck: Stochastic Evaluation On An Information Budget6 60.00Accept (Poster)
86.00Hierarchical Foresight: Self-supervised Learning Of Long-horizon Tasks Via Visual Subgoal Generation6 60.00Accept (Poster)
95.75Maximum Likelihood Constraint Inference For Inverse Reinforcement Learning8 6 3 63.19Accept (Spotlight)
95.75Autoq: Automated Kernel-wise Neural Network Quantization6 6 8 33.19Accept (Poster)
95.75Varibad: A Very Good Method For Bayes-adaptive Deep Rl Via Meta-learning8 6 8 18.19Accept (Poster)
105.67Watch, Try, Learn: Meta-learning From Demonstrations And Rewards8 3 64.22Accept (Poster)
105.67Population-guided Parallel Policy Search For Reinforcement Learning6 8 34.22Accept (Poster)
105.67A Simple Randomization Technique For Generalization In Deep Reinforcement Learning8 3 64.22Accept (Poster)
105.67On The Weaknesses Of Reinforcement Learning For Neural Machine Translation8 6 34.22Accept (Poster)
105.67State Alignment-based Imitation Learning6 8 34.22Accept (Poster)
105.67Finding And Visualizing Weaknesses Of Deep Reinforcement Learning Agents8 6 34.22Accept (Poster)
105.67Model-augmented Actor-critic: Backpropagating Through Paths3 6 84.22Accept (Poster)
105.67Behaviour Suite For Reinforcement Learning8 3 64.22Accept (Spotlight)
105.67Learning Heuristics For Quantified Boolean Formulas Through Reinforcement Learning6 8 34.22Accept (Poster)
105.67Maxmin Q-learning: Controlling The Estimation Bias Of Q-learning8 6 34.22Accept (Poster)
105.67Hypermodels For Exploration8 3 64.22Accept (Poster)
115.50Sub-policy Adaptation For Hierarchical Reinforcement Learning3 86.25Accept (Poster)
115.50Svqn: Sequential Variational Soft Q-learning Networks3 86.25Accept (Poster)
125.25Impact: Importance Weighted Asynchronous Architectures With Clipped Target Networks6 3 6 61.69Accept (Poster)
135.00Ranking Policy Gradient6 3 62.00Accept (Poster)
135.00Model-based Reinforcement Learning For Biological Sequence Design6 3 62.00Accept (Poster)
135.00Learning Nearly Decomposable Value Functions Via Communication Minimization6 6 32.00Accept (Poster)
135.00Implementing Inductive Bias For Different Navigation Tasks Through Diverse Rnn Attrractors3 6 62.00Accept (Poster)
135.00Toward Evaluating Robustness Of Deep Reinforcement Learning With Continuous Control6 3 62.00Accept (Poster)
135.00Learning Efficient Parameter Server Synchronization Policies For Distributed Sgd6 3 62.00Accept (Poster)
135.00Episodic Reinforcement Learning With Associative Memory6 3 62.00Accept (Poster)
144.67Logic And The 2-simplicial Transformer8 3 35.56Accept (Poster)
154.00Exploratory Not Explanatory: Counterfactual Analysis Of Saliency Maps For Deep Rl1 3 88.67Accept (Poster)
154.00Playing The Lottery With Rewards And Multiple Languages: Lottery Tickets In Rl And Nlp3 3 62.00Accept (Poster)
\ No newline at end of file