Skip to content

Commit

Permalink
Update index.html
Browse files Browse the repository at this point in the history
  • Loading branch information
Joe-Vincent authored May 10, 2024
1 parent a5af94d commit 05ac53d
Showing 1 changed file with 27 additions and 5 deletions.
32 changes: 27 additions & 5 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,8 @@ <h1 class="title is-1 publication-title">How Generalizable Is My Behavior Clonin
type="video/mp4">
</video>
<h2 class="subtitle has-text-centered">
Aliquam vitae elit ullamcorper tellus egestas pellentesque. Ut lacus tellus, maximus vel lectus at, placerat pretium mi. Maecenas dignissim tincidunt vestibulum. Sed consequat hendrerit nisl ut maximus.
When using a small number of policy rollouts to evaluate robot performance, it is important to quantify our uncertainty in the performance estimate.
In our paper we show how to place worst-case confidence bounds on the distribution of robot performance while using the observed performance from policy rollouts as efficiently as possible.
</h2>
</div>
</div>
Expand All @@ -140,18 +141,39 @@ <h2 class="title is-3">Abstract</h2>
<!-- End paper abstract -->


<!-- Sim experiments -->
<section class="section">
<div class="container is-max-desktop">
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h2 class="title is-3">Evaluation in Simulation</h2>
<div class="content has-text-justified">
<p>
We obtain upper confidence bounds on the cumulative distribution function (CDF) of the total reward obtained by diffusion policies in out-of-distribution robosuite environments.
An upper confidence bound on the CDF can be interpreted as the worst-case distribution of reward that is consistent with the observed policy rollouts.
Here we show representative policy rollouts for the Square environment, and plot the in-distribution CDF of reward and our upper confidence bound constructed from 40 out-of-distribution policy rollouts.
The confidence bounds we obtain quantify our uncertainty in the performance of the robot in a concrete and interpretable manner.
</p>
</div>
</div>
</div>
</div>
</section>
<!-- End hardware experiments -->




<!-- Hardware experiments -->
<section class="section">
<div class="container is-max-desktop">
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h2 class="title is-3">Hardware Evaluation</h2>
<h2 class="title is-3">Evaluation in Hardware</h2>
<div class="content has-text-justified">
<p>
We obtain lower confidence bounds on the success rate of a diffusion policy tested in two out-of-distribution environments.
The confidence bounds we obtain make the most efficient use of the 50 samples used to estimate the performance of the robot.
The confidence bounds we obtain make the most efficient use of the 50 policy rollouts used to estimate the performance of the robot.
The confidence bounds we obtain quantify our uncertainty in the performance of the robot in a concrete and interpretable manner.
</p>
</div>
Expand All @@ -177,7 +199,7 @@ <h2 class="title is-3">Comparing Policies</h2>
Here we apply our statistical bounds to the recent results from the <a href="https://arxiv.org/abs/2307.15818" target="_blank">RT-2 paper</a>, where the authors compare their RT-2 policy to a VC-1 policy in three settings designed to test emergent capabilities in symbol understanding, reasoning, and human recognition.
For each setting we find the 95% confidence intervals for policy success rate are disjiont, and we conclude with 95% confidence that RT-2 outperforms VC-1.
</p>
<img src="static/images/policy_comparison.png" alt="Confidence intervals for policy success rates">
<img src="static/images/policy_comparison.png" alt="Confidence intervals for policy success rates" width="75%">
</div>
</div>
</div>
Expand All @@ -193,7 +215,7 @@ <h2 class="title is-3">Comparing Policies</h2>


<!-- Youtube video -->
<section class="hero is-small is-light">
<section class="hero is-small">
<div class="hero-body">
<div class="container">
<!-- Paper video. -->
Expand Down

0 comments on commit 05ac53d

Please sign in to comment.