From 9d905b4a93065357e6b53993ec37862e6afc5852 Mon Sep 17 00:00:00 2001 From: Joe Vincent Date: Fri, 10 May 2024 08:43:07 -0700 Subject: [PATCH] Update index.html --- docs/index.html | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/index.html b/docs/index.html index d68208e0..fbaca988 100644 --- a/docs/index.html +++ b/docs/index.html @@ -199,7 +199,9 @@

Comparing Policies

Here we apply our statistical bounds to the recent results from the RT-2 paper, where the authors compare their RT-2 policy to a VC-1 policy in three settings designed to test emergent capabilities in symbol understanding, reasoning, and human recognition. For each setting we find the 95% confidence intervals for policy success rate are disjiont, and we conclude with 95% confidence that RT-2 outperforms VC-1.

+
Confidence intervals for policy success rates +