snap-research · AliaksandrSiarohin · Jun 26, 2024 · Jun 5, 2024 · Jun 5, 2024 · Jun 5, 2024
diff --git a/README.md b/README.md
@@ -15,11 +15,11 @@ This is the offical Github repository of Panda-70M.
 [Ming-Hsuan Yang](https://faculty.ucmerced.edu/mhyang/),
 [Sergey Tulyakov](http://www.stulyakov.com/)
 </br>
-*Computer Vision and Pattern Recognition 2024*
+*Computer Vision and Pattern Recognition (CVPR) 2024*
 
-<!-- [Arxiv Report](https://arxiv.org/abs/2307.04725) | [Project Page](https://snap-research.github.io/Panda-70M) -->
 [![arXiv](https://img.shields.io/badge/arXiv-2402.19479-b31b1b.svg)](https://arxiv.org/abs/2402.19479)
 [![Project Page](https://img.shields.io/badge/Project-Website-green)](https://snap-research.github.io/Panda-70M)
+[![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://youtu.be/m2NQ5k1oTcs)
 
 ## Introduction
 Panda-70M is a large-scale dataset with 70M high-quality video-caption pairs.
@@ -86,7 +86,7 @@ More details can be found in [Dataset Dataloading](./dataset_dataloading) sectio
     </tr>
   </table>
 
-<sup>**We will remove the video samples from our dataset / Github / project webpage as long as you need it. Please contact tsaishienchen at gmail dot com for the request.</sup>
+<sup>**We will remove the video samples from our dataset / Github / project webpage / technical presentation as long as you need it. Please contact tsaishienchen at gmail dot com for the request.</sup>
 
 Please check [here](https://snap-research.github.io/Panda-70M/more_samples) for more samples.
 

diff --git a/captioning/README.md b/captioning/README.md
@@ -1,6 +1,6 @@
 # 🐼 Panda-70M: Video Captioning
 
-**[Note] To use our captioning code, please make sure you follow [this guideline](https://github.com/lm-sys/FastChat/blob/main/docs/vicuna_weights_version.md#how-to-apply-delta-weights-only-needed-for-weights-v0) and correctly prepare vicuna-7b-v0 weight. Basically, you need to first download the original weights and then apply delta weights. Improper weights preparation will lead to meaningless outputs.**
+**[Note] To run the captioning code, please make sure you follow [this guideline](https://github.com/lm-sys/FastChat/blob/main/docs/vicuna_weights_version.md#how-to-apply-delta-weights-only-needed-for-weights-v0) and correctly prepare vicuna-7b-v0 weight. You need to first download the original weights and then apply delta weights. Improper weights preparation will lead to meaningless outputs.**
 
 ## Introduction
 We propose a video captioning model to generate a caption for a short video clip.
@@ -61,7 +61,7 @@ Please look at the video and faithfully summarize it in one sentence.</sup></td>
     </tr>
 </table>
 
-<sup>**We will remove the video samples from our dataset / Github / project webpage as long as you need it. Please contact tsaishienchen at gmail dot com for the request.</sup>
+<sup>**We will remove the video samples from our dataset / Github / project webpage / technical presentation as long as you need it. Please contact tsaishienchen at gmail dot com for the request.</sup>
 
 - **[Note]** You might get different outputs due to the randomness of LLM's generation.
 

diff --git a/captioning/assets/architecture.png b/captioning/assets/architecture.png
diff --git a/dataset_dataloading/README.md b/dataset_dataloading/README.md
@@ -1,7 +1,7 @@
 # 🐼 Panda-70M: Dataset Dataloading
 The section includes the csv files listing the data samples in Panda-70M and the code to download the videos.
 
-**[Note] Please use the video2dataset tool from this repository to download the dataset, as the video2dataset from [the official repository](https://github.com/iejMac/video2dataset) cannot work with our csv format for now.**
+**[Note] Please use the video2dataset tool from this repository to download the dataset, as the video2dataset from [the official repository](https://github.com/iejMac/video2dataset) cannot work with our csv format.**
 
 ## Data Splitting and Download Link
   | Split           | Download | # Source Videos | # Samples | Video Duration | Storage Space |

diff --git a/docs/assets/wordcloud.svg b/docs/assets/wordcloud.svg
diff --git a/docs/assets/worldcloud.svg b/docs/assets/worldcloud.svg
diff --git a/docs/html_pages/resources/stylesheet.css b/docs/html_pages/resources/stylesheet.css
@@ -186,4 +186,19 @@ div.scroll-container {
  .table-container {
    width: 100%;
  }
+}
+
+.youtube-container {
+  position: relative;
+  width: 100%;
+  padding-bottom: 56.25%; /* 16:9 aspect ratio */
+  height: 0;
+  overflow: hidden;
+}
+.youtube-container iframe {
+  position: absolute;
+  top: 0;
+  left: 0;
+  width: 100%;
+  height: 100%;
 }
diff --git a/docs/index.html b/docs/index.html
@@ -27,6 +27,7 @@
 		</nav>
 		<nav>
 			<a href="#download">Download</a>
+			<a href="#presentation">Presentation</a>
 			<a href="#collection">Collection</a>
 			<a href="#demo">Demo</a>
 			<a href="#statistic">Statistic</a>
@@ -167,7 +168,7 @@ <h5 class="pt-1" style="font-size: 2rem; font-weight: normal">A Large-Scale Data
 			</div>
 		</div>
 		<div class="container text-center footnote">
-			We will remove the video samples from our dataset / Github / project webpage as long as you need it. Please contact tsaishienchen at gmail dot com for the request.
+			We will remove the video samples from our dataset / Github / project webpage / technical presentation as long as you need it. Please contact tsaishienchen at gmail dot com for the request.
 		</div>
 	</div>
 
@@ -208,6 +209,16 @@ <h1 class="jumbotron-heading">Download Panda-70M</h1>
 
 	<hr class="mt-5">
 
+	<section id="presentation">
+		<div class="container text-center" style="margin-top: 10px">
+			<div class="youtube-container">
+				<iframe src="https://www.youtube.com/embed/m2NQ5k1oTcs?si=jCc8gruNWA_oXNyP&autoplay=0&mute=0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
+			</div>
+		</div>
+	</section>
+
+	<hr class="mt-5">
+
 	<section id="collection">
 		<div class="container text-center">
 			<h1 class="jumbotron-heading">Collection Pipeline of Panda-70M</h1>
@@ -268,7 +279,7 @@ <h1 class="jumbotron-heading">Demo of Long Video Annotation</h1>
 				</video>
 			</div>
 			<div class="container text-center footnote">
-				We will remove the video samples from our dataset / Github / project webpage as long as you need it. Please contact tsaishienchen at gmail dot com for the request.
+				We will remove the video samples from our dataset / Github / project webpage / technical presentation as long as you need it. Please contact tsaishienchen at gmail dot com for the request.
 			</div>
 		</div>
 	</section>
@@ -283,7 +294,7 @@ <h1 class="jumbotron-heading">Statistic</h1>
 					<img src="./assets/statistic.svg" style="margin-top: -20px;">
 				</div>
 				<div class="image-item">
-					<img src="./assets/worldcloud.svg" style="margin-top: 10px; margin-bottom: 10px; width: 90%">
+					<img src="./assets/wordcloud.svg" style="margin-top: 10px; margin-bottom: 10px; width: 90%">
 				</div>
 			</div>
 		</div>
@@ -360,7 +371,7 @@ <h1 class="jumbotron-heading text-center">Acknowledgement</h1>
 				imageItems.forEach(item => {
 					const elementPosition = item.getBoundingClientRect().top;
 
-					if (elementPosition < window.innerHeight * 0.7) {
+					if (elementPosition < window.innerHeight * 0.85) {
 						item.style.opacity = '1';
 					} else {
 						item.style.opacity = '0';

diff --git a/docs/more_samples.html b/docs/more_samples.html
@@ -30,7 +30,7 @@
 			<a href="#Scenery">Scenery</a>
 			<a href="#Food">Food</a>
 			<a href="#Sports_Activity">Sports Activity</a>
-			<a href="#Vehicles">Vehicles</a>
+			<a href="#Vehicle">Vehicle</a>
 			<a href="#Tutorial_and_Narrative">Tutorial and Narrative</a>
 			<a href="#News_and_TV_Shows">News and TV Shows</a>
 			<a href="#Gaming_and_3D_Rendering">Gaming and 3D Rendering</a>
@@ -48,7 +48,7 @@ <h5 class="pt-1" style="font-size: 2rem; font-weight: normal">A Large-Scale Data
 				<a class="paper-btn" style="width: 130px" href="#Scenery">Scenery</a>
 				<a class="paper-btn" style="width: 130px" href="#Food">Food</a>
 				<a class="paper-btn" style="width: 130px" href="#Sports_Activity">Sports Activity</a> 
-				<a class="paper-btn" style="width: 130px" href="#Vehicles">Vehicles</a>
+				<a class="paper-btn" style="width: 130px" href="#Vehicle">Vehicle</a>
 			</div>
 			<div class="paper-btn-parent">
 				<a class="paper-btn" style="width: 228px" href="#Tutorial_and_Narrative">Tutorial and Narrative</a>
@@ -462,7 +462,7 @@ <h2 class="pt-4"><p class="text-center" id="Sports_Activity">Sports Activity</p>
 
 <hr class="mt-5">
 
-<h2 class="pt-4"><p class="text-center" id="Vehicles">Vehicles</p></h2>
+<h2 class="pt-4"><p class="text-center" id="Vehicle">Vehicle</p></h2>
 <th style="text-align: center; vertical-align: top; padding: 10px;">
 	<video playsinline autoplay loop muted src="./assets/samples/489O6JiJ8Qk.14.mp4" style="width: 100%" type="video/mp4"></video>
 	<p class="responsive-text" style="font-family: Chalkduster; font-size: 16px; color: white">"A remote control monster truck is driving on rough terrain."</p>

diff --git a/splitting/README.md b/splitting/README.md
@@ -61,7 +61,7 @@ The code will split the videos listed in the `video_list.txt` and output the vid
     </tr>
 </table>
 
-<sup>**We will remove the video samples from our dataset / Github / project webpage as long as you need it. Please contact tsaishienchen at gmail dot com for the request.</sup>
+<sup>**We will remove the video samples from our dataset / Github / project webpage / technical presentation as long as you need it. Please contact tsaishienchen at gmail dot com for the request.</sup>
 
 ## Acknowledgements
 The code for video splitting is built upon [PySceneDetect](https://github.com/Breakthrough/PySceneDetect) and [ImageBind](https://github.com/facebookresearch/ImageBind).