Skip to content

Commit

Permalink
Merge pull request #38 from tsaishien-chen/main
Browse files Browse the repository at this point in the history
Fix stuck and meta_data not assigned bugs
  • Loading branch information
AliaksandrSiarohin authored Apr 1, 2024
2 parents 10dd549 + 8f70db4 commit 6ec1ca4
Show file tree
Hide file tree
Showing 6 changed files with 19 additions and 6 deletions.
6 changes: 5 additions & 1 deletion dataset_dataloading/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ video2dataset --url_list="<csv_file>" \
--clip_col="timestamp" \
--output_folder="<output_folder>" \
--save_additional_columns="[matching_score]" \
--config="video2dataset/video2dataset/configs/panda_70M.yaml"
--config="video2dataset/video2dataset/configs/panda70m.yaml"
```
### Known Issues
<table class="center">
Expand Down Expand Up @@ -62,6 +62,10 @@ video2dataset --url_list="<csv_file>" \
<td width=50% style="border: none; text-align: center">In the json file:<pre>"status": "failed_to_download" & "error_message":<br>"[Errno 2] No such file or directory: '/tmp/...'"</pre></td>
<td width=50% style="border: none; text-align: center">The YouTube video has been set to private or removed. Please skip this sample.</td>
</tr>
<tr style="line-height: 0">
<td width=50% style="border: none; text-align: center"><pre>YouTube: Skipping player responses from android clients<br>(got player responses for video ... instead of ...)</pre></td>
<td width=50% style="border: none; text-align: center">The latest version of yt-dlp will solve this issue. Please refer <a href="https://github.com/yt-dlp/yt-dlp/issues/9554">this issue</a> for more details.</td>
</tr>
</table>

### Dataset Format
Expand Down
1 change: 1 addition & 0 deletions dataset_dataloading/video2dataset/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,4 @@ Pillow
accelerate
bitsandbytes
scipy
portalocker
1 change: 1 addition & 0 deletions dataset_dataloading/video2dataset/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ def _read_reqs(relpath):
"video2dataset/configs/default.yaml",
"video2dataset/configs/downsample_ml.yaml",
"video2dataset/configs/optical_flow.yaml",
"video2dataset/configs/panda70m.yaml",
],
)
],
Expand Down
13 changes: 10 additions & 3 deletions dataset_dataloading/video2dataset/video2dataset/data_reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import io
import webvtt
import ffmpeg
import portalocker


def video2audio(video, audio_format, tmp_dir):
Expand Down Expand Up @@ -264,8 +265,14 @@ def __call__(self, row):

streams = {}
for modality, modality_path in modality_paths.items():
with open(modality_path, "rb") as modality_file:
streams[modality] = modality_file.read()
os.remove(modality_path)
try:
with portalocker.Lock(modality_path, 'rb', timeout=180) as locked_file:
streams[modality] = locked_file.read()
os.remove(modality_path)
except portalocker.exceptions.LockException:
print(f"Timeout occurred trying to lock the file: {modality_path}")
os.remove(modality_path)
except IOError as e:
print(f"Failed to delete the file: {modality_path}. Error: {e}")

return key, streams, meta_dict, error_message
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ def __call__(self, streams, metadata):
e_p = e

segment_times = ",".join([str(spl) for spl in splits])
streams_clips = {}
streams_clips, metadata_clips = {}, []

for k in streams.keys():
stream_bytes = streams[k][0] # pre-broadcast so only one
Expand Down Expand Up @@ -234,4 +234,4 @@ def __call__(self, streams, metadata):

streams_clips[k] = stream_clips

return streams_clips, metadata_clips, None
return streams_clips, metadata_clips, None

0 comments on commit 6ec1ca4

Please sign in to comment.