Enhance the preview function to get more comprehensive summary #250

leeeizhang · 2024-10-16T13:26:54Z

csv preview [MRG] update the preview_csv_data function #249
json preview
yaml preview
directory preview [MRG] enhance list_files function #252

The text was updated successfully, but these errors were encountered:

huangyz0918 · 2024-10-18T00:58:05Z

In my priority list, previewing the directory is the most important thing. Assume there is a dir containing a large number of pictures (or videos, audios, etc), we cannot simply return all pictures' path (but we need a part of them), how to draw a clear file structure/pattern for LLMs meanwhile maintaining a acceptable (not scale) RAG information?

leeeizhang · 2024-10-18T16:32:13Z

In my priority list, previewing the directory is the most important thing. Assume there is a dir containing a large number of pictures (or videos, audios, etc), we cannot simply return all pictures' path (but we need a part of them), how to draw a clear file structure/pattern for LLMs meanwhile maintaining a acceptable (not scale) RAG information?

Yes, let us make it easy:

Returning all file paths is not feasible: Most of the datast files are usually named under a fixed pattern. So we do not need to list all of them. Instead, giving some examples (3 to 5 files) in a directory is enough.
How to solve the directory recursion: We should limit the recursive depth for a dataset path. For the deeper directories, we only show the number of them.

Some example:

dataset/ (2 directories, 3 files)
├── train/ (89 directories)
│   ├── idx-0001/ (1203 files)
│   │   ├── 0001-01.png
│   │   ├── 0001-02.png
│   │   ├── 0001-03.png
│   │   └── ...
│   ├── idx-0002/ (1499 files)
│   │   ├── 0002-01.png
│   │   ├── 0002-02.png
│   │   ├── 0002-03.png
│   │   └── ...
│   └── idx-0003/ (1098 files)
│       ├── 0003-01.png
│       ├── 0003-02.png
│       ├── 0003-03.png
│       └── ...
├── eval/ (89 directories)
│   ├── idx-0001/ (1203 files)
│   │   ├── 0001-01.png
│   │   ├── 0001-02.png
│   │   ├── 0001-03.png
│   │   └── ...
│   ├── idx-0002/ (1499 files)
│   │   ├── 0002-01.png
│   │   ├── 0002-02.png
│   │   ├── 0002-03.png
│   │   └── ...
│   └── idx-0003/ (1098 files)
│       ├── 0003-01.png
│       ├── 0003-02.png
│       ├── 0003-03.png
│       └── ...
├── raw.zip (32,029,129 bytes)
├── README.md (1203 lines)
└── checksum.json (1203 lines)

leeeizhang changed the title ~~Enhance the directory preview~~ Enhance the preview function to get more comprehensive summary Oct 16, 2024

leeeizhang self-assigned this Oct 16, 2024

dosubot bot added the enhancement New feature or request label Oct 16, 2024

huangyz0918 linked a pull request Oct 21, 2024 that will close this issue

[MRG] Improve data operation functions #253

Merged

4 tasks

huangyz0918 closed this as completed in #253 Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance the preview function to get more comprehensive summary #250

Enhance the preview function to get more comprehensive summary #250

leeeizhang commented Oct 16, 2024 •

edited

Loading

huangyz0918 commented Oct 18, 2024

leeeizhang commented Oct 18, 2024

Enhance the preview function to get more comprehensive summary #250

Enhance the preview function to get more comprehensive summary #250

Comments

leeeizhang commented Oct 16, 2024 • edited Loading

huangyz0918 commented Oct 18, 2024

leeeizhang commented Oct 18, 2024

leeeizhang commented Oct 16, 2024 •

edited

Loading