Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance the preview function to get more comprehensive summary #250

Closed
2 of 4 tasks
leeeizhang opened this issue Oct 16, 2024 · 2 comments · Fixed by #253
Closed
2 of 4 tasks

Enhance the preview function to get more comprehensive summary #250

leeeizhang opened this issue Oct 16, 2024 · 2 comments · Fixed by #253
Assignees
Labels
enhancement New feature or request

Comments

@leeeizhang
Copy link
Collaborator

leeeizhang commented Oct 16, 2024

@leeeizhang leeeizhang changed the title Enhance the directory preview Enhance the preview function to get more comprehensive summary Oct 16, 2024
@leeeizhang leeeizhang self-assigned this Oct 16, 2024
@dosubot dosubot bot added the enhancement New feature or request label Oct 16, 2024
@huangyz0918
Copy link
Contributor

In my priority list, previewing the directory is the most important thing. Assume there is a dir containing a large number of pictures (or videos, audios, etc), we cannot simply return all pictures' path (but we need a part of them), how to draw a clear file structure/pattern for LLMs meanwhile maintaining a acceptable (not scale) RAG information?

@leeeizhang
Copy link
Collaborator Author

In my priority list, previewing the directory is the most important thing. Assume there is a dir containing a large number of pictures (or videos, audios, etc), we cannot simply return all pictures' path (but we need a part of them), how to draw a clear file structure/pattern for LLMs meanwhile maintaining a acceptable (not scale) RAG information?

Yes, let us make it easy:

  1. Returning all file paths is not feasible: Most of the datast files are usually named under a fixed pattern. So we do not need to list all of them. Instead, giving some examples (3 to 5 files) in a directory is enough.
  2. How to solve the directory recursion: We should limit the recursive depth for a dataset path. For the deeper directories, we only show the number of them.

Some example:

dataset/ (2 directories, 3 files)
├── train/ (89 directories)
│   ├── idx-0001/ (1203 files)
│   │   ├── 0001-01.png
│   │   ├── 0001-02.png
│   │   ├── 0001-03.png
│   │   └── ...
│   ├── idx-0002/ (1499 files)
│   │   ├── 0002-01.png
│   │   ├── 0002-02.png
│   │   ├── 0002-03.png
│   │   └── ...
│   └── idx-0003/ (1098 files)
│       ├── 0003-01.png
│       ├── 0003-02.png
│       ├── 0003-03.png
│       └── ...
├── eval/ (89 directories)
│   ├── idx-0001/ (1203 files)
│   │   ├── 0001-01.png
│   │   ├── 0001-02.png
│   │   ├── 0001-03.png
│   │   └── ...
│   ├── idx-0002/ (1499 files)
│   │   ├── 0002-01.png
│   │   ├── 0002-02.png
│   │   ├── 0002-03.png
│   │   └── ...
│   └── idx-0003/ (1098 files)
│       ├── 0003-01.png
│       ├── 0003-02.png
│       ├── 0003-03.png
│       └── ...
├── raw.zip (32,029,129 bytes)
├── README.md (1203 lines)
└── checksum.json (1203 lines)

@huangyz0918 huangyz0918 linked a pull request Oct 21, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants