Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalable id storage #3292

Open
davidlehn opened this issue Apr 28, 2023 · 4 comments
Open

Scalable id storage #3292

davidlehn opened this issue Apr 28, 2023 · 4 comments
Labels
project General project issues.

Comments

@davidlehn
Copy link
Collaborator

The current number of top-level ids is ... 1000+? It can be a bit difficult to navigate and manage in the GitHub UI and console. We may need to explore solutions to this. At some point a database would be needed, but let's not go there yet. A short term solution would be to shard the ids into subdirs. A top-level .htaccess file could likely be constructed that would internally redirect to a subdir based on the first path letter. That would likely work and cut down max ids per dir significantly. But it could cause user confusion.

@davidlehn davidlehn added the project General project issues. label Apr 28, 2023
@Ostrzyciel
Copy link
Contributor

I'd suggest splitting the root directory by the first letter of the path, as you suggested. It's a pretty common pattern in file-based storage systems (I remember that e.g., MediaWiki uses it for image storage), so it shouldn't be too confusing to people...

The current layout does slow down the GH UI considerably. My VS Code still holds up, but it's hard to navigate such a long list of directories.

@TallTed
Copy link
Contributor

TallTed commented Sep 27, 2024

I suggest also changing the title of this issue, to "toward more scalable perma-id storage", as we're not yet discussing implementing a fully scalable solution (i.e., database), just a stop-gap (i.e., manual directory sharding).

@davidlehn
Copy link
Collaborator Author

Need to consider the case handling of some file systems. How confusing would it be to put all A* and a* ids in a? Sharding into both A and a would cause trouble. But the A* people will no doubt be confused using a. Not sure what a less confusing pattern would be.

I think it would be fun to use hex sharding! Would directories like 0x41 and 0x61 cause trouble? :-)

@Ostrzyciel
Copy link
Contributor

A and a are the same on Windows, so I guess that's out of the question. You could name the directories like aA, bB, etc. It's all a matter of taste, I guess...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
project General project issues.
Projects
None yet
Development

No branches or pull requests

3 participants