-
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for parsing files under extracted/
#46
Conversation
bccda70
to
7e79e21
Compare
extracted/
extracted/
@BurntSushi Sorry for the nag, but can I ask for an update on this? If this is too large, I'd be happy to seperate out the portion that we actually need, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM.
Thanks so much for merging this. I really appreciate it, especially as I know things have been so busy for you. I'm glad rust-lang/rust#84056 is finally unblocked! 🎉 |
No problem and sorry it took so long! Incidentally, I didn't realize this was blocking work for std (although I now see you did link it in your initial comment, whoops). Is |
Yep, the unicode-table-generator tool used to make the standard library's unicode tables uses I also mentioned the std element a few times in our Zulip conversations. It's not a huge deal though, especially as it seems like the work this was blocking may not be a good idea anyway. |
@inquisitivecrystal Ah interesting. I bet some of those space saving tricks would be useful for
Ug sorry. Yeah, my brain has been a pile of mush for the past couple of years. I've only recently just started coming back up for air and getting more time to devote to projects. |
To elaborate a bit more on |
So what I'm trying to say is that if |
FWIW, the smallest tables I know of are in https://bellard.org/quickjs/'s libunicode, which manages to fit all boolean properties, general categories, scripts, and script extensions in around 40kb. Many of them require an unpacking step, but several can be modified to have an index (and the code for that is already in the repo). It's worth taking a look, the basic idea is just to use a chunked RLE on most of the tables. It's considerably smaller than the tables that libstd uses, but also slower, even with the index. |
Rust needs the ability to parse
extracted/DerivedNumericValues.txt
as part of rust-lang/rust#84056. This adds parsing support for that file and all the other files underextracted/
.