-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index is apparently limited to 4 GB #351
Comments
Oh yikes, I'm sorry you're running into that. It does look like this would involve supporting / moving to a 64-bit-based index file. That's not work we have slated, but I think a PR would be appreciated. We're actively running it on what I thought was a large repository, but it looks like the repo itself is only about 8 gigs. |
This seems to be an important issue. |
@rfan-debug it's likely something we'll have to do at some point. Are you interested in tackling it? |
I think giving it a fix is not difficult. However, I am not sure how to test it reliably if i change any code. It seems that we don't have sufficient integration test.. |
I think that's part of what makes this issue tricky. If you're willing to write unit or integration tests, I'd definitely welcome that as well. |
I think the unit test is sufficient for the current I skimmed over the codesearch. I found the root cause of the 4GB limit is from the data type Now i think a good way to build up the integration test set is:
|
I gave it a shot because I also thought that it would be straight forward but it's more difficult than expected. The biggest hurdle is that the index size is tightly bound to the max-size of an array/slice. So a 64bit-sized index couldn't directly be mapped to a I think a better approach would be to have the ability to have different backend implementations for the Index-Type. E.g. I could imagine that an implementation with an SQLite or bbolt backend would be quite easy and would automatically support very large index files. |
👋 Hound developers!
I am trying to index a pretty large repo (144GB - all current sources of openSUSE), and unsurprisingly the index turns out to be larger than 4GB, thus I hit this fatal message:
hound/codesearch/index/write.go
Line 561 in e3b1b43
Would it be possible / how hard would it be to support larger indexes?
I only had a brief look at
read.go
and it seems to me that 32 bit offsets are part of the index file format, so changing that would require re-indexing/converting/supporting two file formats, is that correct?Thanks for all your efforts on Hound!
The text was updated successfully, but these errors were encountered: