Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IndexInput isLoaded #13998

Merged
merged 11 commits into from
Nov 29, 2024
Merged

Add IndexInput isLoaded #13998

merged 11 commits into from
Nov 29, 2024

Conversation

ChrisHegarty
Copy link
Contributor

This commit adds IndexInput::isLoaded to help determine if the contents of an input is resident in physical memory.

The intent of this new method is to help build inspection and diagnostic infrastructure on top. The initial requirement is to help understand if vector data and more specifically the HNSW graph are in memory. For search use cases, performance drops off significantly if, at least, the graph is not resident. This is not a perfect API, more of a hint, but along with read advice like MADV_WILLNEED may be used to determine perf issues searching vectors

@ChrisHegarty ChrisHegarty requested a review from jpountz November 15, 2024 15:07
@navneet1v
Copy link
Contributor

@ChrisHegarty this will be a very useful thing. Can we also figure out how much data is loaded with this API? So lets say an IndexInput is 30GB and only 10GB is loaded/mapped in memory can return that too?

@ChrisHegarty
Copy link
Contributor Author

@ChrisHegarty this will be a very useful thing.

Indeed.

Can we also figure out how much data is loaded with this API? So lets say an IndexInput is 30GB and only 10GB is loaded/mapped in memory can return that too?

While possible, it's not straightforward and would require some native access. For now, let's go with the basic loaded / not-loaded, since this is useful as is.

@rmuir
Copy link
Member

rmuir commented Nov 18, 2024

You would need to call mincore or something yourself. I can't remember, but the native access may already be plumbed.

for non-mmapped i/o you can do similar with syscalls such as cachestat but you need modern linux kernel for that.

@ChrisHegarty
Copy link
Contributor Author

ChrisHegarty commented Nov 18, 2024

Yeah, we can look at how to call mincore, and it might not be that much of a lift with the existing plumbing. Maybe something can look at as a follow up? I'm really trying to get to a situation where we can load (MADV_WILLNEED), and check even the HNSW graph. Maybe even mlock, as a potential follow up. Since not having the graph in memory results in horrible perf (need to get some numbers).

@rmuir
Copy link
Member

rmuir commented Nov 18, 2024

Yeah, we can look at how to call mincore, and it might not be that much of a lift with the existing plumbing. Maybe something can look at as a follow up? I'm really trying to get to a situation where we can load (MADV_WILLNEED), and check even the HNSW graph. Maybe even mlock, as a potential follow up. Since not having the graph in memory results in horrible perf (need to get some numbers).

yes, agreed about mincore as a followup. Let's use existing JDK plumbing as a start as done here.

i'm very much against using mlock, there are so many problems with this. With an out of box linux system my ulimit for this is set to 8MB. I really don't think we should be mlocking gigabytes of vectors because the access is inefficient. It would be better to improve documentation, so that users avoid the typical mistakes such as setting too-big java heap (leaving no room for buffers/cache), configure swappiness if needed, etc. mlock will just make problems worse.

@rmuir
Copy link
Member

rmuir commented Nov 18, 2024

Also for debugging these issues, you can get this information at non-java level using fincore from util-linux, which is probably on any machine:

myindexdir$ fincore --output-all *
PAGES  SIZE FILE               RES DIRTY_PAGES DIRTY WRITEBACK_PAGES WRITEBACK EVICTED_PAGES EVICTED RECENTLY_EVICTED_PAGES RECENTLY_EVICTED
...

@ChrisHegarty
Copy link
Contributor Author

yes, agreed about mincore as a followup. Let's use existing JDK plumbing as a start as done here.

++

i'm very much against using mlock, there are so many problems with this. With an out of box linux system my ulimit for this is set to 8MB. I really don't think we should be mlocking gigabytes of vectors because the access is inefficient. It would be better to improve documentation, so that users avoid the typical mistakes such as setting too-big java heap (leaving no room for buffers/cache), configure swappiness if needed, etc. mlock will just make problems worse.

Yeah, optionally being able to mlock might just cause more problems than it solves, I'll need to play a bit more with it.

For now, I mostly wanna be able to:

  1. operationally indicate to load something into memory (MADV_WILLNEED), and
  2. verify that something is loaded (this PR).

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for me. Maybe implement this API on our in-memory index inputs to return true, e.g. ByteBuffersIndexInput?

* hint because the operating system may have paged out some of the data by the time this method
* returns. If the optional is true, then it's likely that the contents of this input are resident
* in physical memory. A value of false does not imply that the contents are not resident in
* physical memory. An empty optional is returned if it is not possible to determine.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like an empty optional and false mostly mean the same thing, which makes me wonder if this should return a boolean directly?

It may also be worth pointing out that this method runs in linear time with the amount of data that this IndexInput exposes (as opposed to constant-time). So it makes little sense to use it to do something like "if (isLoaded() == false) { prefetch(); }"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note about the time complexity. I'd like to keep the tri-state of the return type, at least for now. Since I think will be useful to know that the isLoaded-ness or not, is determinable or not.

@ChrisHegarty
Copy link
Contributor Author

This works for me. Maybe implement this API on our in-memory index inputs to return true, e.g. ByteBuffersIndexInput?

yeah, I think that this prob makes sense. Lemme satisfy myself that it will always be true.

@rmuir
Copy link
Member

rmuir commented Nov 22, 2024

yeah, I think that this prob makes sense. Lemme satisfy myself that it will always be true.

it won't be in core if currently swapped out, no? I don't think a hardcoded true works.

@ChrisHegarty
Copy link
Contributor Author

yeah, I think that this prob makes sense. Lemme satisfy myself that it will always be true.

it won't be in core if currently swapped out, no? I don't think a hardcoded true works.

Yeah. I was taking a little time to consider if it might be worth casting to MappedByteBuffer (for off-heap) to check for residency in physical memory. But I'm not really sure it's worth the effort, given the primary type of introspection that this API will be used - to determine if the random access nature of searching may falloff a performance cliff in production like environments!

@jpountz
Copy link
Contributor

jpountz commented Nov 22, 2024

Sorry for derailing the PR, let's not implement it on ByteBuffersIndexInput then. We can look into it in a separate PR if we want.

@ChrisHegarty ChrisHegarty merged commit 7dbbd0d into apache:main Nov 29, 2024
3 checks passed
@ChrisHegarty ChrisHegarty deleted the input_isLoaded branch November 29, 2024 10:28
ChrisHegarty added a commit that referenced this pull request Nov 29, 2024
This commit adds IndexInput::isLoaded to help determine if the contents of an input is resident in physical memory.

The intent of this new method is to help build inspection and diagnostic infrastructure on top.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants