-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Backport] Integrate the Incubating Panama Vector API #12311 #12327
Conversation
Leverage accelerated vector hardware instructions in Vector Search. Lucene already has a mechanism that enables the use of non-final JDK APIs, currently used for the Previewing Pamana Foreign API. This change expands this mechanism to include the Incubating Pamana Vector API. When the jdk.incubator.vector module is present at run time the Panamaized version of the low-level primitives used by Vector Search is enabled. If not present, the default scalar version of these low-level primitives is used (as it was previously). Currently, we're only targeting support for JDK 20. A subsequent PR should evaluate JDK 21. --------- Co-authored-by: Uwe Schindler <[email protected]> Co-authored-by: Robert Muir <[email protected]>
There are a couple of warnings showing up in the precommit checks. Not sure why I only see these here, but they should be relatively straightforward to fix (and will be required in main too)
|
Well there are two different types of warnings:
|
I wonder why you got this error in MemorySegmentIndexInput? This is a problem which was already solved log ago. After Java 11 a new method for longs was added which is not available in Java 11. |
Hmm.... I used 9x_branch. Did I use the wrong branch or something silly ? OR maybe I need to re-sync this 9x_branch. Looking ... |
I know the problem with MMaap. The Objects.checkIndex(long, long) was added in Java 16, so not available to Java 11 compiler. The API extract also needs to extract |
Here is the problem: https://github.com/apache/lucene/pull/12327/files#diff-1ff94b709f930416054d6a370251ac7376f9a4ec170b662b9ba12983f35ff1f1L51 The Glob pattern must be extended in the new extractor, then it should work. |
I don't know the problem with jdk.internal.VectorSupport. Why is that needed for compilation? Maybe we need another hack for the Java 11 compiler? |
There is definitely some internals "leaking" into the jdk20 api jar, because of this:
|
I will check this, you can easily do this by adding a System.out() in the loop that expands the "references" and adds them to classesToLoad. I am on it. It looks like some public signature or return value is indirectly referencing the internal class (maybe through superclass). In JDK 17 it exists already, but not in 11. So the extraction was incomplete. I will feedback soo. |
Here we see the issue with a println, it looks like VectorSupport is a superclass/interface of some of the Vector classes:
|
I think we should forward port this internal class special case to main branch, too. It is just working because JDK17 has this internal class ready. In total: It is not a good API design to make public classes refer to internal classes (yes it works, but it is a bit strange). |
Looks like we have to add |
It is in java.base, I will try something out. |
OK, I think I will open a PR on the main branch to fix the Extractor to:
So it needs a bit of refactoring, but to make porting easier, I will do it in separate PR on main branch. Let's keep this one open and later merge the main branch changes in. |
Another simpler approach is: As methods/constants/... from those module-private APIs are not visible at runtime, I tend to simply replace the superclass by "java/lang/Object" and remove all jdk/internal interfaces. Looks much better to me and cleans API JAR small. |
++ I was just thinking/trying same. |
This seems to work: .../generation/extract-jdk-apis/ExtractJdkApis.java | 11 +++++++++++
lucene/core/src/generated/jdk/jdk20.apijar | Bin 52851 -> 52874 bytes
2 files changed, 11 insertions(+)
diff --git a/gradle/generation/extract-jdk-apis/ExtractJdkApis.java b/gradle/generation/extract-jdk-apis/ExtractJdkApis.java
index be8741db366..9df612e89e4 100644
--- a/gradle/generation/extract-jdk-apis/ExtractJdkApis.java
+++ b/gradle/generation/extract-jdk-apis/ExtractJdkApis.java
@@ -101,6 +101,7 @@ public final class ExtractJdkApis {
// recursively add all superclasses / interfaces of visible classes to classesToInclude:
for (Set<String> a = classesToInclude; !a.isEmpty();) {
a = a.stream().map(references::get).filter(Objects::nonNull).flatMap(Arrays::stream).collect(Collectors.toSet());
+ System.out.println(a);
classesToInclude.addAll(a);
}
// remove all non-visible or not referenced classes:
@@ -122,6 +123,9 @@ public final class ExtractJdkApis {
static class Cleaner extends ClassVisitor {
private static final String PREVIEW_ANN = "jdk/internal/javac/PreviewFeature";
private static final String PREVIEW_ANN_DESCR = Type.getObjectType(PREVIEW_ANN).getDescriptor();
+
+ private static final String REPLACE_INTERNAL_PREFIX = "jdk/internal/";
+ private static final String OBJECT_CLASS = "java/lang/Object";
private final Set<String> classesToInclude;
private final Map<String, String[]> references;
@@ -134,7 +138,14 @@ public final class ExtractJdkApis {
@Override
public void visit(int version, int access, String name, String signature, String superName, String[] interfaces) {
+ // remove references to internal JDK classes:
+ if (superName.startsWith(REPLACE_INTERNAL_PREFIX)) {
+ superName = OBJECT_CLASS;
+ }
+ interfaces = Arrays.stream(interfaces).filter(c -> !c.startsWith(REPLACE_INTERNAL_PREFIX)).toArray(String[]::new);
+ // call classwriter
super.visit(Opcodes.V11, access, name, signature, superName, interfaces);
+ // update references to other classes (so we can figure out what is needed to be packaged in JAR):
if (isVisible(access)) {
classesToInclude.add(name);
}
diff --git a/lucene/core/src/generated/jdk/jdk20.apijar b/lucene/core/src/generated/jdk/jdk20.apijar
index de737b7eb75..45bde0c5ca2 100644
Binary files a/lucene/core/src/generated/jdk/jdk20.apijar and b/lucene/core/src/generated/jdk/jdk20.apijar differ |
OK, see #12329. I just check how to report the remaining "seen" classes not packaged. In ideal world it should only contain public JDK classes, but we should report them to make debugging easier in future. |
Just for reference, this is the problematic dependency: |
Yes, I looked at the source code, too. I have no idea why it was done like that, looks strange to me. It only works because the public members of that class are not visible to the outside, although inherited (which is another crazy thing in the module system). |
Thanks, please wait with merging as I don't want to have all the useless classes in the JAR (@rmuir is very picky about that). And we need to fix it for main, too, because there could be incompatible signatures in later java versions. |
Don't worry, I'll not merge without your explicit LGTM. Take your time, I just wanted to get the tests running, etc. |
…ces during extraction (apache#12329)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine. I also tested it locally:
- Regenerating the apijars worked without problem
- Compilation worked
…dc8ca633e8bcf`) (#20) * Add next minor version 9.7.0 * Fix SynonymQuery equals implementation (apache#12260) The term member of TermAndBoost used to be a Term instance and became a BytesRef with apache#11941, which means its equals impl won't take the field name into account. The SynonymQuery equals impl needs to be updated accordingly to take the field into account as well, otherwise synonym queries with same term and boost across different fields are equal which is a bug. * Fix MMapDirectory documentation for Java 20 (apache#12265) * Don't generate stacktrace in CollectionTerminatedException (apache#12270) CollectionTerminatedException is always caught and never exposed to users so there's no point in filling in a stack-trace for it. * add missing changelog entry for apache#12260 * Add missing author to changelog entry for apache#12220 * Make query timeout members final in ExitableDirectoryReader (apache#12274) There's a couple of places in the Exitable wrapper classes where queryTimeout is set within the constructor and never modified. This commit makes such members final. * Update javadocs for QueryTimeout (apache#12272) QueryTimeout was introduced together with ExitableDirectoryReader but is now also optionally set to the IndexSearcher to wrap the bulk scorer with a TimeLimitingBulkScorer. Its javadocs needs updating. * Make TimeExceededException members final (apache#12271) TimeExceededException has three members that are set within its constructor and never modified. They can be made final. * DOAP changes for release 9.6.0 * Add back-compat indices for 9.6.0 * `ToParentBlockJoinQuery` Explain Support Score Mode (apache#12245) (apache#12283) * `ToParentBlockJoinQuery` Explain Support Score Mode --------- Co-authored-by: Marcus <[email protected]> * Simplify SliceExecutor and QueueSizeBasedExecutor (apache#12285) The only behaviour that QueueSizeBasedExecutor overrides from SliceExecutor is when to execute on the caller thread. There is no need to override the whole invokeAll method for that. Instead, this commit introduces a shouldExecuteOnCallerThread method that can be overridden. * [Backport] GITHUB-11838 Add api to allow concurrent query rewrite (apache#12197) * GITHUB-11838 Change API to allow concurrent query rewrite (apache#11840) Replace Query#rewrite(IndexReader) with Query#rewrite(IndexSearcher) Co-authored-by: Patrick Zhai <[email protected]> Co-authored-by: Adrien Grand <[email protected]> Backport of apache#11840 Changes from original: - Query keeps `rewrite(IndexReader)`, but it is now deprecated - VirtualMethod is used to correct delegate to the overridden methods - The changes to `RewriteMethod` type classes are reverted, this increased the backwards compatibility impact. ------------------------------ ### Description Issue: apache#11838 #### Updated Proposal * Change signature of rewrite to `rewrite(IndexSearcher)` * How did I migrate the usage: * Use Intellij to do preliminary refactoring for me * For test usage, use searcher whenever is available, otherwise create one using `newSearcher(reader)` * For very few non-test classes which doesn't have IndexSearcher available but called rewrite, create a searcher using `new IndexSearcher(reader)`, tried my best to avoid creating it recurrently (Especially in `FieldQuery`) * For queries who have implemented the rewrite and uses some part of reader's functionality, use shortcut method when possible, otherwise pull out the reader from indexSearcher. * Backport: Concurrent rewrite for KnnVectorQuery (apache#12160) (apache#12288) * Concurrent rewrite for KnnVectorQuery (apache#12160) - Reduce overhead of non-concurrent search by preserving original execution - Improve readability by factoring into separate functions --------- Co-authored-by: Kaival Parikh <[email protected]> * adjusting for backport --------- Co-authored-by: Kaival Parikh <[email protected]> Co-authored-by: Kaival Parikh <[email protected]> * toposort use iterator to avoid stackoverflow (apache#12286) Co-authored-by: tangdonghai <[email protected]> # Conflicts: # lucene/CHANGES.txt * Fix test to compile with Java 11 after backport of apache#12286 * Update Javadoc for topoSortStates method after apache#12286 (apache#12292) * Optimize HNSW diversity calculation (apache#12235) * Minor cleanup and improvements to DaciukMihovAutomatonBuilder (apache#12305) * GITHUB-12291: Skip blank lines from stopwords list. (apache#12299) * Wrap Query rewrite backwards layer with AccessController (apache#12308) * Make sure APIJAR reproduces with different timezone (unfortunately java encodes the date using local timezone) (apache#12315) * Add multi-thread searchability to OnHeapHnswGraph (apache#12257) * Fix backport error * [MINOR] Update javadoc in Query class (apache#12233) - add a few missing full stops - update wording in the description of Query#equals method * [Backport] Integrate the Incubating Panama Vector API apache#12311 (apache#12327) Leverage accelerated vector hardware instructions in Vector Search. Lucene already has a mechanism that enables the use of non-final JDK APIs, currently used for the Previewing Pamana Foreign API. This change expands this mechanism to include the Incubating Pamana Vector API. When the jdk.incubator.vector module is present at run time the Panamaized version of the low-level primitives used by Vector Search is enabled. If not present, the default scalar version of these low-level primitives is used (as it was previously). Currently, we're only targeting support for JDK 20. A subsequent PR should evaluate JDK 21. --------- Co-authored-by: Uwe Schindler <[email protected]> Co-authored-by: Robert Muir <[email protected]> * Parallelize knn query rewrite across slices rather than segments (apache#12325) The concurrent query rewrite for knn vectory query introduced with apache#12160 requests one thread per segment to the executor. To align this with the IndexSearcher parallel behaviour, we should rather parallelize across slices. Also, we can reuse the same slice executor instance that the index searcher already holds, in that way we are using a QueueSizeBasedExecutor when a thread pool executor is provided. * Optimize ConjunctionDISI.createConjunction (apache#12328) This method is showing up as a little hot when profiling some queries. Almost all the time spent in this method is just burnt on ceremony around stream indirections that don't inline. Moving this to iterators, simplifying the check for same doc id and also saving one iteration (for the min cost) makes this method far cheaper and easier to read. * Update changes to be correct with ARM (it is called NEON there) * GH#12321: Marked DaciukMihovAutomatonBuilder as deprecated (apache#12332) Preparing to reduce visibility of this class in a future release * add BitSet.clear() (apache#12268) # Conflicts: # lucene/CHANGES.txt * Clenaup and update changes and synchronize with 9.x * Update TestVectorUtilProviders.java (apache#12338) * Don't generate stacktrace for TimeExceededException (apache#12335) The exception is package private and never rethrown, we can avoid generating a stacktrace for it. * Introduced the Word2VecSynonymFilter (apache#12169) Co-authored-by: Alessandro Benedetti <[email protected]> * Word2VecSynonymFilter constructor null check (apache#12169) * Use thread-safe search version of HnswGraphSearcher (apache#12246) Addressing comment received in the PR apache#12246 * Word2VecSynonymProvider to use standard Integer max value for hnsw searches (apache#12235) We observed this change was not ported previously from main in an old cherry-pick * Fix searchafter high latency when after value is out of range for segment (apache#12334) * Make memory fence in `ByteBufferGuard` explicit (apache#12290) * Add "direct to binary" option for DaciukMihovAutomatonBuilder and use it in TermInSetQuery#visit (apache#12320) * Add updateDocuments API which accept a query (reopen) (apache#12346) * GITHUB#11350: Handle backward compatibility when merging segments with different FieldInfo This commits restores Lucene 9's ability to handle indices created with Lucene 8 where there are discrepancies in FieldInfos, such as different IndexOptions * [Tessellator] Improve the checks that validate the diagonal between two polygon nodes (apache#12353) # Conflicts: # lucene/CHANGES.txt * feat: soft delete optimize (apache#12339) * Better paging when random reads go backwards (apache#12357) When reading data from outside the buffer, BufferedIndexInput always resets its buffer to start at the new read position. If we are reading backwards (for example, using an OffHeapFSTStore for a terms dictionary) then this can have the effect of re-reading the same data over and over again. This commit changes BufferedIndexInput to use paging when reading backwards, so that if we ask for a byte that is before the current buffer, we read a block of data of bufferSize that ends at the previous buffer start. Fixes apache#12356 * Work around SecurityManager issues during initialization of vector api (JDK-8309727) (apache#12362) * Restrict GraphTokenStreamFiniteStrings#articulationPointsRecurse recursion depth (apache#12249) * Implement MMapDirectory with Java 21 Project Panama Preview API (apache#12294) Backport incl JDK21 apijar file with java.util.Objects regenerated * remove relic in apijar folder caused by vector additions * Speed up IndexedDISI Sparse #AdvanceExactWithinBlock for tiny step advance (apache#12324) * Add checks in KNNVectorField / KNNVectorQuery to only allow non-null, non-empty and finite vectors (apache#12281) --------- Co-authored-by: Uwe Schindler <[email protected]> * Implement VectorUtilProvider with Java 21 Project Panama Vector API (apache#12363) (apache#12365) This commit enables the Panama Vector API for Java 21. The version of VectorUtilPanamaProvider for Java 21 is identical to that of Java 20. As such, there is no specific 21 version - the Java 20 version will be loaded from the MRJAR. * Add CHANGES.txt for apache#12334 Honor after value for skipping documents even if queue is not full for PagingFieldCollector (apache#12368) Signed-off-by: gashutos <[email protected]> * Move TermAndBoost back to its original location. (apache#12366) PR apache#12169 accidentally moved the `TermAndBoost` class to a different location, which would break custom sub-classes of `QueryBuilder`. This commit moves it back to its original location. * GITHUB-12252: Add function queries for computing similarity scores between knn vectors (apache#12253) Co-authored-by: Alessandro Benedetti <[email protected]> * hunspell (minor): reduce allocations when processing compound rules (apache#12316) (cherry picked from commit a454388) * hunspell (minor): reduce allocations when reading the dictionary's morphological data (apache#12323) there can be many entries with morph data, so we'd better avoid compiling and matching regexes and even stream allocation (cherry picked from commit 4bf1b94) * TestHunspell: reduce the flakiness probability (apache#12351) * TestHunspell: reduce the flakiness probability We need to check how the timeout interacts with custom exception-throwing checkCanceled. The default timeout seems not enough for some CI agents, so let's increase it. Co-authored-by: Dawid Weiss <[email protected]> (cherry picked from commit 5b63a18) * This allows VectorUtilProvider tests to be executed although hardware may not fully support vectorization or if C2 is not enabled (apache#12376) --------- Signed-off-by: gashutos <[email protected]> Co-authored-by: Alan Woodward <[email protected]> Co-authored-by: Luca Cavanna <[email protected]> Co-authored-by: Uwe Schindler <[email protected]> Co-authored-by: Armin Braun <[email protected]> Co-authored-by: Mikhail Khludnev <[email protected]> Co-authored-by: Marcus <[email protected]> Co-authored-by: Benjamin Trent <[email protected]> Co-authored-by: Kaival Parikh <[email protected]> Co-authored-by: Kaival Parikh <[email protected]> Co-authored-by: tang donghai <[email protected]> Co-authored-by: Patrick Zhai <[email protected]> Co-authored-by: Greg Miller <[email protected]> Co-authored-by: Jerry Chin <[email protected]> Co-authored-by: Patrick Zhai <[email protected]> Co-authored-by: Andrey Bozhko <[email protected]> Co-authored-by: Chris Hegarty <[email protected]> Co-authored-by: Robert Muir <[email protected]> Co-authored-by: Jonathan Ellis <[email protected]> Co-authored-by: Daniele Antuzi <[email protected]> Co-authored-by: Alessandro Benedetti <[email protected]> Co-authored-by: Chaitanya Gohel <[email protected]> Co-authored-by: Petr Portnov | PROgrm_JARvis <[email protected]> Co-authored-by: Tomas Eduardo Fernandez Lobbe <[email protected]> Co-authored-by: Ignacio Vera <[email protected]> Co-authored-by: fudongying <[email protected]> Co-authored-by: Chris Fournier <[email protected]> Co-authored-by: gf2121 <[email protected]> Co-authored-by: Adrien Grand <[email protected]> Co-authored-by: Elia Porciani <[email protected]> Co-authored-by: Peter Gromov <[email protected]>
Backport of #12311
Leverage accelerated vector hardware instructions in Vector Search.
Lucene already has a mechanism that enables the use of non-final JDK APIs, currently used for the Previewing Pamana Foreign API. This change expands this mechanism to include the Incubating Pamana Vector API. When the jdk.incubator.vector module is present at run time the Panamaized version of the low-level primitives used by Vector Search is enabled. If not present, the default scalar version of these low-level primitives is used (as it was previously).
Currently, we're only targeting support for JDK 20. A subsequent PR should evaluate JDK 21, which is still in development.