Refactored Command Batch Cursors. #1198

rozza · 2023-09-13T12:57:07Z

The construction and resource management of cursors has been simplified by reducing the number of constructors used when creating the cursor. The aim of this work was to simplify future refactorings eg CSOT work.

Unfortunately, it expanded into a bigger piece of work than initially thought as, I took the opportunity to unify the general approaches between sync and async cursors. (With the caveat that async cursors are self closing).

The core top level changes are:

Added CommandCursorResult which replaces QueryResult and is used by both async and sync.

Synchronous changes:

CommandBatchCursor - Replaces QueryBatchCursor and is used for any commands that return a cursor object. This class contains all the logic to manage resources and issue get more calls.
SingleBatchCursor - Used for any commands that return a single list of results but not an actual cursor. There already was an AsyncSingleBatchCursor so brings abstractions inline.

Asynchronous changes:

AsyncCommandBatchCursor - Replaces AsyncQueryBatchCursor and used for any commands that return a cursor object. This class contains all the logic to manage resources and issue get more calls. This was a bigger refactor and the class now it closely follows the logic of the synchronous SingleBatchCursor. The core difference being its self closing and uses a callback loop rather than a while loop when calling getMore.

The tests were updated so that both async and sync cursors are tested equally (previously async had much more comprehensive testing). This process discovered some subtle sync bugs where resources wouldn't be released during the error scenarios. So the PR includes some potential memory leak fixes.

JAVA-5159

Previously had a QueryResult / QueryBatchCursor abstraction. This abstraction is no longer required as only commands are used to create cursors. Two new classes have been added: 1. SingleBatchCursor Used when commands return a single list of results but not an actual cursor. 2. CommandBatchCursor Used for commands that return a cursor and contain all the logic to manage resources and issue get more calls. The construction and resource management has been simplified by reducing the number of constructors used when creating the cursor. This will simplify future refactorings. The asynchronous cursor abstractions have been refactored to more closely follow their synchronous counterparts. Reducing the cognative costs when working on both cursor types. JAVA-5159

rozza · 2023-09-18T09:27:58Z

driver-core/src/main/com/mongodb/internal/async/AsyncBatchCursor.java

-     * to this method will execute the callback with a null result to indicate that there are no more batches available and the cursor
-     * has been closed.
+     * Returns the next batch of results. A tailable cursor will block until another batch exists.
+     * Unlike the {@link BatchCursor} this method will automatically mark the cursor as closed when there are no more expected results.


The cursor no longer calls callback.onResult(null, null). As this class is internal and consumed internally there is no public API breakage or behavour change.

rozza · 2023-09-18T09:29:28Z

driver-core/src/main/com/mongodb/internal/operation/AsyncCommandBatchCursor.java

+                NO_OP_FIELD_NAME_VALIDATOR, ReadPreference.primary(),
+                CommandResultDocumentCodec.create(decoder, NEXT_BATCH), assertNotNull(resourceManager.connectionSource),
+                (commandResult, t) -> {
+                    if (t != null) {


I used multiple returns to make the code more readable and mimic the sync returns, even though we are using callbacks.

rozza · 2023-09-18T09:32:56Z

driver-core/src/main/com/mongodb/internal/operation/AsyncCommandBatchCursor.java

+import static java.util.Collections.emptyList;
+import static java.util.Collections.singletonList;
+
+class AsyncCommandBatchCursor<T> implements AsyncAggregateResponseBatchCursor<T> {


This follows the sync CommandBatchCursor approach and utilizes a ResourceManager. As such its very different to the previous AsyncQueryBatchCursor which contained more branching logic.

I hope that now working on sync or async cursors in the future will be much easier on the developer.

katcharov

Most of the PR is a large but fairly straightforward refactor, and looks good. Still in the process of reviewing the CommandBatchCursor classes, but wanted to see what you thought about consolidating the code further.

katcharov · 2023-09-25T19:44:22Z

driver-core/src/test/unit/com/mongodb/internal/operation/CommandBatchCursorSpecification.groovy

+
+class CommandBatchCursorSpecification extends Specification {
+
+    def 'should generate expected command with batchSize and maxTimeMS'() {


Convert tests to Java?

Looking into it.

katcharov · 2023-09-25T21:21:43Z

driver-core/src/main/com/mongodb/internal/operation/AsyncCommandBatchCursor.java

+    private final ResourceManager resourceManager;
+    private int batchSize;
+
+    AsyncCommandBatchCursor(


Could the implementation be arranged to be the same as the sync CommandBatchCursor? I am opening the two classes side-by-side in my IDE, and the order differs (for example, maxWireVersion is set earlier here).

It seems that many of the sections in these classes are effectively identical, and might be extracted (into a superclass, possibly using generics?). For example, the fields are almost all identical, and the State enums are the same.

(This may apply to other sync-async class and method pairs.)

It could be made the same, however, there is no async Iterator API (which is why Publishers / Flows were invented and subsequently added to the jdk).

For async the API would be very heavy / require nested callbacks for use:

hasNext(SingleResultCallback<Boolean>) next(SingleResultCallback<List<T>>)

Which can be replaced by just:

next(SingleResultCallback<List<T>>) isClosed()

Mongodb cursors are self describing and have reached the end of their results.

The other main difference it the while loop used for getMores works for sync but in async it becomes a nested callback loop.

That said there is no reason hasNext() could be added instead of isClosed(). I'll take a look and try to increase the code reuse.

I opted to leave as is, as the only consumer is internal (BatchCursorFlux) it has a check in recurse cursor:

if (batchCursor.isClosed()) { sink.complete(); } else { // Fetches more results.

driver-core/src/main/com/mongodb/internal/operation/AsyncCommandBatchCursor.java

JAVA-5159

rozza · 2023-09-29T15:03:37Z

Changing to draft while I investigate the build errors.

ConnectionSource and Connection also need to be retained pre kill cursors and released in the kill cursors callback

rozza · 2023-10-04T10:20:27Z

Ready for review again @katcharov

jyemin

I tried to review but got stuck with the difficulty of comparing CommandBatchCursor to QueryBatchCursor without the benefit of a diff.

Is this refactoring going to make CSOT significantly simpler? It's not clear to me that it will, but if so, we should proceed. If not I would prefer to defer it.

jyemin · 2023-10-06T13:15:30Z

driver-core/src/main/com/mongodb/internal/operation/SingleBatchCursor.java

+        this.hasNext = !batch.isEmpty();
+    }
+
+    public List<T> getBatch() {


This method is unused, even in tests. Let's remove it.

jyemin · 2023-10-06T13:16:46Z

driver-core/src/main/com/mongodb/internal/operation/ListIndexesOperation.java

-                    return rethrowIfNotNamespaceError(e, createEmptyBatchCursor(namespace, decoder,
-                            source.getServerDescription().getAddress(), batchSize));
+                    return rethrowIfNotNamespaceError(e,
+                            SingleBatchCursor.createEmptyBatchCursor(source.getServerDescription().getAddress(), batchSize));


We generally use static imports for spots like this.

Renamed the methods and done.

jyemin · 2023-10-06T13:17:25Z

driver-core/src/main/com/mongodb/internal/operation/SingleBatchCursor.java

+
+import static java.util.Collections.emptyList;
+
+class SingleBatchCursor<T> implements BatchCursor<T> {


Let's add a unit test for this class.

jyemin · 2023-10-06T13:24:48Z

driver-core/src/main/com/mongodb/internal/operation/CommandBatchCursor.java

+import static com.mongodb.internal.operation.OperationHelper.LOGGER;
+import static java.lang.String.format;
+
+class CommandBatchCursor<T> implements AggregateResponseBatchCursor<T> {


I'm concerned that, as this appears as a new class in the diff, it's difficult to compare with QueryBatchCursor. Can you share some details about how this code was created? Was it mostly an auto-refactoring? Are our existing tests good enough to ensure that no regressions were introduced?

Unfortunately, gits diff algorithm doesnt always get it right.

This was done in multiple parts

a simple rename.

Updated tests - so both async and sync have the same tests. This lead to some fixes for cases where resources weren't released in sync.

A refactor to make the ResourceManager abstract and reusable across sync and async as per previous pr feedback.

rozza · 2023-10-09T10:42:19Z

Will make CSOT simpler because its now clear where timeout context is needed.

It also brings the async and sync command cursors much more into line with lots of code reuse.
It also brings a single batch cursor to sync also bringing async / sync into line.

rozza marked this pull request as draft September 13, 2023 13:35

rozza force-pushed the JAVA-5159 branch 12 times, most recently from d3ec40e to 0da6c39 Compare September 18, 2023 09:04

rozza requested review from jyemin and a team September 18, 2023 09:25

rozza force-pushed the JAVA-5159 branch from 0da6c39 to 83065a5 Compare September 18, 2023 09:38

rozza force-pushed the JAVA-5159 branch from 83065a5 to 831bef1 Compare September 18, 2023 09:42

rozza commented Sep 18, 2023

View reviewed changes

rozza marked this pull request as ready for review September 18, 2023 09:43

katcharov requested changes Sep 25, 2023

View reviewed changes

rozza added 2 commits September 28, 2023 11:24

Use a volatile variable for CommandCursorResult instead of an atomic ref

aed0c33

Shared a CursorResourceManager between async and sync code

e19f946

JAVA-5159

rozza requested a review from katcharov September 29, 2023 13:29

Add Apache license to new file

66245ee

rozza marked this pull request as draft September 29, 2023 15:03

JAVAify the test

5ff0c8f

rozza force-pushed the JAVA-5159 branch from 604a017 to 5ff0c8f Compare October 3, 2023 10:19

Fix race condition

ade133c

ConnectionSource and Connection also need to be retained pre kill cursors and released in the kill cursors callback

rozza force-pushed the JAVA-5159 branch from 01d3f51 to ade133c Compare October 4, 2023 10:19

rozza marked this pull request as ready for review October 4, 2023 10:20

jyemin reviewed Oct 6, 2023

View reviewed changes

rozza added 2 commits October 9, 2023 11:34

Added SingleBatchCursor unit tests

e1331af

Use static method imports

dd440d1

Checkstyle fix

fd734ed

rozza closed this Oct 12, 2023

rozza deleted the JAVA-5159 branch October 12, 2023 08:26

rozza mentioned this pull request Oct 17, 2023

AsyncCommandBatchCursor now uses a ResourceManager rozza/mongo-java-driver#399

Closed

stIncMale mentioned this pull request Jul 8, 2024

Check resourceManager state first in getMoreLoop #1439

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactored Command Batch Cursors. #1198

Refactored Command Batch Cursors. #1198

rozza commented Sep 13, 2023 •

edited

Loading

rozza Sep 18, 2023

rozza Sep 18, 2023

rozza Sep 18, 2023

katcharov left a comment

katcharov Sep 25, 2023

rozza Sep 29, 2023

rozza Oct 4, 2023

katcharov Sep 25, 2023

rozza Sep 28, 2023

rozza Sep 28, 2023

rozza Sep 29, 2023

rozza commented Sep 29, 2023

rozza commented Oct 4, 2023

jyemin left a comment

jyemin Oct 6, 2023

rozza Oct 9, 2023

jyemin Oct 6, 2023

rozza Oct 9, 2023

jyemin Oct 6, 2023

rozza Oct 9, 2023

jyemin Oct 6, 2023

rozza Oct 9, 2023 •

edited

Loading

rozza commented Oct 9, 2023


		class CommandBatchCursorSpecification extends Specification {

		def 'should generate expected command with batchSize and maxTimeMS'() {


		import static java.util.Collections.emptyList;

		class SingleBatchCursor<T> implements BatchCursor<T> {

Refactored Command Batch Cursors. #1198

Refactored Command Batch Cursors. #1198

Conversation

rozza commented Sep 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

katcharov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rozza commented Sep 29, 2023

rozza commented Oct 4, 2023

jyemin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rozza Oct 9, 2023 • edited Loading

Choose a reason for hiding this comment

rozza commented Oct 9, 2023

rozza commented Sep 13, 2023 •

edited

Loading

rozza Oct 9, 2023 •

edited

Loading