-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improve][pip] PIP-302 Introduce refreshAsync API for TableView #21271
Conversation
In the context of utilizing the TableView component, there are instances where we aspire to consistently retrieve the most up-to-date value associated with a given key. To accomplish this, we can employ an API that allows us to wait until all data has been fully retrieved before accessing the value corresponding to the desired key. ### Modification Introduce a new API method called `readAllExistingMessages()`
This reverts commit 2f280d2.
@liangyepianzhou Please add the following content to your PR description and select a checkbox:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, users have to run view.refreshAsync().thenApply(__ -> view.get(key))
each time. If so, why not call readAllExistingMessages
each time before calling methods like get
? Then users only need to run view.get(key)
instead.
I see @merlimat's suggestion here and you respond: #21166 (comment)
With this proposal, we still cannot guarantee the "latest" value is retrieved. Just like you mentioned, this proposal only guarantees the value is latest at the checkpoint when
I disagree. Assume we added two options:
This pattern is similar to the POSIX
I think the main concern is that However, I think the view.refreshAsync().thenAccept(__ -> {
// Now, the TableView is the latest snapshot of the compacted topic
if (view.containsKey(key)) { // T1
process(view.get(key)); // T2
}
}); Assume at The root cause is that /**
* Get the latest snapshot for a set of keys at the current time point.
*
* @param keys
* @return a future of the map that represents the snapshot of the table view. The keys must be a subset of the
* `keys` parameter and the value is guaranteed to be the latest value before the future is completed.
*/
CompletableFuture<Map<String, T>> getLatestSnapshotAsync(Set<String> keys);
/**
* Get the latest snapshot at the current time point.
* @return a future of map that represents the snapshot of the table view.
*/
CompletableFuture<Map<String, T>> getLatestSnapshotAsync(); |
Here's a point to note: because there might be continuous message writing, we can't guarantee getting the latest value. If we provide users with the |
The current use case is that when reads and writes for a key do not occur at the same time, we can refresh the TableView next time to get the latest value of this key. As for the |
It provides a consistent view. For example, let's assume there are two keys that represents x and y that satisfy
With view.refreshAsync().thenAccept(__ -> { // now, x = 3 and y = 6
int x = view.get("x"); // x = 3
int y = view.get("y"); // y = 14, actually, at the moment, view.get("x") is 7
// If users want to process x and y here, they might find the difference is unexpected large With view.getLatestSnapshotAsync().thenAccept(snapshot -> {
int x = snapshot.get("x"); // x = 3, view.get("x") = 3, view.get("y") = 6
int y = snapshot.get("y"); // y = 6, even if now view.get("y") is 14
}); |
Because there is a system topic shared by all topics in this namespace. If we use Reader, then all topics may need to read the system topic from beginning to end when it loads. |
pip/pip-302.md
Outdated
```java | ||
@Override | ||
public CompletableFuture<Void> refreshAsync() { | ||
return reader.thenCompose(this::readAllExistingMessages); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a readTailMessages
loop at the background.readAllExistingMessages
could also call readNextAsync
. Is there any possible race?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a matter of specific implementation. We can optimize this during the implementation process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we only need to call hasMessageAvailableAsync
until the future returns false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not touch the very concrete implementation in the proposal. But we should illustrate "how" to implement it. You should remove the readAllExistingMessages
call here because nobody knows the method's semantics unless reading the source code. Instead, you should write the implementation like "waiting until there is more messages via hasMessageAvailable
" or "the returned future will be completed when there is no more message".
Yeah I just also thought of this point so I deleted my comment before.
If |
Yes, It will be clarified in the API comments. It should be noted that this is actually the greatest assurance we can provide right now. Because there might be continuous messages being sent, and the rate of sending messages might be faster than the rate of receiving messages, we actually can't guarantee to get the latest message. All we can do is help you refresh once and get the latest value at the current point in time. |
pip/pip-302.md
Outdated
/** | ||
* Triggers the reading of all existing messages from the topics, updates the TableView and waits for the read operation to complete. | ||
* This method fetches the last message of the topics at the point of invocation and updates the TableView with all messages up to and including this last message. | ||
* After the update is complete, users can use the TableView to obtain the latest value for any key. | ||
* Note that the 'latest' value refers to the value at the point of calling refresh, not necessarily the current latest if more messages have been produced in the meantime. | ||
* | ||
* @return a CompletableFuture that completes when all existing messages up to the point of invocation have been read, and the TableView has been updated. | ||
* | ||
* Example usage: | ||
* table.refreshAsync().thenApply(__ -> table.get(key)); | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** | |
* Triggers the reading of all existing messages from the topics, updates the TableView and waits for the read operation to complete. | |
* This method fetches the last message of the topics at the point of invocation and updates the TableView with all messages up to and including this last message. | |
* After the update is complete, users can use the TableView to obtain the latest value for any key. | |
* Note that the 'latest' value refers to the value at the point of calling refresh, not necessarily the current latest if more messages have been produced in the meantime. | |
* | |
* @return a CompletableFuture that completes when all existing messages up to the point of invocation have been read, and the TableView has been updated. | |
* | |
* Example usage: | |
* table.refreshAsync().thenApply(__ -> table.get(key)); | |
*/ | |
/** | |
* | |
* Refresh the table view with the latest data in the topic, ensuring that all subsequent reads are based on the refreshed data. | |
* | |
* Example usage: | |
* | |
* table.refreshAsync().thenApply(__ -> table.get(key)); | |
* | |
* This function retrieves the last written message in the topic and refreshes the table view accordingly. | |
* Once the refresh is complete, all subsequent reads will be performed on the refreshed data or a combination of the refreshed | |
* data and newly published data. The table view remains synchronized with any newly published data after the refresh. | |
* | |
* |x:0|->|y:0|->|z:0|->|x:1|->|z:1|->|x:2|->|y:1|->|y:2| | |
* | |
* If a read occurs after the refresh (at the last published message |y:2|), it ensures that outdated data like x=1 is not obtained. | |
* However, it does not guarantee that the values will always be x=2, y=2, z=1, as the table view may receive updates with newly | |
* published data. | |
* | |
* |x:0|->|y:0|->|z:0|->|x:1|->|z:1|->|x:2|->|y:1|->|y:2| ---> |y:3| | |
* | |
* Both y=2 or y=3 are possible. Therefore, different readers may receive different values, but all values will be equal to or newer | |
* than the data refreshed from the last call to the refresh method. | |
*/ |
I would like to provide another description of the method to make sure we can explain what we are guaranteed and what is not.
64f18d6
to
6accf88
Compare
Master #21271 ### Motivation The proposal will introduce a new API to refresh the table view with the latest written data on the topic, ensuring that all subsequent reads are based on the refreshed data.
Reopen #21166
Motivation
Prerequisite: Since messages are constantly being written into the Topic and there is no read-write lock guarantee, we cannot assure the retrieval of the most up-to-date value.
Implementation Goal: Record a checkpoint before reading and ensure the retrieval of the latest value of the key up to this checkpoint.
Use Case: When read and write operations for a certain key do not occur simultaneously, we can refresh the TableView before reading the key to obtain the latest value for this key.
Modification
Introduce a new API
refreshAsync
.Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: