-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend relative hash support to whole API v2 #7308
Conversation
AbstractThrowableAssert<?, ? extends Throwable> deleteConflict = | ||
isV2() | ||
? soft.assertThatThrownBy(() -> api().deleteBranch().branch(branch).getAndDelete()) | ||
: soft.assertThatThrownBy(() -> api().deleteBranch().branch(branch).delete()); | ||
deleteConflict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only test that failed after the changes. It is testing an edge case: the branch is at c1, but the provided expected hash is c2.
Previously there was no hash resolution, so the endpoint would throw a CONFLICT error since c2 is not the expected hash.
With the new hash resolution in place, the endpoint now throws NOT_FOUND because c2 is not reachable from c1 at all.
I added another test case below that generates a CONFLICT as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the new behavour here. In this case, the hash is valid, but it is not in the current commit chain of the reference. This is exactly the case when one client assigns the reference, while another client attempts to delete it using the old HEAD. I tend to think that a "conflict" is more appropriate from the high-level perspective.
We seems to have two general approaches. Read operations throw "not found" when the hash is not found on the branch. Write operations throw "conflict" when the "expected" hash does not match server-side expectations. The relative hash case kind of mixes both cases.
Since write operations now require a concrete base hash, could we make the resolution more lenient and defer the (final) exception throwing to the highest-level method that knows the operation's end-user semantics?
On the other hand, all non-trivial unambiguous relative hash references resolve to something other than HEAD. Perhaps, we could simplify the parsing of the expected hash parameter for assign/deleteReference requests and respond with a "bad request" if a relative spec is used. If the hash is absolute we could avoid checking it against the history, but do "conflict" by comparing it to current HEAD right away (also requires less backend work).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your comments are 100% spot on. I hesitated a long time myself before picking one solution.
To summarize, we'd have 3 ways to move forward with this issue:
- Accept the behavior change: if c2 is not in main, then
main@c2
should legitimately throw not-found. - Use
DETACHED
to resolve relative hashes.
a. UsingDETACHED
,VersionStore.hashOnReference()
will simply lookup the commit but won't validate that it belongs to any branch. - respond with a "bad request" if a relative spec is used in
assignReference
ordeleteReference
a. This would work too, but it might not solve the issue completely because instead of conflict, we would throw a bad-request error.
Regarding option 2: this was actually my first idea. It solves the issue at hand, and doesn't seem to have any concerning downsides. I gave up on it, thinking that conceptually, option 1 was better than 2... but now I am not sure anymore :-)
Let's try option 2 first and see how it goes. If it doesn't look good, then I will go with your suggestion (option 3).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dimas-b Seems like CI is happy with option 2!
...ts/src/main/java/org/projectnessie/spark/extensions/AbstractNessieSparkSqlExtensionTest.java
Show resolved
Hide resolved
AbstractThrowableAssert<?, ? extends Throwable> deleteConflict = | ||
isV2() | ||
? soft.assertThatThrownBy(() -> api().deleteBranch().branch(branch).getAndDelete()) | ||
: soft.assertThatThrownBy(() -> api().deleteBranch().branch(branch).delete()); | ||
deleteConflict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the new behavour here. In this case, the hash is valid, but it is not in the current commit chain of the reference. This is exactly the case when one client assigns the reference, while another client attempts to delete it using the old HEAD. I tend to think that a "conflict" is more appropriate from the high-level perspective.
We seems to have two general approaches. Read operations throw "not found" when the hash is not found on the branch. Write operations throw "conflict" when the "expected" hash does not match server-side expectations. The relative hash case kind of mixes both cases.
Since write operations now require a concrete base hash, could we make the resolution more lenient and defer the (final) exception throwing to the highest-level method that knows the operation's end-user semantics?
On the other hand, all non-trivial unambiguous relative hash references resolve to something other than HEAD. Perhaps, we could simplify the parsing of the expected hash parameter for assign/deleteReference requests and respond with a "bad request" if a relative spec is used. If the hash is absolute we could avoid checking it against the history, but do "conflict" by comparing it to current HEAD right away (also requires less backend work).
@@ -82,7 +82,7 @@ public Namespace createNamespace(String refName, Namespace namespace) | |||
throws NessieReferenceNotFoundException { | |||
Preconditions.checkArgument(!namespace.isEmpty(), "Namespace name must not be empty"); | |||
|
|||
WithHash<NamedRef> refWithHash = namedRefWithHashOrThrow(refName, null); | |||
ResolvedHash refWithHash = getHashResolver().resolveToHead(refName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Namespaces endpoint is not supported in API v2, so relative hashes normally cannot make their way here. It's probably a moot point, but if there is an easier way to get HEAD, it might be worth using it in this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change here is purely cosmetic. getHashResolver().resolveToHead(refName)
does exactly the same thing as namedRefWithHashOrThrow(refName, null);
. But the new name is more meaningful imho. Do you see any other easier way to resolve the HEAD of a ref?
b22da40
to
f1392f7
Compare
.hasMessageContaining( | ||
String.format( | ||
"Could not find commit '%s' in reference '%s'.", invalidHash, branch.getName())); | ||
.hasMessageContaining(String.format("Commit '%s' not found", invalidHash)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dimas-b FYI this is the only downside so far of resolving hashes against DETACHED: the error returned is still not-found but the message changes slightly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
...ts/src/main/java/org/projectnessie/spark/extensions/AbstractNessieSparkSqlExtensionTest.java
Show resolved
Hide resolved
servers/services/src/test/java/org/projectnessie/services/hash/TestParsedHash.java
Outdated
Show resolved
Hide resolved
servers/services/src/test/java/org/projectnessie/services/hash/TestParsedHash.java
Outdated
Show resolved
Hide resolved
...rs/quarkus-server/src/test/java/org/projectnessie/server/TestQuarkusRestInMemoryPersist.java
Show resolved
Hide resolved
servers/services/src/main/java/org/projectnessie/services/hash/HashResolver.java
Outdated
Show resolved
Hide resolved
// we should only allow named references when no paging is defined | ||
WithHash<NamedRef> endRef = | ||
namedRefWithHashOrThrow(namedRef, null == pageToken ? youngestHash : pageToken); | ||
// FIXME we should only allow named references when no paging is defined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH; I'm not sure that comment is still valid (aka: I think, it can just go away)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that too and came to the conclusion that it could still be relevant. If the named reference is reassigned between two pages to a completely different hash, paging could break I guess. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Paging for commit-log uses a commit-hash passed around as the offset for the next page. It could be an issue when the reference is deleted, but then it's deleted, so fine to fail IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK let's remove the comment then.
servers/services/src/main/java/org/projectnessie/services/hash/HashValidators.java
Outdated
Show resolved
Hide resolved
servers/services/src/main/java/org/projectnessie/services/hash/HashValidators.java
Outdated
Show resolved
Hide resolved
servers/services/src/main/java/org/projectnessie/services/hash/HashValidators.java
Outdated
Show resolved
Hide resolved
@NotNull | ||
@jakarta.validation.constraints.NotNull | ||
@Size | ||
@jakarta.validation.constraints.Size(min = 1) | ||
List<String> getHashesToTransplant(); | ||
List< | ||
@Pattern( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since HV is doing the job, I'm okay removing the check-method
import javax.annotation.Nullable; | ||
|
||
@FunctionalInterface | ||
public interface HashValidator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks much better!
@Override | ||
Hash getHash(); | ||
|
||
@Value.Derived |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Value.Derived | |
@Value.NonAttribute |
otherwise the value would be stored twice (in two fields): in WithHash
and in this one.
// we should only allow named references when no paging is defined | ||
WithHash<NamedRef> endRef = | ||
namedRefWithHashOrThrow(namedRef, null == pageToken ? youngestHash : pageToken); | ||
// FIXME we should only allow named references when no paging is defined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Paging for commit-log uses a commit-hash passed around as the offset for the next page. It could be an issue when the reference is deleted, but then it's deleted, so fine to fail IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks!
The assertion was modified in projectnessie#7308 with a workaround to avoid compilation failures. This commit reintroduces the right assertion, which became possible when Iceberg updated Nessie to 0.61+.
This commit removes the workaround introduced in #7308. The workaround is not required anymore, now that this issue has been fixed: quarkusio/quarkus#35104
The assertion was modified in projectnessie#7308 with a workaround to avoid compilation failures. This commit reintroduces the right assertion, which became possible when Iceberg updated Nessie to 0.61+.
The assertion was modified in projectnessie#7308 with a workaround to avoid compilation failures. This commit reintroduces the right assertion, which became possible when Iceberg updated Nessie to 0.61+.
The assertion was modified in projectnessie#7308 with a workaround to avoid compilation failures. This commit reintroduces the right assertion, which became possible when Iceberg updated Nessie to 0.61+.
The assertion was modified in projectnessie#7308 with a workaround to avoid compilation failures. This commit reintroduces the right assertion, which became possible when Iceberg updated Nessie to 0.61+.
The assertion was modified in projectnessie#7308 with a workaround to avoid compilation failures. This commit reintroduces the right assertion, which became possible when Iceberg updated Nessie to 0.61+.
The assertion was modified in projectnessie#7308 with a workaround to avoid compilation failures. This commit reintroduces the right assertion, which became possible when Iceberg updated Nessie to 0.61+.
The assertion was modified in projectnessie#7308 with a workaround to avoid compilation failures. This commit reintroduces the right assertion, which became possible when Iceberg updated Nessie to 0.61+.
The assertion was modified in projectnessie#7308 with a workaround to avoid compilation failures. This commit reintroduces the right assertion, which became possible when Iceberg updated Nessie to 0.61+.
The assertion was modified in projectnessie#7308 with a workaround to avoid compilation failures. This commit reintroduces the right assertion, which became possible when Iceberg updated Nessie to 0.61+.
The assertion was modified in projectnessie#7308 with a workaround to avoid compilation failures. This commit reintroduces the right assertion, which became possible when Iceberg updated Nessie to 0.61+.
The assertion was modified in projectnessie#7308 with a workaround to avoid compilation failures. This commit reintroduces the right assertion, which became possible when Iceberg updated Nessie to 0.61+.
Fixes #7268.
Changes are dispatched in different commits for easier reviews.
Highlights:
org.projectnessie.services.hash.HashResolver
is responsible for resolving all hashes. It replaces thenamedRefWithHashOrThrow
method previously inBaseApiImpl
.TreeApiImpl
. The most difficult endpoints to adapt wereassignReference
anddeleteReference
because of their strict checking of the expected hash.TreeApiImpl
is shared between v1 and v2, some changes may affect v1's behavior (but not its API). Since v1 is deprecated at this point, I think this is acceptable.org.projectnessie.jaxrs.tests.AbstractRelativeReferences
.Note on hash ambiguity: it is my understanding that, in order for a hash to be unambiguous (i.e. it will be resolved deterministically), it is enough for it to have an absolute part (aka start commit ID). This is valid also for hashes with timestamps, including those in the future, because no relative spec can be resolved to a child of the start commit, only its parents. Thus the following are all unambiguous:
1234
1234~1
1234^1~2
1234*1970-01-01T00:00:00.000Z
1234*2050-01-01T00:00:00.000Z
=> will always resolve to 1234, assuming 1234 was committed before 2050.The following however are considered ambiguous and thus rejected for all writing operations:
~1
^1~2
*1970-01-01T00:00:00.000Z
*2050-01-01T00:00:00.000Z
Given the above conclusions, this PR makes no special treatment for clock drifts.