Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend relative hash support to whole API v2 #7308

Merged
merged 12 commits into from
Aug 1, 2023

Conversation

adutra
Copy link
Contributor

@adutra adutra commented Jul 31, 2023

Fixes #7268.

Changes are dispatched in different commits for easier reviews.

Highlights:

  • A new component, org.projectnessie.services.hash.HashResolver is responsible for resolving all hashes. It replaces the namedRefWithHashOrThrow method previously in BaseApiImpl.
  • Most changes are in TreeApiImpl. The most difficult endpoints to adapt were assignReference and deleteReference because of their strict checking of the expected hash.
    • Note that because TreeApiImpl is shared between v1 and v2, some changes may affect v1's behavior (but not its API). Since v1 is deprecated at this point, I think this is acceptable.
  • The REST layer is designed in such a way that relative hashes are accepted even if they are not particularly useful, or even desired, in certain cases. This is by design to avoid returning cryptic validation errors to clients.
  • As a general rule, ambiguous hases are only accepted in reading operations (see below for examples).
  • A (hopefully) thorough set of tests can be found in org.projectnessie.jaxrs.tests.AbstractRelativeReferences.

Note on hash ambiguity: it is my understanding that, in order for a hash to be unambiguous (i.e. it will be resolved deterministically), it is enough for it to have an absolute part (aka start commit ID). This is valid also for hashes with timestamps, including those in the future, because no relative spec can be resolved to a child of the start commit, only its parents. Thus the following are all unambiguous:

  • 1234
  • 1234~1
  • 1234^1~2
  • 1234*1970-01-01T00:00:00.000Z
  • 1234*2050-01-01T00:00:00.000Z => will always resolve to 1234, assuming 1234 was committed before 2050.

The following however are considered ambiguous and thus rejected for all writing operations:

  • ~1
  • ^1~2
  • *1970-01-01T00:00:00.000Z
  • *2050-01-01T00:00:00.000Z

Given the above conclusions, this PR makes no special treatment for clock drifts.

AbstractThrowableAssert<?, ? extends Throwable> deleteConflict =
isV2()
? soft.assertThatThrownBy(() -> api().deleteBranch().branch(branch).getAndDelete())
: soft.assertThatThrownBy(() -> api().deleteBranch().branch(branch).delete());
deleteConflict
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only test that failed after the changes. It is testing an edge case: the branch is at c1, but the provided expected hash is c2.

Previously there was no hash resolution, so the endpoint would throw a CONFLICT error since c2 is not the expected hash.

With the new hash resolution in place, the endpoint now throws NOT_FOUND because c2 is not reachable from c1 at all.

I added another test case below that generates a CONFLICT as before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the new behavour here. In this case, the hash is valid, but it is not in the current commit chain of the reference. This is exactly the case when one client assigns the reference, while another client attempts to delete it using the old HEAD. I tend to think that a "conflict" is more appropriate from the high-level perspective.

We seems to have two general approaches. Read operations throw "not found" when the hash is not found on the branch. Write operations throw "conflict" when the "expected" hash does not match server-side expectations. The relative hash case kind of mixes both cases.

Since write operations now require a concrete base hash, could we make the resolution more lenient and defer the (final) exception throwing to the highest-level method that knows the operation's end-user semantics?

On the other hand, all non-trivial unambiguous relative hash references resolve to something other than HEAD. Perhaps, we could simplify the parsing of the expected hash parameter for assign/deleteReference requests and respond with a "bad request" if a relative spec is used. If the hash is absolute we could avoid checking it against the history, but do "conflict" by comparing it to current HEAD right away (also requires less backend work).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comments are 100% spot on. I hesitated a long time myself before picking one solution.

To summarize, we'd have 3 ways to move forward with this issue:

  1. Accept the behavior change: if c2 is not in main, then main@c2 should legitimately throw not-found.
  2. Use DETACHED to resolve relative hashes.
    a. Using DETACHED, VersionStore.hashOnReference() will simply lookup the commit but won't validate that it belongs to any branch.
  3. respond with a "bad request" if a relative spec is used in assignReference or deleteReference
    a. This would work too, but it might not solve the issue completely because instead of conflict, we would throw a bad-request error.

Regarding option 2: this was actually my first idea. It solves the issue at hand, and doesn't seem to have any concerning downsides. I gave up on it, thinking that conceptually, option 1 was better than 2... but now I am not sure anymore :-)

Let's try option 2 first and see how it goes. If it doesn't look good, then I will go with your suggestion (option 3).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimas-b Seems like CI is happy with option 2!

AbstractThrowableAssert<?, ? extends Throwable> deleteConflict =
isV2()
? soft.assertThatThrownBy(() -> api().deleteBranch().branch(branch).getAndDelete())
: soft.assertThatThrownBy(() -> api().deleteBranch().branch(branch).delete());
deleteConflict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the new behavour here. In this case, the hash is valid, but it is not in the current commit chain of the reference. This is exactly the case when one client assigns the reference, while another client attempts to delete it using the old HEAD. I tend to think that a "conflict" is more appropriate from the high-level perspective.

We seems to have two general approaches. Read operations throw "not found" when the hash is not found on the branch. Write operations throw "conflict" when the "expected" hash does not match server-side expectations. The relative hash case kind of mixes both cases.

Since write operations now require a concrete base hash, could we make the resolution more lenient and defer the (final) exception throwing to the highest-level method that knows the operation's end-user semantics?

On the other hand, all non-trivial unambiguous relative hash references resolve to something other than HEAD. Perhaps, we could simplify the parsing of the expected hash parameter for assign/deleteReference requests and respond with a "bad request" if a relative spec is used. If the hash is absolute we could avoid checking it against the history, but do "conflict" by comparing it to current HEAD right away (also requires less backend work).

@@ -82,7 +82,7 @@ public Namespace createNamespace(String refName, Namespace namespace)
throws NessieReferenceNotFoundException {
Preconditions.checkArgument(!namespace.isEmpty(), "Namespace name must not be empty");

WithHash<NamedRef> refWithHash = namedRefWithHashOrThrow(refName, null);
ResolvedHash refWithHash = getHashResolver().resolveToHead(refName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Namespaces endpoint is not supported in API v2, so relative hashes normally cannot make their way here. It's probably a moot point, but if there is an easier way to get HEAD, it might be worth using it in this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change here is purely cosmetic. getHashResolver().resolveToHead(refName) does exactly the same thing as namedRefWithHashOrThrow(refName, null);. But the new name is more meaningful imho. Do you see any other easier way to resolve the HEAD of a ref?

@adutra adutra force-pushed the relative-hash-support branch from b22da40 to f1392f7 Compare August 1, 2023 08:45
.hasMessageContaining(
String.format(
"Could not find commit '%s' in reference '%s'.", invalidHash, branch.getName()));
.hasMessageContaining(String.format("Commit '%s' not found", invalidHash));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimas-b FYI this is the only downside so far of resolving hashes against DETACHED: the error returned is still not-found but the message changes slightly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

CHANGELOG.md Outdated Show resolved Hide resolved
// we should only allow named references when no paging is defined
WithHash<NamedRef> endRef =
namedRefWithHashOrThrow(namedRef, null == pageToken ? youngestHash : pageToken);
// FIXME we should only allow named references when no paging is defined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH; I'm not sure that comment is still valid (aka: I think, it can just go away)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that too and came to the conclusion that it could still be relevant. If the named reference is reassigned between two pages to a completely different hash, paging could break I guess. Wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paging for commit-log uses a commit-hash passed around as the offset for the next page. It could be an issue when the reference is deleted, but then it's deleted, so fine to fail IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK let's remove the comment then.

@NotNull
@jakarta.validation.constraints.NotNull
@Size
@jakarta.validation.constraints.Size(min = 1)
List<String> getHashesToTransplant();
List<
@Pattern(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since HV is doing the job, I'm okay removing the check-method

import javax.annotation.Nullable;

@FunctionalInterface
public interface HashValidator {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks much better!

@Override
Hash getHash();

@Value.Derived
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@Value.Derived
@Value.NonAttribute

otherwise the value would be stored twice (in two fields): in WithHash and in this one.

// we should only allow named references when no paging is defined
WithHash<NamedRef> endRef =
namedRefWithHashOrThrow(namedRef, null == pageToken ? youngestHash : pageToken);
// FIXME we should only allow named references when no paging is defined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paging for commit-log uses a commit-hash passed around as the offset for the next page. It could be an issue when the reference is deleted, but then it's deleted, so fine to fail IMO.

Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@adutra adutra merged commit fac4ad0 into projectnessie:main Aug 1, 2023
@adutra adutra deleted the relative-hash-support branch August 1, 2023 16:43
adutra added a commit to adutra/nessie that referenced this pull request Aug 1, 2023
The assertion was modified in projectnessie#7308 with a workaround to
avoid compilation failures.

This commit reintroduces the right assertion, which became
possible when Iceberg updated Nessie to 0.61+.
adutra added a commit that referenced this pull request Aug 4, 2023
This commit removes the workaround introduced in
#7308.

The workaround is not required anymore, now that
this issue has been fixed:

quarkusio/quarkus#35104
snazy pushed a commit to snazy/nessie that referenced this pull request Sep 22, 2023
The assertion was modified in projectnessie#7308 with a workaround to
avoid compilation failures.

This commit reintroduces the right assertion, which became
possible when Iceberg updated Nessie to 0.61+.
snazy pushed a commit to snazy/nessie that referenced this pull request Sep 22, 2023
The assertion was modified in projectnessie#7308 with a workaround to
avoid compilation failures.

This commit reintroduces the right assertion, which became
possible when Iceberg updated Nessie to 0.61+.
snazy pushed a commit to snazy/nessie that referenced this pull request Sep 22, 2023
The assertion was modified in projectnessie#7308 with a workaround to
avoid compilation failures.

This commit reintroduces the right assertion, which became
possible when Iceberg updated Nessie to 0.61+.
snazy pushed a commit to snazy/nessie that referenced this pull request Sep 22, 2023
The assertion was modified in projectnessie#7308 with a workaround to
avoid compilation failures.

This commit reintroduces the right assertion, which became
possible when Iceberg updated Nessie to 0.61+.
snazy pushed a commit to snazy/nessie that referenced this pull request Sep 25, 2023
The assertion was modified in projectnessie#7308 with a workaround to
avoid compilation failures.

This commit reintroduces the right assertion, which became
possible when Iceberg updated Nessie to 0.61+.
snazy pushed a commit to snazy/nessie that referenced this pull request Sep 25, 2023
The assertion was modified in projectnessie#7308 with a workaround to
avoid compilation failures.

This commit reintroduces the right assertion, which became
possible when Iceberg updated Nessie to 0.61+.
snazy pushed a commit to snazy/nessie that referenced this pull request Sep 27, 2023
The assertion was modified in projectnessie#7308 with a workaround to
avoid compilation failures.

This commit reintroduces the right assertion, which became
possible when Iceberg updated Nessie to 0.61+.
snazy pushed a commit to snazy/nessie that referenced this pull request Sep 28, 2023
The assertion was modified in projectnessie#7308 with a workaround to
avoid compilation failures.

This commit reintroduces the right assertion, which became
possible when Iceberg updated Nessie to 0.61+.
snazy pushed a commit to snazy/nessie that referenced this pull request Oct 9, 2023
The assertion was modified in projectnessie#7308 with a workaround to
avoid compilation failures.

This commit reintroduces the right assertion, which became
possible when Iceberg updated Nessie to 0.61+.
snazy pushed a commit to snazy/nessie that referenced this pull request Oct 9, 2023
The assertion was modified in projectnessie#7308 with a workaround to
avoid compilation failures.

This commit reintroduces the right assertion, which became
possible when Iceberg updated Nessie to 0.61+.
snazy pushed a commit to snazy/nessie that referenced this pull request Oct 9, 2023
The assertion was modified in projectnessie#7308 with a workaround to
avoid compilation failures.

This commit reintroduces the right assertion, which became
possible when Iceberg updated Nessie to 0.61+.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support relative hashes across the entire API v2 surface
3 participants