-
Notifications
You must be signed in to change notification settings - Fork 15
[PDS-350703 and Others | i like trains] Part 1: Conjure Undertow, and Flattening Subresource Locators #6518
Conversation
Generate changelog in
|
timelock-impl/src/main/java/com/palantir/atlasdb/timelock/TimestampResource.java
Outdated
Show resolved
Hide resolved
import javax.ws.rs.QueryParam; | ||
import javax.ws.rs.core.MediaType; | ||
|
||
@Path("/{namespace: (?!(tl|lw)/)[a-zA-Z0-9_-]+}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ignored by undertow, used for the jersey side only
@CheckReturnValue(when = When.NEVER) | ||
@Handle( | ||
method = HttpMethod.POST, | ||
path = "/{namespace}/timestamp-management/fast-forward", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't seem right?? Should be ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and should be get, too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh. 🤡
Yeah this is my bad - unfortunately this is only exercised by the undertow branch, and this endpoint is only used in the timelock migration... (we have some internal tests, but not the migration!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly seems fine, couple comments.
I've gone through and checked all the annotations and endpoints, and attempted to verify that there's no difference between the old services and the new resources (including exceptions). Where they differ, I've left a comment.
I still need to verify what happens on ambiguous routing (although yeah I see that timelock namespaces bans the specific names)
timelock-impl/src/main/java/com/palantir/atlasdb/timelock/LockV1Resource.java
Outdated
Show resolved
Hide resolved
timelock-impl/src/main/java/com/palantir/atlasdb/timelock/LockV1Resource.java
Outdated
Show resolved
Hide resolved
timelock-impl/src/main/java/com/palantir/atlasdb/timelock/LockV1Resource.java
Outdated
Show resolved
Hide resolved
timelock-impl/src/main/java/com/palantir/atlasdb/timelock/LockV1Resource.java
Show resolved
Hide resolved
timelock-impl/src/main/java/com/palantir/atlasdb/timelock/TimestampManagementResource.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
NOTE: Ready for review, but I want to do some manual testing first! Also, I'll only have confidence that the internal product that had issues for a while will handle load better after the second part of this PR chain.
General
Before this PR:
We use Jersey sub-resource locators, which are not readily compatible with the conjure undertow generator. These incur a significant cost in resolution under certain versions of internal libraries.
After this PR:
==COMMIT_MSG==
We no longer use Jersey sub-resource locators for user-facing timestamp, timestamp management and lock v1 endpoints (except for the asynchronous TimeLock service).
==COMMIT_MSG==
Priority: Dev P0
Concerns / possible downsides (what feedback would you like?):
XService
andXResource
. I decided to go with this approach as new endpoints should not be defined here anyway (one should use conjure instead), so the damage should be contained.[a-zA-Z0-9_-]+
excepttl
orlw
. Now we don't: this could in theory introduce risks that the routing of something like/tl/timestamp/fresh-timestamp
is ambiguous. However, if I'm not mistaken our internal server platform validates that there are no ambiguous path matches, and also we do actually still check that clients follow the rules (seeTimelockNamespaces
)./foo/{client: .*}
to match both/foo
and/foo/bar
, which I don't think is really supported by the handler either. Since it was zero or one I decided to just write two methods for these cases, though an argument could be made to use the globbing functionality of the conjure undertow handler (though I didn't like that since it is zero or one).Is documentation needed?: No
Compatibility
Does this PR create any API breaks (e.g. at the Java or HTTP layers) - if so, do we have compatibility?: It shouldn't!
Does this PR change the persisted format of any data - if so, do we have forward and backward compatibility?: No
The code in this PR may be part of a blue-green deploy. Can upgrades from previous versions safely coexist? (Consider restarts of blue or green nodes.): Yes
Does this PR rely on statements being true about other products at a deployment - if so, do we have correct product dependencies on these products (or other ways of verifying that these statements are true)?: No
Does this PR need a schema migration?: No
Testing and Correctness
What, if any, assumptions are made about the current state of the world? If they change over time, how will we find out?: Nothing significant
What was existing testing like? What have you done to improve it?: I'm leveraging existing integration tests for Jersey. There is admittedly little testing for Undertow: some exists in the internal timelock repository, and I intend to execute manual validation once the second part of this PR chain (which attacks the asynchronous timelock service) is ready.
If this PR contains complex concurrent or asynchronous code, is it correct? The onus is on the PR writer to demonstrate this.: No
If this PR involves acquiring locks or other shared resources, how do we ensure that these are always released?: No
Execution
How would I tell this PR works in production? (Metrics, logs, etc.): Nothing breaks.
Has the safety of all log arguments been decided correctly?: Main thing is marking Namespaces as safe, which we already exfiltrate in other Conjure requests.
Will this change significantly affect our spending on metrics or logs?: No
How would I tell that this PR does not work in production? (monitors, etc.): 404s increasing. It's unsafe to have an all hours paging on this because of false positive potential. I can monitor by hand at first, or if we decide necessary we can add a low priority notification.
If this PR does not work as expected, how do I fix that state? Would rollback be straightforward?: Rollback
If the above plan is more complex than “recall and rollback”, please tag the support PoC here (if it is the end of the week, tag both the current and next PoC):
Scale
Would this PR be expected to pose a risk at scale? Think of the shopping product at our largest stack.: No
Would this PR be expected to perform a large number of database calls, and/or expensive database calls (e.g., row range scans, concurrent CAS)?: No
Would this PR ever, with time and scale, become the wrong thing to do - and if so, how would we know that we need to do something differently?: Well, eventually we remove Jersey altogether.
Development Process
Where should we start reviewing?: Lock V1
If this PR is in excess of 500 lines excluding versions lock-files, why does it not make sense to split it?: It is possible to split the lock resource into its own PR, but what needs to be checked is mechanically similar to the other resources (there is a lot of boilerplate).
Please tag any other people who should be aware of this PR:
@jeremyk-91
@sverma30
@raiju