Skip to content
This repository has been archived by the owner on Nov 14, 2024. It is now read-only.

[PDS-350703 and Others | i like trains] Part 1: Conjure Undertow, and Flattening Subresource Locators #6518

Merged
merged 15 commits into from
Apr 17, 2023

Conversation

jeremyk-91
Copy link
Contributor

@jeremyk-91 jeremyk-91 commented Apr 11, 2023

NOTE: Ready for review, but I want to do some manual testing first! Also, I'll only have confidence that the internal product that had issues for a while will handle load better after the second part of this PR chain.

General

Before this PR:
We use Jersey sub-resource locators, which are not readily compatible with the conjure undertow generator. These incur a significant cost in resolution under certain versions of internal libraries.

After this PR:

==COMMIT_MSG==
We no longer use Jersey sub-resource locators for user-facing timestamp, timestamp management and lock v1 endpoints (except for the asynchronous TimeLock service).
==COMMIT_MSG==

Priority: Dev P0

Concerns / possible downsides (what feedback would you like?):

  • There is a lot of code, and some duplication between XService and XResource. I decided to go with this approach as new endpoints should not be defined here anyway (one should use conjure instead), so the damage should be contained.
  • The undertow services have slightly different semantics from the Jersey resource, because regexes are not supported by the conjure undertow handler. In particular, we used to only match things with [a-zA-Z0-9_-]+ except tl or lw. Now we don't: this could in theory introduce risks that the routing of something like /tl/timestamp/fresh-timestamp is ambiguous. However, if I'm not mistaken our internal server platform validates that there are no ambiguous path matches, and also we do actually still check that clients follow the rules (see TimelockNamespaces).
  • Some endpoints relied on the ability to provide a path parameter of the form /foo/{client: .*} to match both /foo and /foo/bar, which I don't think is really supported by the handler either. Since it was zero or one I decided to just write two methods for these cases, though an argument could be made to use the globbing functionality of the conjure undertow handler (though I didn't like that since it is zero or one).
  • I'm pretty sure this works given tests and also from the definition of the Jersey spec (in particular, where multiple resources are matched, path resolution is done against all of their methods at the same time), though it's not something I know super well so it's probably worth a check.

Is documentation needed?: No

Compatibility

Does this PR create any API breaks (e.g. at the Java or HTTP layers) - if so, do we have compatibility?: It shouldn't!

Does this PR change the persisted format of any data - if so, do we have forward and backward compatibility?: No

The code in this PR may be part of a blue-green deploy. Can upgrades from previous versions safely coexist? (Consider restarts of blue or green nodes.): Yes

Does this PR rely on statements being true about other products at a deployment - if so, do we have correct product dependencies on these products (or other ways of verifying that these statements are true)?: No

Does this PR need a schema migration?: No

Testing and Correctness

What, if any, assumptions are made about the current state of the world? If they change over time, how will we find out?: Nothing significant

What was existing testing like? What have you done to improve it?: I'm leveraging existing integration tests for Jersey. There is admittedly little testing for Undertow: some exists in the internal timelock repository, and I intend to execute manual validation once the second part of this PR chain (which attacks the asynchronous timelock service) is ready.

If this PR contains complex concurrent or asynchronous code, is it correct? The onus is on the PR writer to demonstrate this.: No

If this PR involves acquiring locks or other shared resources, how do we ensure that these are always released?: No

Execution

How would I tell this PR works in production? (Metrics, logs, etc.): Nothing breaks.

Has the safety of all log arguments been decided correctly?: Main thing is marking Namespaces as safe, which we already exfiltrate in other Conjure requests.

Will this change significantly affect our spending on metrics or logs?: No

How would I tell that this PR does not work in production? (monitors, etc.): 404s increasing. It's unsafe to have an all hours paging on this because of false positive potential. I can monitor by hand at first, or if we decide necessary we can add a low priority notification.

If this PR does not work as expected, how do I fix that state? Would rollback be straightforward?: Rollback

If the above plan is more complex than “recall and rollback”, please tag the support PoC here (if it is the end of the week, tag both the current and next PoC):

Scale

Would this PR be expected to pose a risk at scale? Think of the shopping product at our largest stack.: No

Would this PR be expected to perform a large number of database calls, and/or expensive database calls (e.g., row range scans, concurrent CAS)?: No

Would this PR ever, with time and scale, become the wrong thing to do - and if so, how would we know that we need to do something differently?: Well, eventually we remove Jersey altogether.

Development Process

Where should we start reviewing?: Lock V1

If this PR is in excess of 500 lines excluding versions lock-files, why does it not make sense to split it?: It is possible to split the lock resource into its own PR, but what needs to be checked is mechanically similar to the other resources (there is a lot of boilerplate).

Please tag any other people who should be aware of this PR:
@jeremyk-91
@sverma30
@raiju

@changelog-app
Copy link

changelog-app bot commented Apr 11, 2023

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

We no longer use Jersey sub-resource locators for user-facing timestamp, timestamp management and lock v1 endpoints (except for the asynchronous TimeLock service).

Check the box to generate changelog(s)

  • Generate changelog entry

import javax.ws.rs.QueryParam;
import javax.ws.rs.core.MediaType;

@Path("/{namespace: (?!(tl|lw)/)[a-zA-Z0-9_-]+}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignored by undertow, used for the jersey side only

@CheckReturnValue(when = When.NEVER)
@Handle(
method = HttpMethod.POST,
path = "/{namespace}/timestamp-management/fast-forward",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't seem right?? Should be ping

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and should be get, too

Copy link
Contributor Author

@jeremyk-91 jeremyk-91 Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh. 🤡

Yeah this is my bad - unfortunately this is only exercised by the undertow branch, and this endpoint is only used in the timelock migration... (we have some internal tests, but not the migration!)

Copy link
Contributor

@mdaudali mdaudali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly seems fine, couple comments.
I've gone through and checked all the annotations and endpoints, and attempted to verify that there's no difference between the old services and the new resources (including exceptions). Where they differ, I've left a comment.

I still need to verify what happens on ambiguous routing (although yeah I see that timelock namespaces bans the specific names)

Copy link
Contributor

@mdaudali mdaudali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants