-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strip '://' suffix from remote_write scheme #439
Conversation
@@ -802,7 +802,7 @@ def __init__( | |||
self._charm = charm | |||
self.tool = CosTool(self._charm) | |||
self._relation_name = relation_name | |||
self._endpoint_schema = endpoint_schema | |||
self._endpoint_scheme = endpoint_schema.rstrip("://") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still leaves a lot of "unsanitized input" ways in which a user could break it. Can we be more flexible, like here?
self._endpoint_scheme = endpoint_schema.rstrip("://") | |
if re.match(r'^\w+\W+', endpoint_schema): | |
logger.warning("The provided endpoint schema should be a plain URI scheme with no trailing characters: %s", endpoint_schema) | |
endpoint_schema = re.sub(r'^(\w+)\W+)', r'\1', endpoint_schema) | |
self._endpoint_scheme = endpoint_schema.rstrip("://") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still leaves a lot of "unsanitized input" ways in which a user could break it. Can we be more flexible, like here?
Suggested change
self._endpoint_scheme = endpoint_schema.rstrip("://") if re.match(r'^\w+\W+', endpoint_schema): logger.warning("The provided endpoint schema should be a plain URI scheme with no trailing characters: %s", endpoint_schema) endpoint_schema = re.sub(r'^(\w+)\W+)', r'\1', endpoint_schema) self._endpoint_scheme = endpoint_schema.rstrip("://")
This will return:
error: unbalanced parenthesis at position 9
Seems the code should be:
self._endpoint_scheme = endpoint_schema.rstrip("://") | |
if re.match(r'^\w+\W+', endpoint_schema): | |
logger.warning("The provided endpoint schema should be a plain URI scheme with no trailing characters: %s", endpoint_schema) | |
endpoint_schema = ```suggestion | |
if re.match(r'^\w+\W+', endpoint_schema): | |
logger.warning("The provided endpoint schema should be a plain URI scheme with no trailing characters: %s", endpoint_schema) | |
endpoint_schema = re.sub(r'^((\w+)\W+)', r'\1', endpoint_schema) | |
self._endpoint_scheme = endpoint_schema.rstrip("://") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right but the extra )
was just trailling. ^((\w+)\W+)
won't do as expected (the outermost group will be \1
, and the alphanumeric would be \2
).
Should be:
re.sub(r'^(\w+)\W+.*', r'\1', endpoint_schema)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is remote-write. The only two possible options are http and https, no?
self._endpoint_scheme = endpoint_schema.strip().rstrip("://")
if self._endpoint_scheme not in ("http", "https"):
logger.warning("...")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably logger.error
, because it's not going to work.
Honestly, we should probably throw an event back and put it into BlockedStatus
, or straight-up throw an exception to block the charm. Users/authors can miss logger.warning
, but "whoops, I put the wrong string in and now it doesn't work" is a bad user experience.
You could say "the only possible options are http and https", but you also clearly managed to pass in "http://"
, so we need to sanitize at least a little. From the other side of it, Prometheus itself doesn't actually enforce that it's https(s). Only that it can be unmarshalled (deserialized) into a url.URL
, which s not nearly that prescriptive.
Do we want to support only http(s)? Maybe. But I'd personally say that it's better to be permissive at the moment because we just... don't know. Mimir will exercise remote write a lot more than we currently are. Thanos already has a gRPC endpoint, and very well may have one also. That also means that, in theory, proxyless gRPC meshing via xds://grpc-endpoint:1234
maybe be valid, depending on what the consumer is.
If we leave it as permissive as the actual Prometheus code, and say "as long as you gave me a schema, I'll try it, and watch the logs." We could exclude cases later if they're problematic or common. If we make it extremely permissive (https/https only), any user who wants to use gRPC meshing may find that it "just works". Including, maybe Kubeflow with Istio.
I don't think we know the problem domain enough to be that prescriptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
However I do not like the \w\W
one because it trims presumptuously:
>>> re.sub(r'^(\w+)\W+.*', r'\1', "one://two://")
'one'
>>> re.sub(r'^(\w+)\W+.*', r'\1', "one#two")
'one'
Changing to the following. Let's tackle the rest in a separate issue.
>>> sanitized = "one#two".strip().rstrip("://")
>>> re.match("^\w+$", sanitized)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not presumptuous. #
cannot be in a URI schema. If you want to be explicit, it's [A-Za-z0-9+-.]
(+
and .
would need to be escaped in a regexp), but +
and .
and -
are incredibly uncommon and more or less isolated to handlers for MIME types in internal applications (like MS Office embedding IE and using them for internal links).
one#two
is not valid, one://two
is not a valid scheme either (as the change would assert that it is), and so on. If we're going to sanitize, we can either match the exact spec (re.sub(r'^(([A-Za-z0-9]|-|\.|\+)+)\W+', r'\1', ...)
), which is still going to allow something like ms-office-word
as a scheme, have a whitelist for strings (and again, since Golang's uri.URI
isn't otherwise filtered by Prom, we shouldn't), or simply abandon this patch before we get too far down this rabbit hole.
Notable, \w
also allows _
, which cannot be in a scheme, and "really" validating it is an even uglier regexp than normal. That sort of thing is, honestly, terrible in a real-life codebase which isn't an RFC, won't be obvious unless we put something like "please see https://www.rfc-editor.org/rfc/rfc3986#section-3.1" as a comment, and so on.
At this point, I would vote for simply abandoning this patch until when and if we ever see a user-reported bug about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch, but also probably too naive. If we're going to sanitize it, let's sanitize it.
@@ -802,7 +802,7 @@ def __init__( | |||
self._charm = charm | |||
self.tool = CosTool(self._charm) | |||
self._relation_name = relation_name | |||
self._endpoint_schema = endpoint_schema | |||
self._endpoint_scheme = endpoint_schema.rstrip("://") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still leaves a lot of "unsanitized input" ways in which a user could break it. Can we be more flexible, like here?
Suggested change
self._endpoint_scheme = endpoint_schema.rstrip("://") if re.match(r'^\w+\W+', endpoint_schema): logger.warning("The provided endpoint schema should be a plain URI scheme with no trailing characters: %s", endpoint_schema) endpoint_schema = re.sub(r'^(\w+)\W+)', r'\1', endpoint_schema) self._endpoint_scheme = endpoint_schema.rstrip("://")
This will return:
error: unbalanced parenthesis at position 9
Seems the code should be:
self._endpoint_scheme = endpoint_schema.rstrip("://") | |
if re.match(r'^\w+\W+', endpoint_schema): | |
logger.warning("The provided endpoint schema should be a plain URI scheme with no trailing characters: %s", endpoint_schema) | |
endpoint_schema = ```suggestion | |
if re.match(r'^\w+\W+', endpoint_schema): | |
logger.warning("The provided endpoint schema should be a plain URI scheme with no trailing characters: %s", endpoint_schema) | |
endpoint_schema = re.sub(r'^((\w+)\W+)', r'\1', endpoint_schema) | |
self._endpoint_scheme = endpoint_schema.rstrip("://") |
Closing due to mixed views. |
Issue
Users could mistakenly pass "http://" with a "://" suffix,
and the generate reldata will be invalid:
Solution
Eliminate the problem by always stripping the
://
suffix.Cannot use
Literal["http", "https"]
because it will reject arbitrary strings such as:Context
NTA.
Testing Instructions
Relate mimir to avalanche and
juju show-unit avalanche/0
.Release Notes
Strip '://' suffix from remote_write scheme.