Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip '://' suffix from remote_write scheme #439

Closed
wants to merge 2 commits into from

Conversation

sed-i
Copy link
Contributor

@sed-i sed-i commented Jan 25, 2023

Issue

Users could mistakenly pass "http://" with a "://" suffix,

            endpoint_schema="http://",  # should have been just "http"

and the generate reldata will be invalid:

    related-units:
      mimir/0:
        in-scope: true
        data:
          remote_write: '{"url": "http://://mimir-0.mimir-endpoints.test-alert-propagation-nxl2.svc.cluster.local:9009/api/v1/push"}'

Solution

Eliminate the problem by always stripping the :// suffix.

Cannot use Literal["http", "https"] because it will reject arbitrary strings such as:

src/charm.py:130: error: Argument "endpoint_schema" to "PrometheusRemoteWriteProvider" has incompatible type "str"; expected "Literal['http', 'https']"  [arg-type]
                endpoint_schema=external_url.scheme,

Context

NTA.

Testing Instructions

Relate mimir to avalanche and juju show-unit avalanche/0.

Release Notes

Strip '://' suffix from remote_write scheme.

@@ -802,7 +802,7 @@ def __init__(
self._charm = charm
self.tool = CosTool(self._charm)
self._relation_name = relation_name
self._endpoint_schema = endpoint_schema
self._endpoint_scheme = endpoint_schema.rstrip("://")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still leaves a lot of "unsanitized input" ways in which a user could break it. Can we be more flexible, like here?

Suggested change
self._endpoint_scheme = endpoint_schema.rstrip("://")
if re.match(r'^\w+\W+', endpoint_schema):
logger.warning("The provided endpoint schema should be a plain URI scheme with no trailing characters: %s", endpoint_schema)
endpoint_schema = re.sub(r'^(\w+)\W+)', r'\1', endpoint_schema)
self._endpoint_scheme = endpoint_schema.rstrip("://")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still leaves a lot of "unsanitized input" ways in which a user could break it. Can we be more flexible, like here?

Suggested change
self._endpoint_scheme = endpoint_schema.rstrip("://")
if re.match(r'^\w+\W+', endpoint_schema):
logger.warning("The provided endpoint schema should be a plain URI scheme with no trailing characters: %s", endpoint_schema)
endpoint_schema = re.sub(r'^(\w+)\W+)', r'\1', endpoint_schema)
self._endpoint_scheme = endpoint_schema.rstrip("://")

This will return:

error: unbalanced parenthesis at position 9

Seems the code should be:

Suggested change
self._endpoint_scheme = endpoint_schema.rstrip("://")
if re.match(r'^\w+\W+', endpoint_schema):
logger.warning("The provided endpoint schema should be a plain URI scheme with no trailing characters: %s", endpoint_schema)
endpoint_schema = ```suggestion
if re.match(r'^\w+\W+', endpoint_schema):
logger.warning("The provided endpoint schema should be a plain URI scheme with no trailing characters: %s", endpoint_schema)
endpoint_schema = re.sub(r'^((\w+)\W+)', r'\1', endpoint_schema)
self._endpoint_scheme = endpoint_schema.rstrip("://")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right but the extra ) was just trailling. ^((\w+)\W+) won't do as expected (the outermost group will be \1, and the alphanumeric would be \2).

Should be:

re.sub(r'^(\w+)\W+.*', r'\1', endpoint_schema)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is remote-write. The only two possible options are http and https, no?

self._endpoint_scheme = endpoint_schema.strip().rstrip("://")
if self._endpoint_scheme not in ("http", "https"):
    logger.warning("...")

Copy link
Contributor

@rbarry82 rbarry82 Jan 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably logger.error, because it's not going to work.

Honestly, we should probably throw an event back and put it into BlockedStatus, or straight-up throw an exception to block the charm. Users/authors can miss logger.warning, but "whoops, I put the wrong string in and now it doesn't work" is a bad user experience.

You could say "the only possible options are http and https", but you also clearly managed to pass in "http://", so we need to sanitize at least a little. From the other side of it, Prometheus itself doesn't actually enforce that it's https(s). Only that it can be unmarshalled (deserialized) into a url.URL, which s not nearly that prescriptive.

Do we want to support only http(s)? Maybe. But I'd personally say that it's better to be permissive at the moment because we just... don't know. Mimir will exercise remote write a lot more than we currently are. Thanos already has a gRPC endpoint, and very well may have one also. That also means that, in theory, proxyless gRPC meshing via xds://grpc-endpoint:1234 maybe be valid, depending on what the consumer is.

If we leave it as permissive as the actual Prometheus code, and say "as long as you gave me a schema, I'll try it, and watch the logs." We could exclude cases later if they're problematic or common. If we make it extremely permissive (https/https only), any user who wants to use gRPC meshing may find that it "just works". Including, maybe Kubeflow with Istio.

I don't think we know the problem domain enough to be that prescriptive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.
However I do not like the \w\W one because it trims presumptuously:

>>> re.sub(r'^(\w+)\W+.*', r'\1', "one://two://")
'one'
>>> re.sub(r'^(\w+)\W+.*', r'\1', "one#two")
'one'

Changing to the following. Let's tackle the rest in a separate issue.

>>> sanitized = "one#two".strip().rstrip("://")
>>> re.match("^\w+$", sanitized)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not presumptuous. # cannot be in a URI schema. If you want to be explicit, it's [A-Za-z0-9+-.] (+ and . would need to be escaped in a regexp), but + and . and - are incredibly uncommon and more or less isolated to handlers for MIME types in internal applications (like MS Office embedding IE and using them for internal links).

one#two is not valid, one://two is not a valid scheme either (as the change would assert that it is), and so on. If we're going to sanitize, we can either match the exact spec (re.sub(r'^(([A-Za-z0-9]|-|\.|\+)+)\W+', r'\1', ...)), which is still going to allow something like ms-office-word as a scheme, have a whitelist for strings (and again, since Golang's uri.URI isn't otherwise filtered by Prom, we shouldn't), or simply abandon this patch before we get too far down this rabbit hole.

Notable, \w also allows _, which cannot be in a scheme, and "really" validating it is an even uglier regexp than normal. That sort of thing is, honestly, terrible in a real-life codebase which isn't an RFC, won't be obvious unless we put something like "please see https://www.rfc-editor.org/rfc/rfc3986#section-3.1" as a comment, and so on.

At this point, I would vote for simply abandoning this patch until when and if we ever see a user-reported bug about it.

Copy link
Contributor

@rbarry82 rbarry82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, but also probably too naive. If we're going to sanitize it, let's sanitize it.

@@ -802,7 +802,7 @@ def __init__(
self._charm = charm
self.tool = CosTool(self._charm)
self._relation_name = relation_name
self._endpoint_schema = endpoint_schema
self._endpoint_scheme = endpoint_schema.rstrip("://")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still leaves a lot of "unsanitized input" ways in which a user could break it. Can we be more flexible, like here?

Suggested change
self._endpoint_scheme = endpoint_schema.rstrip("://")
if re.match(r'^\w+\W+', endpoint_schema):
logger.warning("The provided endpoint schema should be a plain URI scheme with no trailing characters: %s", endpoint_schema)
endpoint_schema = re.sub(r'^(\w+)\W+)', r'\1', endpoint_schema)
self._endpoint_scheme = endpoint_schema.rstrip("://")

This will return:

error: unbalanced parenthesis at position 9

Seems the code should be:

Suggested change
self._endpoint_scheme = endpoint_schema.rstrip("://")
if re.match(r'^\w+\W+', endpoint_schema):
logger.warning("The provided endpoint schema should be a plain URI scheme with no trailing characters: %s", endpoint_schema)
endpoint_schema = ```suggestion
if re.match(r'^\w+\W+', endpoint_schema):
logger.warning("The provided endpoint schema should be a plain URI scheme with no trailing characters: %s", endpoint_schema)
endpoint_schema = re.sub(r'^((\w+)\W+)', r'\1', endpoint_schema)
self._endpoint_scheme = endpoint_schema.rstrip("://")

@sed-i sed-i requested a review from rbarry82 January 26, 2023 17:16
@sed-i
Copy link
Contributor Author

sed-i commented Jan 26, 2023

Closing due to mixed views.

@sed-i sed-i closed this Jan 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants