Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip '://' suffix from remote_write scheme #439

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions lib/charms/prometheus_k8s/v0/prometheus_remote_write.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@

# Increment this PATCH version before using `charmcraft publish-lib` or reset
# to 0 if you are raising the major API version
LIBPATCH = 10
LIBPATCH = 11


logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -764,7 +764,7 @@ def __init__(
self,
charm: CharmBase,
relation_name: str = DEFAULT_RELATION_NAME,
endpoint_schema: str = "http",
endpoint_schema: str = "http", # TODO: in v1, rename to 'scheme'
endpoint_address: str = "",
endpoint_port: Union[str, int] = 9090,
endpoint_path: str = "/api/v1/write",
Expand Down Expand Up @@ -802,7 +802,7 @@ def __init__(
self._charm = charm
self.tool = CosTool(self._charm)
self._relation_name = relation_name
self._endpoint_schema = endpoint_schema
self._endpoint_scheme = endpoint_schema.rstrip("://")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still leaves a lot of "unsanitized input" ways in which a user could break it. Can we be more flexible, like here?

Suggested change
self._endpoint_scheme = endpoint_schema.rstrip("://")
if re.match(r'^\w+\W+', endpoint_schema):
logger.warning("The provided endpoint schema should be a plain URI scheme with no trailing characters: %s", endpoint_schema)
endpoint_schema = re.sub(r'^(\w+)\W+)', r'\1', endpoint_schema)
self._endpoint_scheme = endpoint_schema.rstrip("://")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still leaves a lot of "unsanitized input" ways in which a user could break it. Can we be more flexible, like here?

Suggested change
self._endpoint_scheme = endpoint_schema.rstrip("://")
if re.match(r'^\w+\W+', endpoint_schema):
logger.warning("The provided endpoint schema should be a plain URI scheme with no trailing characters: %s", endpoint_schema)
endpoint_schema = re.sub(r'^(\w+)\W+)', r'\1', endpoint_schema)
self._endpoint_scheme = endpoint_schema.rstrip("://")

This will return:

error: unbalanced parenthesis at position 9

Seems the code should be:

Suggested change
self._endpoint_scheme = endpoint_schema.rstrip("://")
if re.match(r'^\w+\W+', endpoint_schema):
logger.warning("The provided endpoint schema should be a plain URI scheme with no trailing characters: %s", endpoint_schema)
endpoint_schema = ```suggestion
if re.match(r'^\w+\W+', endpoint_schema):
logger.warning("The provided endpoint schema should be a plain URI scheme with no trailing characters: %s", endpoint_schema)
endpoint_schema = re.sub(r'^((\w+)\W+)', r'\1', endpoint_schema)
self._endpoint_scheme = endpoint_schema.rstrip("://")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right but the extra ) was just trailling. ^((\w+)\W+) won't do as expected (the outermost group will be \1, and the alphanumeric would be \2).

Should be:

re.sub(r'^(\w+)\W+.*', r'\1', endpoint_schema)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is remote-write. The only two possible options are http and https, no?

self._endpoint_scheme = endpoint_schema.strip().rstrip("://")
if self._endpoint_scheme not in ("http", "https"):
    logger.warning("...")

Copy link
Contributor

@rbarry82 rbarry82 Jan 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably logger.error, because it's not going to work.

Honestly, we should probably throw an event back and put it into BlockedStatus, or straight-up throw an exception to block the charm. Users/authors can miss logger.warning, but "whoops, I put the wrong string in and now it doesn't work" is a bad user experience.

You could say "the only possible options are http and https", but you also clearly managed to pass in "http://", so we need to sanitize at least a little. From the other side of it, Prometheus itself doesn't actually enforce that it's https(s). Only that it can be unmarshalled (deserialized) into a url.URL, which s not nearly that prescriptive.

Do we want to support only http(s)? Maybe. But I'd personally say that it's better to be permissive at the moment because we just... don't know. Mimir will exercise remote write a lot more than we currently are. Thanos already has a gRPC endpoint, and very well may have one also. That also means that, in theory, proxyless gRPC meshing via xds://grpc-endpoint:1234 maybe be valid, depending on what the consumer is.

If we leave it as permissive as the actual Prometheus code, and say "as long as you gave me a schema, I'll try it, and watch the logs." We could exclude cases later if they're problematic or common. If we make it extremely permissive (https/https only), any user who wants to use gRPC meshing may find that it "just works". Including, maybe Kubeflow with Istio.

I don't think we know the problem domain enough to be that prescriptive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.
However I do not like the \w\W one because it trims presumptuously:

>>> re.sub(r'^(\w+)\W+.*', r'\1', "one://two://")
'one'
>>> re.sub(r'^(\w+)\W+.*', r'\1', "one#two")
'one'

Changing to the following. Let's tackle the rest in a separate issue.

>>> sanitized = "one#two".strip().rstrip("://")
>>> re.match("^\w+$", sanitized)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not presumptuous. # cannot be in a URI schema. If you want to be explicit, it's [A-Za-z0-9+-.] (+ and . would need to be escaped in a regexp), but + and . and - are incredibly uncommon and more or less isolated to handlers for MIME types in internal applications (like MS Office embedding IE and using them for internal links).

one#two is not valid, one://two is not a valid scheme either (as the change would assert that it is), and so on. If we're going to sanitize, we can either match the exact spec (re.sub(r'^(([A-Za-z0-9]|-|\.|\+)+)\W+', r'\1', ...)), which is still going to allow something like ms-office-word as a scheme, have a whitelist for strings (and again, since Golang's uri.URI isn't otherwise filtered by Prom, we shouldn't), or simply abandon this patch before we get too far down this rabbit hole.

Notable, \w also allows _, which cannot be in a scheme, and "really" validating it is an even uglier regexp than normal. That sort of thing is, honestly, terrible in a real-life codebase which isn't an RFC, won't be obvious unless we put something like "please see https://www.rfc-editor.org/rfc/rfc3986#section-3.1" as a comment, and so on.

At this point, I would vote for simply abandoning this patch until when and if we ever see a user-reported bug about it.

self._endpoint_address = endpoint_address
self._endpoint_port = int(endpoint_port)
self._endpoint_path = endpoint_path
Expand Down Expand Up @@ -852,7 +852,7 @@ def _set_endpoint_on_relation(self, relation: Relation) -> None:
path = "/{}".format(path)

endpoint_url = "{}://{}:{}{}".format(
self._endpoint_schema, address, str(self._endpoint_port), path
self._endpoint_scheme, address, str(self._endpoint_port), path
)

relation.data[self._charm.unit]["remote_write"] = json.dumps(
Expand Down
2 changes: 1 addition & 1 deletion tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ deps =
integration: pytest-operator==1.0.0b1
commands =
charm: mypy {[vars]src_path} {posargs}
lib: mypy --python-version 3.5 {[vars]lib_path} {posargs}
lib: mypy --python-version 3.8 {[vars]lib_path} {posargs}
unit: mypy {[vars]tst_path}/unit {posargs}
integration: mypy {[vars]tst_path}/integration {posargs}

Expand Down