Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPE-5637, DPE-5276] Implement expose-external config option with values false (clusterip), nodeport and loadbalancer #328

Merged
merged 25 commits into from
Nov 27, 2024
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
5931e15
WIP: Implement expose-external config option with values false (clust…
shayancanonical Oct 10, 2024
7cac8af
Manually tested expose-external config + some code cleanup
shayancanonical Oct 14, 2024
c1459f2
Fix some issues + implement integration tests for expose-external
shayancanonical Oct 15, 2024
f471813
Add missing config file
shayancanonical Oct 15, 2024
902509f
Remove redundant action return code check
shayancanonical Oct 15, 2024
d115742
Add workaround for juju client errors when patching svc to loadbalanc…
shayancanonical Oct 16, 2024
681a5c1
Merge branch 'main' into feature/expose-external
shayancanonical Oct 16, 2024
b60e5e3
Standardize abstract_charm.py with the VM charm
shayancanonical Oct 16, 2024
0c048db
Update outdated tracing charm libs
shayancanonical Oct 16, 2024
ffc728a
Address PR feedback
shayancanonical Oct 17, 2024
d76b325
Merge branch 'main' into feature/expose-external
shayancanonical Oct 17, 2024
ee7bccb
Avoid waiting for service and let further events update endpoints onc…
shayancanonical Oct 18, 2024
ab49776
Appropriately set app status when invalid expose-external value
shayancanonical Oct 18, 2024
f938351
Address PR feedback + update outdate charm tracing lib
shayancanonical Oct 21, 2024
7254856
Add check for None when declaring variable
shayancanonical Oct 21, 2024
4d33cbb
Update tests to retry connections to endpoints after it fails
shayancanonical Oct 22, 2024
c6f6869
Test against EKS + make updates to be complaint with DA122
shayancanonical Nov 22, 2024
e7a9bf1
Merge branch 'main' into feature/expose-external
shayancanonical Nov 22, 2024
4006177
Adjust scenario to include new router peer relation
shayancanonical Nov 22, 2024
887317d
Fix lint warnings
shayancanonical Nov 22, 2024
b9575c3
Revert charm tracing lib to libpatch 2
shayancanonical Nov 22, 2024
c5df314
Update TLS SANs to include DNS for reaching the unit directly
shayancanonical Nov 25, 2024
369f8b6
Add K8s service SANs (the one created by juju)
shayancanonical Nov 26, 2024
fb65030
Address PR feedback
shayancanonical Nov 26, 2024
66138e3
Update data platform workflows to v23.1.0
shayancanonical Nov 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ jobs:
- lint
- unit-test
- build
uses: canonical/data-platform-workflows/.github/workflows/integration_test_charm.yaml@v23.0.2
uses: canonical/data-platform-workflows/.github/workflows/integration_test_charm.yaml@feature/metallb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert before merging.

with:
artifact-prefix: ${{ needs.build.outputs.artifact-prefix }}
architecture: ${{ matrix.architecture }}
Expand All @@ -106,5 +106,6 @@ jobs:
juju-snap-channel: ${{ matrix.juju.snap_channel }}
libjuju-version-constraint: ${{ matrix.juju.libjuju }}
_beta_allure_report: ${{ matrix.juju.allure_on_amd64 && matrix.architecture == 'amd64' }}
metallb-addon: true
permissions:
contents: write # Needed for Allure Report beta
10 changes: 10 additions & 0 deletions config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Copyright 2024 Canonical Ltd.
# See LICENSE file for licensing details.

options:
expose-external:
description: |
String to determine how to expose the MySQLRouter externally from the Kubernetes cluster.
Possible values: 'false', 'nodeport', 'loadbalancer'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: default false may imply true is valid value. Change to something else? no none

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulomach any ideas for alternatives? expose-external: none? expose-external: clusterip?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additionally, the possible values are false and nodeport in kafka-k8s. it may not be a good idea to introduce inconsistencies

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh bummer, ok maybe for another time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, it seems it was a Marc nitpick also that Mykola did not responded to in the original kafka spec

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default false may imply true is valid value. Change to something else? no none

👋🏻 I subscribe Paulo's comment here.

I understand we are "compromising" in order to have a similar interface across DP charms (i.e. kafka-k8s), but it is confusing and we all should make an effort to reduce these occurrences in the future.

type: string
default: "false"
18 changes: 11 additions & 7 deletions lib/charms/tempo_coordinator_k8s/v0/charm_tracing.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,9 @@ def my_tracing_endpoint(self) -> Optional[str]:
- every event as a span (including custom events)
- every charm method call (except dunders) as a span

We recommend that you scale up your tracing provider and relate it to an ingress so that your tracing requests
go through the ingress and get load balanced across all units. Otherwise, if the provider's leader goes down, your tracing goes down.


## TLS support
If your charm integrates with a TLS provider which is also trusted by the tracing provider (the Tempo charm),
Expand Down Expand Up @@ -269,7 +272,7 @@ def _remove_stale_otel_sdk_packages():
# Increment this PATCH version before using `charmcraft publish-lib` or reset
# to 0 if you are raising the major API version

LIBPATCH = 1
LIBPATCH = 3

PYDEPS = ["opentelemetry-exporter-otlp-proto-http==1.21.0"]

Expand Down Expand Up @@ -371,10 +374,6 @@ class UntraceableObjectError(TracingError):
"""Raised when an object you're attempting to instrument cannot be autoinstrumented."""


class TLSError(TracingError):
"""Raised when the tracing endpoint is https but we don't have a cert yet."""


def _get_tracing_endpoint(
tracing_endpoint_attr: str,
charm_instance: object,
Expand Down Expand Up @@ -484,10 +483,15 @@ def wrap_init(self: CharmBase, framework: Framework, *args, **kwargs):
)

if tracing_endpoint.startswith("https://") and not server_cert:
raise TLSError(
logger.error(
"Tracing endpoint is https, but no server_cert has been passed."
"Please point @trace_charm to a `server_cert` attr."
"Please point @trace_charm to a `server_cert` attr. "
"This might also mean that the tracing provider is related to a "
"certificates provider, but this application is not (yet). "
"In that case, you might just have to wait a bit for the certificates "
"integration to settle. "
)
return

exporter = OTLPSpanExporter(
endpoint=tracing_endpoint,
Expand Down
24 changes: 16 additions & 8 deletions lib/charms/tempo_coordinator_k8s/v0/tracing.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ def __init__(self, *args):
`TracingEndpointRequirer.request_protocols(*protocol:str, relation:Optional[Relation])` method.
Using this method also allows you to use per-relation protocols.

Units of provider charms obtain the tempo endpoint to which they will push their traces by calling
Units of requirer charms obtain the tempo endpoint to which they will push their traces by calling
`TracingEndpointRequirer.get_endpoint(protocol: str)`, where `protocol` is, for example:
- `otlp_grpc`
- `otlp_http`
Expand All @@ -44,7 +44,10 @@ def __init__(self, *args):
If the `protocol` is not in the list of protocols that the charm requested at endpoint set-up time,
the library will raise an error.

## Requirer Library Usage
We recommend that you scale up your tracing provider and relate it to an ingress so that your tracing requests
go through the ingress and get load balanced across all units. Otherwise, if the provider's leader goes down, your tracing goes down.

## Provider Library Usage

The `TracingEndpointProvider` object may be used by charms to manage relations with their
trace sources. For this purposes a Tempo-like charm needs to do two things
Expand Down Expand Up @@ -107,7 +110,7 @@ def __init__(self, *args):

# Increment this PATCH version before using `charmcraft publish-lib` or reset
# to 0 if you are raising the major API version
LIBPATCH = 1
LIBPATCH = 3

PYDEPS = ["pydantic"]

Expand Down Expand Up @@ -985,11 +988,16 @@ def charm_tracing_config(
is_https = endpoint.startswith("https://")

if is_https:
if cert_path is None:
raise TracingError("Cannot send traces to an https endpoint without a certificate.")
elif not Path(cert_path).exists():
# if endpoint is https BUT we don't have a server_cert yet:
# disable charm tracing until we do to prevent tls errors
if cert_path is None or not Path(cert_path).exists():
# disable charm tracing until we obtain a cert to prevent tls errors
logger.error(
"Tracing endpoint is https, but no server_cert has been passed."
"Please point @trace_charm to a `server_cert` attr. "
"This might also mean that the tracing provider is related to a "
"certificates provider, but this application is not (yet). "
"In that case, you might just have to wait a bit for the certificates "
"integration to settle. "
)
return None, None
return endpoint, str(cert_path)
else:
Expand Down
3 changes: 1 addition & 2 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 0 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,6 @@ tenacity = "^8.5.0"
allure-pytest = "^2.13.5"
allure-pytest-collection-report = {git = "https://github.com/canonical/data-platform-workflows", tag = "v23.0.2", subdirectory = "python/pytest_plugins/allure_pytest_collection_report"}


[tool.coverage.run]
branch = true

Expand Down
63 changes: 52 additions & 11 deletions src/abstract_charm.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ def __init__(self, *args) -> None:
self._database_requires = relations.database_requires.RelationEndpoint(self)
self._database_provides = relations.database_provides.RelationEndpoint(self)
self._cos_relation = relations.cos.COSRelation(self, self._container)
self._ha_cluster = None
self.framework.observe(self.on.update_status, self.reconcile)
self.framework.observe(
self.on[upgrade.PEER_RELATION_ENDPOINT_NAME].relation_changed, self.reconcile
Expand Down Expand Up @@ -99,23 +100,29 @@ def _logrotate(self) -> logrotate.LogRotate:

@property
@abc.abstractmethod
def _read_write_endpoint(self) -> str:
def _read_write_endpoints(self) -> str:
"""MySQL Router read-write endpoint"""

@property
@abc.abstractmethod
def _read_only_endpoint(self) -> str:
def _read_only_endpoints(self) -> str:
"""MySQL Router read-only endpoint"""

@property
@abc.abstractmethod
def _exposed_read_write_endpoint(self) -> str:
"""The exposed read-write endpoint"""
def _exposed_read_write_endpoint(self) -> typing.Optional[str]:
"""The exposed read-write endpoint.

Only defined in vm charm.
"""

@property
@abc.abstractmethod
def _exposed_read_only_endpoint(self) -> str:
"""The exposed read-only endpoint"""
def _exposed_read_only_endpoint(self) -> typing.Optional[str]:
"""The exposed read-only endpoint.

Only defined in vm charm.
"""

@abc.abstractmethod
def is_externally_accessible(self, *, event) -> typing.Optional[bool]:
Expand Down Expand Up @@ -212,6 +219,9 @@ def _determine_unit_status(self, *, event) -> ops.StatusBase:
workload_status = self.get_workload(event=event).status
if self._upgrade:
statuses.append(self._upgrade.get_unit_juju_status(workload_status=workload_status))
# only in machine charms
if self._ha_cluster:
statuses.append(self._ha_cluster.get_unit_juju_status())
statuses.append(workload_status)
return self._prioritize_statuses(statuses)

Expand All @@ -232,12 +242,16 @@ def wait_until_mysql_router_ready(self, *, event) -> None:
"""

@abc.abstractmethod
def _reconcile_node_port(self, *, event) -> None:
"""Reconcile node port.
def _reconcile_service(self) -> None:
"""Reconcile service.

Only applies to Kubernetes charm
"""

@abc.abstractmethod
def _wait_until_service_reconciled(self) -> None:
"""Waits until the service is reconciled (connectable with a socket)"""

@abc.abstractmethod
def _reconcile_ports(self, *, event) -> None:
"""Reconcile exposed ports.
Expand Down Expand Up @@ -311,6 +325,10 @@ def reconcile(self, event=None) -> None: # noqa: C901
f"{self._cos_relation.is_relation_breaking(event)=}"
)

# only in machine charms
if self._ha_cluster:
self._ha_cluster.set_vip(self.config.get("vip"))

try:
if self._unit_lifecycle.authorized_leader:
if self._database_requires.is_relation_breaking(event):
Expand All @@ -324,15 +342,29 @@ def reconcile(self, event=None) -> None: # noqa: C901
and isinstance(workload_, workload.AuthenticatedWorkload)
and workload_.container_ready
):
self._reconcile_node_port(event=event)
self._reconcile_service()
shayancanonical marked this conversation as resolved.
Show resolved Hide resolved
self._database_provides.reconcile_users(
event=event,
router_read_write_endpoint=self._read_write_endpoint,
router_read_only_endpoint=self._read_only_endpoint,
router_read_write_endpoints=self._read_write_endpoints,
router_read_only_endpoints=self._read_only_endpoints,
exposed_read_write_endpoint=self._exposed_read_write_endpoint,
exposed_read_only_endpoint=self._exposed_read_only_endpoint,
shell=workload_.shell,
)
# _ha_cluster only assigned a value in machine charms
if self._ha_cluster:
self._database_provides.update_endpoints(
router_read_write_endpoints=self._read_write_endpoints,
router_read_only_endpoints=self._read_only_endpoints,
exposed_read_write_endpoint=self._exposed_read_write_endpoint,
exposed_read_only_endpoint=self._exposed_read_only_endpoint,
)
else:
self._database_provides.update_endpoints(
router_read_write_endpoints=self._read_write_endpoints,
router_read_only_endpoints=self._read_only_endpoints,
)

if workload_.container_ready:
workload_.reconcile(
event=event,
Expand All @@ -348,6 +380,15 @@ def reconcile(self, event=None) -> None: # noqa: C901
):
self._reconcile_ports(event=event)

if (
self._unit_lifecycle.authorized_leader
and not self._upgrade.in_progress
and isinstance(workload_, workload.AuthenticatedWorkload)
and workload_.container_ready
and isinstance(event, ops.ConfigChangedEvent)
):
self._wait_until_service_reconciled()

# Empty waiting status means we're waiting for database requires relation before
# starting workload
if not workload_.status or workload_.status == ops.WaitingStatus():
Expand Down
Loading