-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mypy: Add type hints to pglookout [BF-1560] #106
base: main
Are you sure you want to change the base?
Conversation
Also in this commit: - removed `python2` leftovers (from pglookout 2.0.0 (2018-10-12))
Also in this commit: - More unit tests; - Removed redundant `()` in regex.
New file to type hint the configuration. Also: - Alphabetically sort configuration keys in the `README`. This has the effect of grouping related keys closer. - Add missing key: `replication_catchup_timeout`
9d90d78
to
9b4e13f
Compare
This refactor is made to simplify the branching of a part of this method. This helps `mypy` and makes it clearer about what the different preconditions.
d0304b6
to
6ce993e
Compare
Rename `test_lookout.py` -> `test_pglookout.py` to follow conventions.
Change `remote_conns` type hint. The temporary `str` type is no longer accepted. When loading the config, a conversion is done to only have the dict-style type in the config's `remote_conns`.
Extraction of the 3 read functions: from file, git, and Makefile.
Package https://pypi.org/project/futures/ is not needed in Python3.
Also includes some cleanup to better leverage pytest features.
7ba4841
to
7434873
Compare
9dfaebe
to
7306a03
Compare
7306a03
to
e4e771a
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #106 +/- ##
==========================================
+ Coverage 68.20% 71.32% +3.11%
==========================================
Files 10 13 +3
Lines 975 1203 +228
Branches 171 182 +11
==========================================
+ Hits 665 858 +193
- Misses 253 280 +27
- Partials 57 65 +8
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
This fixes some behaviour discrepancy between local and CI.
@@ -201,6 +201,11 @@ description of the current state of the cluster which is under monitoring. | |||
Configuration keys | |||
================== | |||
|
|||
``alert_file_dir`` (default ``os.getcwd()``) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I alphabetically sorted configuration keys in the README
. This has the effect of grouping related keys closer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a different PR? This one is already big enough:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sort could have been in a different PR or a different commit (just the sort operation, no change). My bad 🙇.
Having them in 016d14d might not be the best. That's why I added the GitHub comments to guide reviewers (there's a sort and there's one key that was missing). If that's not good enough, sure it would be possible to split, rebase interactively (pray), force push, modify the test in aiven-core
to target the new hash. Tell me if you want this.
``replication_catchup_timeout`` (default ``300.0``) | ||
|
||
How long should pglookout wait for a node to catch up with the primary | ||
before it decides to promote itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added missing key: replication_catchup_timeout
@@ -18,9 +17,6 @@ | |||
"requests >= 1.2.0", | |||
] | |||
|
|||
if sys.version_info[0] == 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Package https://pypi.org/project/futures/ is not needed in Python3.
def db(): | ||
tmpdir_obj = py_path.local(tempfile.mkdtemp(prefix="pglookout_dbtest_")) | ||
tmpdir = str(tmpdir_obj) | ||
def db(tmp_path_factory: pytest.TempPathFactory) -> Generator[TestPG, None, None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, we can use tmp_path_factory
for session scoped fixtures. So I simplified this fixture.
if "remote_conns" in config: | ||
config["remote_conns"] = {name: get_connection_info(info) for name, info in config["remote_conns"].items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New method to normalize config. When loading the config, a conversion is done to only have the dict-style type in the config's remote_conns
. This simplifies the preconditions.
We could extend this method further to validate more of the config, in the future. Or use better libraries to delegate that part (pydantic).
MISSING_MASTER_FROM_CONFIG_TIMEOUT: Final[float] = 15.0 | ||
MAINTENANCE_MODE_FILE: Final[str] = "/tmp/pglookout_maintenance_mode_file" | ||
PG_DATA_DIRECTORY: Final[str] = "/var/lib/pgsql/data" | ||
JSON_STATE_FILE_PATH: Final[str] = "/tmp/pglookout_state.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These values are extracted from the implementation, in order to guarantee that there is no change in behaviour.
More work could be done, there's still plenty of magic constants.
if isinstance(facility, str): | ||
facility_id: int = SysLogHandler.facility_names.get(facility, SysLogHandler.LOG_LOCAL2) | ||
else: | ||
facility_id = facility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A change that is not really a change. Our README
mentions:
``syslog_facility`` (default ``"local2"``)
Determines syslog log facility. (requires syslog to be true as well)
signal.signal(signal.SIGHUP, self.load_config) | ||
signal.signal(signal.SIGINT, self.quit) | ||
signal.signal(signal.SIGTERM, self.quit) | ||
signal(SIGHUP, self.load_config_from_signal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New method, see below.
if not self._is_initial_state_valid(): | ||
self.log.error("Initial state is invalid, exiting.") | ||
sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this check to fail early. Ideally, we wouldn't need that. But there are some tricky states that we want to avoid. Read the method for more info (a few lines below).
def load_config_from_signal(self, _signal: int, _frame: FrameType | None = None) -> None: | ||
self.log.debug( | ||
"Loading JSON config from: %r, signal: %r, frame: %r", | ||
self.config_path, | ||
_signal, | ||
_frame, | ||
) | ||
self.load_config() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new method is the only one expected to have access to signal and frame. This type of info should not reach the load_config
. The behaviour is left unchanged.
However, now, that means that load_config
can be called as a normal method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR contains many different changes.
It's true that, sometimes,when adding types you must change some code; but my impression is that this sneaks many small changes along with typing. The combined effect of all those changes isn't really clear to me.
Since this is a critical component, we should make each change more isolated, and not introduce semantic changes in a commit with "mypy" in the message, unless they're absolutely needed to make the code work (sometimes when using mypy we realize there's a "dynamic typing" magic going on) or to properly parse validated data structures, if that's the case (but, if today it works even when not validated because of some type coercion or magic string conversion, changing that could still be an issue).
I still suggest you first try this in aiven-core btw. If it works and passes all tests, that would be a great indicator that things should work properly, and we can think about how to restructure/split this pr.
class ThreadedWebServer(ThreadingMixIn, HTTPServer): | ||
cluster_state = None | ||
log = None | ||
cluster_monitor_check_queue = None | ||
allow_reuse_address = True | ||
allow_reuse_address: bool = True | ||
|
||
def __init__( | ||
self, | ||
address: str, | ||
port: int, | ||
RequestHandlerClass: type[BaseHTTPRequestHandler], | ||
cluster_state: dict[str, MemberState], | ||
log: Logger, | ||
cluster_monitor_check_queue: Queue[str], | ||
) -> None: | ||
super().__init__((address, port), RequestHandlerClass) | ||
self.cluster_state: dict[str, MemberState] = cluster_state | ||
self.log: Logger = log | ||
self.cluster_monitor_check_queue: Queue[str] = cluster_monitor_check_queue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd not mix this change with typing changes. And, this comment should be in the commit message anyway.
self.server = ThreadedWebServer( | ||
address=self.address, | ||
port=self.port, | ||
RequestHandlerClass=RequestHandler, | ||
cluster_state=self.cluster_state, | ||
log=self.log, | ||
cluster_monitor_check_queue=self.cluster_monitor_check_queue, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevertheless, I'd keep this in a separate commit with its own message.
if not self.own_db: | ||
self._log_state_when_own_db_is_not_in_cluster( | ||
observer_state=observer_state, | ||
standby_nodes=standby_nodes, | ||
) | ||
else: | ||
assert own_state is not None | ||
consider_failover = self._is_failover_consideration_needed( | ||
own_state=own_state, | ||
observer_state=observer_state, | ||
standby_nodes=standby_nodes, | ||
master_node=master_node, | ||
) | ||
|
||
if self.own_db: | ||
if self.own_db == self.current_master: | ||
# We are the master of this cluster, nothing to do | ||
self.log.debug( | ||
"We %r: %r are still the master node: %r of this cluster, nothing to do.", | ||
self.own_db, | ||
own_state, | ||
master_node, | ||
) | ||
return | ||
if not standby_nodes: | ||
self.log.warning("No standby nodes set, master node: %r", master_node) | ||
return | ||
self.consider_failover(own_state, master_node, standby_nodes) | ||
if consider_failover and own_state: | ||
self.consider_failover( | ||
own_state=own_state, | ||
master_node=master_node, | ||
standby_nodes=standby_nodes, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't review this part.
56864e7
to
432800b
Compare
This makes sure that the tests can run in their own environment, without affecting the system deps, while still using the system deps.
432800b
to
9a0abae
Compare
…type_hints # Conflicts: # pglookout/pglookout.py # test/test_pglookout.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't finished yet. Some of my comments can be ignore but others will be nice to revisit. Really appreciate that everything was split into different commits, but from a reviewer perspective it is a bit overwhelming and some of the changes might be behavioral and non-related to the PR (I only noticed this by navigating into each commit and actually focusing, not ideal in my opinion). I guess in those situations where that dependency cannot be avoided, is better to open a different PR. (Maybe is just me 🤷♀️ )
(More for me, I still need to continue from 6ce993e and further)
StatsdMetricType = Literal[ | ||
b"g", # gauge | ||
b"c", # counter | ||
b"s", # set | ||
b"ms", # timing | ||
b"h", # histogram | ||
b"d", # distribution | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question Why not using enumerations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is checked at static type checking time (when running mypy
) and is ignored at runtime.
An enumeration is an actual runtime "thing" (type+value). I don't want to change anything in these PRs (or as little as possible).
@pytest.mark.parametrize( | ||
"timestamp", | ||
[ | ||
datetime(2021, 1, 1, 23, 42, 11, 123456), | ||
datetime(2021, 1, 1, 23, 42, 11), | ||
datetime(2021, 1, 1, 23, 42), | ||
datetime(2021, 1, 1, 23), | ||
datetime(2021, 1, 1), | ||
], | ||
) | ||
def test_roundtrip(timestamp: datetime) -> None: | ||
ts2 = parse_iso_datetime(get_iso_timestamp(timestamp)) | ||
|
||
assert ts2 == timestamp | ||
|
||
|
||
@pytest.mark.parametrize( | ||
("value", "normalized_value"), | ||
# fmt: off | ||
[ | ||
# Extended format | ||
("2021-01-01T00:00:00.000000Z", "2021-01-01T00:00:00Z"), # noqa: E241 | ||
("2021-01-01T23:42:11.123456Z", "2021-01-01T23:42:11.123456Z"), # noqa: E241 | ||
("2021-01-01T00:00:00Z", "2021-01-01T00:00:00Z"), # noqa: E241 | ||
("2021-01-01T23:42:11Z", "2021-01-01T23:42:11Z"), # noqa: E241 | ||
("2021-01-01T00:00Z", "2021-01-01T00:00:00Z"), # noqa: E241 | ||
("2021-01-01T23:42Z", "2021-01-01T23:42:00Z"), # noqa: E241 | ||
("2021-01-01", "2021-01-01T00:00:00Z"), # noqa: E241 | ||
# Basic format | ||
("20210101T000000Z", "2021-01-01T00:00:00Z"), # noqa: E241 | ||
("20210101T234211123456Z", "2021-01-01T23:42:11.123456Z"), # noqa: E241 | ||
("20210101T000000Z", "2021-01-01T00:00:00Z"), # noqa: E241 | ||
("20210101T234211Z", "2021-01-01T23:42:11Z"), # noqa: E241 | ||
("20210101T0000Z", "2021-01-01T00:00:00Z"), # noqa: E241 | ||
("20210101T2342Z", "2021-01-01T23:42:00Z"), # noqa: E241 | ||
("20210101", "2021-01-01T00:00:00Z"), # noqa: E241 | ||
], | ||
# fmt: on | ||
) | ||
def test_reverse_roundtrip(value: str, normalized_value: str) -> None: | ||
v2 = get_iso_timestamp(parse_iso_datetime(value)) | ||
|
||
assert v2 == normalized_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion Maybe a different PR? its sort of non-related to type hints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because there were some minute changes in parse_iso_datetime
and get_iso_timestamp
. And when having function + invert function, we should have roundtrip tests. This was missing and helped to make sure I didn't break anything. Also, I think it helps reviewers by giving higher guarantees that my changes didn't break anything.
But true, it's not a test-focused PR. So, I didn't add much.
@@ -201,6 +201,11 @@ description of the current state of the cluster which is under monitoring. | |||
Configuration keys | |||
================== | |||
|
|||
``alert_file_dir`` (default ``os.getcwd()``) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a different PR? This one is already big enough:)
from typing import Literal, TypedDict | ||
|
||
|
||
class Statsd(TypedDict, total=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question why not total=True
? isn't it important for the typed dicts to have all these keys? feels like this might lead to bugs if we are not strict enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I remember, it might be because of this:
stats: Statsd = self.config.get("statsd", {})
self.stats = StatsClient(**stats)
The default has to also be a valid Statsd
, and an empty dict is only valid when total=False
.
Also we have this:
class StatsClient:
def __init__(
self,
host: str | None = "127.0.0.1",
port: int = 8125,
tags: dict[str, str] | None = None,
) -> None: ...
where they all have a default. Therefore, they're optional. These defaults only make sense if we allow for users to provide an "incomplete" Statsd
.
http_port: int | ||
json_state_file_path: str | ||
known_gone_nodes: list[str] | ||
log_level: Literal["NOTSET", "DEBUG", "INFO", "WARNING", "WARN", "ERROR", "FATAL", "CRITICAL"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion Maybe an enum?:) ig I like enums
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same reason as the other Literal vs. Enum comment. :-)
failover_decision_queue, | ||
is_replication_lag_over_warning_limit, | ||
stats, | ||
config: Config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not doing the same for StatsClient.__init__
? https://github.com/aiven/pglookout/blob/2ce13ab9002303040ea44cbc4f11fa6bb65722bf/pglookout/statsd.py#L32-L34
There is already Statsd
https://github.com/aiven/pglookout/blob/016d14db3722dd1002a765112672afb0ba3f2d09/pglookout/config.py#L7
state_data: str | ||
|
||
|
||
class MemberState(TypedDict, total=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the naming is not clear enough, I had to jump to a different commit and file just to read the docstring in order to know what was going on in 9b4e13f
class MemberState(TypedDict, total=False): | |
class ClusterMemberState(TypeDict, total=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm... even with the docstring? I personally think that your name suggestion is also good and I think it makes sense. I'm just wondering.
pglookout/common_types.py
Outdated
# Note for future improvements: | ||
# If we want ObserverState to accept arbitrary keys, we have three choices: | ||
# - Use a different type (pydantic, dataclasses, etc.) | ||
# - Use a TypedDict for static keys (connection, fetch_time) and a sub-dict for | ||
# dynamic keys (received from state.json). | ||
# - Wait for something like `allow_extra` to be implemented into TypedDict (unlikely) | ||
# https://github.com/python/mypy/issues/4617 | ||
ObserverState = Dict[str, Any] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually confusing... ClusterMonitor.observer_state
is actually dict[str, ObserverState]
, while ClusterMonitor._fetch_observer_state
return type is Optional[ObserverState]
... so its quite misleading in my opinion. I rather see a more specific typing and later work on improvements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I pulled a lot of hair on that one. The names "observer state" were loosely used for different things. Both for an "observed state" and for an "observer state".
For now, I'd say to ignore this, as it gets improved in a later commit :-). Ideally, I could also change the name instance variable ClusterMonitor.observer_state
and such, but I didn't (I think), as I didn't want to change any of the public APIs (symbols) in this PR.
standby_nodes: dict[str, MemberState], | ||
master_node: MemberState | None, | ||
) -> None: | ||
assert self.own_db is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this assert. Seems like we are considering the case where self.own_db
might be None or empty (missing typing?). This quite changes current behavior, a separate PR will be better for reviewing in such case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for mypy
. We should always remember that our Python code could skip assert
s with -O
/ -OO
( https://docs.python.org/3/tutorial/modules.html#compiled-python-files ) and logic cannot rely on them.
The reason why this was added here is because mypy
highlighted some preconditions. And this one here is one of them. If self.own_db is None
, we would crash a few ops later (with the code currently in main
). The only difference here is that
- we fail with an
AssertionError
instead of a different type of exception (which would be much more obscure). mypy
helps- Optimized code will still behave the same as before. Unoptimized code will be cleaner.
If I don't change that, there are so many variables in this scope that rely and depend on the state of other variables outside of their scope and it's so dirty that mypy
is unable to do any infer. That's why the stuff here was split into several well-scoped methods and don't depend of anything outside of their scope anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a comment to it, smth like:
assert self.own_db is not None | |
assert self.own_db is not None # help mypy |
if not self.own_db: | ||
self._log_state_when_own_db_is_not_in_cluster( | ||
observer_state=observer_state, | ||
standby_nodes=standby_nodes, | ||
) | ||
else: | ||
assert own_state is not None | ||
consider_failover = self._is_failover_consideration_needed( | ||
own_state=own_state, | ||
observer_state=observer_state, | ||
standby_nodes=standby_nodes, | ||
master_node=master_node, | ||
) | ||
|
||
if self.own_db: | ||
if self.own_db == self.current_master: | ||
# We are the master of this cluster, nothing to do | ||
self.log.debug( | ||
"We %r: %r are still the master node: %r of this cluster, nothing to do.", | ||
self.own_db, | ||
own_state, | ||
master_node, | ||
) | ||
return | ||
if not standby_nodes: | ||
self.log.warning("No standby nodes set, master node: %r", master_node) | ||
return | ||
self.consider_failover(own_state, master_node, standby_nodes) | ||
if consider_failover and own_state: | ||
self.consider_failover( | ||
own_state=own_state, | ||
master_node=master_node, | ||
standby_nodes=standby_nodes, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned on my previous comment, I think we should just focus on typing and later on behavioral/cosmetic changes. It currently feels this is out of scope with this PR.
I'm not going to review this further; consider all my comments as resolved, and handle this within team BF :-) |
Thanks @alanfranz and good luck in your new team! I'll remove you from the reviewers :-) |
Important Note
DO NOT merge until we test this on
aiven-core
pipelineThis can be tracked there (private repo):
python2
leftovers (from pglookout 2.0.0 (2018-10-12))pglokout/debian/rules
;setup.py
.os
and string manipulation to replace them withpathlib.Path
.test/test_common.py
;test/conftest.py
to better leverage pytest features.pytest
features (i.e.tmpdir
).README.rst
:replication_catchup_timeout
pglookout/pglookout.py
:check_cluster_state
to simplify the branching of a part of this method.This helps
mypy
and makes it clearer about what the different preconditions are.test/test_lookout.py
->test/test_pglookout.py
to follow conventions.dict-style type in the config's
remote_conns
. This simplifies the preconditions.pglookout/config.py
: