CA-388587: Fix filtering the `xapi-clusterd` `db` and log any errors #67

bernhardkaindl · 2024-02-09T06:13:05Z

Fix filtering the XAPI Clustder DB as found by CA-388587 (found using testing by @ashwin9390)

Commit 1 and 1 setup unit testing:
- Commit 1 enables enforced version of:
  1. Create a test and make it fail
  - Use Strict XFAIL only: An unexpected PASS is fail.
  1. Then fix the code.
  2. And remove the XFAIL marker at the same time (the strict XFAIL setting forces you to remove it)
The Fix 1 and Fix 2 commits reuse original patches by @edwintorok for this issue.
- They fix the bugs and remove their respective XFAIL marker.
Fix 3 applies the lesson learnt:
- to log errors during processing to the bug-tool logfile for analysis.

The tests that are created and fixed as described in the module docstring of the test case module.

Signed-off-by: Bernhard Kaindl <[email protected]>

github-actions · 2024-02-09T06:13:42Z

Pytest Code coverage comment for Python 2.7

File	Stmts	Miss	Cover	Missing
xen-bugtool	1577	278	82%	77–79, 84, 95, 100–101, 414, 523, 529, 552, 687, 705, 770–771, 785, 789, 804–805, 811–812, 817, 827, 836, 839–842, 848, 853–855, 858–859, 862–863, 866, 907–909, 1044–1058, 1060–1069, 1130, 1138, 1263, 1265, 1269–1270, 1293, 1311, 1314–1316, 1319–1323, 1326–1327, 1331–1333, 1339–1345, 1348–1352, 1354–1370, 1373–1381, 1384–1385, 1387, 1389–1390, 1392–1400, 1403, 1405–1408, 1410, 1412–1413, 1447, 1475, 1490, 1498–1499, 1562–1564, 1566–1571, 1574–1578, 1582–1583, 1585, 1602–1603, 1605, 1607–1608, 1610–1612, 1614–1621, 1623–1628, 1632–1633, 1635, 1637, 1639, 1675–1676, 1678, 1680–1682, 1685–1686, 1688, 1690–1692, 1695–1696, 1698, 1700–1703, 1706–1716, 1719–1723, 1732, 1745, 1755, 1759, 1794–1796, 1871, 1906, 1945, 2005, 2048, 2051–2052, 2090, 2097, 2103, 2105–2108, 2111–2121, 2212–2214, 2217–2219, 2250–2252, 2292–2296, 2308, 2326
integration
__init__.py	0	0	100%
conftest.py	18	0	100%
namespace_container.py	35	1	97%	85
test_system_load.py	18	0	100%
test_xenserver_config.py	15	0	100%
utils.py	71	10	85%	18, 31, 54–57, 61, 81, 84, 122
mocks/xen
__init__.py	0	0	100%
mocks/xen/lowlevel
__init__.py	0	0	100%
mocks/xen/lowlevel/xc
__init__.py	5	0	100%
unit
__init__.py	0	0	100%
conftest.py	50	10	80%	22, 51, 53–57, 59–61
test_dir_list.py	11	0	100%
test_filter_xapi_clusterd_db.py	78	0	100%
test_fs_funcs.py	10	0	100%
test_load_plugins.py	9	0	100%
test_main.py	119	8	93%	189, 271–273, 275–278
test_output.py	65	0	100%
test_process_output.py	11	0	100%
test_snmp.py	15	0	100%
test_xapidb_filter.py	21	0	100%
TOTAL	2128	307	85%

Tests	Skipped	Failures	Errors	Time
33	1 💤	0 ❌	0 🔥	2.421s ⏱️

github-actions · 2024-02-09T06:14:18Z

Python3 coverage comment from https://github.com/marketplace/actions/pytest-coverage-comment

File	Stmts	Miss	Cover	Missing
__init__.py	0	0	100%
conftest.py	58	0	100%
test_dir_list.py	11	0	100%
test_filter_xapi_clusterd_db.py	78	0	100%
test_fs_funcs.py	10	0	100%
test_load_plugins.py	9	0	100%
test_main.py	121	0	100%
test_output.py	65	0	100%
test_process_output.py	11	0	100%
test_snmp.py	15	0	100%
test_xapidb_filter.py	21	0	100%
xen-bugtool	1577	271	82%	95, 100–101, 414, 523, 529, 552, 687, 705, 770–771, 785, 789, 804–805, 811–812, 817, 827, 836, 848, 853–855, 858–859, 862–863, 866, 907–909, 928, 1044–1058, 1060–1069, 1130, 1138, 1263, 1265, 1269–1270, 1293, 1311, 1314–1316, 1319–1323, 1326–1327, 1331–1333, 1339–1345, 1348–1352, 1354–1370, 1373–1381, 1384–1385, 1387, 1389–1390, 1392–1400, 1403, 1405–1408, 1410, 1412–1413, 1447, 1475, 1490, 1498–1499, 1562–1564, 1566–1571, 1574–1578, 1582–1583, 1585, 1602–1603, 1605, 1607–1608, 1610–1612, 1614–1621, 1623–1628, 1632–1633, 1635, 1637, 1639, 1675–1676, 1678, 1680–1682, 1685–1686, 1688, 1690–1692, 1695–1696, 1698, 1700–1703, 1706–1716, 1719–1723, 1733, 1745, 1755, 1759, 1794–1796, 1906, 1930, 1945, 2005, 2048, 2051–2052, 2090, 2097, 2103, 2105–2108, 2111–2121, 2212–2214, 2217–2219, 2250–2252, 2292–2296, 2308, 2326
TOTAL	1976	271	86%

Tests	Skipped	Failures	Errors	Time
29	0 💤	0 ❌	0 🔥	1.209s ⏱️

tests/unit/conftest.py

tests/unit/test_filter_xapi_clusterd_db.py

ashwin9390

This PR has refactoring of the code ,descriptive documentation and additional test case .Overall code looks sufficient and clean.

Signed-off-by: Bernhard Kaindl <[email protected]>

The 'token' is an API token that is used to authenticate all API requests to the xapi-clusterd daemon. - Apply the fix originally developed and tested by Edwin. - Update the tests: - Ensure that the fix works and keeps working across future changes. Co-authored-by: Bernhard Kaindl <[email protected]> Signed-off-by: Edwin Török <[email protected]>

… instead Handle the situations where configs doesn't have "pems" keys: Yangtze does not have REQ-403 and the REQ-403 API changed later to pass the PEMs in files, so the clusterd DB only has their path names: Ignore the pems keys if not there and sanitize the rest of the config. - Applies the fix originally developed and tested by Edwin. - Update the tests: - Ensure that the fix works and keeps working across future changes. Co-authored-by: Bernhard Kaindl <[email protected]> Signed-off-by: Edwin Török <[email protected]>

Signed-off-by: Bernhard Kaindl <[email protected]>

bernhardkaindl · 2024-02-12T16:29:43Z

@lindig, @ashwin9390 : I have only done clean-ups, streamlined the test fixtures and their use.

Maybe review (and approve) again?

xen-bugtool

edwintorok · 2024-02-12T16:59:34Z

xen-bugtool

    try:
        with open(XAPI_CLUSTERD, 'r') as f:
            clusterd_data = json.load(f)
-
+    except Exception as e:  # pylint: disable=broad-exception-caught


Why split the try/with here? AFAICT the end result is the same: we log the exception and return the empty string. And the traceback should tell us exactly which line raised the error, but maybe I'm missing something here, is there a reason to keep try/with split like this?

Hi, the reason was rather long-winded with at least two aspects:

The original reason for the try-split split was:
I had set .pylintrc max-try-statements=11 and this was exceeded by the added lines.

I've set this limit with an intention:

If it is exceeded, I want developers to think:

Should any error in all of this big try-statement really result in the same error-handling?

The max-size of the try-statements was the lever that I found for this, to make the code at least not regress further.

So I want to give devs (including myself) a heads-up, when their try-statements grow beyond a certain size, aka "out of leaps and bounds".

So I do not want to raise that limit, and I also like to avoid t just add a comment to disable the warning:

Firstly, it gives code a bad look when disabling warnings.
Secondly, I think that pylint implements this limit for a good reason (or I found):

max-try-statements=11 is what works for the status-report tool, and adding the additional code inside would have made this the largest try-statement in the status-report script.

Because these try-statements are combined with broad-except, (except Exception) they catch everything in the try-statement.

These large try-statements with broad-except have been a major pain point for me for finding Python3 bugs - the actual cause of Python3 bugs.

Errors in them have not been straight-forward to find because nothing tells you that there was even an error:

Usually they are even combined with output_data = "", and an empty file (in the happy case) is the only indication that something is wrong.

And this is also what was the case in this function, the broad Exception handler
threw away the KeyError that the expected pems was no longer in the JSON data.

So when testing the xapi-clusterdb, the only error indication was that the output file was empty.

So what I'd like to go for is that when such an Exception happens, I want a more targeted handling of the exception.

But it's just preference that I'd like to have a more specific Exception handling when the sources are completely different causes.

And the size was my only handle, I had. I set it so that it would trigger when the existing try-statements grow even larger.

Here is where I wanted to aim for:

try: with open(XAPI_CLUSTERD, 'r') as f: clusterd_data = json.load(f) except Exception as e: # pylint: disable=broad-exception-caught return log_internal_error(e)

Any Exception raised here means either invalid JSON - so something that should be an alert that something is bad.

This Exception should actually go into a separate bugball file like "system-errors.out" or so.

In the other case any error its definitely a bug in bugtool itself that must be fixed.

Since the data has been red, just some fields are not there, it's a different kind of error for me.

that should automatically be directed to a developer (since that looks like an unexpected change in the JSON).

OTOH, a non-existing clusterd-db file is a normal case in the non-clustered case.
For the usual case where the file does not exist, we can just check if it is there.

But if the file is missing in the clusterd-db case, it would a problem that would prevent clustering from working.

So I did this with a look towards enabling more diagnostics later,
make it possible to direct the exception causes towards different channels.

As it has 3 approvals now, I'd merge and move forward. Smaller tweaks are of course always possible.

If the real reason is to avoid broad try-excepts, can we do that here? There are only a limited number of exceptions that 'open' can raise, and KeyError is certainly not one we should catch, and I understand the desire to not mask these errors.

I think the reason for catching broad errors in status-report may be a good one though: if one of the status report gathering functions fail you don't want to give up on the entire script, there may still be valuable information you can gather from other files.

Perhaps a better approach (that could be addressed in a future PR) would be:

have some kind of top-level try/catch handler whenever we invoke a function to collect something for capability "X", and if that fails we log the exception somewhere (general xen-bugtool.out or the file that was supposed to contain our output, etc.), but keep going and capture other things. Then the broad try/except would only wrap a very small number of lines (i.e. 1 or 2, just invoke the gathering function for that capability)

you've already added a check on whether the file exists which avoids logging errors unnecessarily when they are not errors

status-report could keep a count of how many of these broad try/except it has caught and logged, and exit non-zero at the end, which should allow XenRT to detect this and fail (we should whether it'd still store the already created bugtool in that case, as that is still very useful in debugging what went wrong)

Very good proposals! Yes, they should be implemented!

I created #72 to implement them and added my answers!

I only added you as an additional assignee to show that we should do this, if it adds an entry into your GitHub notifications that you like to avoid, of course just remove yourself.

lindig

It all looks reasonable but I'm not a good person to review this.

CA-358870: Backport fixes from #67 to filter the Xapi `clusterd/db` for XS8/Yangtze

pytest.ini: Make @pytest.mark.xfail(strict=True) the default

f9eb22c

Signed-off-by: Bernhard Kaindl <[email protected]>

bernhardkaindl force-pushed the CA-388587/fix/clusterd-db branch 2 times, most recently from bad4e07 to f9eb22c Compare February 9, 2024 06:26

bernhardkaindl requested review from lindig, GeraldEV and ashwin9390 February 9, 2024 09:09

bernhardkaindl marked this pull request as ready for review February 9, 2024 09:09

bernhardkaindl mentioned this pull request Feb 9, 2024

CA-388587: Fix filter_xapi_clusterd_db() to capture the db and log any errors that occured. #58

Closed

lindig previously approved these changes Feb 9, 2024

View reviewed changes

tests/unit/conftest.py Outdated Show resolved Hide resolved

bernhardkaindl requested a review from edwintorok February 9, 2024 09:39

bernhardkaindl dismissed lindig’s stale review via f9eb22c February 9, 2024 19:30

bernhardkaindl force-pushed the CA-388587/fix/clusterd-db branch from 7a082a9 to f9eb22c Compare February 9, 2024 19:30

bernhardkaindl requested a review from lindig February 9, 2024 21:01

bernhardkaindl force-pushed the CA-388587/fix/clusterd-db branch from cfd0cdb to f9eb22c Compare February 12, 2024 10:52

bernhardkaindl changed the title ~~CA-388587: Fix capture of xapi-clusterd DB and log any errors~~ CA-388587: Fix filtering the xapi-clusterd db and log any errors Feb 12, 2024

ashwin9390 reviewed Feb 12, 2024

View reviewed changes

tests/unit/conftest.py Show resolved Hide resolved

ashwin9390 reviewed Feb 12, 2024

View reviewed changes

tests/unit/test_filter_xapi_clusterd_db.py Outdated Show resolved Hide resolved

ashwin9390 previously approved these changes Feb 12, 2024

View reviewed changes

bernhardkaindl and others added 4 commits February 12, 2024 15:12

CA-388587: Add fixtures and tests for testing filter_xapi_clusterd_db()

2abde59

Signed-off-by: Bernhard Kaindl <[email protected]>

CA-388587: Fix 3: Log any errors in filter_xapi_clusterd_db()

5fd22f8

Signed-off-by: Bernhard Kaindl <[email protected]>

bernhardkaindl dismissed ashwin9390’s stale review via f9eb22c February 12, 2024 15:37

bernhardkaindl force-pushed the CA-388587/fix/clusterd-db branch from 3ba1e16 to f9eb22c Compare February 12, 2024 15:37

bernhardkaindl requested a review from ashwin9390 February 12, 2024 16:27

bernhardkaindl requested a review from rosslagerwall February 12, 2024 16:30

edwintorok reviewed Feb 12, 2024

View reviewed changes

xen-bugtool Show resolved Hide resolved

edwintorok reviewed Feb 12, 2024

View reviewed changes

edwintorok approved these changes Feb 12, 2024

View reviewed changes

lindig approved these changes Feb 12, 2024

View reviewed changes

ashwin9390 approved these changes Feb 12, 2024

View reviewed changes

bernhardkaindl merged commit 6a92d55 into master Feb 12, 2024
12 checks passed

bernhardkaindl deleted the CA-388587/fix/clusterd-db branch February 12, 2024 18:24

bernhardkaindl added a commit that referenced this pull request Feb 14, 2024

Merge pull request #78 from xenserver-next/xs8-fix-clusterdb-filter

b93cd6e

CA-358870: Backport fixes from #67 to filter the Xapi `clusterd/db` for XS8/Yangtze

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CA-388587: Fix filtering the `xapi-clusterd` `db` and log any errors #67

CA-388587: Fix filtering the `xapi-clusterd` `db` and log any errors #67

bernhardkaindl commented Feb 9, 2024 •

edited

Loading

github-actions bot commented Feb 9, 2024 •

edited

Loading

github-actions bot commented Feb 9, 2024 •

edited

Loading

ashwin9390 left a comment

bernhardkaindl commented Feb 12, 2024

edwintorok Feb 12, 2024

bernhardkaindl Feb 12, 2024

edwintorok Feb 13, 2024 •

edited

Loading

bernhardkaindl Feb 13, 2024

lindig left a comment

CA-388587: Fix filtering the xapi-clusterd db and log any errors #67

CA-388587: Fix filtering the xapi-clusterd db and log any errors #67

Conversation

bernhardkaindl commented Feb 9, 2024 • edited Loading

github-actions bot commented Feb 9, 2024 • edited Loading

github-actions bot commented Feb 9, 2024 • edited Loading

ashwin9390 left a comment

Choose a reason for hiding this comment

bernhardkaindl commented Feb 12, 2024

edwintorok Feb 12, 2024

Choose a reason for hiding this comment

bernhardkaindl Feb 12, 2024

Choose a reason for hiding this comment

edwintorok Feb 13, 2024 • edited Loading

Choose a reason for hiding this comment

bernhardkaindl Feb 13, 2024

Choose a reason for hiding this comment

lindig left a comment

Choose a reason for hiding this comment

CA-388587: Fix filtering the `xapi-clusterd` `db` and log any errors #67

CA-388587: Fix filtering the `xapi-clusterd` `db` and log any errors #67

bernhardkaindl commented Feb 9, 2024 •

edited

Loading

github-actions bot commented Feb 9, 2024 •

edited

Loading

github-actions bot commented Feb 9, 2024 •

edited

Loading

edwintorok Feb 13, 2024 •

edited

Loading