Skip to content

Commit

Permalink
Observability mode - output jsonlines
Browse files Browse the repository at this point in the history
  • Loading branch information
Zac-HD committed Dec 10, 2023
1 parent 6941cd2 commit 239c836
Show file tree
Hide file tree
Showing 21 changed files with 645 additions and 42 deletions.
4 changes: 4 additions & 0 deletions hypothesis-python/RELEASE.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
RELEASE_TYPE: minor

This release adds an experimental :wikipedia:`observability <Observability_(software)>`
mode. :doc:`You can read the docs about it here <observability>`.
Empty file.
15 changes: 15 additions & 0 deletions hypothesis-python/docs/_static/wrap-in-tables.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
/* override table width restrictions */
/* thanks to https://github.com/readthedocs/sphinx_rtd_theme/issues/117#issuecomment-153083280 */
@media screen and (min-width: 767px) {

.wy-table-responsive table td {
/* !important prevents the common CSS stylesheets from
overriding this as on RTD they are loaded after this stylesheet */
white-space: normal !important;
}

.wy-table-responsive {
overflow: visible !important;
}

}
2 changes: 1 addition & 1 deletion hypothesis-python/docs/changes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2362,7 +2362,7 @@ Did you know that of the 2\ :superscript:`64` possible floating-point numbers,

While nans *usually* have all zeros in the sign bit and mantissa, this
`isn't always true <https://wingolog.org/archives/2011/05/18/value-representation-in-javascript-implementations>`__,
and :wikipedia:`'signaling' nans might trap or error <https://en.wikipedia.org/wiki/NaN#Signaling_NaN>`.
and :wikipedia:`'signaling' nans might trap or error <NaN#Signaling_NaN>`.
To help distinguish such errors in e.g. CI logs, Hypothesis now prints ``-nan`` for
negative nans, and adds a comment like ``# Saw 3 signaling NaNs`` if applicable.

Expand Down
3 changes: 3 additions & 0 deletions hypothesis-python/docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
"hoverxref.extension",
"sphinx_codeautolink",
"sphinx_selective_exclude.eager_only",
"sphinx-jsonschema",
]

templates_path = ["_templates"]
Expand Down Expand Up @@ -147,6 +148,8 @@ def setup(app):

html_static_path = ["_static"]

html_css_files = ["wrap-in-tables.css"]

htmlhelp_basename = "Hypothesisdoc"

html_favicon = "../../brand/favicon.ico"
Expand Down
1 change: 1 addition & 0 deletions hypothesis-python/docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,3 +80,4 @@ check out some of the
support
packaging
reproducing
observability
76 changes: 76 additions & 0 deletions hypothesis-python/docs/observability.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
===================
Observability tools
===================

.. warning::

This feature is experimental, and could have breaking changes or even be removed
without notice. Try it out, let us know what you think, but don't rely on it
just yet!


Motivation
==========

Understanding what your code is doing - for example, why your test failed - is often
a frustrating exercise in adding some more instrumentation or logging (or ``print()`` calls)
and running it again. The idea of :wikipedia:`observability <Observability_(software)>`
is to let you answer questions you didn't think of in advance. In slogan form,

*Debugging should be a data analysis problem.*

By default, Hypothesis only reports the minimal failing example... but sometimes you might
want to know something about *all* the examples. Printing them to the terminal with
:ref:`verbose output <verbose-output>` might be nice, but isn't always enough.
This feature gives you an analysis-ready dataframe with useful columns and one row
per test case, with columns from arguments to code coverage to pass/fail status.

This is deliberately a much lighter-weight and task-specific system than e.g.
`OpenTelemetry <https://opentelemetry.io/>`__. It's also less detailed than time-travel
debuggers such as `rr <https://rr-project.org/>`__ or `pytrace <https://pytrace.com/>`__,
because there's no good way to compare multiple traces from these tools and their
Python support is relatively immature.


Configuration
=============

If you set the ``HYPOTHESIS_EXPERIMENTAL_OBSERVABILITY`` environment variable,
Hypothesis will log various observations to jsonlines files in the
``.hypothesis/observed/`` directory. You can load and explore these with e.g.
:func:`pd.read_json(".hypothesis/observed/*_testcases.jsonl", lines=True) <pandas.read_json>`,
or by using the :pypi:`sqlite-utils` and :pypi:`datasette` libraries::

sqlite-utils insert testcases.db testcases .hypothesis/observed/*_testcases.jsonl --nl --flatten
datasette serve testcases.db


Collecting more information
---------------------------

If you want to record more information about your test cases than the arguments and
outcome - for example, was ``x`` a binary tree? what was the difference between the
expected and the actual value? how many queries did it take to find a solution? -
Hypothesis makes this easy.

:func:`~hypothesis.event` accepts a string label, and optionally a string or int or
float observation associated with it. All events are collected and summarized in
:ref:`statistics`, as well as included on a per-test-case basis in our observations.

:func:`~hypothesis.target` is a special case of numeric-valued events: as well as
recording them in observations, Hypothesis will try to maximize the targeted value.
Knowing that, you can use this to guide the search for failing inputs.


Data Format
===========

We dump observations in `json lines format <https://jsonlines.org/>`__, with each line
describing either a test case or an information message. The tables below are derived
from :download:`this machine-readable JSON schema <schema_observations.json>`, to
provide both readable and verifiable specifications.

.. jsonschema:: ./schema_observations.json#/oneOf/0
:hide_key: /additionalProperties, /type
.. jsonschema:: ./schema_observations.json#/oneOf/1
:hide_key: /additionalProperties, /type
93 changes: 93 additions & 0 deletions hypothesis-python/docs/schema_observations.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
{
"title": "PBT Observations",
"description": "PBT Observations define a standard way to communicate what happened when property-based tests were run. They describe test cases, or general notifications classified as info, alert, or error messages.",
"oneOf": [
{
"title": "Test case",
"description": "Describes the inputs to and result of running some test function on a particular input. The test might have passed, failed, or been abandoned part way through (e.g. because we failed a ``.filter()`` condition).",
"type": "object",
"properties": {
"type": {
"const": "test_case",
"description": "A tag which labels this observation as data about a specific test case."
},
"status": {
"enum": ["passed", "failed", "gave_up"],
"description": "Whether the test passed, failed, or was aborted before completion (e.g. due to use of ``.filter()``). Note that if we gave_up partway, values such as arguments and features may be incomplete."
},
"status_reason": {
"type": "string",
"description": "If non-empty, the reason for which the test failed or was abandoned. For Hypothesis, this is usually the exception type and location."
},
"representation": {
"type": "string",
"description": "The string representation of the input."
},
"arguments": {
"type": "object",
"description": "A structured json-encoded representation of the input. Hypothesis always provides a dictionary of argument names to json-ified values, including interactive draws from the :func:`~hypothesis.strategies.data` strategy. In other libraries this can be any object."
},
"how_generated": {
"type": ["string", "null"],
"description": "How the input was generated, if known. In Hypothesis this might be an explicit example, generated during a particular phase with some backend, or by replaying the minimal failing example."
},
"features": {
"type": "object",
"description": "Runtime observations which might help explain what this test case did. Hypothesis includes target() scores, tags from event(), time spent generating data and running user code, and so on."
},
"coverage": {
"type": ["object", "null"],
"description": "Mapping of filename to list of covered line numbers, if coverage information is available, or None if not. Hypothesis deliberately omits stdlib and site-packages code.",
"additionalProperties": {
"type": "array",
"items": {"type": "integer", "minimum": 1},
"uniqueItems": true
}
},
"metadata": {
"type": "object",
"description": "Arbitrary metadata which might be of interest, but does not semantically fit in 'features'. For example, Hypothesis includes the traceback for failing tests here."
},
"property": {
"type": "string",
"description": "The name or representation of the test function we're running."
},
"run_start": {
"type": "number",
"description": "unix timestamp at which we started running this test function, so that later analysis can group test cases by run."
}
},
"required": ["type", "status", "status_reason", "representation", "arguments", "how_generated", "features", "coverage", "metadata", "property", "run_start"],
"additionalProperties": false
},
{
"title": "Information message",
"description": "Info, alert, and error messages correspond to a group of test cases or the overall run, and are intended for humans rather than machine analysis.",
"type": "object",
"properties": {
"type": {
"enum": [ "info", "alert", "error"],
"description": "A tag which labels this observation as general information to show the user. Hypothesis uses info messages to report statistics; alert or error messages can be provided by plugins."
},
"title": {
"type": "string",
"description": "The title of this message"
},
"content": {
"type": "string",
"description": "The body of the message. May use markdown."
},
"property": {
"type": "string",
"description": "The name or representation of the test function we're running. For Hypothesis, usually the Pytest nodeid."
},
"run_start": {
"type": "number",
"description": "unix timestamp at which we started running this test function, so that later analysis can group test cases by run."
}
},
"required": [ "type", "title", "content", "property", "run_start"],
"additionalProperties": false
}
]
}
10 changes: 9 additions & 1 deletion hypothesis-python/src/_hypothesis_pytestplugin.py
Original file line number Diff line number Diff line change
Expand Up @@ -373,6 +373,13 @@ def pytest_terminal_summary(terminalreporter):
if fex:
failing_examples.append(json.loads(fex))

from hypothesis.internal.observability import _WROTE_TO

if _WROTE_TO:
terminalreporter.section("Hypothesis")
for fname in sorted(_WROTE_TO):
terminalreporter.write_line(f"observations written to {fname}")

if failing_examples:
# This must have been imported already to write the failing examples
from hypothesis.extra._patching import gc_patches, make_patch, save_patch
Expand All @@ -384,7 +391,8 @@ def pytest_terminal_summary(terminalreporter):
except Exception:
# fail gracefully if we hit any filesystem or permissions problems
return
terminalreporter.section("Hypothesis")
if not _WROTE_TO:
terminalreporter.section("Hypothesis")
terminalreporter.write_line(
f"`git apply {fname}` to add failing examples to your code."
)
Expand Down
Loading

0 comments on commit 239c836

Please sign in to comment.