Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memory debug and profile #130

Merged
merged 9 commits into from
Dec 22, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions docs/advanced_usage/pipelines.rst
Original file line number Diff line number Diff line change
Expand Up @@ -423,6 +423,42 @@ An example of this situation, where we want to reuse the output of the first blo
predictions = pipeline.predict(X_test)
score = compute_score(y_test, predictions)

Pipeline debugging
------------------

Sometimes we might be interested in debugging a pipeline execution and obtain information
about the time, the memory usage, the inputs and outputs that each step takes. This is possible
by using the argument ``debug`` with the method ``fit`` and ``predict``. This argument allows us
to retrieve critical information from the pipeline execution:

* ``Time``: Elapsed time for the primitive and the given stage (fit or predict).
* ``Memory``: Amount of memory increase or decrease for the given primitive for that pipeline.
* ``Input``: The input values that the primitive takes for that specific step.
* ``Output``: The output produced by the primitive.


If the ``debug`` argument is set to ``True`` then a dictionary will be returned containing all the
elements listed previously::

result, debug_info = pipeline.fit(X_train, y_train, debug=True)

In case you want to retrieve only some of the elements listed above and skip the rest, you can
pass an ``str`` to the ``debug`` argument with any combination of the following characters:

* ``i``: To include inputs.
* ``o``: To include outputs.
* ``m``: To include used memory.
* ``t``: To include elapsed time.

For example, if we are only interested on capturing the elapsed time and used memory during the
``fit`` process, we can call the method as follows::

result, debug_info = pipeline.fit(X_train, y_train, debug='tm')

.. warning:: Bear in mind that if we use ``debug=True`` or saving the ``Input`` and ``Output``,
this will consume extra memory ram as it will create copies of the input data and
the output data for each primitive. For profiling it is recommended using the option
``tm`` as shown in the previous example.

.. _API Reference: ../api_reference.html
.. _primitives: ../primitives.html
Expand Down
Loading