Skip to content
/ wvu Public

Win Vector LLC Python data science teaching tools (graphs and data manipulation)

License

Notifications You must be signed in to change notification settings

WinVector/wvu

Repository files navigation

wvu is a simple set of utilities for doing and teaching data science and machine learning methods. They are not replacements for the standard methods in sklearn.

import numpy.random
import pandas
import wvu.util

wvu.__version__
'0.3.6'

Illustration of cross-method plan.

wvu.util.mk_cross_plan(10, 2)
[{'train': [2, 3, 7, 8, 9], 'test': [0, 1, 4, 5, 6]},
 {'train': [0, 1, 4, 5, 6], 'test': [2, 3, 7, 8, 9]}]

Plotting example

help(wvu.util.plot_roc)
Help on function plot_roc in module wvu.util:

plot_roc(prediction, istrue, title='Receiver operating characteristic plot', *, truth_target=True, ideal_line_color=None, extra_points=None, show=True)
    Plot a ROC curve of numeric prediction against boolean istrue.
    
    :param prediction: column of numeric predictions
    :param istrue: column of items to predict
    :param title: plot title
    :param truth_target: value to consider target or true.
    :param ideal_line_color: if not None, color of ideal line
    :param extra_points: data frame of additional point to annotate graph, columns fpr, tpr, label
    :param show: logical, if True call matplotlib.pyplot.show()
    :return: calculated area under the curve, plot produced by call.
    
    Example:
    
    import pandas
    import wvpy.util
    
    d = pandas.DataFrame({
        'x': [1, 2, 3, 4, 5],
        'y': [False, False, True, True, False]
    })
    
    wvpy.util.plot_roc(
        prediction=d['x'],
        istrue=d['y'],
        ideal_line_color='lightgrey'
    )
    
    wvpy.util.plot_roc(
        prediction=d['x'],
        istrue=d['y'],
        ideal_line_color='lightgrey',
        extra_points=pandas.DataFrame({
            'tpr': [0, 1],
            'fpr': [0, 1],
            'label': ['AAA', 'BBB']
        })
    )
d = pandas.concat([
    pandas.DataFrame({
        'x': numpy.random.normal(size=1000),
        'y': numpy.random.choice([True, False], 
                                 p=(0.02, 0.98), 
                                 size=1000, 
                                 replace=True)}),
    pandas.DataFrame({
        'x': numpy.random.normal(size=200) + 5,
        'y': numpy.random.choice([True, False], 
                                 size=200, 
                                 replace=True)}),
])
wvu.util.plot_roc(
    prediction=d.x,
    istrue=d.y,
    ideal_line_color="DarkGrey",
    title='Example ROC plot')
<Figure size 432x288 with 0 Axes>

png

0.861085556577737
help(wvu.util.threshold_plot)
Help on function threshold_plot in module wvu.util:

threshold_plot(d: pandas.core.frame.DataFrame, pred_var: str, truth_var: str, truth_target: bool = True, threshold_range: Iterable[float] = (-inf, inf), plotvars: Iterable[str] = ('precision', 'recall'), title: str = 'Measures as a function of threshold', *, show: bool = True) -> None
    Produce multiple facet plot relating the performance of using a threshold greater than or equal to
    different values at predicting a truth target.
    
    :param d: pandas.DataFrame to plot
    :param pred_var: name of column of numeric predictions
    :param truth_var: name of column with reference truth
    :param truth_target: value considered true
    :param threshold_range: x-axis range to plot
    :param plotvars: list of metrics to plot, must come from ['threshold', 'count', 'fraction',
        'true_positive_rate', 'false_positive_rate', 'true_negative_rate', 'false_negative_rate',
        'precision', 'recall', 'sensitivity', 'specificity', 'accuracy']
    :param title: title for plot
    :param show: logical, if True call matplotlib.pyplot.show()
    :return: None, plot produced as a side effect
    
    Example:
    
    import pandas
    import wvpy.util
    
    d = pandas.DataFrame({
        'x': [1, 2, 3, 4, 5],
        'y': [False, False, True, True, False]
    })
    
    wvpy.util.threshold_plot(
        d,
        pred_var='x',
        truth_var='y',
        plotvars=("sensitivity", "specificity"),
    )
wvu.util.threshold_plot(
        d,
        pred_var='x',
        truth_var='y',
        plotvars=("sensitivity", "specificity"),
        title = "example plot"
    )

png

wvu.util.threshold_plot(
        d,
        pred_var='x',
        truth_var='y',
        plotvars=("precision", "recall"),
        title = "example plot"
    )

png

help(wvu.util.gain_curve_plot)
Help on function gain_curve_plot in module wvu.util:

gain_curve_plot(prediction, outcome, title='Gain curve plot', *, show=True)
    plot cumulative outcome as a function of prediction order (descending)
    
    :param prediction: vector of numeric predictions
    :param outcome: vector of actual values
    :param title: plot title
    :param show: logical, if True call matplotlib.pyplot.show()
    :return: None
    
    Example:
    
    d = pandas.DataFrame({
        'x': [.1, .2, .3, .4, .5],
        'y': [0, 0, 1, 1, 0]
    })
    
    wvpy.util.gain_curve_plot(
        prediction=d['x'],
        outcome=d['y'],
    )
wvu.util.gain_curve_plot(
        prediction=d['x'],
        outcome=d['y'],
        title = "gain curve plot"
)

png

wvu.util.lift_curve_plot(
        prediction=d['x'],
        outcome=d['y'],
        title = "lift curve plot"
)

png

About

Win Vector LLC Python data science teaching tools (graphs and data manipulation)

Resources

License

Stars

Watchers

Forks

Packages

No packages published