Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Python 3.8 support #210

Merged
merged 8 commits into from
Aug 11, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion azure-pipelines-steps.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ steps:
condition: and(succeeded(), eq(variables['Agent.OS'], 'Linux'))

# Install the package
- script: 'python -m pip install --upgrade pip && pip install --upgrade setuptools && pip install ${{ parameters.package }}'
- script: 'python -m pip install --upgrade pip && pip install --upgrade setuptools wheel && pip install ${{ parameters.package }}'
displayName: 'Install dependencies'

- ${{ parameters.body }}
77 changes: 43 additions & 34 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -103,40 +103,40 @@ jobs:
testRunTitle: 'Notebooks'
condition: succeededOrFailed()

- job: 'AutoML'
dependsOn: 'EvalChanges'
condition: eq(dependencies.EvalChanges.outputs['output.testCode'], 'True')
variables:
python.version: '3.6'
pool:
vmImage: 'ubuntu-16.04'
steps:
- template: azure-pipelines-steps.yml
parameters:
body:
- task: AzureCLI@2
displayName: 'AutoML tests'
inputs:
azureSubscription: 'automl'
scriptLocation: 'inlineScript'
scriptType: 'pscore'
powerShellIgnoreLASTEXITCODE: '' # string for now due to https://github.com/microsoft/azure-pipelines-tasks/issues/12266
inlineScript: |
$env:SUBSCRIPTION_ID = az account show --query id -o tsv
python setup.py pytest
env:
WORKSPACE_NAME: 'testWorkspace'
RESOURCE_GROUP: 'testingAutoMLEconML'
PYTEST_ADDOPTS: '-m "automl" -n 0'
COVERAGE_PROCESS_START: 'setup.cfg'

- task: PublishTestResults@2
displayName: 'Publish Test Results **/test-results.xml'
inputs:
testResultsFiles: '**/test-results.xml'
testRunTitle: 'AutoML'
condition: succeededOrFailed()
package: '.[automl]'
# - job: 'AutoML'
# dependsOn: 'EvalChanges'
# condition: eq(dependencies.EvalChanges.outputs['output.testCode'], 'True')
# variables:
# python.version: '3.6'
# pool:
# vmImage: 'ubuntu-16.04'
# steps:
# - template: azure-pipelines-steps.yml
# parameters:
# body:
# - task: AzureCLI@2
# displayName: 'AutoML tests'
# inputs:
# azureSubscription: 'automl'
# scriptLocation: 'inlineScript'
# scriptType: 'pscore'
# powerShellIgnoreLASTEXITCODE: '' # string for now due to https://github.com/microsoft/azure-pipelines-tasks/issues/12266
# inlineScript: |
# $env:SUBSCRIPTION_ID = az account show --query id -o tsv
# python setup.py pytest
# env:
# WORKSPACE_NAME: 'testWorkspace'
# RESOURCE_GROUP: 'testingAutoMLEconML'
# PYTEST_ADDOPTS: '-m "automl" -n 0'
# COVERAGE_PROCESS_START: 'setup.cfg'

# - task: PublishTestResults@2
# displayName: 'Publish Test Results **/test-results.xml'
# inputs:
# testResultsFiles: '**/test-results.xml'
# testRunTitle: 'AutoML'
# condition: succeededOrFailed()
# package: '.[automl]'
Comment on lines +106 to +139
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to disable AutoML tests, because it seems like they were already broken by something changing server-side for AutoML and I couldn't figure out the necessary changes after a small amount of debugging. I've opened a new issue (#270) to fix this.


- job: 'Linting'
dependsOn: 'EvalChanges'
Expand Down Expand Up @@ -185,6 +185,15 @@ jobs:
Windows, Python 3.7:
imageName: 'vs2017-win2016'
python.version: '3.7'
Linux, Python 3.8:
imageName: 'ubuntu-16.04'
python.version: '3.8'
macOS, Python 3.8:
imageName: 'macOS-10.15'
python.version: '3.8'
Windows, Python 3.8:
imageName: 'vs2017-win2016'
python.version: '3.8'

pool:
vmImage: $(imageName)
Expand Down
9 changes: 1 addition & 8 deletions doc/spec/estimation/dml.rst
Original file line number Diff line number Diff line change
Expand Up @@ -696,9 +696,7 @@ the case where this matrix has low rank: all the products can be embedded in som
space and the cross-price elasticities is a linear function of these low dimensional embeddings. This corresponds
to well-studied latent factor models in pricing. Our framework can easily handle this by using
a nuclear norm regularized multi-task regression in the final stage. For instance the
lightning package implements such a class:

.. testcode::
lightning package implements such a class::
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Released versions of lightning are currently not compatible with sklearn 0.23 (though this problem has been fixed in the lightning GitHub repo - see scikit-learn-contrib/lightning#142). We can revert this small change if there is a new PyPI release.


from econml.dml import DMLCateEstimator
from sklearn.preprocessing import PolynomialFeatures
Expand All @@ -714,8 +712,3 @@ lightning package implements such a class:
te_pred = est.const_marginal_effect(np.median(X, axis=0, keepdims=True))
print(te_pred)
print(np.linalg.svd(te_pred[0]))

.. testoutput::
:hide:

...
2 changes: 1 addition & 1 deletion doc/spec/estimation/forest.rst
Original file line number Diff line number Diff line change
Expand Up @@ -376,7 +376,7 @@ Similarly, we can call :class:`.DiscreteTreatmentOrthoForest`:
>>> est.fit(Y, T, W, W)
<econml.ortho_forest.DiscreteTreatmentOrthoForest object at 0x...>
>>> print(est.effect(W[:2]))
[1.01... 1.25...]
[0.99... 1.35...]

Let's now look at a more involved example with a high-dimensional set of confounders :math:`W`
and with more realistic noisy data. In this case we can just use the default parameters
Expand Down
13 changes: 6 additions & 7 deletions econml/_ortho_learner.py
Original file line number Diff line number Diff line change
Expand Up @@ -335,8 +335,7 @@ def score(self, Y, T, W=None, nuisances=None):
>>> est.score(y, X[:, 0], W=X[:, 1:])
0.00727995...
>>> est.model_final.model
LinearRegression(copy_X=True, fit_intercept=False, n_jobs=None,
normalize=False)
LinearRegression(fit_intercept=False)
>>> est.model_final.model.coef_
array([1.023649...])

Expand Down Expand Up @@ -388,15 +387,15 @@ def score(self, Y, T, W=None, nuisances=None):
est.fit(y, T, W=W)

>>> est.score_
0.00316040...
0.00673015...
>>> est.const_marginal_effect()
array([[1.001231...]])
array([[1.008401...]])
>>> est.effect()
array([1.001231...])
array([1.008401...])
>>> est.score(y, T, W=W)
0.00256958...
0.00310431...
>>> est.model_final.model.coef_[0]
1.00123158...
1.00840170...

Attributes
----------
Expand Down
13 changes: 3 additions & 10 deletions econml/_rlearner.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,22 +156,15 @@ def predict(self, X):
>>> est.score(y, X[:, 0], X=np.ones((X.shape[0], 1)), W=X[:, 1:])
9.73638006...e-05
>>> est.model_final.model
LinearRegression(copy_X=True, fit_intercept=False, n_jobs=None,
normalize=False)
LinearRegression(fit_intercept=False)
>>> est.model_final.model.coef_
array([0.999631...])
>>> est.score_
9.82623204...e-05
>>> [mdl._model for mdl in est.models_y]
[LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
normalize=False),
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
normalize=False)]
[LinearRegression(), LinearRegression()]
>>> [mdl._model for mdl in est.models_t]
[LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
normalize=False),
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
normalize=False)]
[LinearRegression(), LinearRegression()]

Attributes
----------
Expand Down
71 changes: 34 additions & 37 deletions econml/drlearner.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,30 +168,30 @@ class takes as input the parameter ``model_regressor``, which is an arbitrary sc
est.fit(y, T, X=X, W=None)

>>> est.const_marginal_effect(X[:2])
array([[0.527611..., 1.043938...],
[0.345923..., 0.422289...]])
array([[0.511640..., 1.144004...],
[0.378140..., 0.613143...]])
>>> est.effect(X[:2], T0=0, T1=1)
array([0.527611..., 0.345923...])
array([0.511640..., 0.378140...])
>>> est.score_
6.48100436...
5.11238581...
>>> est.score(y, T, X=X)
4.58598642...
5.78673506...
>>> est.model_cate(T=1).coef_
array([0.413288..., 0.02370... , 0.021575...])
array([0.434910..., 0.010226..., 0.047913...])
>>> est.model_cate(T=2).coef_
array([ 0.920586..., 0.0963652..., -0.060305...])
array([ 0.863723..., 0.086946..., -0.022288...])
>>> est.cate_feature_names()
<BLANKLINE>
>>> [mdl.coef_ for mdl in est.models_regression]
[array([ 1.435973...e+00, 3.342106...e-04, -7.102984...e-03, 6.707922...e-01,
1.984256...e+00]), array([ 1.494633...e+00, -2.463273...e-03, 2.009746...e-03, 6.828204...e-01,
2.034977...e+00])]
[array([ 1.472104...e+00, 1.984419...e-03, -1.103451...e-02, 6.984376...e-01,
2.049695...e+00]), array([ 1.455654..., -0.002110..., 0.005488..., 0.677090..., 1.998648...])]
>>> [mdl.coef_ for mdl in est.models_propensity]
[array([[-1.005830..., 0.087684..., 0.110012... ],
[ 0.087689..., 0.034947..., -0.088753...],
[ 0.918140..., -0.122632..., -0.021259...]]), array([[-0.742430..., 0.067423..., -0.080428...],
[ 0.046120..., -0.030004..., -0.076622...],
[ 0.696310..., -0.037418..., 0.157051...]])]
[array([[-0.747137..., 0.153419..., -0.018412...],
[ 0.083807..., -0.110360..., -0.076003...],
[ 0.663330..., -0.043058... , 0.094416...]]),
array([[-1.048348...e+00, 2.248997...e-04, 3.228087...e-02],
[ 1.911900...e-02, 1.241337...e-01, -8.196211...e-02],
[ 1.029229...e+00, -1.243586...e-01, 4.968123...e-02]])]

Beyond default models:

Expand All @@ -215,19 +215,19 @@ class takes as input the parameter ``model_regressor``, which is an arbitrary sc
est.fit(y, T, X=X, W=None)

>>> est.score_
1.9...
1.7...
>>> est.const_marginal_effect(X[:3])
array([[0.66..., 1.16...],
[0.56..., 0.86...],
[0.34..., 0.20...]])
array([[0.68..., 1.10...],
[0.56..., 0.79...],
[0.34..., 0.10...]])
>>> est.model_cate(T=2).coef_
array([ 0.71..., -0. , -0. ])
array([0.74..., 0. , 0. ])
>>> est.model_cate(T=2).intercept_
1.9...
>>> est.model_cate(T=1).coef_
array([0.23..., 0. , 0. ])
array([0.24..., 0.00..., 0. ])
>>> est.model_cate(T=1).intercept_
0.92...
0.94...

Attributes
----------
Expand Down Expand Up @@ -605,18 +605,17 @@ class LinearDRLearner(StatsModelsCateEstimatorDiscreteMixin, DRLearner):
est.fit(y, T, X=X, W=None, inference='statsmodels')

>>> est.effect(X[:3])
array([ 0.454507..., 0.324469..., -0.070401...])
array([ 0.409743..., 0.312604..., -0.127394...])
>>> est.effect_interval(X[:3])
(array([ 0.186553..., -0.117521..., -0.589221...]),
array([0.722462..., 0.766459..., 0.448419...]))
(array([ 0.120682..., -0.102543..., -0.663246...]), array([0.698803..., 0.727753..., 0.408458...]))
>>> est.coef_(T=1)
array([0.409764... , 0.019722..., 0.053648...])
array([ 0.450779..., -0.003214... , 0.063884... ])
>>> est.coef__interval(T=1)
(array([ 0.188595..., -0.168478..., -0.139291...]), array([0.630934..., 0.207922..., 0.246588...]))
(array([ 0.202646..., -0.207195..., -0.104558...]), array([0.698911..., 0.200767..., 0.232326...]))
>>> est.intercept_(T=1)
0.86450983...
0.88425066...
>>> est.intercept__interval(T=1)
(0.67765526..., 1.05136440...)
(0.68655813..., 1.08194320...)

Attributes
----------
Expand Down Expand Up @@ -801,19 +800,17 @@ class SparseLinearDRLearner(DebiasedLassoCateEstimatorDiscreteMixin, DRLearner):
est.fit(y, T, X=X, W=None, inference='debiasedlasso')

>>> est.effect(X[:3])
array([ 0.461389..., 0.319324..., -0.074323...])
array([ 0.418400..., 0.306400..., -0.130733...])
>>> est.effect_interval(X[:3])
(array([ 0.119569..., -0.165439..., -0.649570...]),
array([0.803210..., 0.804087..., 0.500923...]))
(array([ 0.056783..., -0.206438..., -0.739296...]), array([0.780017..., 0.819239..., 0.477828...]))
>>> est.coef_(T=1)
array([0.409848..., 0.026783..., 0.053017...])
array([0.449779..., 0.004807..., 0.061954...])
>>> est.coef__interval(T=1)
(array([ 0.213627..., -0.158139..., -0.137547...]),
array([0.606069..., 0.211706..., 0.243582...]))
(array([ 0.242194... , -0.190825..., -0.139646...]), array([0.657365..., 0.200440..., 0.263556...]))
>>> est.intercept_(T=1)
0.86461883...
0.88436847...
>>> est.intercept__interval(T=1)
(0.67790198..., 1.05133569...)
(0.68683788..., 1.08189907...)

Attributes
----------
Expand Down
7 changes: 1 addition & 6 deletions econml/sklearn_extensions/ensemble.py
Original file line number Diff line number Diff line change
Expand Up @@ -312,12 +312,7 @@ class SubsampledHonestForest(ForestRegressor, RegressorMixin):
n_estimators=1000)

>>> regr.fit(X_train, y_train)
SubsampledHonestForest(criterion='mse', honest=True, max_depth=None,
max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
n_estimators=1000, n_jobs=None, random_state=0,
subsample_fr='auto', verbose=0, warm_start=False)
SubsampledHonestForest(n_estimators=1000, random_state=0)
>>> regr.feature_importances_
array([0.40..., 0.35..., 0.11..., 0.11...])
>>> regr.predict(np.ones((1, 4)))
Expand Down
17 changes: 11 additions & 6 deletions econml/tests/test_dml.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,18 +34,22 @@ class TestDML(unittest.TestCase):

def test_cate_api(self):
"""Test that we correctly implement the CATE API."""
n = 20
n_c = 20 # number of rows for continuous models
n_d = 30 # number of rows for discrete models

def make_random(is_discrete, d):
def make_random(n, is_discrete, d):
if d is None:
return None
sz = (n, d) if d >= 0 else (n,)
if is_discrete:
while True:
arr = np.random.choice(['a', 'b', 'c'], size=sz)
# ensure that we've got at least two of every element
# ensure that we've got at least 6 of every element
# 2 outer splits, 3 inner splits when model_t is 'auto' and treatment is discrete
# NOTE: this number may need to change if the default number of folds in
# WeightedStratifiedKFold changes
_, counts = np.unique(arr, return_counts=True)
if len(counts) == 3 and counts.min() > 1:
if len(counts) == 3 and counts.min() > 5:
return arr
else:
return np.random.normal(size=sz)
Expand All @@ -55,7 +59,8 @@ def make_random(is_discrete, d):
for d_y in [3, 1, -1]:
for d_x in [2, None]:
for d_w in [2, None]:
W, X, Y, T = [make_random(is_discrete, d)
n = n_d if is_discrete else n_c
W, X, Y, T = [make_random(n, is_discrete, d)
for is_discrete, d in [(False, d_w),
(False, d_x),
(False, d_y),
Expand Down Expand Up @@ -699,7 +704,7 @@ def test_can_custom_splitter(self):
def test_can_use_featurizer(self):
"Test that we can use a featurizer, and that fit is only called during training"
dml = LinearDMLCateEstimator(LinearRegression(), LinearRegression(),
fit_cate_intercept=False, featurizer=OneHotEncoder(n_values='auto', sparse=False))
fit_cate_intercept=False, featurizer=OneHotEncoder(sparse=False))

T = np.tile([1, 2, 3], 6)
Y = np.array([1, 2, 3, 1, 2, 3])
Expand Down
4 changes: 2 additions & 2 deletions econml/tests/test_drlearner.py
Original file line number Diff line number Diff line change
Expand Up @@ -678,7 +678,7 @@ def test_sparse(self):
n_x = 50
n_nonzero = 1
n_w = 5
n = 1000
n = 2000
# Treatment effect coef
a = np.zeros(n_x)
nonzero_idx = np.random.choice(n_x, size=n_nonzero, replace=False)
Expand Down Expand Up @@ -713,7 +713,7 @@ def test_sparse(self):
y_lower, y_upper = sparse_dml.effect_interval(x_test, T0=0, T1=1)
in_CI = ((y_lower < true_eff) & (true_eff < y_upper))
# Check that a majority of true effects lie in the 5-95% CI
self.assertTrue(in_CI.mean() > 0.8)
self.assertGreater(in_CI.mean(), 0.8)

def _test_te(self, learner_instance, tol, te_type="const"):
if te_type not in ["const", "heterogeneous"]:
Expand Down
Loading