run_model_on_task: make avoid_duplicate_runs=False the default #1143

joaquinvanschoren · 2022-06-24T12:23:57Z

Description

run_model_on_task has an option to avoid running experiments that already exist on OpenML, called avoid_duplicate_runs. This, however, requires an API key. It is currently the default, meaning that people can't try out this function without setting their API key.
This creates an unnecessary obstacle, especially for beginners who don't know that the avoid_duplicate_runs option can be switched off.

Steps/Code to Reproduce

from sklearn import ensemble
from openml import tasks, runs

clf = ensemble.RandomForestClassifier()
task = tasks.get_task(3954)
run = runs.run_model_on_task(clf, task)

Expected Results

The model should just run. The user may have no intention to upload the run to OpenML later.

Actual Results

An API key error is thrown

Versions

Linux-5.4.188+-x86_64-with-Ubuntu-18.04-bionic
Python 3.7.13 (default, Apr 24 2022, 01:04:09)
[GCC 7.5.0]
NumPy 1.21.6
SciPy 1.4.1
Scikit-Learn 1.0.2
OpenML 0.12.2

The text was updated successfully, but these errors were encountered:

PGijsbers · 2022-06-27T12:46:49Z

I had a closer look, and the problem is actually a post request which should have been a get request:

openml-python/openml/flows/functions.py

Line 256 in 99a62f6

    
           "flow/exists", "post", data={"name": name, "external_version": external_version},

With that fixed, you can run above code without an API key configured while still having the avoid_duplicate_runs functionality (at the cost of some additional server roundtrips to check if the run exists).

So now the question becomes, should we still prefer to have it turned off by default regardless?

PGijsbers · 2022-06-27T12:50:09Z

Also @mfeurer

mfeurer · 2022-06-27T14:49:14Z

That's an interesting question - maybe we can move this flag to the upload/publish function instead? It will serve the same purpose, but slightly improve the user experience as users can still run things without having to worry about duplicate runs.

PGijsbers · 2022-06-28T07:41:16Z

The idea of having it here is that the user may avoid unnecessary computation since they can identify there's a duplicate before running the experiment and download the results instead (not so much avoiding duplicates on the server, or so I thought).

mfeurer · 2022-07-05T07:15:07Z

Interesting, I thought it's to avoid duplicate stuff on the server. @joaquinvanschoren would you still like to remove that flag now that @PGijsbers has found a workaround?

Fixes #1143. This change means that runs results will by default not be fetched from the server, but computed locally. The benefit is that the operation no longer requires an API key or internet connection by default.

joaquinvanschoren added the Good First Issue Issues suitable for people new to contributing to openml-python! label Jun 24, 2022

chadmarchand mentioned this issue Jun 26, 2022

Set default value of avoid_duplicate_runs to false for run_model_on_task #1145

Closed

PGijsbers added Requires Feedback and removed Good First Issue Issues suitable for people new to contributing to openml-python! labels Jun 27, 2022

PGijsbers mentioned this issue Jun 27, 2022

Use GET instead of POST for flow exist #1147

Merged

PGijsbers added this to the 0.13 milestone Nov 17, 2022

mfeurer modified the milestones: 0.13.1, 0.14.0 Jun 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run_model_on_task: make avoid_duplicate_runs=False the default #1143

run_model_on_task: make avoid_duplicate_runs=False the default #1143

joaquinvanschoren commented Jun 24, 2022

PGijsbers commented Jun 27, 2022

PGijsbers commented Jun 27, 2022

mfeurer commented Jun 27, 2022 •

edited

Loading

PGijsbers commented Jun 28, 2022

mfeurer commented Jul 5, 2022

run_model_on_task: make avoid_duplicate_runs=False the default #1143

run_model_on_task: make avoid_duplicate_runs=False the default #1143

Comments

joaquinvanschoren commented Jun 24, 2022

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

PGijsbers commented Jun 27, 2022

PGijsbers commented Jun 27, 2022

mfeurer commented Jun 27, 2022 • edited Loading

PGijsbers commented Jun 28, 2022

mfeurer commented Jul 5, 2022

mfeurer commented Jun 27, 2022 •

edited

Loading