Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does ROCAUC work in score_array()? #137

Open
janosh opened this issue May 12, 2022 · 1 comment
Open

How does ROCAUC work in score_array()? #137

janosh opened this issue May 12, 2022 · 1 comment
Assignees
Labels
code Anything having to do with matbench python package code high priority

Comments

@janosh
Copy link
Member

janosh commented May 12, 2022

Seems like there's something wrong with score_array() in the classification case.

def score_array(true_array, pred_array, task_type):
"""
Score an array according to multiple metrics.
Args:
true_array (list or np.array): The ground truth array
pred_array (list or np.array): The predicted (test) array
task_type (str): Either regression or classification.
Returns:
(dict): dictionary of the scores, according to all defined
metrics.
"""
computed = {}
if task_type == REG_KEY:
metrics = REG_METRICS
elif task_type == CLF_KEY:
metrics = CLF_METRICS
else:
raise ValueError(
f"'task_type' must be on of {[REG_KEY, CLF_KEY]}, not '{task_type}'"
)
for metric in metrics:
mfunc = METRIC_MAP[metric]
if metric == "rocauc":
# Both arrays must be in probability form
# if pred. array is given in probabilities
if isinstance(pred_array[0], float):
true_array = homogenize_clf_array(true_array, to_probs=True)
# Other clf metrics always be converted to labels
elif metric in CLF_METRICS:
if isinstance(pred_array[0], float):
pred_array = homogenize_clf_array(pred_array, to_labels=True)
computed[metric] = mfunc(true_array, pred_array)
return computed

accuracy comes before rocauc in CLF_METRICS:

CLF_METRICS = ["accuracy", "balanced_accuracy", "f1", "rocauc"]

That means this code will convert the predictions to labels:

# Other clf metrics always be converted to labels
elif metric in CLF_METRICS:
    if isinstance(pred_array[0], float):
        pred_array = homogenize_clf_array(pred_array, to_labels=True)

in which case afterwards

if metric == "rocauc":
    # Both arrays must be in probability form
    # if pred. array is given in probabilities
    if isinstance(pred_array[0], float):
        true_array = homogenize_clf_array(true_array, to_probs=True)

will never be true and so you'd be trying to compute an ROCAUC from true labels vs predicted labels? Maybe I'm missing something?

@ardunn
Copy link
Collaborator

ardunn commented May 20, 2022

@janosh I think you are correct. I will fix this ASAP

@ardunn ardunn added high priority code Anything having to do with matbench python package code labels Jul 27, 2022
@ardunn ardunn self-assigned this Jul 27, 2022
robinruff added a commit to robinruff/matbench that referenced this issue Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code Anything having to do with matbench python package code high priority
Projects
None yet
Development

No branches or pull requests

2 participants