-
Notifications
You must be signed in to change notification settings - Fork 74.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tf.metrics.accuracy maintains a running accuracy? #9498
Comments
@sguada Do you have opinions here? The current API is designed around an evaluation process that restarts from scratch each time; it does seem hard to use in other contexts. |
For what it's worth, I agree with @gongzhitaao here, though we'll be stuck with these for a while given their use in TF 1.0. I think the metric names prefaced with |
Yeah when we created them they were streaming_metrics so it was streaming_accuracy. I think streaming metrics should have a simple way to reset them. FYI: Implementing non streaming accuracy is simple, ex: |
@sguada Thanks. That's what I'm using in my code. |
+1 for
Maybe a new named argument could be added, which causes the metrics to return an additional (third) result, which would be the I could also imagine another named argument which would cause the metrics to return just one value -- one-time metrics call (i.e., |
Note that this is tightly coupled to #4814. |
Nice suggestion! This has puzzled me for weeks. Suppose we set every_n_steps = 100 so the monitor will be called every 100 steps. Also, suppose the input_fn function for the validation monitor will yield a streaming of data. Let's assume 10 batches. Case 1: the auc state is reset every time the validation monitor is called, there for, the streaming is done for the 10 batches in each validation step. Case 2: the auc state is NOT reset, so the streaming auc is computed from the first call of validation monitor. Namely, the first output (at 100 steps) is computed from 10 batches, the second validation output (at 200 steps) is computed based on the streaming auc after the first call and also the 10 batches fed in. The third output (at 300 steps) is computed based on the streaming auc after the second call and also the 10 batches fed in. Question1, which one of the scenarios is implemented? Question2, if we use tf.metrics.auc, what is the difference? In this doc they say: For estimation of the metric over a stream of data, the function creates an update_op operation that updates these variables and returns the auc. So, the streaming is from the very beginning or within each call for the validation monitor? |
Also, from tensorflow output, INFO:tensorflow:Saving dict for global step 39807: accuracy = 0.85421, accuracy/baseline_label_mean = 0.14821, accuracy/threshold_0.500000_mean = 0.85421, auc = 0.686321, global_step = 39807, labels/actual_label_mean = 0.14821, labels/prediction_mean = 0.146081, loss = 0.39175, precision/positive_threshold_0.500000_mean = 0.580026, recall/positive_threshold_0.500000_mean = 0.0591728, validate_confusion_matrix = [[84052 699] Why are this two AUC differs so much? When i use model.evaluate on train, validate, test data set, all the output AUCs are very close to the first auc shows in bold above. |
All the metrics are streaming, which is why we removed the name. The non-streaming use case is typically trivial to implement (see @sguada's example). It is also almost entirely useless if you use these in evaluation, since you're almost never interested in a single batch result. @lancerts, the ValidationMonitor will start a completely new evaluation each time, which should reset the state. I will close this issue -- this is working as intended. If you have a specific feature request, please file a new issue. Thanks! |
When running @gongzhitaao code snippet at the top multiple times, I sometimes get different outputs for v = sess.run([acc, acc_op], feed_dict={x: [1, 0, 0, 0, 0],
y: [1, 0, 0, 0, 1]})
print(v)
# [0.0, 0.8] # shouldn't the first be 0.8 as well?
# [0.8, 0.8] # running this several times will sometimes produce this output, but less frequently I have this code running in standalone script, each run is completely independent from the previous run. Any ideas anyone? |
The fuzzy function behavior is sucks! |
@kashefy If you execute both If This is turning into more of a StackOverflow discussion, I suggest we take additional questions there. |
For people who are still looking to reset and manage the tensorflow variables: Avoiding headaches with tf.metrics Sept. 11, 2017 http://ronny.rest/blog/post_2017_09_11_tf_metrics/ I find he provides a nice intuition as well on the |
It looks like the article linked by @Knight-H is now moved to https://steemit.com/machine-learning/@ronny.rest/avoiding-headaches-with-tf-metrics To add to the article, Tensorflow also automatically adds the local variables of tf.metrics to the collection tf.GraphKeys.METRIC_VARIABLES. Sadly this fact is not mentioned in the current documentation. |
For simple, one-batch accuracy:
@sguada's answer uses deprecated functions. |
Sorry but this does not make sense to me the acc and update_op they are both part of the computational graph and no matter where we put them in the run call, ie. session.run([acc, update_op]) vs session.run([update_op, acc]), the order of execution will be the same, i'll get surprised if it is the other way. In my view this is a bug. |
I use the
tf.metrics.accuracy
, however it is a bit counter-intuitive in that it maintains a running accuracy (the doc agrees with this). The following simple script illustrates the situationMy concerns are
tf.metrics.accuracy
, i.e., thecount
andtotal
.The text was updated successfully, but these errors were encountered: