You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After updating to ProtoBuf 1.0.0 #124 I found that summaries are not logged correctly to Tensorboard.
Some of them do get logged but some don't. I suspect that's because some summaries are fine but after trying to log incorrectly with some of them, the file or tensorboard stops registring the ones following that.
I prepared a minimal reproducing code by revising the Flux example to the new Flux API (the existing example uses a deprecated API) nomadbl@fc9ba3e
Which after some googling I can only speculate it indicates it has something to do with multiprocessing and the file trying to get written by multiple instances of the logger in different threads.
So far I tried (without success) to fix it under that assumption by specifying the logger should lock the file: src/TBLogger.jl, 119: file = open(fpath, "w"; lock=true)
Any other ideas or insights are welcome. I'll try to isolate the issue using the above mentioned reproducing code.
The text was updated successfully, but these errors were encountered:
I succeeded in altering the flux example such that the bug does not occur: nomadbl@e0f2245
The trick was to change lines like @info "train" loss=loss_fn(pred, y) acc=accuracy(pred, y)
into @info "train/vals" loss=loss_fn(pred, y) acc=accuracy(pred, y)
That is, the bug is somehow related to tag names.
nomadbl
changed the title
Summaries missing when logging from Flux training loop
Summaries require names of format name/tagJul 2, 2023
Since this seems to work with the workaround above I'm leaving this for now.
I suspect that this has to be fixed by setting node_name or tag correctly in Summary_Values (i.e. var"Summary.Value")
I wasn't able to determine how to do this by reading the tensorboard/tensorflow documentation. Looks like a pretty in depth understanding is required there.
After updating to ProtoBuf 1.0.0 #124 I found that summaries are not logged correctly to Tensorboard.
Some of them do get logged but some don't. I suspect that's because some summaries are fine but after trying to log incorrectly with some of them, the file or tensorboard stops registring the ones following that.
I prepared a minimal reproducing code by revising the Flux example to the new Flux API (the existing example uses a deprecated API)
nomadbl@fc9ba3e
During logging I observe an error message
Which after some googling I can only speculate it indicates it has something to do with multiprocessing and the file trying to get written by multiple instances of the logger in different threads.
So far I tried (without success) to fix it under that assumption by specifying the logger should lock the file:
src/TBLogger.jl
, 119:file = open(fpath, "w"; lock=true)
Any other ideas or insights are welcome. I'll try to isolate the issue using the above mentioned reproducing code.
The text was updated successfully, but these errors were encountered: