-
-
Notifications
You must be signed in to change notification settings - Fork 16.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update expt name comment and folder parsing for training #978
Conversation
just thinking that it would be easier if when comment is passed an there is no such folder yet, the experiment would be named just with the comment... |
@Borda thanks for the PR! I always like to try to understand use cases better. We made the runs/exp change because the default autogenerated tensorboard directories were quite verbose, including timestamps and device identifiers. The strategy I've been using is to supply a --log-dir and --name when training, which combine into a unique incremented directory. For example: Would log to Would the change only apply a unique increment if an existing directory with the same name was encountered (or if no --name was supplied)? |
well, in this PR I didn't do completely what I mentioned in the PR just try to fix some edge case with dir parsing and update argument description... |
Hi @Borda , I actually find the current naming quite nice.
Incidentally, if you use Tensorboard, you can "Filter runs" and see the visualization for a certain group of experiments. I also find the exp number first being more useful as I can tell chronologically which experiment I've performed first across many without checking time stamp. It could also just be me building my workflow around the current repo. Feel free to give me your opinion. |
@NanoCode012 I see, I would do it just optional and anyway it was just a proposal and it is not part of this PR |
@Borda ah, I see. So the scope of the changes is to increase robustness to edge cases without modifying the functionality, I misunderstood before. I see the argparser explanation fix also, thanks! |
@Borda was there a specific edge case you had in mind that would cause the current code to fail? |
Yes, the case was that I had other custom folder which doesn't have the exp{n} on the beginning |
@glenn-jocher gan ve merge this one? :] |
train.py
Outdated
if opt.bucket: | ||
os.system('gsutil cp gs://%s/evolve.txt .' % opt.bucket) # download evolve.txt if exists | ||
|
||
for _ in range(300): # generations to evolve | ||
for _ in tqdm(range(300), desc='perform evolve >>'): # generations to evolve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't use tqdm here as there are many print/logging statements within each training that it does not handle correctly. I tested evolution on the branch right now to verify the issue. Will remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, we can and I would recommend writing somewhere which index is actually running moreover the tqdm
gives you an estimate of how much longer you need to finish the evolv...
otherwise, you need to very hard way compute single training from an estimate of each epoch and extrapolate to all queued experiment, the worse case it opening the terminal after some time somehow count how many training finished because there no such count... :]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand, unfortunately tqdm is badly behaved when printing within the a tqdm loop. The best way to monitor evolution progress is to look at the yolov5/evolve.txt file, which will show 1 row per generation (sorted by fitness, top row best). hyp.evolve.yaml will also show the best generation index, I should probably update this line to show the best gen / total gen:
Lines 6 to 9 in 0ada058
# Hyperparameter Evolution Results | |
# Generations: 306 | |
# P R mAP.5 mAP.5:.95 box obj cls | |
# Metrics: 0.6 0.936 0.896 0.684 0.0115 0.00805 0.00146 |
This method also helps you keep track of distributed evolution progress using the example in #607, where multiple single-GPU processes can evolve to the same central evolve.txt and hyp file.
anchor evolution is working correctly now
prefer the single line readout for concise logging, which helps simplify notebook and tutorials etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Borda was there a specific edge case you had in mind that would cause the current code to fail?
Yes, the case was that I had other custom folder which doesn't have the exp{n} on the beginning
Ok I think I found this note where you described the problem, but I don't quite understand. Currently we have
{log_dir}/exp{N}_{name}/
{log_dir}/exp{N}/
Can you provide steps to reproduce the problem so I can understand better? Thanks!
EDIT: All other remaining changes look good.
the case as I remember was that if you have some other subfolders in the dir which do not match the experiment pattern like |
Ok looks good! Will merge. |
* comment * fix parsing * fix evolve * folder * tqdm * Update train.py * Update train.py * reinstate anchors into meta dict anchor evolution is working correctly now * reinstate logger prefer the single line readout for concise logging, which helps simplify notebook and tutorials etc. Co-authored-by: Glenn Jocher <[email protected]>
…#978) * comment * fix parsing * fix evolve * folder * tqdm * Update train.py * Update train.py * reinstate anchors into meta dict anchor evolution is working correctly now * reinstate logger prefer the single line readout for concise logging, which helps simplify notebook and tutorials etc. Co-authored-by: Glenn Jocher <[email protected]>
…#978) * comment * fix parsing * fix evolve * folder * tqdm * Update train.py * Update train.py * reinstate anchors into meta dict anchor evolution is working correctly now * reinstate logger prefer the single line readout for concise logging, which helps simplify notebook and tutorials etc. Co-authored-by: Glenn Jocher <[email protected]>
…#978) * comment * fix parsing * fix evolve * folder * tqdm * Update train.py * Update train.py * reinstate anchors into meta dict anchor evolution is working correctly now * reinstate logger prefer the single line readout for concise logging, which helps simplify notebook and tutorials etc. Co-authored-by: Glenn Jocher <[email protected]>
updating argument description and cleaning weak parsing indexes from listed folders
🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
🌟 Summary
Enhanced logging details, clarified experiment naming, and refined hyperparameter evolution persistence.
📊 Key Changes
--name
argument to rename the experiment folder rather than just the results file.🎯 Purpose & Impact
These changes aim to provide a smoother user experience, with intuitive navigation and better organization of training runs, which is key for users managing multiple experiments and models.