update expt name comment and folder parsing for training #978

Borda · 2020-09-16T08:31:08Z

updating argument description and cleaning weak parsing indexes from listed folders

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Enhanced logging details, clarified experiment naming, and refined hyperparameter evolution persistence.

📊 Key Changes

Made logger messages more readable by splitting details across multiple lines.
Changed the --name argument to rename the experiment folder rather than just the results file.
Improved logger information when starting TensorBoard with a formatted string.
Evolved hyperparameters are now saved in a path that respects the user-defined log directory.
Updated directory incrementing function to use regex for more robust directory numbering.

🎯 Purpose & Impact

💡 Easier to read logs: Logging messages are now more user-friendly.
📁 Clearer naming: Helps users easily identify and organize their experiments based on custom names.
📈 Improved TensorBoard integration: Users get clearer instructions on how to view TensorBoard outputs.
🛠 Flexible hyperparameter evolution storage: Storing evolved hyperparameters in user-defined locations allows for better organization and usage in future training.
🎲 Enhanced directory management: Uses regular expressions for improved reliability in incrementing experiment folders, reducing potential confusion or errors when running multiple experiments.

These changes aim to provide a smoother user experience, with intuitive navigation and better organization of training runs, which is key for users managing multiple experiments and models.

Borda · 2020-09-16T08:41:31Z

just thinking that it would be easier if when comment is passed an there is no such folder yet, the experiment would be named just with the comment...

glenn-jocher · 2020-09-16T20:19:11Z

@Borda thanks for the PR! I always like to try to understand use cases better. We made the runs/exp change because the default autogenerated tensorboard directories were quite verbose, including timestamps and device identifiers.

The strategy I've been using is to supply a --log-dir and --name when training, which combine into a unique incremented directory. For example:
python train.py --log runs/voc --name yolov5s_finetune_640

Would log to runs/voc/exp0_yolov5s_finetune_640. I use this on Colab with Google Drive mounted or organize runs:

Would the change only apply a unique increment if an existing directory with the same name was encountered (or if no --name was supplied)?

Borda · 2020-09-16T23:01:30Z

well, in this PR I didn't do completely what I mentioned in the PR just try to fix some edge case with dir parsing and update argument description...
Yes, optionally skip the exp{N} mind be useful for some folder grouping as you have in your screen-shot, I would rather see all sorted/grouped by model then chronologically by exp hash...

NanoCode012 · 2020-09-17T02:27:16Z

Hi @Borda , I actually find the current naming quite nice.

I would rather see all sorted/grouped by model then chronologically by exp hash...

Incidentally, if you use Tensorboard, you can "Filter runs" and see the visualization for a certain group of experiments. I also find the exp number first being more useful as I can tell chronologically which experiment I've performed first across many without checking time stamp.

It could also just be me building my workflow around the current repo. Feel free to give me your opinion.

Borda · 2020-09-17T07:25:11Z

@NanoCode012 I see, I would do it just optional and anyway it was just a proposal and it is not part of this PR
so let's move the discussion about naming to the Issues as this is just fixing comment and parsing the existing folders...

glenn-jocher · 2020-09-18T00:33:22Z

@Borda ah, I see. So the scope of the changes is to increase robustness to edge cases without modifying the functionality, I misunderstood before.

I see the argparser explanation fix also, thanks!

glenn-jocher · 2020-09-18T00:39:21Z

@Borda was there a specific edge case you had in mind that would cause the current code to fail?

Borda · 2020-09-18T05:12:48Z

@Borda was there a specific edge case you had in mind that would cause the current code to fail?

Yes, the case was that I had other custom folder which doesn't have the exp{n} on the beginning

Borda · 2020-09-24T14:33:00Z

@glenn-jocher gan ve merge this one? :]

Borda · 2020-09-29T07:53:25Z

@glenn-jocher ^^

Borda · 2020-10-12T06:12:11Z

@glenn-jocher ^^

train.py

glenn-jocher · 2020-10-12T09:37:23Z

train.py

        if opt.bucket:
            os.system('gsutil cp gs://%s/evolve.txt .' % opt.bucket)  # download evolve.txt if exists

-        for _ in range(300):  # generations to evolve
+        for _ in tqdm(range(300), desc='perform evolve >>'):  # generations to evolve


We can't use tqdm here as there are many print/logging statements within each training that it does not handle correctly. I tested evolution on the branch right now to verify the issue. Will remove.

well, we can and I would recommend writing somewhere which index is actually running moreover the tqdm gives you an estimate of how much longer you need to finish the evolv...
otherwise, you need to very hard way compute single training from an estimate of each epoch and extrapolate to all queued experiment, the worse case it opening the terminal after some time somehow count how many training finished because there no such count... :]

Yes, I understand, unfortunately tqdm is badly behaved when printing within the a tqdm loop. The best way to monitor evolution progress is to look at the yolov5/evolve.txt file, which will show 1 row per generation (sorted by fitness, top row best). hyp.evolve.yaml will also show the best generation index, I should probably update this line to show the best gen / total gen:

yolov5/data/hyp.finetune.yaml

Lines 6 to 9 in 0ada058

# Hyperparameter Evolution Results

# Generations: 306

# P R mAP.5 mAP.5:.95 box obj cls

# Metrics: 0.6 0.936 0.896 0.684 0.0115 0.00805 0.00146

This method also helps you keep track of distributed evolution progress using the example in #607, where multiple single-GPU processes can evolve to the same central evolve.txt and hyp file.

anchor evolution is working correctly now

prefer the single line readout for concise logging, which helps simplify notebook and tutorials etc.

train.py

glenn-jocher

@Borda was there a specific edge case you had in mind that would cause the current code to fail?

Yes, the case was that I had other custom folder which doesn't have the exp{n} on the beginning

Ok I think I found this note where you described the problem, but I don't quite understand. Currently we have

{log_dir}/exp{N}_{name}/
{log_dir}/exp{N}/

Can you provide steps to reproduce the problem so I can understand better? Thanks!

EDIT: All other remaining changes look good.

utils/general.py

Borda · 2020-10-12T09:58:14Z

Can you provide steps to reproduce the problem so I can understand better? Thanks!

the case as I remember was that if you have some other subfolders in the dir which do not match the experiment pattern like this-is-my_extra-folder so it triggers parsing because of _ but there is not a number so int('some-string') fails

glenn-jocher · 2020-10-13T12:10:14Z

Ok looks good! Will merge.

* comment * fix parsing * fix evolve * folder * tqdm * Update train.py * Update train.py * reinstate anchors into meta dict anchor evolution is working correctly now * reinstate logger prefer the single line readout for concise logging, which helps simplify notebook and tutorials etc. Co-authored-by: Glenn Jocher <[email protected]>

…#978) * comment * fix parsing * fix evolve * folder * tqdm * Update train.py * Update train.py * reinstate anchors into meta dict anchor evolution is working correctly now * reinstate logger prefer the single line readout for concise logging, which helps simplify notebook and tutorials etc. Co-authored-by: Glenn Jocher <[email protected]>

Borda force-pushed the expt-name branch from 02ecca7 to 26efd73 Compare September 16, 2020 08:39

Borda mentioned this pull request Sep 17, 2020

custome name for experiment #977

Closed

Borda changed the title ~~add expt name for training~~ update expt name comment and folder parsing for training Sep 17, 2020

Borda force-pushed the expt-name branch from beb12c4 to 0bdf913 Compare September 21, 2020 09:49

Borda force-pushed the expt-name branch from 0bdf913 to 423462c Compare September 25, 2020 09:38

Borda added 5 commits October 12, 2020 08:05

comment

360d360

fix parsing

9f63d84

fix evolve

5e9a72b

folder

27ddecd

tqdm

4cebfc8

Borda force-pushed the expt-name branch from 3a10ff4 to 4cebfc8 Compare October 12, 2020 06:11

Update train.py

e10f144

glenn-jocher reviewed Oct 12, 2020

View reviewed changes

glenn-jocher added 3 commits October 12, 2020 11:37

Update train.py

6f56e4a

reinstate anchors into meta dict

aabe4f7

anchor evolution is working correctly now

reinstate logger

8a32d25

prefer the single line readout for concise logging, which helps simplify notebook and tutorials etc.

glenn-jocher reviewed Oct 12, 2020

View reviewed changes

train.py Show resolved Hide resolved

glenn-jocher reviewed Oct 12, 2020

View reviewed changes

utils/general.py Show resolved Hide resolved

glenn-jocher merged commit 00917a6 into ultralytics:master Oct 13, 2020

Borda deleted the expt-name branch October 13, 2020 13:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update expt name comment and folder parsing for training #978

update expt name comment and folder parsing for training #978

Borda commented Sep 16, 2020 •

edited by UltralyticsAssistant

Loading

Borda commented Sep 16, 2020

glenn-jocher commented Sep 16, 2020

Borda commented Sep 16, 2020

NanoCode012 commented Sep 17, 2020

Borda commented Sep 17, 2020

glenn-jocher commented Sep 18, 2020

glenn-jocher commented Sep 18, 2020

Borda commented Sep 18, 2020

Borda commented Sep 24, 2020

Borda commented Sep 29, 2020

Borda commented Oct 12, 2020

glenn-jocher Oct 12, 2020

Borda Oct 12, 2020 •

edited

Loading

glenn-jocher Oct 13, 2020

glenn-jocher left a comment •

edited

Loading

Borda commented Oct 12, 2020

glenn-jocher commented Oct 13, 2020

	# Hyperparameter Evolution Results
	# Generations: 306
	# P R mAP.5 mAP.5:.95 box obj cls
	# Metrics: 0.6 0.936 0.896 0.684 0.0115 0.00805 0.00146

update expt name comment and folder parsing for training #978

update expt name comment and folder parsing for training #978

Conversation

Borda commented Sep 16, 2020 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

Borda commented Sep 16, 2020

glenn-jocher commented Sep 16, 2020

Borda commented Sep 16, 2020

NanoCode012 commented Sep 17, 2020

Borda commented Sep 17, 2020

glenn-jocher commented Sep 18, 2020

glenn-jocher commented Sep 18, 2020

Borda commented Sep 18, 2020

Borda commented Sep 24, 2020

Borda commented Sep 29, 2020

Borda commented Oct 12, 2020

glenn-jocher Oct 12, 2020

Choose a reason for hiding this comment

Borda Oct 12, 2020 • edited Loading

Choose a reason for hiding this comment

glenn-jocher Oct 13, 2020

Choose a reason for hiding this comment

glenn-jocher left a comment • edited Loading

Choose a reason for hiding this comment

Borda commented Oct 12, 2020

glenn-jocher commented Oct 13, 2020

Borda commented Sep 16, 2020 •

edited by UltralyticsAssistant

Loading

Borda Oct 12, 2020 •

edited

Loading

glenn-jocher left a comment •

edited

Loading