You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Imaging we are implement a new logger, let's see how we can define the save_dir property.
classCustomLogger(Logger):
def__init__(self, all_experiment_path):
self._root_dir=all_experiment_path@property@overridedefsave_dir(self) ->str:
option1=os.path.join(self._root_dir, self.name, str(self.version))
# option2 = self._root_dir# option3 = os.path.join(self._root_dir, self.name)returnoption1# or option2 or option3
The first option includes the experiment name and version as subdirectories, and the second one does not.
As CustomLogger is not a subclass of TensorBoardLogger or CSVLogger. the Trainer.log_dir is defined to be logger.save_dir if the first logger is a CustomLogger, according to here.
The config.yaml file will be saved to logger.save_dir/config.yaml. The checkpoint file will be saved to logger.save_dir/logger.name/logger.version, which
expand to be logger._root_dir/logger.name/logger.version/logger.name/logger.version, which is with duplicate directory hierarchy, according to __resolve_ckpt_dir() function.
I DO NOT like it. The following tree view shows this clearly.
However, there is no way to remove the duplicate hierarchy for now. If the second or third option is adopted, the config.yaml file will be saved in a directory irrelevant to the experiment version, and will be overwrite by next experiment. This is not acceptable.
I tried to fix this problem and was willingly to submit a PR. Fixing this issue seems more complicated than imagined.
Additional context
Because I'm new to Lightning, I'm not sure I fully understand Logger's design. From the current code, several attributes such as root_dir, log_dir, and save_dir are not clearly defined. For example, in the defination of lightning.fabric.loggers.tensorboard.TensorBoardLogger class, the experiment name is not included in the root_dir path, but for lightning.pytorch.loggers.tensorboard.TensorBoardLogger, it is included.
The root_dir and log_dir properties can be None introduces unnecessary complexities. The documentation of log_dir says:
Return the root directory where all versions of an experiment get saved, or `None` if the logger does not save data locally.
If I understand it correctly, the property root_dir and log_dir should never be None for all concrete loggers. This is because the config.yaml should always be saved to somewhere like root_dir/name/version. Even if a logger does not save config.yaml locally, we can still initialize root_dir to the current directory ('.'), but not really use it.
I have two questions:
Why we need the save_dir property? Why always setting root_dir to be <root>/name and log_dir to be <root>/name/version not sufficient?
What makes TensorBoardLogger and CSVLogger special that result in the Trainer.log_dir to be logger.log_dir instead of logger.save_dir?
Outline & Motivation
The usage of
log_dir
andsave_dir
are confusion.Imaging we are implement a new logger, let's see how we can define the
save_dir
property.The first option includes the experiment name and version as subdirectories, and the second one does not.
As
CustomLogger
is not a subclass ofTensorBoardLogger
orCSVLogger
. theTrainer.log_dir
is defined to belogger.save_dir
if the first logger is aCustomLogger
, according to here.The
config.yaml
file will be saved tologger.save_dir
/config.yaml
. The checkpoint file will be saved tologger.save_dir
/logger.name
/logger.version
, whichexpand to be
logger._root_dir
/logger.name
/logger.version
/logger.name
/logger.version
, which is with duplicate directory hierarchy, according to__resolve_ckpt_dir()
function.I DO NOT like it. The following tree view shows this clearly.
However, there is no way to remove the duplicate hierarchy for now. If the second or third option is adopted, the
config.yaml
file will be saved in a directory irrelevant to the experiment version, and will be overwrite by next experiment. This is not acceptable.Pitch
I'm expecting the result hierarchy like this:
I tried to fix this problem and was willingly to submit a PR. Fixing this issue seems more complicated than imagined.
Additional context
Because I'm new to Lightning, I'm not sure I fully understand Logger's design. From the current code, several attributes such as
root_dir
,log_dir
, andsave_dir
are not clearly defined. For example, in the defination oflightning.fabric.loggers.tensorboard.TensorBoardLogger
class, the experiment name is not included in theroot_dir
path, but forlightning.pytorch.loggers.tensorboard.TensorBoardLogger
, it is included.The
root_dir
andlog_dir
properties can beNone
introduces unnecessary complexities. The documentation oflog_dir
says:If I understand it correctly, the property
root_dir
andlog_dir
should never beNone
for all concrete loggers. This is because theconfig.yaml
should always be saved to somewhere likeroot_dir/name/version
. Even if a logger does not saveconfig.yaml
locally, we can still initializeroot_dir
to the current directory ('.'), but not really use it.I have two questions:
Why we need the
save_dir
property? Why always settingroot_dir
to be<root>/name
andlog_dir
to be<root>/name/version
not sufficient?What makes
TensorBoardLogger
andCSVLogger
special that result in theTrainer.log_dir
to belogger.log_dir
instead oflogger.save_dir
?cc @justusschock @awaelchli
The text was updated successfully, but these errors were encountered: