Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues & feedback #32

Open
ad48hp opened this issue Apr 15, 2024 · 10 comments
Open

Issues & feedback #32

ad48hp opened this issue Apr 15, 2024 · 10 comments

Comments

@ad48hp
Copy link

ad48hp commented Apr 15, 2024

Hello,
i've used pip install torch===1.6.0 torchvision===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html
and it deviated from using
pip install torch===1.6.0+cpu torchvision===0.7.0+cpu -f https://download.pytorch.org/whl/torch_stable.html

I've read that the program can be installed without CUDA, so i guess both options are right ?

I've read there that GPU is needed.
#21
But the installation guide shows that the CUDA is optional.
https://github.com/ProGamerGov/dream-creator/blob/master/INSTALL.md

@ad48hp ad48hp changed the title Calculate the mean and standard deviation of your dataset Issues & feedback Apr 15, 2024
@ad48hp
Copy link
Author

ad48hp commented Apr 15, 2024

I tried the version with CUDA, and so far it works.
However, when i create images, they resemble more the original GoogleNet than my dataseet.
How large should the dataset be ?
I tried 200 and 2,000 images (2 classes), and the above happens.
out_new_10epochs
out_new_20epochs
out_old_50epochs

`2,000 image dataset; 2 classes - Test
Running optimization with ADAM
Iteration 25, Loss -4057.275146484375
Iteration 50, Loss -6080.12939453125
Iteration 75, Loss -7767.14990234375
Iteration 100, Loss -8851.4541015625

200 image dataset; 2 classes
Epoch 50/120

train Loss: 0.0021 Acc: 1.0000
Time Elapsed 15m 12s
val Loss: 0.0000 Acc: 1.0000
Time Elapsed 15m 12s

2,000 image dataset; 2 classes
Epoch 20/120

train Loss: 0.0103 Acc: 0.9988
Time Elapsed 13m 22s
val Loss: 0.0002 Acc: 1.0000
Time Elapsed 13m 28s`

@ad48hp
Copy link
Author

ad48hp commented Apr 15, 2024

I tried to change the layer training to freeze conv1, and it's not very different.
out

I've tried to do the 10 classes dataset, totally comprising of 1,000 images (100 per class), and it looks similar.

`Epoch 10/120

train Loss: 0.6038 Acc: 0.8925
Time Elapsed 2m 26s
val Loss: 0.1755 Acc: 0.9500
Time Elapsed 2m 28s`
out

@ad48hp
Copy link
Author

ad48hp commented Apr 17, 2024

Does the model get trained based on pretrained GoogleNet ?
If so, can you add training the model from scratch ?

I know this is old project, but it pottentially has amazing potential.

@ProGamerGov
Copy link
Owner

ProGamerGov commented Apr 20, 2024

@ad48hp The base models were trained with millions of images, so I'm not sure how well training from scratch would work. All you would have to do is set the layers to all be trainable and zero all the model values (or use some other number combo)

Also, some neurons/channels are going to stay the while finetuning as I discovered when I made this project, whereas others will change to match the new content

@ad48hp
Copy link
Author

ad48hp commented Apr 22, 2024

Can you upload a script in torch, which would set all the layers to be trainable, and zero all the model values ?
I'm not sure how to do this by myself.

@ProGamerGov
Copy link
Owner

ProGamerGov commented Apr 29, 2024

@ad48hp Apologies for the late reply, but zeroing models is actually a bit of a bad idea. You general want a range of numbers to initialize a brand new model for training.

There are multiple options for initialization, and I am unsure of which ones are best for training Inception v1 models: https://pytorch.org/docs/stable/nn.init.html

Here's a quick function that ChatGPT assisted me with, that should initialize the model so that you can train from scratch. Model weights generals have the weight and bias components that you need to use, and you might need to do a bit of testing/research to see what initialization options are best.

import torch.nn as nn

def initialize_model_weights(model):
    for module in model.modules():
        if isinstance(module, (nn.Conv1d, nn.Conv2d, nn.Conv3d, nn.ConvTranspose2d)):
            nn.init.xavier_uniform_(module.weight)
            if module.bias is not None:
                nn.init.constant_(module.bias, 0)
        elif isinstance(module, nn.Linear):
            nn.init.kaiming_uniform_(module.weight)
            if module.bias is not None:
                nn.init.constant_(module.bias, 0)

This code should remove all learned knowledge from the model

@ProGamerGov
Copy link
Owner

Also, if you aren't using a GPU then its going to be painfully slow.

@ad48hp
Copy link
Author

ad48hp commented May 19, 2024

Can you write complete code for the init ?
I don't really know where to put this code.

@ad48hp
Copy link
Author

ad48hp commented Jul 25, 2024

I tried to add the code, like following:
Sorry for incorrect formating.

`
import argparse
import torch
import torch.optim as optim

from utils.training_utils import save_model, load_dataset, reset_weights, set_seed, load_checkpoint, setup_model
from utils.inceptionv1_caffe import InceptionV1_Caffe
from utils.train_model import train_model

import torch.nn as nn

def initialize_model_weights(model):
for module in model.modules():
if isinstance(module, (nn.Conv1d, nn.Conv2d, nn.Conv3d, nn.ConvTranspose2d)):
nn.init.xavier_uniform_(module.weight)
if module.bias is not None:
nn.init.constant_(module.bias, 0)
elif isinstance(module, nn.Linear):
nn.init.kaiming_uniform_(module.weight)
if module.bias is not None:
nn.init.constant_(module.bias, 0)

def main():
parser = argparse.ArgumentParser()
# Input options
parser.add_argument("-data_path", help="Path to your dataset", type=str, default='')
parser.add_argument("-model_file", type=str, default='models/pt_bvlc.pth')
parser.add_argument("-data_mean", type=str, default='')
parser.add_argument("-data_sd", type=str, default='')
parser.add_argument("-base_model", choices=['bvlc', 'p365', '5h'], default='bvlc')

# Training options
parser.add_argument("-num_epochs", type=int, default=120)
parser.add_argument("-batch_size", type=int, default=32)
parser.add_argument( "-lr", "-learning_rate", type=float, default=1e-2)
parser.add_argument("-optimizer", choices=['sgd', 'adam'], default='sgd')
parser.add_argument("-train_workers", type=int, default=0)
parser.add_argument("-val_workers", type=int, default=0)
parser.add_argument("-balance_classes", action='store_true')

# Output options
parser.add_argument("-save_epoch", type=int, default=5)
parser.add_argument("-output_name", type=str, default='bvlc_out.pth')
parser.add_argument("-individual_acc", action='store_true')
parser.add_argument("-save_csv", action='store_true')
parser.add_argument("-csv_dir", type=str, default='')

# Other options
parser.add_argument("-not_caffe", action='store_true')
parser.add_argument("-use_device", type=str, default='cuda:0')
parser.add_argument("-seed", type=int, default=-1)

# Dataset options
parser.add_argument("-val_percent", type=float, default=0.2)

# Model options
parser.add_argument("-reset_weights", action='store_true')
parser.add_argument("-delete_branches", action='store_true')
parser.add_argument("-freeze_aux1_to", choices=['none', 'loss_conv', 'loss_fc', 'loss_classifier'], default='none')
parser.add_argument("-freeze_aux2_to", choices=['none', 'loss_conv', 'loss_fc', 'loss_classifier'], default='none')
parser.add_argument("-freeze_to", choices=['none', 'conv1', 'conv2', 'conv3', 'mixed3a', 'mixed3b', 'mixed4a', 'mixed4b', 'mixed4c', 'mixed4d', 'mixed4e', 'mixed5a', 'mixed5b'], default='mixed3b')
parser.add_argument("-toggle_layers", type=str, default='none')
params = parser.parse_args()
main_func(params)

def main_func(params):
assert params.data_mean != '', "-data_mean is required"
assert params.data_sd != '', "-data_sd is required"
params.data_mean = [float(m) for m in params.data_mean.split(',')]
params.data_sd = [float(s) for s in params.data_sd.split(',')]

if params.seed > -1:
    set_seed(params.seed)
rnd_generator = torch.Generator(device='cpu') if params.seed > -1 else None

# Setup image training data
training_data, num_classes, class_weights = load_dataset(data_path=params.data_path, val_percent=params.val_percent, batch_size=params.batch_size, \
                                                         input_mean=params.data_mean, input_sd=params.data_sd, use_caffe=not params.not_caffe, \
                                                         train_workers=params.train_workers, val_workers=params.val_workers, balance_weights=params.balance_classes, \
                                                         rnd_generator=rnd_generator)


# Setup model definition
cnn, is_start_model, base_model = setup_model(params.model_file, num_classes=num_classes, base_model=params.base_model, pretrained=not params.reset_weights)

if params.optimizer == 'sgd':
    optimizer = optim.SGD(cnn.parameters(), lr=params.lr, momentum=0.9)
elif params.optimizer == 'adam':
    optimizer = optim.Adam(cnn.parameters(), lr=params.lr)

lrscheduler = optim.lr_scheduler.StepLR(optimizer, step_size=8, gamma=0.96)

if params.balance_classes:
    criterion = torch.nn.CrossEntropyLoss(weight=class_weights.to(params.use_device))
else:
    criterion = torch.nn.CrossEntropyLoss()

# Maybe delete braches
if params.delete_branches and not is_start_model:
    try:
        cnn.remove_branches()
        has_branches = False
    except:
        has_branches = True
        pass
else:
   has_branches = True


# Load pretrained model weights
start_epoch = 1
if not params.reset_weights:
    cnn, optimizer, lrscheduler, start_epoch = load_checkpoint(cnn, params.model_file, optimizer, lrscheduler, num_classes, is_start_model=is_start_model)

if params.delete_branches and is_start_model:
    try:
        cnn.remove_branches()
        has_branches = False
    except:
        has_branches = True
        pass
else:
   has_branches = True


# Maybe freeze some model layers
main_layer_list = ['conv1', 'conv2', 'conv3', 'mixed3a', 'mixed3b', 'mixed4a', 'mixed4b', 'mixed4c', 'mixed4d', 'mixed4e', 'mixed5a', 'mixed5b']
if params.freeze_to != 'none':
    for layer in main_layer_list:
        if params.freeze_to == layer:
            break
        for param in getattr(cnn, layer).parameters():
            param.requires_grad = False
branch_layer_list = ['loss_conv', 'loss_fc', 'loss_classifier']
if params.freeze_aux1_to != 'none' and has_branches:
    for layer in branch_layer_list:
        if params.freeze_aux1_to == layer:
            break
        for param in getattr(getattr(cnn, 'aux1'), layer).parameters():
            param.requires_grad = False
if params.freeze_aux2_to != 'none' and has_branches:
    for layer in branch_layer_list:
        if params.freeze_aux2_to == layer:
            break
        for param in getattr(getattr(cnn, 'aux2'), layer).parameters():
            param.requires_grad = False


   # Optionally freeze/unfreeze specific layers and sub layers
if params.toggle_layers != 'none':
    toggle_layers = [l.replace('\\', '/').replace('.', '/').split('/') for l in params.toggle_layers.split(',')]
    for layer in toggle_layers:
        if len(layer) == 2:
            for param in getattr(getattr(cnn, layer[0]), layer[1]).parameters():
                param.requires_grad = False if param.requires_grad == True else False
        else:
            for param in getattr(cnn, layer[0]).parameters():
                param.requires_grad = False if param.requires_grad == True else False


n_learnable_params = sum(param.numel() for param in cnn.parameters() if param.requires_grad)
print('Model has ' + "{:,}".format(n_learnable_params) + ' learnable parameters\n')


cnn = cnn.to(params.use_device)
if 'cuda' in params.use_device:
    if params.seed > -1:
        torch.backends.cudnn.benchmark = True
    torch.backends.cudnn.enabled = True

initialize_model_weights(cnn)
save_info = [[params.data_mean, params.data_sd, 'BGR'], num_classes, has_branches, base_model]

# Train model
train_model(model=cnn, dataloaders=training_data, criterion=criterion, optimizer=optimizer, lrscheduler=lrscheduler, \
            num_epochs=params.num_epochs, start_epoch=start_epoch, save_epoch=params.save_epoch, output_name=params.output_name, \
            device=params.use_device, has_branches=has_branches, fc_only=False, num_classes=num_classes, individual_acc=params.individual_acc, \
            should_save_csv=params.save_csv, csv_path=params.csv_dir, save_info=save_info)

if name == "main":
main()
`

But the images look noisy.

fc_c0000_e005
fc_c0009_e005

Any clue why ?

@ProGamerGov
Copy link
Owner

@ad48hp Not sure if there's an issue, but that's often what it looks like at first when training from scratch. How many images are you using and how many steps have you trained it for?

The model architecture itself is also over 10 years old at this point as well: https://arxiv.org/abs/1409.4842, so there could be unforeseen problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants