Testing dictionary learning techniques vs perceptron and MLP using BACH and PatchCamelyon datasets.
-
Create a virtual environment (suggested) using virtualenv or virtualenvwrapper
-
Activate your virtual environment
-
Install the dependencies
pip install -r requirements.txt --use-feature=2020-resolver --no-cache-dir
-
Install the right pytorch version for your CUDA vesion. To see your which CUDA version you have just run
nvcc -V
. -
Download your datasets:
-
Copy settings.py.template into settings.py and set the general configuration settings properly
$ cp settings.py.template settings.py
from utils.datasets.bach import RescaleResize
# BACH only
RescaleResize(.0625)() # rescales using .0625 scaling factor
RescaleResize((100, 100, 3))() # resizes to (100, 100, 3)
Note: See class definition to pass the correct parameters
Don't forget to update the path of settings.TRAIN_PHOTOS_DATASET. E.g.: If you resized to .0625 then you have to update your settings like this:
TRAIN_PHOTOS_DATASET = os.path.join(BASE_DATASET_LINK, 'ICIAR2018_BACH_Challenge', 'Photos_0.0625')
from utils.datasets.bach import TrainValTestSplit
# BACH only
TrainValTestSplit()()
Note: See class definition to pass the correct parameters
from utils.datasets.bach import WholeImage
# BACH only
WholeImage()()
Note: See class definition to pass the correct parameters
from utils.datasets.bach import MiniPatch
# BACH only
MiniPatch()()
Note: See class definition to pass the correct parameters
Use it right after executing MiniPatch()()
.
from utils.datasets.bach import SelectNRandomPatches
# BACH only
SelectNRandomPatches(100)()
import os
import settings
from utils.datasets.bach import plot_n_first_json_images
plot_n_first_json_images(5, os.path.join(settings.OUTPUT_FOLDER, settings.TRAIN_FOLDER_NAME),
(9, 9), carousel=True, remove_axes=False, dpi=100)
Note: See function definition to pass the correct parameters
from dl_models.fine_tuned_resnet_18.models import TransferLearningResnet18
# BACH only
# example 1: Train a resnet18 using fine tuning
model = TransferLearningResnet18(fine_tune=True)
model.training_data_plot_grid()
model.train(num_epochs=25)
model.save('fine_tuned_resnet18.pt')
model.visualize_model()
model.test()
# example 2: Load a resnet18 as a fixed feature extractor
# download and save the pre-trained resnet 18
model = TransferLearningResnet18(fine_tune=False)
model.save('resnet18_feature_extractor.pt')
# Load the fixed feature extractor
model2 = TransferLearningResnet18(fine_tune=False)
model2.load('resnet18_feature_extractor.pt')
model2.visualize_model()
model2.test()
If your images are big, you should consider using RescaleResize
and/or MiniPatch
classes
to reduce their dimensionality. Thus, you will avoid issues with memory.
from gtorch_utils.constants import DB
from utils.datasets.bach import RawImages
from constants.constants import ProcessImageOption, Label, PCamLabel, PCamSubDataset
# for Bach
ri = RawImages(process_method=ProcessImageOption.GRAYSCALE, label_class=Label, sub_datasets=DB)
# for PatchCamelyon
ri = RawImages(process_method=ProcessImageOption.GRAYSCALE, label_class=PCamLabel, sub_datasets=PCamSubDataset)
ri.create_datasets_for_LC_KSVD('my_raw_dataset.json')
Note: See function definition to pass the correct parameters
from gtorch_utils.constants import DB
from utils.datasets.bach import RandomFaces
from constants.constants import ProcessImageOption, Label, PCamLabel, PCamSubDataset
# BACH
# Requires all images to have the same width & height so without applying RescaleResize execute the
# TrainValTestSplit()(), then in the settings make sure CUT_SIZE = 512; finally, create minipatches
# MiniPatch()(). Now you'll be able to apply the RandomFaces feature extractor. (of course you can
# change the 512 value, preferably choose a multiple of 32)
randfaces = RandomFaces(img_height=512, img_width=512, process_method=ProcessImageOption.GRAYSCALE, label_class=Label, sub_datasets=DB)
# PatchCamelyon
# if you ran HDF5_2_PNG with only_center=True then the images are 32x32, otherwise they will be 96x96
randfaces = RandomFaces(img_height=32, img_width=32, process_method=ProcessImageOption.GRAYSCALE, label_class=PCamLabel, sub_datasets=PCamSubDataset)
randfaces.create_datasets_for_LC_KSVD('my_randface_dataset.json')
Note: See function definition to pass the correct parameters
This feature extractor requires a learned dictionary learning D
. Thus, you
should first train your dictionary learning algorithm (e.g.: LC-KSVD1,
LC-KSVD2), save the learned dictionary as a NumPy file np.save('D.npy', D, False)
; finally, use the learned dictionary D
to create the sparse codes.
import numpy as np
from gtorch_utils.constants import DB
from lcksvd.dksvd import DKSVD
from constants.constants import ProcessImageOption, Label, PCamLabel, \
PCamSubDataset, CodeType
from utils.datasets.bach import SparseCodes
from utils.utils import load_codes
# GETTING LEARNED DICTIONARY ##############################################
test = load_codes('my_raw_dataset_test.json', type_=CodeType.RAW)
train = load_codes('my_raw_dataset_train.json', type_=CodeType.RAW)
val = load_codes('my_raw_dataset_val.json', type_=CodeType.RAW)
SPARSITYTHRES = 15
lcksvd = DKSVD(
sparsitythres=SPARSITYTHRES, dictsize=train['labels'].shape[0]*SPARSITYTHRES, timeit=True,
sqrt_alpha=.0012, sqrt_beta=.0012, tol=1e-6, iterations=50, iterations4ini=20
)
Dinit, Tinit_T, Winit_T, Q = lcksvd.initialization4LCKSVD(*train.values())
np.save('Dinit.npy', Dinit, False)
np.save('Tinit_T.npy', Tinit_T, False)
np.save('Winit_T.npy', Winit_T, False)
np.save('Q.npy', Q, False)
D, X, T, W = lcksvd.labelconsistentksvd2(train['codes'], Dinit, train['labels'], Q, Tinit_T, Winit_T)
np.save('D.npy', D, False)
np.save('X.npy', X, False)
np.save('T.npy', T, False)
np.save('W.npy', W, False)
predictions, gamma = lcksvd.classification(D, W, test['codes'])
print('\nFinal recognition rate for LC-KSVD2 is : {0:.4f}'.format(
accuracy_score(np.argmax(test['labels'], axis=0), predictions)))
# CREATING SPARSE CODES ###################################################
# for BACH
ri = SparseCodes(
process_method=ProcessImageOption.GRAYSCALE, label_class=Label, sub_datasets=DB,
sparse_coding=DKSVD.get_sparse_representations,
sparse_coding_kwargs=dict(D=np.load('D.npy'), sparsitythres=15)
)
# for PatchCamelyon
ri = SparseCodes(
process_method=ProcessImageOption.GRAYSCALE, label_class=PCamLabel, sub_datasets=PCamSubDataset
sparse_coding=DKSVD.get_sparse_representations,
sparse_coding_kwargs=dict(D=np.load('D.npy'), sparsitythres=15)
)
ri.create_datasets_for_LC_KSVD('sparse_codes_dataset.json')
from dl_models.fine_tuned_resnet_18.models import TransferLearningResnet18
# BACH only
model = TransferLearningResnet18(fine_tune=True)
model.load('fine_tuned_resnet18.pt')
model.create_datasets_for_LC_KSVD('my_cnn_dataset.json')
Note: See function definition to pass the correct parameters
from utils.utils import load_codes
from constants.constants import CodeType
# Choose the right code type based on constants.constants.CodeType
test = load_codes(''my_cnn_dataset_test.json', type_=CodeType.CNN)
print(test['codes'].shape)
print(test['labels'].shape)
train = load_codes('my_cnn_dataset_train.json', type_=CodeType.CNN)
print(train['codes'].shape)
print(train['labels'].shape)
val = load_codes('my_cnn_dataset_val.json', type_=CodeType.CNN)
print(val['codes'].shape)
print(val['labels'].shape)
Note: See function definition to pass the correct parameters
import numpy as np
from lc_ksvd.dksvd import DKSVD
from sklearn.metrics import accuracy_score
from constants.constants import CodeType
from utils.utils import load_codes
train = load_codes('my_cnn_dataset_train.json', type_=CodeType.CNN)
val = load_codes('my_cnn_dataset_val.json', type_=CodeType.CNN)
test = load_codes(''my_cnn_dataset_test.json', type_=CodeType.CNN)
lcksvd = DKSVD(dictsize=570, timeit=True)
Dinit, Tinit_T, Winit_T, Q = lcksvd.initialization4LCKSVD(*train.values())
D, X, T, W = lcksvd.labelconsistentksvd1(train['codes'], Dinit, train['labels'], Q, Tinit_T)
predictions, gamma = lcksvd.classification(D, W, test['codes'])
print('\nFinal recognition rate for LC-KSVD1 is : {0:.4f}'.format(
accuracy_score(np.argmax(test['labels'], axis=0), predictions)))
Note: See function definition to pass the correct parameters
import numpy as np
from lc_ksvd.dksvd import DKSVD
from sklearn.metrics import accuracy_score
from constants.constants import CodeType
from utils.utils import load_codes
train = load_codes('my_cnn_dataset_train.json', type_=CodeType.CNN)
val = load_codes('my_cnn_dataset_val.json', type_=CodeType.CNN)
test = load_codes(''my_cnn_dataset_test.json', type_=CodeType.CNN)
lcksvd = DKSVD(dictsize=570, timeit=True)
Dinit, Tinit_T, Winit_T, Q = lcksvd.initialization4LCKSVD(*train.values())
D, X, T, W = lcksvd.labelconsistentksvd2(train['codes'], Dinit, train['labels'], Q, Tinit_T, Winit_T)
predictions, gamma = lcksvd.classification(D, W, test['codes'])
print('\nFinal recognition rate for LC-KSVD2 is : {0:.4f}'.format(
accuracy_score(np.argmax(test['labels'], axis=0), predictions)))
Note: See function definition to pass the correct parameters
import numpy as np
from lc_ksvd.dksvd import DKSVD
from sklearn.metrics import accuracy_score
from constants.constants import CodeType
from utils.utils import load_codes
train = load_codes('my_cnn_dataset_train.json', type_=CodeType.CNN)
val = load_codes('my_cnn_dataset_val.json', type_=CodeType.CNN)
test = load_codes(''my_cnn_dataset_test.json', type_=CodeType.CNN)
lcksvd = DKSVD(dictsize=570, timeit=True)
Dinit, Winit = lcksvd.initialization4DKSVD(*train.values())
predictions, gamma = lcksvd.classification(Dinit, Winit, train['codes'])
print('\nFinal recognition rate for D-KSVD is : {0:.4f}'.format(
accuracy_score(np.argmax(train['labels'], axis=0), predictions)))
Note: See function definition to pass the correct parameters
from constants.constants import CodeType
from dl_models.fine_tuned_resnet_18.models import TransferLearningResnet18
from utils.datasets.bach import BACHDataset, BachTorchNetDataset
# BACH only
# Train by reading images from disk
model = TransferLearningResnet18(fine_tune=True)
# or train by reading extracted codes (e.g. using raw codes from 64x64 mini-patches)
model = TransferLearningResnet18(
fine_tune=True, dataset_handler=BachTorchNetDataset,
dataset_kwargs=dict(
code_type=CodeType.RAW,
filename_pattern='my_raw_dataset.json',
original_shape=(64, 64)
)
)
#
model.training_data_plot_grid()
model.train(num_epochs=25)
model.save('fine_tuned_resnet18.pt')
model.visualize_model()
model.test()
Note: See function definition to pass the correct parameters
from collections import OrderedDict
from gtorch_utils.constants import DB
from gtorch_utils.models.managers import ModelMGR
from gtorch_utils.models.perceptrons import Perceptron
from torch import optim
from constants.constants import CodeType
from utils.datasets.bach import BachTorchDataset
from utils.utils import load_codes
test = load_codes('my_raw_dataset_test.json', type_=CodeType.RAW)
ModelMGR(
cuda=True,
model=Perceptron(test['codes'].shape[0], test['labels'].shape[0]),
sub_datasets=DB, # PCamSubDataset
dataset=BachTorchDataset, # PCamTorchDataset
dataset_kwargs=dict(filename_pattern='my_raw_dataset.json', code_type=CodeType.RAW),
batch_size=6,
shuffe=False,
num_workers=16,
optimizer=optim.SGD,
optimizer_kwargs=dict(lr=1e-3, momentum=.9),
lr_scheduler=None,
lr_scheduler_kwargs={},
epochs=600,
earlystopping_kwargs=dict(min_delta=1e-5, patience=15),
checkpoints=False,
checkpoint_interval=5,
checkpoint_path=OrderedDict(directory_path='tmp', filename=''),
saving_details=OrderedDict(directory_path='tmp', filename='best_model.pth'),
tensorboard=True
)()
from collections import OrderedDict
from gtorch_utils.constants import DB
from gtorch_utils.models.managers import ModelMGR
from gtorch_utils.models.perceptrons import MLP
from torch import optim
from constants.constants import CodeType
from utils.datasets.bach import BachTorchDataset
from utils.utils import load_codes
test = load_codes('my_raw_dataset_test.json', type_=CodeType.RAW)
ModelMGR(
cuda=True,
model=MLP(
test['codes'].shape[0], test['codes'].shape[0],
test['labels'].shape[0], dropout=.25, sigma=.1
),
sub_datasets=DB, # PCamSubDataset
dataset=BachTorchDataset, # PCamTorchDataset
dataset_kwargs=dict(filename_pattern='my_raw_dataset.json', code_type=CodeType.RAW),
batch_size=6,
shuffe=False,
num_workers=16,
optimizer=optim.SGD,
optimizer_kwargs=dict(lr=1e-4, momentum=.9),
lr_scheduler=None,
lr_scheduler_kwargs={},
epochs=200,
earlystopping_kwargs=dict(min_delta=1e-6, patience=15),
checkpoints=False,
checkpoint_interval=5,
checkpoint_path=OrderedDict(directory_path='tmp'),
saving_details=OrderedDict(directory_path='tmp', filename='best_model.pth'),
tensorboard=True
)()
import numpy as np
from lc_ksvd.constants import PlotFilter
from lc_ksvd.dksvd import DKSVD
from lc_ksvd.utils.plot_tools import LearnedRepresentationPlotter
from constants.constants import Label, COLOURS, CodeType
from utils.utils import load_codes
train = load_codes('my_cnn_dataset_train.json', type_=CodeType.CNN)
val = load_codes('my_cnn_dataset_val.json', type_=CodeType.CNN)
test = load_codes(''my_cnn_dataset_test.json', type_=CodeType.CNN)
lcksvd = DKSVD(dictsize=570, timeit=True)
Dinit, Tinit_T, Winit_T, Q = lcksvd.initialization4LCKSVD(*train.values())
D, X, T, W = lcksvd.labelconsistentksvd2(train['codes'], Dinit, train['labels'], Q, Tinit_T, Winit_T)
predictions, gamma = lcksvd.classification(D, W, test['codes'])
LearnedRepresentationPlotter(predictions=predictions, gamma=gamma, label_index=Label.INDEX, custom_colours=COLOURS)(simple='')
LearnedRepresentationPlotter(predictions=predictions, gamma=gamma, label_index=Label.INDEX, custom_colours=COLOURS)(file_saving_name='myimage')
LearnedRepresentationPlotter(predictions=predictions, gamma=gamma, label_index=Label.INDEX, custom_colours=COLOURS)( filter_by=PlotFilter.UNIQUE, marker='.')
Note: See class definition to pass the correct parameters
from lc_ksvd.dksvd import DKSVD
from lc_ksvd.utils.plot_tools import AtomsPlotter
from constants.constants import Label, COLOURS, CodeType
from utils.utils import load_codes
train = load_codes('my_cnn_dataset_train.json', type_=CodeType.CNN)
val = load_codes('my_cnn_dataset_val.json', type_=CodeType.CNN)
test = load_codes(''my_cnn_dataset_test.json', type_=CodeType.CNN)
lcksvd = DKSVD(dictsize=570, timeit=True)
Dinit, Tinit_T, Winit_T, Q = lcksvd.initialization4LCKSVD(*train.values())
D, X, T, W = lcksvd.labelconsistentksvd2(train['codes'], Dinit, train['labels'], Q, Tinit_T, Winit_T)
predictions, gamma = lcksvd.classification(D, W, test['codes'])
AtomsPlotter(dictionary=D, img_width=128, img_height=96, n_rows=10, n_cols=16)()
Note: See class definition to pass the correct parameters
If you ran the Perceptron or MLP with tensorboard=True, then you can run tensorboard to see a nice plot:
sudo chmod +x run_tensorboard.sh
./run_tensorboard.sh
Once you downloaded and updated your settings file properly you have to adapt/format the PCam dataset. Then, you can use any of tools defined after the BACH sub-section Plot/Save images from json images (including it).
-
HDF5 to PNG
Update the path of
settings.BASE_DATASET_LINK
before running it. Setsettings.TRAIN_PHOTOS_DATASET = os.path.join(BASE_DATASET_LINK, 'images')
before running it.from utils.datasets.pcam HDF5_2_PNG HDF5_2_PNG(only_center=True)()
-
Format split dataset provided by PatchCamelyon
Set
settings.TRAIN_PHOTOS_DATASET = os.path.join(BASE_DATASET_LINK, 'images')
before running it.from utils.datasets.pcam FormatProvidedDatasetSplits FormatProvidedDatasetSplits()()
-
Create ROI files
Using whole images
from utils.datasets.pcam import WholeImage WholeImage()()
Once in a blue moon the pyc files does not get updated properly or there are very weird errors. When I ran out of ideas sometimes I get it fixed by removing all the .pyc files:
sudo chmod +x delete_pyc.sh
./delete_pyc.sh