Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tensorflow ? #41

Closed
NicoKiaru opened this issue Jul 14, 2023 · 19 comments
Closed

Update tensorflow ? #41

NicoKiaru opened this issue Jul 14, 2023 · 19 comments

Comments

@NicoKiaru
Copy link

Hello !

I'm trying to see if I can update some dependencies of abba_python (pyimagej, python version, etc.) but DeepSlice relies on tensorflow 1.15 and this seems to be the bottleneck here.

I know that such an update is complicated. Do you plan to update tensorflow at some point, or is it such a pain that this won't happen ?

If this will not happen, I'll probably try (one day) to go for a client-(local) server architecture.

Cheers,

Nico

@NicoKiaru
Copy link
Author

Tensorflow 1.15 -> max python version = 3.7 (tensorflow/tensorflow#39768 (comment))

@PolarBean
Copy link
Owner

I have tried to do this, as has a user seperately. There is a bug in keras which prevents the model from being loaded properly in more recent versions of keras and the keras team has stated they dont have the resources to patch it.

keras-team/tf-keras#75

The plan now is to train future versions of DeepSlice in tf 2.0, and then phase out the current models. But theres no timeline on that at present.

@NicoKiaru
Copy link
Author

Ok, good to know! I'll see what I can do.

@NicoKiaru
Copy link
Author

What do you think if the idea of building a small command line interface around DeepSlice ? I do not know how complex that is, but this could help having DeepSlice available while using more recent python versions in difference environments. There's also the fancy client-server architecture with a local server, (that you also have implement on the deepslice website), but at first sight it looks more complex...

@PolarBean
Copy link
Owner

Would this be preferable to an API? I know the API has been planned for some time but I want to get started soon.

@NicoKiaru
Copy link
Author

Would this be preferable to an API?

You mean an API to the DeepSlice server ? Or an API to the current DeepSlice code (in which case you already have one, no?)

Just in case that's useful this tool by @ksugar shows a nice example of client server architecture (https://github.com/ksugar/qupath-extension-sam) that's enabling to use SAM from another application (QuPath).

To me the server way looks a bit more complex, that's why I suggested a CLI, but I do not have a strong preference. If anything, I like if things are simple to install...

@PolarBean
Copy link
Owner

API to the DeepSlice server :) i think thats actually the easiest as it already has a backend that accepts images from the web page, i just need to modify that somewhat.

@paulbrodersen
Copy link

paulbrodersen commented Oct 3, 2023

Hey, me again. I also would appreciate the ability to update to a newer python version in my DeepSlice conda environment, as all other image processing in the same environment is extremely slow compared to running the same code with python 3.10 or above. I would like to avoid having separate environments for separate tasks in the pipeline, as then my users need to install multiple environments. As they are biologists, even making them install one environment is a bit of a chore.

I have looked into the issue a little bit, trying to see if I could come up with a workaround. The core issue seems to me a mismatch between the outputs in of the xception model between the two tensorflow versions / and the two corresponding versions of your code. However, looking at the git blame history of the source code of the xception model, I can't see any differences that would result in this issue, as most of the code seems to have been stable since it was first committed to the repo. This to me strongly suggests that the changes you made in your code result in the issue.

Specifically, I think that in your old code, the model._layers.pop() calls never actually worked as intended. As you noted yourself in one of your comments on the issue, the output shape of the xception model in your old code was identical to the output shape of the original model. And as one of the developers mentioned, model._layers.pop was never implemented for functional models. Hence I assume that your pop calls removed the layers from the list of layers (and hence changed the model summary), but didn't actually remove the references that handle the I/O flow.

Have you tried loading the weights with tf2 and with the xception model unaltered?

I would make a bet that it will work.

@PolarBean
Copy link
Owner

i have tried this in the past but just tried it again.
if you are interested here is what I changed:
image

And here is the resulting error trace:
image

Its an issue with tensorflow. The long term solution is moving this project to PyTorch as there hopefully wont be any TF1 to TF2 shenanigans. But thats a big investment.

@paulbrodersen
Copy link

Shame, thanks for trying though.

@NicoKiaru
Copy link
Author

Just FYI, I gave a try to the CLI and it's very easy. I wrote a small script that I put in the python lib:

https://github.com/BIOP/ijl-utilities-wrappers/blob/master/src/main/resources/deepslice_cli_v1.1.5.py

In the end, because DeepSlice does not require to transfer gigantic images, I'm pretty happy with it, data transfer is not a bottleneck. The conda env living separately prevents issues with the combination of tools, and (as far as I'm concerned), this simplifies a lot the matching with Java.

(side note: I've finally made the transition from xml parsing to json, but I've not released the new version yet. Anyway I'm ready to drop xml support, which was a pain)

@paulmthompson
Copy link

First: This package is awesome. Thank you so much for putting it together.

I appear to have a working version of DeepSlice with Tensorflow 2.13 and Python 3.10. I found a couple of bugs along the way that I will try to detail here. As you pointed out above, the main problem is that the interface you used to remove the xception top layers in Tensorflow 1.X is no longer available. Specifically these lines:

base_model._layers.pop()
base_model._layers.pop()

As you pointed out here #41 (comment) , this has some weird behavior. There appear to be two main issues here.

First, while this command does indeed remove layers from the _layers list in the model, it really is kind of hiding those layers and most likely taking them out of the training interface. However, by inspecting

base_model.__dict__

You will see that references to these layers still exists, and this is actually creating some kind of separate dead-end node path in parallel.

Second, I believe this double-layer-pop was intended to remove 1) the softmax layer, and then 2) the 1000 element dense prediction element underneath. Because the softmax is actually rolled into the dense prediction layer, this is actually removing (but not actually removing) the prediction and global average pooling layer beneath it. If we try to do this the normal interface in modern TF, we will error eventually during prediction

base_model = Xception(include_top=True, weights=xception_weights)
base_model = Model(inputs=base_model.input, outputs=base_model.layers[-3].output,name="xception")

Because the top feature tensor from this is 10 x 10 x 2048, not 2048 as intended.

As you pointed out in your keras issue, we can get weights loaded by removing only the top layer, and then using some combination of loading weight options so that they are loaded by name. But some layers aren't going to load, and the results are not going to be accurate. The issue here is that while the layer-popping method successfully removed the layer from our save file, the 1000 element prediction and softmax layer from Xception very much still exists. Indeed, if you look at your saved weights file for the first Dense layer you add, it is 1000 x 256, not 2048 x 256. So with the weight files as they exist, Keras will either not load the first Dense Layer you added, or not loaded the exception weights (because it wants the missing prediction layer). I fixed this by recreating the model and adding the prediction layer into the weight files like this:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras import Model
import tensorflow as tf
from tensorflow.keras.applications.xception import Xception

import h5py

allen_weight_path = '../../DeepSlice/metadata/weights/Allen_Mixed_Best.h5'
new_allen_weight_path = '../../DeepSlice/metadata/weights/Allen_Mixed_Best_2.h5'

with h5py.File(new_allen_weight_path,'w') as f_dest:
    with h5py.File(allen_weight_path,'r') as f_src:
        for x in f_src['xception'].items():
            name = x[0]
            f_src.copy(f_src['xception'][name],f_dest,name)

base_model = Xception(include_top=True, weights=xception_weights)

base_model = Model(inputs=base_model.input, outputs=base_model.output,name="xception")

model = Sequential()
model.add(base_model)
model.add(Dense(256, activation="relu"))
model.add(Dense(256, activation="relu"))
model.add(Dense(9, activation="linear"))

base_model.load_weights(new_allen_weight_path,by_name=True)
model.load_weights(allen_weight_path,by_name=True,skip_mismatch=True)

model.save_weights(new_allen_weight_path)

After doing this for allen and the synthetic weights, now my weight file matches the model created with the following in Tensorflow 2.x compliant format:

base_model = Xception(include_top=True, weights=xception_weights)

base_model = Model(inputs=base_model.input, outputs=base_model.output,name="xception")

if species == "rat":
        inputs = Input(shape=(299, 299, 3))
        base_model_layer = base_model(inputs, training=True)
        dense1_layer = Dense(256, activation="relu")(base_model_layer)
        dense2_layer = Dense(256, activation="relu")(dense1_layer)
        output_layer = Dense(9, activation="linear")(dense2_layer)
        model = Model(inputs=inputs, outputs=output_layer)
    else:
        model = Sequential()
        model.add(base_model)
        model.add(Dense(256, activation="relu"))
        model.add(Dense(256, activation="relu"))
        model.add(Dense(9, activation="linear"))

    if weights != None:
        model.load_weights(weights,by_name=True,skip_mismatch=True)

    return model

And if I run the example notebook, my results seem quite good (although not quite identical).

Interestingly, I am pretty confident that the prediction layer in the working model that gives such good results is actually the one loaded by default and was not fine tuned. The rest of your model kind of trained around it I guess? That also means there is a softmax wedged in the middle there too, which is probably subtracting a few percent from your possible performance.

I am happy to send the corrected H5 files of weights / make a pull request, etc. Let me know how I can help! We are very excited to start using this package.

Paul

@PolarBean
Copy link
Owner

Thats amazing, thank you. Insane that it is training with a softmax layer inbetween... this is really motivating to train a new model, as you mentioned theres a lot of performance being left on the table.

The only hesitation I have regarding updating is that it would be worsening the current models results until we can retrain. How possible do you think it is to replicate the current model in TF2 (softmax and all). The aim would be to replicate the current performance (within some reasonable margin).

This is a really great contribution, I cant thank you enough!

@paulmthompson
Copy link

The TF2 version of the model "as is" (with the softmax) has slightly different numbers on your example brain output, but "by eye" looks the same to me. I will send you an email with the TF2 compatible weights so that you can test on some of your validation data.

@rdaggs
Copy link

rdaggs commented Feb 28, 2024

Just wanted to note for anyone implementing this solution, there is one change I was hung up on.
On line 161 within neural_network.py
model.load_weights(secondary_weights)
becomes
model.load_weights(secondary_weights,by_name=True,skip_mismatch=True)
This finalizes the accessibility of the new weights.
Thought I would pass this along.
Thank you again Paul for your patch for this error.``

@anna-teruel
Copy link

Hello :)
Sorry to open this issue again. I am trying to install this package on my macos m1 and of course I'm having issues with tensorflow library. So far I think I managed to install it, but because the tensorflow library I use is different I also get issues with the weights. I found this issue and thank you @paulmthompson for taking time to write it so detailed. It helped me a lot. However, I have one question regarding your code. Where do you define the xception_weights in your first part of the code? I don't know if I'm explaining myself.

The way it's written it doesn't work and I both defined

xception_weights = 'imagenet'
xception_weights = None

Just to make it work, but when I try to run the predictions (based on the provided demo notebook) I get the following error

ValueError: Weight count mismatch for layer #0 (named xception in the current model, xception in the save file). Layer expects 236 weight(s). Received 234 saved weight(s)

I assume it's because of setting xception_weights to imagenet or None, but if anyone thinks it's not related, I would like to know, because perhaps there's something I am missing because I don't understand. Any help is more than welcome :)

Big thanks!

@paulmthompson
Copy link

I had to make some modifications to the weight file and then use that new weight file to be TF2 compatible. That is probably why there is a layer name that you don't recognize.

The link to downloading those new weights may not have been merged (I just got back from a long paternity leave and I'm not sure quite sure where this was left off).

Could you email me and I can send you a download link to the new TF2 weights file in the meantime?

@wjguan
Copy link
Contributor

wjguan commented May 23, 2024

This should be addressed by PR #52

@PolarBean
Copy link
Owner

Solved by @wjguan :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants