-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can not reproduce BraTS 2020 results. #21
Comments
Please check your BraTS2020 data, and also you can plot some input data in the training phase (in the training_step function) to confirm that the training data is right. |
Thank you for your experiments! |
Hi, thanks for sharing your results @Devil-Ideal . Could you please share your environment settings? for example, your python version and torch version. Also, did you train the model on 4xV100 GPUs and keep hyper-parametes as defaults? Here is my test result: |
of course, the environment is as follows. I only use 3 RTX 3090 GPUs and changed the frequency of verification to verify every 30 epochs. Other hyper-parameters are as default. I guess there is a problem with your dataset. I download the dataset from Kaggle instead of the official website(here is the link: https://www.kaggle.com/datasets/awsaf49/brats20-dataset-training-validation), since my application haven't been approved. |
Hi Devil @Devil-Ideal , many thanks for your quick reply, I will try the dataset that you download from kaggle to see whether I can achieve the results. |
Hello, have you reproduced the results of the successful paper? |
Turning to the kaggle version seems useless for me... |
Me too, did you finally solve this problem?
…------------------ 原始邮件 ------------------
发件人: "ge-xing/Diff-UNet" ***@***.***>;
发送时间: 2023年11月3日(星期五) 上午10:49
***@***.***>;
***@***.******@***.***>;
主题: Re: [ge-xing/Diff-UNet] Can not reproduce BraTS 2020 results. (Issue #21)
Turning to the kaggle version seems useless for me...
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
I am encountering the same problem with reproducing the benchmark. sofar I haven't changed to the kaggle dataversion I'm not training on 4 v100's but I doubt this is the problem |
How many GPUs did you use and what was the batch size? In fact, in the code, batch size refers to the batch size on each GPU. Therefore, it is best to be the same as the original paper, the total batch size is 2*4 |
Thank you for the suggestion. Regarding the epochs, training happens with 300 epochs right? |
yep,300 epochs are enough. And if you don't have enough GPUs, I think even total batch size == 4 or 6, the performence will be much better |
Perfect! Thank you for taking your time I will update you once I've ran the suggested trials |
I double checked and saw that I've already trained with batchsize = 2 & one GPU. Seems a bit odd that I don't get the same results. |
if the total batchsize = 2, it's normal, since the original setting is equal 8, so just ues bigger batchsize (e.g 4 GPUs with batchsize = 2) |
Increasing the Batchsize wouldn't improve the results... I was wondering if you have a requirements.txt file or something to see what kind of configurations for python, pytorch etc you are using. |
It's weired, since I have run it several times, and the results are not bad. Here is my configurations: |
18340097191, [email protected] , you can add my wechat or email me to further discuss these problems. Recently, I also will open-source the v2 version of Diff-UNet. Welcome to try it. |
Recently, I have also reproduced my code on brats2020 dataset, this is the training process: If you still have questions about it, please contact with me by email: [email protected] |
Hi, thanks for sharing the code base. I try to reproduce the results on BraTS 2020 dataset, but the results I got are much worse than the paper. Here are the details:
For model training:
wt is 0.8498, tc is 0.4873, et is 0.4150, mean_dice is 0.5840
The tensorboard files are:
The final model files are:
My settings are default settings:
env = "DDP"
max_epoch = 300
batch_size = 2
num_gpus = 4
GPU type: A100
Then I use the best model (best_model_0.5975.pt) to do evaluation on the test set, and I got:
My python environment is:
Python 3.8.10
monai 1.1.0
numpy 1.22.2
SimpleITK 2.2.1
torch 1.13.0a0+936e930
The most strange thing is the segmentation performance of TC and ET is quite bad. Do you have any idea why the performance is so weird, and could you give me some advice on model training? BTW, could you please share the conda env file and your model weights for BraTS 2020 dataset? If you can create and share a docker image I think that could be perfect! Thanks.
The text was updated successfully, but these errors were encountered: