-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate EvanBeal's contrast agnostic registration model into SCT's registration scripts #3760
Comments
Registration model conversion to ONNX formatConverting the
And then use
The Limit appearing when using ONNX formatThe drawback of the conversion to the In the multimodal registration project, images of any size could be registered. The approach was to load the weights of the model trained for a specific size, create a new model for the size of interest and then set the weights of the trained model to this new model, allowing images of any size to be registered. This is no longer possible with the conversion to Potential solutionsTwo possibilities exist to enable registration of images of any size with this new constraint.
These two possibilities could also be combined together by prioritising the second option but using the first one if no model in the set is close enough to the input images size. Code for model conversion and application of ONNX registration modelConversion of H5 model to ONNX using a different input size than the one used during the model training (e.g. 128 x 128 x 128):
Use of an ONNX registration model to produce a dense deformation field compatible with sct_apply_transfo:
|
Ah! I didn't think of that when we switched to ONNX-only support with SCT. That is quite a significant problem, and I foresee it be a problem for other models as well (for segmentation). Thank you for proposing the potential solutions, @EvanBeal , but they are both quite limiting/heavy as you also pointed out (1: requires extra code, 2: requires unnecessarily high amount of models to be stored, and each with a fixed resolution so there is always a risk that a user will input a volume that is not well covered by our sets of resolutions). @joshuacwnewton how 'problematic' would it be to support H5 models again? IIRC the issue was mostly at SCT installation? |
I agree with @jcohenadad that the workarounds for fixed-input ONNX models probably wouldn't be sufficient, and that this represents a significant problem.
It depends on which version of Tensorflow @EvanBeal used in his development. (This is relevant for #3367.) @EvanBeal, could you do a quick Regardless of the specific version of TF, the main issue I have is that this would place extra burden on SCT devs in the long-term, because we would be the ones maintaining a dependency on Tensorflow just to support this feature. It's definitely possible to depend on Tensorflow (we've been doing it for years). but I would personally prefer to work towards being a PyTorch-only project, and explore other options first, if we can? (Admittedly, I am a bit biased though, because I had set my hopes on #3738 for making SCT dev's lives easier in the future.) Aside...I think this situation is a good indicator that there could be better communication in the future between SCT devs and NP research students. Say, for example, there was a quick meeting early on (to consult about the dev side of the project). We might have been able to catch that this would be a Tensorflow-based project much earlier, before @EvanBeal put in the the work that he has. |
I have some ideas off the top of my head that I want to quickly look into:
|
I am using Tensorflow and Keras 2.7.0
pip freeze
It may be possible for certain models and I might have missed how to do it. However, I don't think that it is possible for the model of this registration project because the model implies some specific transformations with interpolation computed on grids that are used in the model to rescale the deformation field produced as well as some custom layers that require to set a specific input size.
This could definitely be a good idea and I think that it can be viable (the parts of SynthMorph that are not implemented in PyTorch concern the training of the model) ! If it is not problematic to have a PyTorch dependency, then it may be possible to convert the trained h5 model to PyTorch and then use the PyTorch implementation of the model used for registration (VxmDense) to enable data of any size as inputs. However, I have never tested the PyTorch implementations but from what I see (PyTorch vs Tensorflow), it seems like the two implementations of the VxmDense model are not exactly the same (but it may concern only optional parts), so I would need to investigate if it's possible to use the weights obtained on the model implemented with TF to build a model implemented with PyTorch. |
This is good to hear! This means that (worst case scenario) if we had to go back to Tensorflow, we could at least use an up-to-date version of Tensorflow, which won't block Python 3.8/3.9 upgrades.
Just to clarify this one part: SCT already depends on PyTorch through Ivadomed via our
Just to check -- Would it be possible to retrain the model using the Torch training scripts? (I'm guessing not, because the tensorflow-specific So, if I'm understanding correctly, the only option is to build the same model in both Tensorflow and PyTorch, then assign the weights layer by layer manually? (e.g. https://www.adrian.idv.hk/2020-12-31-torch2tf/, https://datascience.stackexchange.com/a/40511) In that case, it might not be worth the time/effort... and it may just be better for SCT to depend on Tensorflow again. |
Yes perfect !
So, no this is not possible. Your guess is correct, the training of the SynthMorph method implies some parts that are only build with TF and not PyTorch (generation of synthetic label maps and gray-scale images).
Indeed, it seems with these sources that the weights need to be assigned manually. However, I think that there might be other possibilities (like this one) but I don't know if it is applicable in our case. But even if the weights need to be set for each layer, I think that it could be interesting to try as it would need to be done only once to then have a PyTorch model that could be really useful.
I will try to have a look as soon as possible on how feasible it is to go with PyTorch and I will document it in this issue so we could determine the best option ;) |
Conversion of the registration models from Keras .h5 to PyTorch modelsUsing the strategy of assigning the weights layer by layer and using the PyTorch architecture of the registration model available in VoxelMorph (VxmDense), I was able to convert the TensorFlow models to PyTorch. This was done with the following code:
Input data of any sizeOnce this was done I have faced the problematic of how can we use data of any size as inputs now that we have these PyTorch models. It appears that this is possible by building the architecture of the model for the input size of interest and then set the weights layer by layer using the trained model (this is somehow similar to what I was doing with the TensorFlow models). This can be done with a code similar to this one:
ResultsOnce the models were converted and when I have found how to use data of any size as inputs, I have tested these new models to see if I could obtain the same results as with the TF models. Unfortunately, this is not the case. I don’t know what is wrong in the process but when I tried to register images using the PyTorch models instead of the TF models, the results are not the same and it leads to poor registration. Therefore, either I manage to identify the part of the process that might be problematic and I am able to fix it (e.g. not linked to the PyTorch model implementation done in VoxelMorph), or we have to consider another solution than using PyTorch models. Here is an example of results obtained: And here the code that I used (cascaded registration):
Note: I did not use the input image size but reduced it to 192 x 192 x 192 because otherwise my computer's RAM was overloaded and the process stopped for that reason. I did not encounter this problem with the TF models. |
Thank you very much for doing these thorough and well-documented investigations, @EvanBeal ! 🙏 This is quite frustrating that the registration results are not comparable between pytorch and tensorflow. I cross my fingers that the 'bug' (if we can call it a bug) is easy to spot 🤞
Hopefully this is related to the 'bug'. If RAM is more of a limitation with pytorch, then this is another argument for keeping TF, which would be annoying. |
After spending several hours trying to find an error in the code and looking for potential alternatives for converting from TensorFlow to PyTorch, I could not find anything relevant that would explain the observed differences and the poor registration results obtained with PyTorch. Therefore, as a last attempt to use PyTorch, I opened an issue (issue 425) on the VoxelMorph repository to explain the situation and see if they have any idea what the reason might be or if they have ever tried to do such a conversion. |
Thank you for the thorough digging, @EvanBeal . Let's hope that the VoxelMorph folks will be able to pinpoint what the problem could be 🤞. |
Conversion of the registration models from Keras .h5 to PyTorch models (Corrected)Good news ! The VoxelMorph folks were able to find the issue and their suggestion enabled to solve the problem.
Therefore, using the following code allows to obtain PyTorch models that perform exactly like the TensorFlow ones !
Availability of PyTorch modelsAs a next step, I created a release on the multimodal registration project (r20220512) so the PyTorch models are now publicly available. Integration in SCTWe should now be able to work on the integration of the code in SCT. To perform the registration on data of any input size with the cascaded registration models, the following code needs to be used. Code for cascaded registration using PyTorch models
Therefore, (an adapted version of) this code will need to be integrated into a function in SCT to be used via With the TensorFlow dependency that has been removed, the two warping fields resulting from the two registration steps are not composed anymore inside this code. I don't think this is a problem because we could give both deformation fields to
As a negative point, this problem remains and I am not able to run the whole process on my own computer for data of size 192x256x320 because the RAM is overloaded. Only the first registration step is performed and then the process is killed. |
Oh my goodness, this is wonderful news! Bravo! Thank you for taking the time and effort to investigate, and kudos for being able to navigate such a tricky investigation. (Truthfully, I had been feeling a bit unsure whether this would even be possible to do! So, I am very very relieved to hear that you were able to identify the issue! 🎉)
I'm very curious about this point. I would like to try on my computer as well to see if I encounter the same issue. But, this may also be something worth bringing up in a new issue on the VoxelMorph repo, too? Maybe they have encountered these resource-usage issues too, and could have some insight into the differences of their model architectures between TF and PyTorch. |
OMG! I'm so happy to read this. Thank you for having persisted @EvanBeal , it did pay off at the end! |
yup! that seems to be the right way to do it, as we currently do for multi-step approaches with other algorithms (ANTs, centermassrot, etc.). |
So, I tried following along with @EvanBeal's test script from the "Code for cascaded registration using PyTorch models" dropdown in #3760 (comment). However, I ran into a new issue:
I have reported this upstream in the hopes that the VoxelMorph team can release a new version on PyPI . But, right now, this poses a problem: If SCT tries to include |
Actually.... because we're using
I'm still going to ask them to publish to pypi (voxelmorph/voxelmorph#432), because we'll need that for #1526, but we're not stuck in the meantime. And in the long run, when we run into situations like that, we can always fork and publish dependencies ourselves as 'neuropoly-$PKG'. |
To follow up on the RAM issues, @EvanBeal has created voxelmorph/voxelmorph#434, and I adapted Evan's cascaded registration script into a reproducible example in voxelmorph/voxelmorph#434 (comment). (This work is also currently located in the |
This is done: https://pypi.org/project/voxelmorph/0.2/ |
Given that both the PyTorch and VoxelMorph issues have been (more or less) resolved, I imagine we can start looking at integrating the VoxelMorph approach into SCT. Here's are some rough, cursory TODOs, along with some pointers to relevant areas of SCT's codebase:
Aside: Observations about
|
you're absolutely right. As you probably figured this is a legacy issue. Many years ago we didn't have APIs, all the algos where in the scripts. Then, progressively, things were moved around. I remember having worked on these scripts/APIs quite extensively, and there are still a lot of 'todo' to make it proper |
To follow-up with @joshuacwnewton’s comment, I have worked on integrating the deep learning multimodal registration method into SCT following what is mentioned in the comment. And this work can be found on this branch. From the tests that I have done today, it seems to be now quite functional, with a usage that is similar than for the others registration algorithms. (commit) What has been doneThe files This is done by adding an if condition in the How it will be used in the futureThere is no need of prior preprocessing steps as these ones are directly included in the method.
The deep learning registration models are currently not present in SCT, so they will need to be added somewhere to be able to directly perform the registration without having to ask the user to download some models somewhere else. My guess is that they should be added in the How to test it nowTo test the method now, you need to download the deep learning models and put them in a folder called First, download and extract Then, choose some data to do your test. For instance, considering the I highly recommend to crop the data before using it, otherwise it is very likely that the process will be killed because it will use too much memory. So, if you use these data you can for example do the following:
Eventually, you can perform registration:
N.B. I didn’t perform a lot of tests now so it is likely to encounter bugs, but these are some first results to share with you and to let you know the current state of the integration. |
Wonderful!! Thank you so much for promptly following through on integrating your model into SCT. One thing to mention: I realize now that my original branch, Apart from that, could you please open a pull request then request my review? I'd love to start going over your work so that we can get it merged. 😄
One thing that I ended up doing when debugging the memory issues was to use # Cropping around the spinal cord (T1w)
sct_deepseg_sc -i T1w.nii.gz -c t1 -centerline cnn
sct_create_mask -i T1w.nii.gz -p centerline,T1w_seg.nii.gz -size 35mm -f cylinder -o mask_T1w.nii.gz
sct_crop_image -i T1w.nii.gz -m mask_T1w.nii.gz
# Cropping around the spinal cord (T2w)
sct_deepseg_sc -i T2w.nii.gz -c t2 -centerline cnn
sct_create_mask -i T2w.nii.gz -p centerline,T2w_seg.nii.gz -size 35mm -f cylinder -o mask_T2w.nii.gz
sct_crop_image -i T2w.nii.gz -m mask_T2w.nii.gz I think this might be a little more friendly if we end up testing multiple different images? Since we won't have to rely on hardcoded coordinate values that are specific to a single image. |
Yes alright ! I have deleted
Yep, done with PR #3807
Yes totally agree ! Here it was just to provide a specific example but indeed to be more general and being able to easily test the method on multiple different images your solution is waaay better ;) |
This issue is for planning the integration of @EvanBeal's contrast agnostic registration model project into SCT's registration scripts.
Proposed changes (copied from 2022-04-05 meeting minutes)
sct_register_to_template
sct_register_multimodal
? (TODO: Needs clarification)Remaining work
onnxruntime
and ONNX models #3735.The text was updated successfully, but these errors were encountered: