Conversion fails on model loaded via torch.load or torch.jit.load #221

saseptim · 2024-09-13T11:42:26Z

Description of the bug:

I have a pytorch model which was saved with torch.jit.save(). I tried both with a traced model and scripted model. The error is:

_File /orcam/ear/scratch/usr/avis/VENV_AI_EDGE/lib/python3.10/site-packages/torch/export/_trace.py:1449, in _export(mod, args, kwargs, dynamic_shapes, strict, preserve_module_call_signature, pre_dispatch, _allow_complex_guards_as_runtime_asserts, _disable_forced_specializations, _is_torch_jit_trace)
1447 original_state_dict = mod.state_dict(keep_vars=True)
1448 if not _is_torch_jit_trace:
-> 1449 forward_arg_names = _get_forward_arg_names(mod, args, kwargs)
1450 else:
1451 forward_arg_names = None

File /orcam/ear/scratch/usr/avis/VENV_AI_EDGE/lib/python3.10/site-packages/torch/export/_trace.py:753, in _get_forward_arg_names(mod, args, kwargs)
739 def _get_forward_arg_names(
740 mod: torch.nn.Module,
741 args: Tuple[Any, ...],
742 kwargs: Optional[Dict[str, Any]] = None,
743 ) -> List[str]:
744 """
745 Gets the argument names to forward that are used, for restoring the
746 original signature when unlifting the exported program module.
(...)
751 export lifted modules.
752 """
--> 753 sig = inspect.signature(mod.forward)
754 _args = sig.bind_partial(*args).arguments
756 names: List[str] = []

File /usr/lib/python3.10/inspect.py:3254, in signature(obj, follow_wrapped, globals, locals, eval_str)
3252 def signature(obj, *, follow_wrapped=True, globals=None, locals=None, eval_str=False):
3253 """Get a signature object for the passed callable."""
-> 3254 return Signature.from_callable(obj, follow_wrapped=follow_wrapped,
3255 globals=globals, locals=locals, eval_str=eval_str)

File /usr/lib/python3.10/inspect.py:3002, in Signature.from_callable(cls, obj, follow_wrapped, globals, locals, eval_str)
2998 @classmethod
2999 def from_callable(cls, obj, *,
3000 follow_wrapped=True, globals=None, locals=None, eval_str=False):
3001 """Constructs Signature for the given callable object."""
-> 3002 return _signature_from_callable(obj, sigcls=cls,
3003 follow_wrapper_chains=follow_wrapped,
3004 globals=globals, locals=locals, eval_str=eval_str)

File /usr/lib/python3.10/inspect.py:2550, in _signature_from_callable(obj, follow_wrapper_chains, skip_bound_arg, globals, locals, eval_str, sigcls)
2548 except ValueError as ex:
2549 msg = 'no signature found for {!r}'.format(obj)
-> 2550 raise ValueError(msg) from ex
2552 if sig is not None:
2553 # For classes and objects we skip the first parameter of their
2554 # call, new, or init methods
2555 if skip_bound_arg:

ValueError: no signature found for <torch.ScriptMethod object at 0x7f942662ffb0>_

Actual vs expected behavior:

No response

Any other information you'd like to share?

No response

pkgoogle · 2024-09-16T18:27:57Z

Hi @saseptim, I don't believe ai-edge-torch can handle that file format..., for this repo the PyTorch model needs to be torch.export compliant... you can find more details here: https://github.com/google-ai-edge/ai-edge-torch/blob/main/docs/pytorch_converter/README.md#conversion

Do you have an example script showing what you are doing? Thanks.

github-actions · 2024-09-24T02:01:12Z

Marking this issue as stale since it has been open for 7 days with no activity. This issue will be closed if no further activity occurs.

jchwenger · 2024-09-25T22:19:23Z

Hi @pkgoogle, I just came across this as I was trying to convert a pix2pix model (here the training code). I save using torch.jit.script, which allows me to reload easily without having to redefine the model in Python. I see that you recommend reading about torch.export, but it's still unclear to me whether it's a matter of saving the model using this new approach, or something else...? The current example is nice but it uses an off-the-shelf network rather than something users would have defined themselves. Any ideas? It would be nice to have a workable pipeline from PyTorch to mediapipe, thanks in advance!

pkgoogle · 2024-09-26T18:30:27Z

Hi @jchwenger, we don't support non-torch-exportable models (plenty of custom models are torch exportable, although not every one). Fundamentally the root issue is with torch-export so we cannot fix that, however once/if that is fixed then this library should be able to convert it or if it can't then the root cause may be a bug on our end. To test for torch exportability you have to follow the steps here: https://pytorch.org/docs/stable/export.html i.e. you can load the model and see if you can export it with PyTorch API's. If you don't run into an exception then it is probably torch exportable. Torch.Export exports its to StableHLO, an MLIR dialect which is more interoperable w/ the ecosystem of libraries that support MLIR, including this one. You can think of it as a different saving format which is more interoperable with other libraries. This is important to get the models working on heterogenous hardware such as edge devices, mobile, TPU's etc.

jchwenger · 2024-09-27T10:24:10Z

Hi @pkgoogle, thanks for this! I was confused by the phrasing in the docs, it's as simple as that: when you say "must be compliant with torch.export", it just means the model must be saved using that format/API. Got this to work, yay!, however out of three tests only a simple dense net works, and I'm not entirely sure why.

I have a very simple and runnable Colab here, maybe you will see something super obvious I missed?

Strangely, I get an error around frozen tensors in the Pix2Pix for the in-place nn.ReLU(True), but not in the DCGAN generator, I'll report this on PyTorch...

jchwenger · 2024-10-01T19:58:56Z

Side note: the in-place nn.ReLU(True) mystery is now solved, in the issue above, caused by the presence of dropout before that (no mutation, therefore in-place operation, after it as of now).

github-actions · 2024-10-09T02:00:43Z

Marking this issue as stale since it has been open for 7 days with no activity. This issue will be closed if no further activity occurs.

jchwenger · 2024-10-09T13:34:02Z

Hi again, bumping this up, @pkgoogle. Just wondering: do you believe there is reasonable hope to fix this discrepancy (off-the-shelf ResNet and small dense net OK, but DCGAN and pix2pix not) when converting? Or should I try and post this issue on the PyTorch repo? Any thought welcome, thanks!

pkgoogle · 2024-10-10T17:41:18Z

Hi @jchwenger I'm having trouble figuring out which discrepancies you are referring to. Are you saying that we can convert ResNet and small dense net, but not DCGAN and pix2pix? The answer will depend on what is causing the issue. If it's due to PyTorch export then the root cause is with PT Export (in which case you should create an issue there), if it's something else...well I will have to investigate. For DCGAN and pix2pix if we haven't confirmed it's PT Export, can you provide me a reproducible script which shows the error? (Sometimes people make small changes/adjustments in their code that actually affect the investigation).

jchwenger · 2024-10-10T20:34:51Z

Thanks @pkgoogle for the answer!

It's quite simple, with the custom dense net and ResNet, the test described in the original docs passes with "Inference result with Pytorch and TfLite was within tolerance", whereas with the DCGAN and pix2pix models it fails, with "Something wrong with Pytorch --> TfLite".

As you say, I don't know if it's the PT export or the conversion...

I have all four examples in this Colab, which should be runnable out of the box, with only the session restart needed after installing the dependencies. Thanks in advance!

pkgoogle · 2024-10-11T17:30:58Z

Hi @jchwenger, I'm looking into it but do you associate w/ the OP? reason being is it feels like we are hijacking this thread as the original problem seems different. If you are not -- In which case we much prefer you create a new issue to track progress on your issues. In this case it's looking like an accuracy issue post-conversion for DCGAN & pix2pix.

jchwenger · 2024-10-11T22:08:32Z

Fair point, all done, see here!

saseptim added the type:bug Bug label Sep 13, 2024

pkgoogle self-assigned this Sep 16, 2024

pkgoogle added type:support For use-related issues status:awaiting user response When awaiting user response and removed type:bug Bug labels Sep 16, 2024

github-actions bot added the status:stale label Sep 24, 2024

github-actions bot removed the status:stale label Sep 26, 2024

jchwenger mentioned this issue Sep 27, 2024

RuntimeError: cannot mutate tensors with frozen storage when attempting to export with nn.ReLU(True) in one model but not another? pytorch/pytorch#136846

Closed

github-actions bot added the status:stale label Oct 9, 2024

github-actions bot removed the status:stale label Oct 10, 2024

jchwenger mentioned this issue Oct 11, 2024

Conversion fails on custom models (DCGAN, pix2pix) #295

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conversion fails on model loaded via torch.load or torch.jit.load #221

Conversion fails on model loaded via torch.load or torch.jit.load #221

saseptim commented Sep 13, 2024

pkgoogle commented Sep 16, 2024

github-actions bot commented Sep 24, 2024

jchwenger commented Sep 25, 2024

pkgoogle commented Sep 26, 2024

jchwenger commented Sep 27, 2024

jchwenger commented Oct 1, 2024 •

edited

Loading

github-actions bot commented Oct 9, 2024

jchwenger commented Oct 9, 2024

pkgoogle commented Oct 10, 2024

jchwenger commented Oct 10, 2024

pkgoogle commented Oct 11, 2024

jchwenger commented Oct 11, 2024

Conversion fails on model loaded via torch.load or torch.jit.load #221

Conversion fails on model loaded via torch.load or torch.jit.load #221

Comments

saseptim commented Sep 13, 2024

Description of the bug:

Actual vs expected behavior:

Any other information you'd like to share?

pkgoogle commented Sep 16, 2024

github-actions bot commented Sep 24, 2024

jchwenger commented Sep 25, 2024

pkgoogle commented Sep 26, 2024

jchwenger commented Sep 27, 2024

jchwenger commented Oct 1, 2024 • edited Loading

github-actions bot commented Oct 9, 2024

jchwenger commented Oct 9, 2024

pkgoogle commented Oct 10, 2024

jchwenger commented Oct 10, 2024

pkgoogle commented Oct 11, 2024

jchwenger commented Oct 11, 2024

jchwenger commented Oct 1, 2024 •

edited

Loading