onnxruntime-webのWebGLでdecode.onnxを読めるようにする #4

Hiroshiba · 2022-06-08T18:14:58Z

概要

スマホ版VOICEVOXを作ってみたいのですが、スマホでモデルを推論する部分が一番の課題になります。
叶える方法は２つ、スマホで推論できるようCoreMLなどを頑張る方法と、onnxruntimeとWebGLを使う方法です。
このissueは後者の方法を検証します。

方法

WebGL上で動かしたいのは、一番重たいdecode.onnxだけです。
このモデルは普通にやると、例えばint64していないなどの理由でWebGL上では動きませんでした。
int64を回避しても、ConstantOfShapeなど一部のlayerがなく実行できません。

なので、未対応のonnxのlayerを迂回する必要があります。

方法は３通りほどあります。

ConstantOfShapeが無いOPSET（==8）のonnxモデルを作成する
- この方法は onnxruntime-webのWebGLでdecode.onnxを読めるようにする #4 (comment) でやめたほうが良いことがわかりました。
未対応のlayerを迂回するようにpytorchモデルを変える
onnxモデルに変換した後、未対応のlayerを迂回するようにモデルを変える

1がダメだったので、2か3ですが、onnxの知識を身に付けるよりpytorchドメインでやったほうが簡単そうなので、2の方法が良いのかなと思っています。

実行

迂回したonnxモデルへの変換はto-onnxruntime-web-webglで頑張っています。

変換した後onnxruntime-webで動かせるかは vv_check_web で確かめられます。

Hiroshiba · 2022-06-08T18:18:45Z

1の方法、過去のpytorchを使ってtorch.onnx.exportを使ってOPSET=8のonnxモデルを書き出そうとしてみました。
結論から言うと、過去のpytorchを使うぐらいなら今のpytorchでモデルを変えていったほうが良さそうでした。

過去のpytorchで今の.pthを読み込むこと自体は可能でした。
対応するpytorch関数もほぼほぼ揃っていそうでした。
（yukarin_sのSiLUだけなさそうでしたが、そもそもyukarin_sはコンバート不要なので無視できます）

少なくともtorch==1.1.0ではOPSET=9しか書き出せませんでした。
当時はdynamic_axesがなく、可変長の入出力ができないかもしれません。
OPSET=9で書き出そうとしたところ、RuntimeError: ONNX export failed: Couldn't export operator aten::flipというエラーが発生しました。
該当するpytorch関数が過去に無いものがあり、モデル構造を変える必要がありそうです。

であれば、今のpytorchでモデル構造を変えたほうが楽だと思います。

Yosshi999 · 2022-06-11T20:14:03Z

ones_like的な操作をしただけでConstantOfShapeが生えてしまうのでpytorchをいじるのは難しいかもしれません
手法3に使えるかもしれない、ONNX中のConstantOfShapeを置き換えるスクリプトを書いてみました

https://gist.github.com/Yosshi999/3ddab2e017d8be26f3c07534718b1929

Hiroshiba · 2022-06-12T16:28:11Z

おお！！　参考になります！！
ちなみにこの書き換えをdecode.onnxに与えると、WebGLで動くようになったりとか試されてたりしますか･･･？👀

Yosshi999 · 2022-06-12T21:11:01Z

別のエラーが出ます...

Uncaught (in promise) Error: required attribute not found: axes
at a.get (attribute.ts:93:1)
at a.getInts (attribute.ts:75:1)
at t.parseSqueezeAttributes [as opInit] (squeeze.ts:14:1)
at t.WebGLSessionHandler.resolve (session-handler.ts:82:1)
at t.Session.initializeOps (session.ts:242:1)
at session.ts:93:1
at t.Profiler.event (instrument.ts:337:1)
at t.Session.initialize (session.ts:89:1)
at session.ts:71:1

本来Squeezeのaxesパラメータはoptionalのはずなんですが、それが無いとエラーになってるっぽいです？？

Yosshi999 · 2022-06-12T21:31:14Z

https://github.com/Hiroshiba/yukarin_sosoa/blob/9bf897ad4fbf84a6b93570cfbe870124260c1285/yukarin_sosoa/network/predictor.py#L83

ここらへんですかね　完全に勘ですが

Yosshi999 · 2022-06-12T21:44:41Z

https://github.com/espnet/espnet/blob/b008ac7d58e9ced1a9f8c89cc85ee69d9e9461ab/espnet/nets/chainer_backend/transformer/attention.py#L91

NetronでいうWhere_486がここなのでやはりf0のマスク関係ですね

Yosshi999 · 2022-06-12T21:47:22Z

現状batchsizeが１に固定されている以上（固定されてますよね？）f0等のマスキングが意味を成してないので、mask周りを消してしまうと直るかもしれません

Hiroshiba · 2022-06-13T15:16:18Z

おーーーなるほどです！！

batchsizeは1で固定して大丈夫なはずなので、消せそうな気がしますね！！
たぶんweightも変わらないので、単純にfunctionを省けば動きそうな気がします！

ちょっとコード追ったのですが、vv_core_inferenceを離れてyukarin_sosoaのPredictorに入ったあとっぽいですね･･･。
コード書きかえはちょっとだけ骨がおれそう感。

Hiroshiba · 2022-06-16T01:39:26Z

↑の件ですが、maskを与える部分にNoneを入力するとそもそも無視してくれるかもしれません。
ここにNoneを与える感じです。

vv_core_inference/vv_core_inference/make_yukarin_sosoa_forwarder.py

Line 129 in 14b85de

h, _ = self.encoder(h, mask)

試してみたかったのですが、そういえばConstantOfShapeを省く方法がわからないんでした･･･。
Fallback for ConstantOfShapeのコードを真似してみたいのですが、色々試されていますがどれを実行すれば良さそうでしたか 👀

Yosshi999 · 2022-06-17T00:16:19Z

Convert method 1が動いたと思います。
ファイル名のadd1の部分をdecodeに書き換えれば、decode.onnxを読み込んでくれます

Hiroshiba · 2022-06-17T15:57:18Z

ありがとうございます！！とりあえず違うエラーになることは確認できました･･･！
次はWhereがないとのことでした。

Uncaught (in promise) TypeError: cannot resolve operator 'Where' with opsets: ai.onnx v12

maskがないのにwhereがあるのが不思議だったのですが、どうやらexpandする際の次元の計算？に使われてるっぽかったです。（なぜ次元を計算する程度でwhereが必要なんだろう 🤔 ）

vv_core_inference/vv_core_inference/make_yukarin_sosoa_forwarder.py

Lines 121 to 123 in 14b85de

 speaker_feature = speaker_id.expand( 

 speaker_id.shape[0], h.shape[1], speaker_id.shape[2] 

 ) # (batch_size, length, ?)

（whereは何箇所かあって、全部これでした。他にもexpandが使われているのかもですが、少なくともvv_core_inference内ではありませんでした。）

Hiroshiba · 2022-06-17T15:57:28Z

onnx-simplifierという、onnxをシンプルにできそうなリポジトリを見つけました。
こちらを利用すればこのあたりの処理を一掃できる気がするのですが、複数の可変長shapeの行列を与えられないという問題があって立ち往生してたりします。

[咨询] 请问下onnxsim支持多输入 dynamic shape吗 daquexian/onnx-simplifier#189

Yosshi999 · 2022-06-18T09:06:30Z

expend の "-1を指定した次元はそのままにする" を実現する為っぽいですね > whereが必要な理由

具体的な実装はこれだと思います
https://github.com/pytorch/pytorch/blob/v1.11.0/torch/onnx/symbolic_opset9.py#L453-L466
↑export直前に関数を上書きしてこの箇所の-1処理を消すことで、未定義処理上等でonnx変換させることができるかもしれません

Yosshi999 · 2022-06-18T22:28:30Z

TypeError: cannot resolve operator 'Expand' with opsets: ai.onnx v12

¯\(ツ)/¯

Hiroshiba · 2022-06-20T17:40:00Z

いやぁ･･･････Expandが無いんですね･･････

Yosshi999 · 2022-06-21T11:26:59Z

decode.onnxに使われているoperatorは以下の通りで、

{'Concat', 'Div', 'Equal', 'Cos', 'Sub', 'Slice', 'LeakyRelu', 'Squeeze', 'Sigmoid', 'Range', 'Cast', 'Conv', 'Expand', 'ConstantOfShape', 'Tanh', 'Shape', 'ConvTranspose', 'Sqrt', 'Where', 'Add', 'Transpose', 'MatMul', 'Reshape', 'Pow', 'Sin', 'ScatterND', 'Gather', 'Softmax', 'Unsqueeze', 'Relu', 'ReduceMean', 'Split', 'Constant', 'Mul'}

このうちhttps://github.com/microsoft/onnxruntime/blob/master/js/web/docs/operators.md によれば非対応のoperatorは次の通りです。

Range
Expand
ConstantOfShape
ConvTranspose
Where
ScatterND
Constant

Yosshi999 · 2022-06-21T11:30:16Z

Constantについてはどうも何かの処理で消えるっぽいので気にする必要は無さそうです。

Yosshi999 · 2022-06-21T15:02:59Z

ちなみにhifiganのみに注目すると登場する命令は

{'Tanh', 'Div', 'Constant', 'Transpose', 'ConvTranspose', 'Gather', 'Unsqueeze', 'LeakyRelu', 'Conv', 'Add'}

のみになり、非対応operatorは

Constant
ConvTranspose

のみになります。

これは提案ですが、yukarin_sosoaのwebgl化は諦めて、hifiganのみのwebgl化を目指してConvTransposeの解決に集中するというのはどうでしょうか？実行時間もsosoaとhifiganは大体1:10くらいの差があるので、十分な高速化が期待できると思います。

Hiroshiba · 2022-06-21T15:13:47Z

なるほど、現実的で面白い提案だと思います！！
単純な比較はできませんが、sosoaとhifiganでモデルサイズは1:50くらいありました。

-rw-r--r-- 1 hihok  55K  6月  7 23:20 model/yukarin_s/model.pth
-rw-r--r-- 1 hihok  36K  6月  7 23:20 model/yukarin_sa/model.pth
-rw-r--r-- 1 hihok 1.4M  6月  7 23:20 model/yukarin_sosoa/model.pth
-rw-r--r-- 1 hihok  54M  6月  7 23:20 model/hifigan/model.pth

それに、そういえばhifiganはいろんなacoustic model（sosoa）に対して統一することもできるはずで、分割しておくとちょっと便利そうです。
かなり現実的な気がしてきました！良さそうな気がします･･･！！！

Yosshi999 · 2022-06-25T08:02:32Z

microsoft/onnxruntime#10873 (comment)

usually, max texture size is 4k

なかなか不穏なことが書かれてますね
どうにかして分割するしかないのでしょうか

Yosshi999 · 2022-06-25T08:07:49Z

https://github.com/webonnx/wonnx#in-the-browser-using-webgpu--webassembly
こっちを試してみる手もありそう

Hiroshiba · 2022-06-25T13:23:02Z

usually, max texture size is 4k

なるほど･･････？
数万×１の行列を数百×数百くらいにreshapeすれば動作が安定する説･･･？

wonnx

いろいろあるんですねぇ。。

Hiroshiba mentioned this issue Jun 8, 2022

スマホ版VOICEVOXの開発 VOICEVOX/voicevox_project#10

Open

Yosshi999 mentioned this issue Jun 21, 2022

Webgl support #5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

onnxruntime-webのWebGLでdecode.onnxを読めるようにする #4

onnxruntime-webのWebGLでdecode.onnxを読めるようにする #4

Hiroshiba commented Jun 8, 2022 •

edited

Loading

Hiroshiba commented Jun 8, 2022 •

edited

Loading

Yosshi999 commented Jun 11, 2022

Hiroshiba commented Jun 12, 2022 •

edited

Loading

Yosshi999 commented Jun 12, 2022

Yosshi999 commented Jun 12, 2022

Yosshi999 commented Jun 12, 2022

Yosshi999 commented Jun 12, 2022

Hiroshiba commented Jun 13, 2022

Hiroshiba commented Jun 16, 2022

Yosshi999 commented Jun 17, 2022

Hiroshiba commented Jun 17, 2022

Hiroshiba commented Jun 17, 2022

Yosshi999 commented Jun 18, 2022 •

edited

Loading

Yosshi999 commented Jun 18, 2022

Hiroshiba commented Jun 20, 2022

Yosshi999 commented Jun 21, 2022

Yosshi999 commented Jun 21, 2022

Yosshi999 commented Jun 21, 2022

Hiroshiba commented Jun 21, 2022 •

edited

Loading

Yosshi999 commented Jun 25, 2022

Yosshi999 commented Jun 25, 2022

Hiroshiba commented Jun 25, 2022

onnxruntime-webのWebGLでdecode.onnxを読めるようにする #4

onnxruntime-webのWebGLでdecode.onnxを読めるようにする #4

Comments

Hiroshiba commented Jun 8, 2022 • edited Loading

概要

方法

実行

Hiroshiba commented Jun 8, 2022 • edited Loading

Yosshi999 commented Jun 11, 2022

Hiroshiba commented Jun 12, 2022 • edited Loading

Yosshi999 commented Jun 12, 2022

Yosshi999 commented Jun 12, 2022

Yosshi999 commented Jun 12, 2022

Yosshi999 commented Jun 12, 2022

Hiroshiba commented Jun 13, 2022

Hiroshiba commented Jun 16, 2022

Yosshi999 commented Jun 17, 2022

Hiroshiba commented Jun 17, 2022

Hiroshiba commented Jun 17, 2022

Yosshi999 commented Jun 18, 2022 • edited Loading

Yosshi999 commented Jun 18, 2022

Hiroshiba commented Jun 20, 2022

Yosshi999 commented Jun 21, 2022

Yosshi999 commented Jun 21, 2022

Yosshi999 commented Jun 21, 2022

Hiroshiba commented Jun 21, 2022 • edited Loading

Yosshi999 commented Jun 25, 2022

Yosshi999 commented Jun 25, 2022

Hiroshiba commented Jun 25, 2022

Hiroshiba commented Jun 8, 2022 •

edited

Loading

Hiroshiba commented Jun 8, 2022 •

edited

Loading

Hiroshiba commented Jun 12, 2022 •

edited

Loading

Yosshi999 commented Jun 18, 2022 •

edited

Loading

Hiroshiba commented Jun 21, 2022 •

edited

Loading