-
Hello, I'm using define_extern at my generator, and due to the boundary queries the generated code is not efficient at all. What does the number of calls to the external function depends on? How can I minimize it? This is my call, maybe I missed something:
Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
The main way to control the number of calls to an extern stage is with its compute_at location. If you compute_root it there will only be three calls in total - two for bounds queries and one for doing the actual compute. If you schedule it compute_at inside some loop of extent n, then I believe you'll get n*3 calls plus a few more for if Halide needs to do bounds queries in the outer loops too. You can also schedule the loops of the extern stage in Halide (e.g. compute_rooting it, but tiling and parallelizing it on the Halide side), which I think will increase the number of calls to do compute, but not the number of bounds queries. |
Beta Was this translation helpful? Give feedback.
-
I should add - we do have the mechanism to provide the compiler with a proxy expression that Halide should use for bounds relationships, (e.g. the bounds of this extern stage are as if it accessed these expressions in its inputs), but we don't have that plumbed through to the front-end. I'm not entirely sure the following works, you can try using the form of define_extern that takes a list of Vars instead of a numeric dimensionality, and then hackily set it like so:
If that's a useful feature for you, we should plumb it through to the front-end, probably via another overload of define_extern |
Beta Was this translation helpful? Give feedback.
I should add - we do have the mechanism to provide the compiler with a proxy expression that Halide should use for bounds relationships, (e.g. the bounds of this extern stage are as if it accessed these expressions in its inputs), but we don't have that plumbed through to the front-end. I'm not entirely sure the following works, you can try using the form of define_extern that takes a list of Vars instead of a numeric dimensionality, and then hackily set it like so:
If…