-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple adjoint method #41
Comments
Agree this would be a great feature! Here I can dump some references and notes from past projects: As you say in principle autograd can take care of most of the input-output relationship, especialy since the scikit-fem FEM assembly is purely Python-based with simple operations. They have work on integrating autograd. They do it to find Jacobians for nonlinear problems, but could just as easily be used for optimization. The issue is when we leave Python to do the main compute, for instance the linear solve (Coulomb, heat equation) or eigensolve (mode solving). It would be good for this library to provide the most generic solution i.e. a primitive for the solver, leaving everything around to autograd, so that a user can define an arbitrary problem around it and just treat the linear or mode solve like a regular Python operation. Linear solvers and dense eigensolvers have many primitives in popular autograd-enabled packages e.g PyTorch and JAX. Dense eigensolvers in particular have a simple expression using the entire spectrum: Mike B. Giles. Collected Matrix Derivative Results for Forward and Reverse Mode Algorithmic Differentiation. Springer-Link, pages 35{44, 2008}. So these just work. For nonlinear problems where you apply these iteratively you might still prefer to write your own adjoint. Also for generic sparse eigensolvers this is also still required. It`s an open problem on JAX. For a specific problem we can derive our own adjoint and implement our own PyTorch or JAX primitive for embedding into a backpropagation pipeline. As you say for this looks like a linear solve reusing the results of the forward eigensolve, combined with the reverse-mode derivatives of its outputs w.r.t. to upstream computation. I have a simple implementation of a JAX sparse eigensolver primitive for the finite-difference 1D Schrodinger Equation lying around. It handles both the eigenvalues and eigenvectors. I will see how difficult it is to port over to FEM (started #42 for the forward solve). Hopefully this can be readily extended to the Maxwell wave equation. |
Thanks for chiming in!
You typically don't want to differentiate the solver routine itself. You end up differentiating the error of the solver too, which means you must needlessly track a large amount of useless data (often making problems intractable). The trick is to implement the VJP (or pullback) ourselves by differentiating the operator (it can be a discrete operator) using AD and plugging that into the recombination step of the adjoint method. Again, the main benefit here is we effectively get the gradient for free since it's just a function of the eigenfields. (The group index/velocity, which is just the derivative of the eigenvalue, leverages this fact. This is one way to derive the integral often used to compute these quantities.) |
I may be misunderstanding you. But you shouldn't be limited to a specific problem. Once you can AD the operator, you can automate any problem (like we do with meep). |
We're saying the same thing, although your vocabulary is probably more exact than mine :) yes we don't want to backpropagate through an iterative solver, so we need to create the AD object with closed forms for the forward evaluation (matrix entry inputs --> evals and evectors) and backward evaluation (adjoints of evals and evectors --> adjoints of matrix entry inputs) I'm wondering if we can instanciate this as the AD of a "sparse eigensolver" operator that would work for all sorts of wave problems, instead of an E&M, Schrodinger, or mechanical modes specific eigensolver operator |
As long as each sparse (or dense) matrix has a routine that populates the matrix based on a set of input variables that are somehow related to your DOF, then all you need to do is differentiate that routine using AD (for whatever problem you care about). Then you just "recombine" the eigenvectors carefully. Typically, this "brute force" approach has some sort of inefficiency. So I like to go through the math of each operator (not as hard as it sounds) and I usually find some interesting simplification that allows one to scale the problem much better. |
If I may chime in for a second - I've been using femwell (very cool btw!) for a bit these past days and have been thinking about pretty much the same thing. I think that differentiating through the solver and the finite element assembly are both relatively straightforward technical problems, with the assembly part being a bit more tricky mostly because most of it is done in an external package (skfem). In the case of eigensolvers, there might be some issues depending on whether or not the eigenvectors are needed and depending on the properties of the system matrix, but for thermal/optical simulations there shouldn't be any showstoppers there. The real issue I see lies in the meshing - in these problems here you are typically interested in shape derivatives (e.g., effective index w.r.t. the width of a waveguide). In other words, not the derivatives w.r.t. (just) the material parameters (as in topology optimization). This makes the problem quite tricky I think. First, you need to be able to get the derivatives of the FE matrix w.r.t. the positions of the mesh nodes. Then there is the problem that moving nodes is only possible up to a certain point, at which you then have to remesh, which makes the problem discrete because you are changing the structure of the matrix. So a kind of hands-off "let autograd take care of everything" approach is not going to work. If you are not going for very large changes in the geometry of your structure, one way to get around this could be to finely mesh in some region of interest and then finding some smart way of hopping around your nodes (and somehow interpolating in between?). If the expected changes are large, then this is not feasible as you would need to finely mesh everything, defeating the point of FEM. There is also the more "hacky" way to do it by ignoring the fact that your mesh actually defines your geometry and parametrizing the material in a density-based topopt fashion such that you get "quasi-boundaries". But then these boundaries do not necessarily follow your mesh anymore, which is not great. It also means that you are not simulating "proper" geometric objects anymore (with neatly defined material boundaries), which then requires interpolation and it still requires fine meshing of the entire optimization region (or remeshing between iterations if you want to be smart about it...). I believe this is how a lot of topology optimization is done in FEM. Many of the typical examples in mechanics actually just use quadrilateral elements and work on what is essentially a regular grid. In any case, I just wanted to mention this because I think it's something to consider. |
I agree, @MRBaozi, dealing with the parameterization is tricky. But luckily there are some established solutions here. If you are interested in shape optimization, as opposed to TO, then you need a level-set parameterization (as I'm sure you know). This is regardless of the underlying discretization scheme. Luckily this can be very straightforward. With implicit level sets, you optimize the level-set evolution, and you can remesh each iteration. Typically, however, you can't leverage "mathematical" optimization algorithms like CCSA/MMA, as you need to evolve the level set itself. However, with explicit level sets, you can use traditional optimization algorithms (I think in the FEM world they call this an xFEM approach...). The FEM basis is a continuous interpolater, so there are many ways to do this. Moving node positions themselves however is quite tricky, and typically requires some SGD-like algorithms.
This isn't necessarily true (but I see where you're coming from). It depends on how you parameterize things (see below).
Even if you have to remesh, you can still use AD. You can still get continuous, well-defined gradients, for example, by using subpixel smoothing. In other words, you have a parameterization that abstracts away the underlying mesh (you just need a mapping from your level set to your new mesh each iteration). There are other ways to do this too (other than subpixel smoothing). Now AD may be hard to implement. (This is where a Julia implementation would be really nice, as there are several end-to-end AD packages like enzyme.jl). But it's totally fine to approximate Ch 4 of my dissertation describes these nuances in a bit more detail. A lot of these solutions require a different degree of work. My approach is to look at a particular problem first, and iteratively apply the right solution for that problem (rather than try to generalize a package up front with all conceivable use-cases). So first step is find a useful problem (eg modulator RF/photonics co-design) and go from there. |
Level sets are one way to parametrize the boundaries of arbitrary shapes, you only need them (in the sense that this is the established method) for generalized shape optimization, which is already pretty close to topology optimization anyways, and they have their own quirks of course (e.g., the level set is usually solved on a different mesh than the simulation). That being said, I'm sure we can agree that whatever parametrization one chooses, this is likely the area where most of the effort in the implentation lies here. As I said previously, I just wanted to raise this point because it was already on my mind and I believe that it is more involved than the "autodiff (eigen)solvers" part of the problem. And as you said, if the mesh assembly is fast, then doing finite differences for And I absolutely agree that the most productive way forward in any case is tackling a (preferrably basic) problem and see where it goes from there. I will be playing around with this a bit myself (mostly out of curiosity) if I can find the time, but no promises 😅 |
Hey @smartalecH, @simbilod, @MRBaozi, you guys seem quite versed with applying the adjoint method to all kinds of problems (I am more of a rookie...). I would like to implement gradients on FDE simulations (both the eigenvalues and vectors). Or if it's already out there just use it :)
To the eigenvalue formulation behind the FDE, e.g. as in:
But that seems to me like overcomplicating the issue and disregaring some usefull properties of EM eigenmodes!? |
Hey @Jan-David-Black, I am not particularly familiar with FDE, but I am assuming that this means that you have some index profile on a regular grid, build a system matrix using that and then solve the eigenproblem for that matrix.
Exactly, fundamentally it is not even related to some particular physics. The main difference here is how you build up the system matrix, not how you solve the actual eigenproblem. I guess you have an eigenvalue problem somewhere in your code, and the operation of retrieving the eigenvalues and eigenvectors is what you want to differentiate through. Some useful references that come to mind are this extended collection of matrix derivative results (section 3.1, "Eigenvalues and eigenvectors") and Steven Johnson's notes on adjoint methods (section 4, "Eigenproblems"). Also check out Appendix 4 of this technical report. The math tells you what terms you need to calculate, and then you look for efficient ways to implement that. In the simplest case however, you can essentially just replace your call to Probably you are using sparse matrices though, which makes it a bit more tricky because sparse array support is extremely lacking in all autodiff frameworks that I know. You can check out e.g. legume from S. Fan's group at Stanford, they need a differentiable eigensolver for the guided mode expansion. In particular, you can check out how they implemented the autograd primitive here. I haven't checked whether something similar exists for jax, but I wouldn't be surprised. Hope that helps! |
Jep, that is how it works. It relies on a staggered Yee grid like FDFD and FDTD do, which lends itself to the maxwell curl equations.
I have been skimming through legumes implementation. To understand what's going on I'll have to put in a lot more attention though... In their Readme it states that:
To my knowledge Floris only has a FDTD package. And the underlying math seems relatively different at first glance, as FDTD/FDFD is not an eigenvalue problem?! At least from Alecs thesis I am not able to build the bridge towards eigenmode calculation. Thanks again for providing the references!! |
Hey @smartalecH , @MRBaozi , @simbilod , @Jan-David-Black , I don't know yet enough about adjoint optimization, but from what I understand it would be really great to have the capabilities and I'm down for helping where I can :) In #70 I've added a Julia/Gridap implementation of the mode solver and also a 3D-electrostatic and a 3D-thermal solver to simulate heat based phase shifters. Hope that helps :) |
@HelgeGehring wow, nice work! (I'm a big Julia user, and can't wait to leverage its features in this context!) I think the best place to start is to identify a practical problem that needs adjoint optimization. Then we add the functionality as needed. (As discussed earlier, a lot of this will be problem dependent) Any ideas? I mentioned matching group indices of an optical and RF waveguide mode for a modulator, but I wonder if there is something more impactful and that's lower hanging fruit? |
@smartalecH what about matching the modes of a spot size converter and a fiber? That would however require to solve the additional linear solve, as it depends on the eigenvector. There is only a few design parameters so probably not super helpful to use gradients here. And only optimizing for maximum performance isn't really valuable for anything that is supposed to be fabricated. |
@smartalecH thanks! matching the mode group indices sounds great! I'd guess for starting it would maybe be nice to pick something as simple as possible? Just to get it once running? |
Luckily, adjoint methods for (most) eigenvalue problems tend to be rather trivial if you are optimizing the eigenvalue directly. So, for example, if you want to match the effective index of two modes by optimizing the dimensions of the waveguide, you can use classical gradient-descent methods to do so.
Since there's only a few parameters (eg width, height), it's reasonable to just use a derivative-free method. However, in this case, it's still easy to compute the gradient using an adjoint method. Specifically, you can leverage the Hellmann-Feynman theorem, which only requires the mode fields themselves (ie there's no adjoint solve needed).
If you wanted to also optimize a quantity that depends on the eigenvectors (like matching the group index of two modes) then there's an extra perturbation term involving an additional linear solve (though since you are assembling an FEM matrix, you can probably perform this linear solve really easily). This sort of optimization is especially useful for modulator design, for example.
The hardest part of all of this is computing the$A_u$ term, which is the derivative of your system matrix wrt the DOF. But that can be automated with autodiff if done right.
But you can imagine some interesting dispersion engineering, especially when wanting to engineer RF and optical modes simultaneously. Especially if you can start to introduce some interesting DOF.
(cc @joelslaby, @joamatab)
The text was updated successfully, but these errors were encountered: