The backends PyTorch, TensorFlow and Jax have built-in automatic differentiation functionality.
However, the respective APIs vary widely in how the gradients are computed.
Φ_{Flow} seeks to unify optimization and gradient computation so that code written against the Φ_{Flow} API will work with all backends.
Nevertheless, we recommend using backend-specific optimization for certain tasks like neural network training.

The following overview table shows which Φ_{Flow} optimization functions are supported by which backends.

Backend | solve_linear | minimize | functional_gradient | NN Training |
---|---|---|---|---|

PyTorch | ✓ | ✓ | ✓ | ✓ |

TensorFlow | ✓ | ✓ | ✓ | ✓ |

Jax | ✓ | ✓ | ✓ | |

NumPy | ✓ |

Φ_{Flow} provides functions for solving linear equations as well as finding optimal parameters for non-linear problems.
These functions work with all backends but run much slower with NumPy due to the lack of analytic gradients.

The function `math.minimize()`

and `solve_nonlinear()`

solve unconstrained nonlinear optimization problems.
The following example uses L-BFGS-B to find a solution to a nonlinear optimization problem using the `phi.field`

API:

```
def loss(x1: Grid, x2: Grid) -> math.Tensor:
return field.l2_loss(physics(x1, x2) - target)
x0 = x0_1, x0_2
solution = math.minimize(math.jit_compile(loss), math.Solve('L-BFGS-B', 0, 1e-3, x0=x0))
```

For solving linear systems of equations, Φ_{Flow} provides the function `math.solve_linear()`

.
The following example uses the conjugate gradient algorithm to solve_linear `f(x) = y`

:

```
@math.jit_compile_linear
def f(x: Grid) -> Grid:
return field.where(mask, 2 * field.laplace(x), x)
x = math.solve_linear(f, y, math.Solve('CG', 1e-3, 0, x0=x0))
```

Which solver implementation is used, depends on the backend.
However, all backends support conjugate gradient (`'CG'`

) and conjugate gradient with adaptive step size (`'CG-adaptive'`

).
Specify `'auto'`

to let Φ_{Flow} chose an appropriate solver.

Overview: Implementations

Method | PyTorch | TensorFlow | Jax | NumPy |
---|---|---|---|---|

‘CG’ | Φ-Flow | Φ-Flow | `jax.scipy.sparse.linalg.cg` (only in jit mode, no trajectory) |
`scipy.sparse.linalg.cg` (no trajectory) |

‘CG-adaptive’ | Φ-Flow | Φ-Flow | Φ-Flow (only function-based) | Φ-Flow |

‘auto’ | Φ-Flow | Φ-Flow | `jax.scipy.sparse.linalg.cg` (only in jit mode, no trajectory) |
`scipy.sparse.linalg.spsolve` (only for sparse matrices) |

Both `solve_linear`

and `minimize`

return only the solution.
To access further information about the optimization, run the optimization within a `SolveTape`

context.

When a solve does not find a solution, a subclass of `ConvergenceException`

is thrown.

```
solve = math.Solve('CG', 1e-3, 0, max_iterations=300)
try:
solution = field.solve_linear(..., solve=solve)
solution = field.minimize(..., solve=solve)
converged = True
except NotConverged as not_converged:
print(not_converged)
last_estimate = not_converged.x
except Diverged as diverged:
print(diverged)
iterations_performed = solve.result.iterations # available in any case
```

Currently, backpropagation is supported for `solve_linear()`

.
The following code shows how to specify different settings for the backpropagation solve:

```
from phi.flow import *
gradient_solve = math.Solve('auto', 1e-5, 0, max_iterations=100, x0=x0)
solve = math.Solve('CG', 1e-5, 1e-5, gradient_solve=gradient_solve, x0=None)
```

Solves during backpropagation raise the same exceptions as in the forward pass.
The only exception is TensorFlow where a warning message is printed instead since exceptions are not supported during backpropagation.
Information about backpropagation solves can be obtained the same ways with the forward solves,
using a `SolveTape`

context around the gradient computation.

There are two ways to evaluate gradients using Φ_{Flow}:
computing gradients of functions and computing gradients of tensors.
The former is the preferred way and works will all backends except for NumPy.

The functions
`math.functional_gradient()`

and
`field.functional_gradient()`

compute the gradient of a Python function with respect to one or multiple arguments.
The following example evaluates the gradient of `physics`

w.r.t. two of its inputs.

```
def physics(x: Grid, y: Grid, target: Grid) -> math.Tensor:
x, y = step(x, y)
return field.l2_loss(y - target), x
gradient = field.functional_gradient(physics, wrt=[0, 1], get_output=True)
loss, x, dx, dy = gradient(...)
```

In the above example, `wrt=[0, 1]`

specifies that we are interested in the gradient with respect to the first and second argument of `physics`

.
With `get_output=True`

, evaluating the gradient also returns all regular outputs of `physics`

.
Otherwise, only `dx, dy`

would be returned.

Note that the first output of `physics`

must be a scalar `Tensor`

object.
All other outputs are not part of the gradient computation.

Note that higher-order derivatives might not work as expected with some backends.

This method is only supported by PyTorch and TensorFlow.
Operations within a `math.record_gradients`

context record gradient information.
Calling `math.gradients`

then computes the gradients with respect to a previously marked tensor.

*Warning*: You have to manually detach PyTorch tensors that will be used outside the context.
Higher-order derivatives are not supported for PyTorch.

*Recommended for neural network training.*

Unfortunately all supported backends have a different approach to computing gradients:

- PyTorch: gradients are recorded per flagged tensor
- TensorFlow: gradients are recorded per flagged operation
- Jax: gradients are computed for functions

TensorFlow and PyTorch include various optimizers for neural network (NN) training. Additionally, NN variables created through the respective layer functions are typically marked as variables by default, meaning the computational graph for derived tensors is created automatically.

The following script shows how a PyTorch neural network can be trained. See the demo network_training_pytorch.py for a full example.

```
net = u_net(2, 2)
optimizer = optim.Adam(net.parameters(), lr=1e-3)
for training_step in range(100):
data: Grid = load_training_data(training_step)
optimizer.zero_grad()
prediction: Grid = field.native_call(net, data)
simulation_output: Grid = simulate(prediction)
loss = field.l2_loss(simulation_output)
loss.native().backward()
optimizer.step()
```

In the above example, `field.native_call()`

extracts the field values as PyTorch tensors with shape `(batch_size, channels, spatial...)`

,
then calls the network and returs the result again as a `Field`

.

Since `loss`

is a `phi.math.Tensor`

, we need to invoke `native()`

to call PyTorch’s `backward()`

function.

For TensorFlow, a `GradientTape`

context is required around network evaluation, physics and loss definition.