What if instead of building hierarchical models out of mathematical functions, one used a hierarchical system of physical devices to perform computation? Hardware accelerators for deep learning provide optimized computation of tensor products and activation functions. But one need not model nor even mimic the precise mathematics used in current deep networks. All that is truly needed is an efficient way of training the parameters of a system with enough expressive power.

In [Wri22D], the authors propose training controllable
parameters of deep networks of physical devices by gradient methods while using
the actual outputs of the physical system during training. The gradients are
computed through backpropagating the true inference error on a differentiable
digital twin of the device - contrary to approaches that rely entirely on the
digital twin (also for the forward passes). This approach is thus a mixture of
in-situ and in-silico and was dubbed *physics aware training* (PAT). Despite the
simulation of the analyzed physical systems being very accurate, the authors
found that purely in-silico training on digital twins leads to significantly
poorer final results, presumably due to accumulation of errors.

Hardware trained in this way has the potential to dramatically decrease energy cost for inference by using specialized and fast physical systems for performing the computation task at hand. Besides the applications shown in the paper for standard machine learning benchmarks like MNIST, one could think of applying this method to the tuning of physical stacks of controllers for which suitable simulations are available. These methods are particularly appealing when the input signal to be processed is non-digital, e.g. sound waves entering a smart microphone.