In Side Channel Analysis, masking is known to be a reliable and robust
counter-measure. Recently, several papers have focused on the application of the
Deep Learning (DL) theory to improve the efficiency of side channel attacks against
implementations protected with this approach. Even if these seminal works have
demonstrated the practical interest of DL in the side-channel context, they did not
argue on their theoretical soundness nor quantify their efficiency, especially with
respect to the optimality bounds published so far in the literature. This paper aims
at addressing this question of optimality, in particular when masking is applied.
We argue that minimizing the Negative Log Likelihood during the training of Deep
Learning models is actually asymptotically equivalent to maximizing a lower bound
of the mutual information between the observations and the target secret chunk, or
equivalently to minimizing an upper bound on underlying side-channel efficiency. Also,
we argue that training a Deep Neural Networks consists in finding the parameters that
maximize the Perceived Information introduced by Renauld et al. at EUROCRYPT
2011. These theoretical results allowed us to formally study the impact of masking
counter-measures against Deep Learning based Side Channel attacks. In particular,
and as expected, we verified, both on simulations and on experimental traces, that
Boolean masking is sound against such a class of Side Channel attacks.