Deep learning side-channel attacks are an emerging threat to the security of implementations of cryptographic algorithms. The attacker first trains a model on a large set of side-channel traces captured from a chip with a known key. The trained model is then used to recover the unknown key from a few traces captured from a victim chip. The first successful attacks have been demonstrated recently. However, they typically train and test on power traces captured from the same device. In this paper, we show that it is important to train and test on traces captured from different boards and using diverse implementations of the cryptographic algorithm under attack. Otherwise, it is easy to overestimate the classification accuracy. For example, if we train and test an MLP model on power traces captured from the same board, we can recover all key byte values with 96% accuracy from a single trace. However, the single-trace attack accuracy drops to 2.45% if we test on traces captured from a board different from the one we used for training, even if both boards carry identical chips.