Efficient CNN - Based profiled side channel attacks

Profiled side-channel attacks are now considered as a powerful form of side channel

attacks used to break the security of cryptographic devices. A recent line of research has investigated

a new profiled attack based on deep learning and many of them have used convolution neural network

(CNN) as deep learning architecture for the attack. The effectiveness of the attack is greatly influenced

by the CNN architecture. However, the CNN architecture used for current profiled attacks have often

been based on image recognition fields, and choosing the right CNN architectures and parameters for

adaption to profiled attacks is still challenging. In this paper, we propose an efficient profiled attack

for unprotected and masking-protected cryptographic devices based on two CNN architectures, called

CNNn, CNNd respectively. Both of CNN architecture parameters proposed in this paper are based

on the property of points of interest on the power trace and further determined by the Grey Wolf

Optimization (GWO) algorithm. To verify the proposed attacks, experiments were performed on a

trace set collected from an Atmega8515 smart card when it performs AES-128 encryption, a DPA

contest v4 dataset and the ASCAD public dataset.

Download

Trang 1

Trang 2

Trang 3

Trang 4

Trang 5

Trang 6

Trang 7

Trang 8

Trang 9

Trang 10

Tải về để xem bản đầy đủ

22 trang duykhanh 8120

Download

Bạn đang xem 10 trang mẫu của tài liệu "Efficient CNN - Based profiled side channel attacks", để tải tài liệu gốc về máy hãy click vào nút Download ở trên

Tóm tắt nội dung tài liệu: Efficient CNN - Based profiled side channel attacks

currently considered state-of-the-art results
described by Zaid et al. The results of using 4000 traces in the optimization phase to ﬁnd
the parameters for CNNn are shown in Table 4. The CNNn with the parameters given in
Table 3 is trained with the 4,000-trace dataset above and then used in the attack phase
to ﬁnd the correct key. The estimated probability of the keys given by Figure 12 shows
that the correct key value is 130 with the ﬁrst byte of the key used in AES-128 having the
highest estimated probability. The large distinction between the estimated probability of
the correct key and the estimated probability of the other keys reﬂects this dataset being
easy to attack. This result is consistent with the claims made in [22]. Table 4 compares the
eﬀectiveness of the proposed method with that of Zaid. The GE values obtained by attacks
using both methods are shown in Figure 13. Our CNNn architecture is more eﬀective in
terms of the number of traces required for GE to reach 0. Our method requires only 2 traces
to reach 0 while Zaid’s method requires 7 traces. This result demonstrates that our CNNn
architecture can learn POIs from power traces more precisely than the CNN architecture
proposed by Zaid. However, the number of trainable parameters and the training time is
more for CNNn than the proposed method of Zaid. Neither of these CNN architectures is
too complicated, but they do have good oﬀensive results. Therefore, for unprotected devices,
the CNN architecture does not need to be very complicated; only one convolution layer with
a small number of kernels and a small kernel size and one FC layer containing relatively few
neurons is needed.
In this section, we present two experiments of proﬁled attacks on unprotected devices
using our dataset and the DPAContestV4 dataset. Both of them use the same CNN ar-
chitecture. Although, according to the theorem “No Free Lunch” [21], there is no optimal
architecture for all problems, according to the analysis as well as the experimental results
with the two datasets, the CNNn architecture with the parameters given in Table 5 should
be used for proﬁling with unprotected devices.
16 NGOC QUY TRAN, HONG QUANG NGUYEN
Correct key:130
1000 Our proposed CNNn
2000 20 Zaid proposed CNN
3000
15
4000
GE
5000 10
6000
5
Probability (Log)
7000
8000 0
0 50 100 150 200 250 0 5 10 15 20 25 30
All hypothetical keys Number of traces
Figure 12. Estimation probability of all Figure 13. Guessing entropy results
hypothetical keys with Dataset2 for Dataset2
Table 3. CNNn parameters selected by GWO with Dataset2
Parameter Input values of GWO Value after GWO
Number of kernels:γ1 1:10 4
Kernel size:γ2 1:10 3
Number of neurons in FC (n) 1:50 10
Table 4. Comparison of performance on Dataset2
Template Attack [22] Zaid et al. method [13] Our proposal
Trainable parameters \ 8.782 8.858
Number of traces for GE=0 3 7 2
Training time (s) \ 103 158
Table 5. Optimal parameters of CNNn for unprotected devices
Layer Parameter
One layer, Activation function SeLU
CONV Number of kernels γ1 = 4
Kernel size γ2 = 3
Batch Normalization \
Pooling size = 2;
POOL
Stride = 2
Flatten \
Number of layers =1
Fully Connected
Number of neurons δ1 = 10
Output 256 neurons, softmax activation
EFFICIENT CNN-BASED PROFILED SIDE CHANNEL ATTACKS 17
4.5. Results on masking - protected device
The experiment in this section uses the DataSet3 dataset, which is divided into 3 parts:
45000 traces for training, 5000 traces for validation, and 10000 traces for the attack. The
training and validation data are used by the GWO optimization algorithm to ﬁnd optimal
parameters for the CNNd architecture. The basic CNN architecture used in the experiments
in this section is the CNNd proposed in Section 3.4.
The parameters optimized for CNNd given in Table 6 are generated by the GWO algo-
rithm. Next, CNNd is trained to create a model for data traces. The probabilities of the
256 hypothetical keys estimated in the attack phase are presented in Figure 14, and it is
apparent that the maximum probability value corresponds to that of key 224, which is the
actual AES-128 key used. Although the probability diﬀerence between the right and wrong
keys is not substantial, as Figure 15 shows, when the number of attack traces is increased,
the probabilities of the wrong keys remain the same, while that of the correct key signiﬁ-
cantly increases. Table 7 compares the eﬃciency of the proposed method to that of Zaid
[13] and Prouﬀ [12]. The GE value obtained by attacking using both methods is shown in
Figure 16. The CNNd method proposed by us is more eﬀective in terms of the number of
traces required to achieve a GE of 0. Our method requires about 183 traces to reach GE = 0
while Zaid’s method requires 195 traces, which represents an approximately 5% reduction.
This result demonstrates that our CNNn architecture can learn POIs from the power traces
of masked devices more precisely than the CNN architectures proposed by either Zaid or
Prouﬀ. However, the number of trainable parameters and the training time of the proposed
CNN are larger than those of the method proposed by Zaid yet much smaller than those of
the method proposed by Prouﬀ. This can be explained by the architecture of CNNd being
more complex than that the architecture used by Zaid yet much simpler than that used by
Prouﬀ.
Figure 14. Estimation probability of all hypothetical keys with Dataset3
18 NGOC QUY TRAN, HONG QUANG NGUYEN
Table 6. CNNd parameters selected by GWO with Dataset3
Parameter Input values of GWO Value after GWO
CONV 1
Number of kernels: γ1 1-10 4
Kernel size: γ2 1-10 3
Batch Normalization \ \
POOL 1
Pooling size 2 2
Stride 2 2
CONV 2
Number of kernels: γ3 1-20 8
Kernel size: γ4 20-100 51
Batch Normalization \ \
POOL 2
Pooling size 2 2
Stride 2 2
Flatten \ \
Fully Connected
Number of FCs: δ1 1-3 2
Number of neurons/FC: δ2 1-50 10
256 neurons
Output
Activation function: Softmax
Figure 15. Estimation probability of all hypothetical keys
against number of traces with Dataset3
EFFICIENT CNN-BASED PROFILED SIDE CHANNEL ATTACKS 19
Figure 16. Guessing entropy results for Dataset3
Table 7. Comparison of performance on DataSet3
Template Attack CNN Proﬁled attack CNN proposed CNN proposed
[12] [12] by Zaid [13] by us
Trainable
\ 66.652.444 16.960 26.334
parameters
Number of traces
450 1.146 195 183
for GE=0
Training time
\ 5417 253 790
(s)
5. CONCLUSION
In this paper, we have demonstrated that deep learning can be successfully applied to
proﬁled attacks on cryptographic devices. By analyzing the POIs characteristics of power
traces and convolution operations, we have proposed two basic CNN architectures, CNNn
and CNNd, used for unprotected and masking-protected devices, respectively. The param-
eters of the proposed basic CNN architecture are optimized by the GWO algorithm. Our
CNNn architecture has minimal complexity, and requires only 2 to 4 traces, to reveal the cor-
rect key of unprotected devices. After experimenting successfully on both trace datasets, we
claim that CNNn should be the ﬁrst choice when conducting proﬁled attacks on unprotected
devices. Regarding attacking masking-protected devices, although the architecture of CNNd
has one more convolution layer than the CNN architecture of Zaid, it gives better results,
speciﬁcally a 5% decrease in the number of traces required for GE to equal 0. Therefore,
both CNN architectures should be used with the protected device. As a ﬁnal note, CNN
can be used to conduct proﬁled attacks eﬃciently assuming its architecture and parameters
have been carefully selected.
20 NGOC QUY TRAN, HONG QUANG NGUYEN
REFERENCES
[1] P. Kocher, J. Jaﬀe, B. Jun, “Diﬀerential Power Analysis,” Advances in Cryptology —
CRYPTO’ 99. CRYPTO 1999. Lecture Notes in Computer Science, vol 1666. Springer,
Berlin, Heidelberg. https://doi.org/10.1007/3-540-48405-1 25
[2] P .Kocher, “Timing attacks on implementations of DiﬃeHellman, RSA, DSS, and other sys-
tems,” in Proceedings of the 16th Annual International Cryptology Conference on Advances
in Cryptology, Santa Barbara (USA), 1996.
[3] K. Gandolﬁ, C. Mourtel, G. Oliver, “Electromagnetic analysis: Concrete results,” in Proceedings
of the International Workshop on Cryptographic Hardware and Embedded Systems, Paris,
2001.
[4] F. Standaert, C. Archambeau, “Using subspace-based template attacks to compare and combine
power and electromagnetic information leakages,” in Proceedings of the International Work-
shop on Cryptographic Hardware and Embedded Systems - CHES 2008, Washington, D.C
(USA), 2008.
[5] S. Chari, JR. Rao, P. Rohatgi, ”Template Attacks,” Cryptographic Hardware and Embedded
Systems - CHES 2002. CHES 2002. Lecture Notes in Computer Science, vol 2523. Springer,
Berlin, Heidelberg. https://doi.org/10.1007/3-540-36400-5 3
[6] E. Brier, C. Clavier,F. Olivier, “Correlation power analysis with a leakage model,” in Interna-
tional Workshop on Cryptographic Hardware and Embedded Systems, Springer, 2004, pp.
16–29. https://doi.org/10.1007/978-3-540-28632-5 2
[7] B. Gierlichs, et al, “Mutual information analysis,” Cryptographic Hardware and Embedded
Systems – CHES 2008. CHES 2008. Lecture Notes in Computer Science, vol 5154. Springer,
Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85053-3 27
[8] W. Schindler, K. Lemke, C. Paar, “A stochastic model for diﬀerential side channel
cryptanalysis,” Cryptographic Hardware and Embedded Systems – CHES 2005. CHES
2005. Lecture Notes in Computer Science, vol 3659. Springer, Berlin, Heidelberg.
https://doi.org/10.1007/11545262 3
[9] B. Hettwer, S. Gehrer, T. G¨uneysu,“Applications of machine learning techniques in side-channel
attacks: a survey,” J Cryptogr Eng, vol. 10, p. 135–162 , 2020.
[10] H. Maghrebi, T. Portigliatti, E. Prouﬀ, “Breaking cryptographic implementations us-
ing deep learning techniques,” Security, Privacy, and Applied Cryptography Engineer-
ing. SPACE 2016. Lecture Notes in Computer Science, vol 10076. Springer, Cham.
https://doi.org/10.1007/978-3-319-49445-6 1
[11] E. Cagli, C. Dumas, E. Prouﬀ, “Convolutional neural networks with data augmentation
against jitter-based countermeasures, In Wieland Fischer and Naofumi Homma, edi-
tors,Cryptographic Hardware and Embedded Systems – CHES 2017, Cham, Springer In-
ternational Publishing, 2017, pp. 45–68.
[12] E. Prouﬀ, R. Strullu, R. Benadjila, E. Cagli, C. Dumas, “Study of deep learning techniques for
side-channel analysis and introduction to ascad database,” Cryptology ePrint Archive, Report
2018/053, 2018, https://eprint.iacr.org/2018/053, 2018
[13] G. Zaid, L. Bossuet, A. Habrard, A. Venelli, “Methodology for eﬃcient cnn architectures in
proﬁling attacks,” IACR Transactions on Cryptographic Hardware and Embedded Systems,
IACR, 2020, 2020 (1), pp.1-36. Doi: https://doi.org/10.13154/tches.v2020.i1.1-36
EFFICIENT CNN-BASED PROFILED SIDE CHANNEL ATTACKS 21
[14] J. Heaton, I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016.
[15] S. Ioﬀe, C. Szegedy, “Batch normalization: Accelerating deep network training by reducing in-
ternal covariate shift,” in Proceedings of the 32nd International Conference on International
Conference on Machine Learning, 2015.
[16] S. Mirjalili, S. M. Mirjalili, and A. Lewis, “Grey wolf optimizer,” Advances in Engineering
Software, vol. 69, pp. 46–61, 2014.
[17] F. Chollet et al, “Keras,” https://keras.io, 2015.
[18] L.N. Smith, N. Topin, “Super-convergence: Very fast training of residual networks using large
learning rates,” in ICLR 2018 Conference, CoRR, 2018.
[19] G. Klambauer, T. Unterthiner, A. Mayr, S. Hochreiter, “Self-normalizing neural networks,”
arXiv:1706.02515.
[20] K. He, X. Zhang, S. Ren, J. Sun, “Delving deep into rectiﬁers: Surpassing human-level perfor-
mance on imagenet classiﬁcation,” in Proceedings of the 2015 IEEE International Conference
on Computer Vision (ICCV), ICCV ’15, Washington, DC, USA, 2015, pages 1026–1034.
[21] D.H. Wolpert, W.G. Macready, “No free lunch theorems for optimization,” IEEE Trans. Evo-
lut. Comput, vol. 1, no. 1, pp. 67–82, 1997.
[22] J. Kim, S. Picek, A. Heuser, S. Bhasin, A. Hanjalic, “Make some noise. Unleashing the
power of convolutional neural networks for proﬁled side-channel analysis”, IACR Transac-
tions on Cryptographic Hardware and Embedded Systems, vol. 2019, no. 3, pp. 148–179.
https://doi.org/10.13154/tches.v2019.i3.148-179
[23] S. Mangard, E. Oswald, T. Popp, Power Analysis Attacks Revealing the Secrets of Smart
Cards, New York: USA: Springer, 2010.
[24] A. Heuser and M. Zohner, “Intelligent machine homicide breaking cryptographic devices using
support vector,” in COSADE 2012, Heidelberg, 2012.
[25] G. Hospodar, B. Gierlichs, E. De Mulder, I. Verbauwhede, J. Vandewalle, “Machine learning
in side-channel analysis: A ﬁrst study,” J Cryptogr Eng, vol. 1, article number 293, 2011.
https://doi.org/10.1007/s13389-011-0023-x
[26] G. Hospodar, E. De Mulder, B. Gierlichs, J. Vandewalle, and I. Verbauwhede, “Least squares
support vector machines for side-channel analysis,” in COSADE 2011, Darmstadt, 2011.
[27] L. Lerman, S. F. Medeiros, G. Bontempi, and O. Markowitch, “A machine learn-
ing approach against a masked AES,” J Cryptogr Eng, vol. 5, pp. 123–139, 2015.
https://doi.org/10.1007/s13389-014-0089-3
[28] S. Picek, A. Heuser, A. Jovic, S.A. Ludwig, S.Guilley, D. Jakobovic, N. Mentens, “Side-
channel analysis and machine learning: A practical perspective,” 2017 International Joint
Conference on Neural Networks (IJCNN), Anchorage, AK, 2017, pp. 4095-4102. Doi:
10.1109/IJCNN.2017.7966373.
[29] S. Picek, A. Heuser, A. Jovic, L. Batina, and A. Legay, “The secrets of proﬁling for side-channel
analysis: feature selection matters,” IACR Cryptology ePrint Archive, 2017.
22 NGOC QUY TRAN, HONG QUANG NGUYEN
[30] Y. Zheng, Y. Zhou, Z. Yu, C. Hu, H. Zhang, “How to compare selections of points of inter-
est for side-channel distinguishers in practice?,” in Information and Communications Secu-
rity. ICICS 2014. Lecture Notes in Computer Science, vol 8958. Springer, Cham, 2014.
https://doi.org/10.1007/978-3-319-21966-0 15
[31] Y. Kong, E. Saeedi, “The investigation of neural networks performance in side channel attacks,”
Artif Intell Rev, vol. 52, pp. 607–623, 2019. https://doi.org/10.1007/s10462-018-9640-4
[32] F. Standaert, T. Malkin, M. Yung, “A uniﬁed framework for the analysis of side-channel
key recovery attacks,” Advances in Cryptology - EUROCRYPT 2009. EUROCRYPT
2009. Lecture Notes in Computer Science, vol 5479. Springer, Berlin, Heidelberg.
https://doi.org/10.1007/978-3-642-01001-9 26
Received 25 August 2020
Accepted 14 January 2021

File đính kèm:

efficient_cnn_based_profiled_side_channel_attacks.pdf