Thermal distribution and reliability prediction for 3D networks-on-chip

As one of the most promising technologies to reduce footprint, power consumption and

wire latency, Three Dimensional Integrated Circuits (3D-ICs) is considered as the near future for

VLSI system. Combining with the Network-on-Chip infrastructure to obtain 3D Networks-onChip (3D-NoCs), the new on-chip communication paradigm brings several advantages. However,

thermal dissipation is one of the most critical challenges for 3D-ICs, where the heat cannot easily

transfer through several layers of silicon. Consequently, the high-temperature area also confronts

the reliability threat as the Mean Time to Failure (MTTF) decreases exponentially with the

operating temperature as in Black’s model. Apparently, 3D-NoCs and 3D ICs must tackle this

fundamental problem in order to be widely used. However, the thermal analyses usually require

complicated simulation and might cost an enormous execution time. As a closed-loop design flow,

designers may take several times to optimize their designs which significantly increase the thermal

analyzing time. Furthermore, reliability prediction also requires both completed design and

thermal prediction, and designer can use the result as a feedback for their optimization. As we can

observe two big gaps in the design flow, it is difficult to obtain both of them which put 3D-NoCs

under thermal throttling and reliability threats. Therefore, in this work, we investigate the thermal

distribution and reliability prediction of 3D-NoCs. We first propose a new method to help simulate

the temperature (both steady and transient) using traffic values from realistic and synthetic

benchmarks and the power consumption from standard VLSI design flow. Then, based on the

proposed method, we further predict the relative reliability between different parts of the network.

Experimental results show that the method has an extremely fast execution time in comparison to

the acceleration lifetime test. Furthermore, we compare the thermal behavior and reliability

between Monolithic design and TSV (Through-Silicon-Via) based design. We also explore the

ability to implement the thermal via a mechanism to help reduce the operating temperature.

Thermal distribution and reliability prediction for 3D networks-on-chip trang 1

Trang 1

Thermal distribution and reliability prediction for 3D networks-on-chip trang 2

Trang 2

Thermal distribution and reliability prediction for 3D networks-on-chip trang 3

Trang 3

Thermal distribution and reliability prediction for 3D networks-on-chip trang 4

Trang 4

Thermal distribution and reliability prediction for 3D networks-on-chip trang 5

Trang 5

Thermal distribution and reliability prediction for 3D networks-on-chip trang 6

Trang 6

Thermal distribution and reliability prediction for 3D networks-on-chip trang 7

Trang 7

Thermal distribution and reliability prediction for 3D networks-on-chip trang 8

Trang 8

Thermal distribution and reliability prediction for 3D networks-on-chip trang 9

Trang 9

Thermal distribution and reliability prediction for 3D networks-on-chip trang 10

Trang 10

Tải về để xem bản đầy đủ

pdf 13 trang duykhanh 6460
Bạn đang xem 10 trang mẫu của tài liệu "Thermal distribution and reliability prediction for 3D networks-on-chip", để tải tài liệu gốc về máy hãy click vào nút Download ở trên

Tóm tắt nội dung tài liệu: Thermal distribution and reliability prediction for 3D networks-on-chip

Thermal distribution and reliability prediction for 3D networks-on-chip
nchmark. With synthetic 
also adopt the method in [32] where we remove benchmarks, TSV-based 3D-NoC is slightly 
the bonding layers between silicon layers. We better than Monolithic ones. 
keep the thickness of the silicon layer as it is for 
a fair comparison. Obviously, if we thin the 4.4. Exploring Different Layout and Thermal 
layer, the transfer of heat is much faster. Dissipation Method 
 Figure 9 shows the router temperature In this section, we explore different layouts 
under the PARSEC benchmark. Here, we also and their thermal dissipation behaviors for our 
compare with the monolithic technology where 3D-NoC. First, we perform thermal and 
no TSV needed [32]. As we can observe in reliability prediction for our layout in Figure 
Figure 9, the TSV-based system has lower 2(b). Then, we insert four thermal TSVs with 
operating temperature thanks to the ability to the size 15 15 in four corners of the 
transfer the heat of Copper TSVs. The 
 router floorplan in Figure 2(c). This size of 
difference in temperature is around 1K at TSV is still feasible in the existing manufacture 
the bottom layer and even reach 3.5K in the 
 process [7]. We also add 10 Keep-out-Zone 
cannel benchmark. 
 distance this thermal TSV to avoid mechanical 
 Figure 10 shows the operating temperature 
 stress. The thermal TSV went through all layers 
under synthetic benchmarks of our 3D-NoC. 
 of TSVs but did not contact with the heatsink. 
We can easily notice that the operating 
 The heatsink and thermal TSV are separated by 
temperature of Monolithic systems is much 
 a layer of thermal interface material. 
higher than TSV ones since we stress the 
system under its saturation points. The highest 
temperature of Monolithic 3D-NoC even 
reaches 351.64 K (78.49°C). The hottest layer 
of the TSV-based system has a similar 
temperature as the coolest layer of Monolithic 
3D-NoC. 
 Figure 11. Normalized MTTF of our 3D-NoC under 
 PARSEC benchmarks. 
 Figure 10. Temperature of our 3D-NoC under 
 synthetic benchmarks. 
4.2. 3D-NoC Reliability Estimation 
 In this section, we use the Black’s model to 
evaluate the MTTF of 3D-NoC. Figure 11 and 
Figure 12 show the normalized MTTF of each 
layer to 323.15K (50°C) under PARSEC and 
synthetic benchmarks. Here, we can observe the 
TSV-based 3D-NoC dominates Monolithic in Figure 12. Normalized MTTF of our 3D-NoC under 
 synthetic benchmarks. 
74 K.N. Dang et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 36, No. 1 (2020) 65-77 
 Figure 13 and Figure 14 show the thermal significantly cool down the bottom layer. Also, 
behaviors under PARSEC and synthetic liquid cooling could be extremely helpful in 
benchmarks for different layouts and cooling. this situation. 
We can notice that the layout in Figure 2(b) has In comparison to the traditional 2D-ICs, we 
the worst thermal behavior among the TSV observe that the TSV-based ICs have higher 
designs. On the other hand, adding thermal operating temperatures. The 2D-based 3D-
TSV can help reduce the operating temperature NoCs operate under 319K and 322K with 
significantly. By adding four TSVs, we can PARSEC and synthetic benchmarks, 
even reduce the temperature by nearly 1K at the respectively. On the other hand, TSV-based 
bottom layer in the uniform benchmark which system increases at most 10K in maximum 
is the most stressed benchmark. Other temperature with the layout in Figure 2(b). 
benchmarks’ results also show a slight In summary, different layouts can make 
improvement in thermal behaviors. different thermal behaviors. The layout in 
 One thing we can easily notice the top Figure 2(b) does not surround the router by 
layer’s temperatures do not change. This is due TSV area, therefore, the router could heat up 
to the fact it is already cool down by the each other and reach a higher temperature. On 
heatsink and adding TSV cannot help it reduces the other hand, adding thermal TSV to cool 
the temperature. Also, the heatsink temperature down the bottom layer is helpful since it can 
is raised near the top layer temperature which reduce nearly 1 Kelvin in the worst case. By 
reduces the ability to transfer heat. If the mapping to the reliability, we can easily obtain 
thermal TSV can contact the heatsink, it can a 2×~3× improvement of MTTF. 
 G 
 Figure 13. Thermal behavior of different layouts and cooling methods under the PARSEC benchmark. 
 Figure 14. Thermal behavior of different layouts and cooling methods under the synthetic benchmarks. 
 K.N. Dang et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 36, No. 1 (2020) 65-77 75 
4.5. Execution Time TSV-based 3D-NoCs due to two major reasons: 
 i) TSVs act like thermal conduct devices and 
 In this work, we evaluate the proposed ii) Monolithic 3D-ICs has a higher density than 
method using a system with Xeon E5-2620 8 TSV-based system. However, we would like to 
cores 2.1GHz, 16GB RAM and Linux note that Monolithic 3D-ICs have lower area 
Subsystem and PowerShell under Windows 10. cost than TSV-based systems. 
The platform is written under C++, Python, and Fluid cooling [7] is one of the most 
Bash. The execution time is measured using advanced methods to reduce the operating 
command time under Linux and Measure- temperature of the system. Although we have 
Command under Windows PowerShell. Here, not explored the ability of this method, it has 
the simulation time of PARSEC and synthetic shown promising efficiency for 3D-ICs [7]. 
benchmarks are not considered because they are With a fast velocity of the fluid, we expect the 
separated from our flow. As shown in Table 4, system can be cooled down significantly. 
all steps in our flow perform under two seconds. However, we would like to note that fluid 
Our method easily outperforms in terms of cooling has unknown reliability which needs to 
execution time the fabrication-based methods be carefully investigated for being widely used. 
which usually take hours regardless of designing, 
fabrication and assembly time [10-12]. 
 5. Conclusion 
 Table 4. Execution time of the proposed flow 
 In this work, we proposed a platform to 
 Work Step Time quickly estimate the power, thermal behavior, 
 Ours Power extraction (one 1.22 s and reliability of 3D-NoC systems. The method 
 benchmark) has shown extremely short execution time. We 
 Floorplan generate 0.095 s also analyze and simulate the reliability of TSV 
 Temperature estimation 81 s and Monolithic 3D-ICs. Furthermore, we 
 (one benchmark) explore and compare different layout strategies 
 Reliability estimation (12 1.12 s and cooling methods. 
 benchmarks) From our experiments with 3D-NoC, we 
 [10] Reliability test 96h can realize that lower index layers have higher 
 [11] The longest step in 1000h operating temperatures and are more critical in 
 reliability test terms of reliability. Although this conclusion 
 [12] Lifetime acceleration test 100-5000h cannot cover all possible cases; this is a 
 consensus of the tested benchmark Based on 
 Although our approach is fater than these experiments, designers can decide their 
real-chip testing [10-12], it cannot as accurate fault-tolerance or thermal dissipation up on 
as the baking tests due to the deviations during their required specification. 
simulation and the potential of manufacturing In the future, advanced cooling techniques 
variation. However, as the close-loop design such as liquid could be investigated. The impact 
flow, having an understand of the potential of DVFS and fault tolerance on performance 
reliability threat is helpful for designers. and thermal behavior also could be studied. 
4.6. Discussion 
 In this section, we would like to discuss Acknowledgments 
some technical details of our methods. 
Advantages and drawbacks are also mentioned This research is funded by the Vietnam 
in this part. National Foundation for Science and 
 In our evaluation, we point out that Technology Development (NAFOSTED) under 
Monolithic has a higher temperature than grant number 102.01-2018.312. 
76 K.N. Dang et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 36, No. 1 (2020) 65-77 
References [10] Hamada, M. Dorothy June, J. William, Roesch, 
 "Evaluating device reliability using wafer-level 
 methodology", CS Mantech Conference, 2008. 
 [1] Khanh N. Dang, Akram Ben Ahmed, Xuan Tu [11] Renesas’s Semiconductor Reliability Handbook 
 Tran, Yuichi Okuyama, Abderazek Ben Abdallah, https://www.renesas.com/us/en/doc/products/others/r
 “A Comprehensive Reliability Assessment of 51zz0001ej0250.pdf/, 2017 (access 17 March 2020). 
 Fault-Resilient Network-on-Chip Using 
 [12] Toshiba’s Reliability Handbook 
 Analytical Model,” IEEE Transactions on Very 
 https://toshiba.semicon-
 Large Scale Integration (VLSI) Systems. 25(11) 
 storage.com/content/dam/toshiba-
 (2017) 3099-3112. 
 ss/shared/docs/design-support/reliability/reliability-
 https://doi.org/10.1109/TVLSI.2017.2736004. handbook-tdsc-en.pdf /, 2018 (access 17 March 2020). 
 [2] K. Banerjee K. Banerjee, S.J. Souri, P. Kapur and [13] Zhang, Runjie, Mircea R. Stan, Kevin 
 K.C. Saraswat, “3-D ICs: A novel chip design for Skadron, “Hotspot 6.0: Validation, 
 improving deep-submicrometer interconnect acceleration and extension”, University of 
 performance and systems-on-chip integration,” Virginia, Tech, Rep, 2015. 
 Proc. IEEE. 89(5) (201) 602-633. 
 [14] Sridhar, Arvind, et al., "3D-ICE: Fast compact 
 https://doi.org/10.1109/5.929647. 
 transient thermal modeling for 3D ICs with inter-
 [3] Khanh N. Dang, Akram Ben Ahmed, Yuichi tier liquid cooling", 2010 IEEE/ACM 
 Okuyama, Abderazek Ben Abdallah, “Scalable International Conference on Computer-Aided 
 design methodology and online algorithm for Design (ICCAD), IEEE, 2010. 
 TSV-cluster defects recovery in highly reliable 
 [15] Scott Ladenheim, Yi-Chung Chen, Milan 
 3D-NoC systems”, IEEE Transactions on 
 Mihajlović, Vasilis F. Pavlidis, "The MTA: An 
 Emerging Topics in Computing, 2017, pp. 1-14 
 Advanced and Versatile Thermal Simulator for 
 (in-press). 
 Integrated Systems", IEEE Transactions on 
 https://doi.org/10.1109/TETC.2017.2762407. 
 Computer-Aided Design of Integrated Circuits 
 [4] Wong, Simon, et al. "Monolithic 3D integrated and Systems 37(12) (2018) 3123-3136. 
 circuits" International Symposium on VLSI https://doi.org/10.1109/TCAD.2018.2789729. 
 Technology, Systems and Applications (VLSI-
 [16] Erdmann, Christophe, et al., "A heterogeneous 
 TSA), IEEE, 2007. 
 3D-IC consisting of two 28 nm FPGA die and 32 
 [5] Y.J. Park et al., “Thermal Analysis for 3D Multi- reconfigurable high-performance data converters", 
 core Processors with Dynamic Frequency IEEE Journal of Solid-State Circuits 50(1) (2014) 
 Scaling”, in IEEE/ACIS 9th Int, Conf, on 258-269. 
 Computer and Information Science, Aug 2010, https://doi.org/10.1109/JSSC.2014.2357432. 
 pp. 69-74. 
 [17] Kahng, B. Andrew, et al., "ORION 2.0: A fast and 
 [6] Van der Plas, Geert, et al., "Design issues and accurate NoC power and area model for early-
 considerations for low-cost 3-D TSV IC stage design space exploration", Design, 
 technology". IEEE Journal of Solid-State Circuits Automation & Test in Europe Conference & 
 46(1) (2010) 293-307. Exhibition, IEEE, 2009. 
 [7] D. Cuesta et al., “Thermal-aware floorplanner for [18] Lee, Seung Eun, and Nader Bagherzadeh, "A high 
 3D IC, including TSVs, liquid microchannels and level power model for Network-on-Chip (NoC) 
 thermal domains optimization,” Applied Soft router", Computers & Electrical Engineering 
 Computing 34 (2015) 164-177. 35(6) (2009) 837-845. 
 https://doi.org/10.1016/j.asoc.2015.04.052. 
 https://doi.org/10.1016/j.compeleceng.2008.11.023. 
 [8] Park, Changyok, "Dummy TSV to improve 
 [19] Lee, Seung Eun, Nader Bagherzadeh, "A variable 
 process uniformity and heat dissipation", U.S. 
 frequency link for a power-aware network-on-
 Patent 10, 181, 454, 15 Jan, 2019. 
 chip (NoC)", Integration 42(4) (2009) 479-485. 
 https://patents.google.com/patent/US2011021545
 https://doi.org/10.1016/j.vlsi.2009.01.002. 
 7A1/en (access 16 March 2020). 
 [20] Lebreton, Hugo, Pascal Vivet, "Power modeling in 
 [9] J.R. Black, “Mass transport of aluminum by 
 SystemC at transaction level, application to a DVFS 
 momentum exchange with conducting 
 architecture", 2008 IEEE Computer Society Annual 
 electrons”, in 6th Annual Reliability Physics 
 Symposium on VLSI, IEEE, 2008. 
 Symposium (IEEE), IEEE, 1967, pp. 148-159. 
 [21] Khanh N. Dang Akram Ben Ahmed, Abderazek 
 Ben Abdallah, Xuan-Tu Tran, “TSV-OCT: A 
 K.N. Dang et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 36, No. 1 (2020) 65-77 77 
 Scalable Online Multiple-TSV Defects [28] Bienia, Christian, et al., "The PARSEC 
 Localization for Real-Time 3-D-IC systems” benchmark suite: Characterization and 
 IEEE Transactions on Very Large Scale architectural implications", Proceedings of the 
 Integration Systems 28(3) (2020) 672 - 685. 17th international conference on Parallel 
 https://doi.org/10.1109/TVLSI.2019.2948878. architectures and compilation techniques, 2008. 
[22] United States of America: Department of Defense, [29] Li, Sheng, et al., "McPAT: an integrated power, area 
 Military Handbook: Reliability Prediction of and timing modeling framework for multicore and 
 Electronic Equipment: MIL-HDBK-217F, 1991. manycore architectures", Proceedings of the 42nd 
[23] J.B. Bowles, “A survey of reliability-prediction Annual IEEE/ACM International Symposium on 
 procedures for microelectronic devices”, IEEE Microarchitecture, 2009. 
 Trans, Rel. 41(1) (1992) 2-12. [30] J. Meng, K. Kawakami, A.K. Coskun, 
 https://doi.org/10.1109/24.126662. “Optimizing energy efficiency of 3-d multicore 
[24] J. Srinivasan et al., “Lifetime reliability: Toward an systems with stacked dram under power and 
 architectural solution”, IEEE Micro. 25(3) (2005) thermal constraints”, in DAC Design Automation 
 70-80. https://doi.org/10.1109/MM.2005.54. Conference 2012, IEEE, 2012, pp. 648-655. 
[25] NanGate Inc., “Nangate Open Cell Library 45nm” [31] Khanh N. Dang, Akram Ben Ahmed, Abderazek 
  2016 (accessed 16 June 2016). Ben Abdallah, Michael Corad Meyer, Xuan-Tu 
[26] NCSU Electronic Design Automation, Tran, “2D Parity Product Code for TSV online 
 “FreePDK3D45 3D-IC process design kit”, fault correction and detection”, REV Journal on 
  Electronics and Communications (in-press). 
 tents/, 2016 (accessed 16 June 2016).  
[27] Binkert, Nathan, et al., "The gem5 simulator", [32] Samal, Sandeep Kumar, et al., "Fast and accurate 
 ACM SIGARCH computer architecture news thermal modeling and optimization for monolithic 
 39(2) (2011) 1-7. 3D ICs", 2014 51st ACM/EDAC/IEEE Design 
 Automation Conference (DAC), IEEE, 2014. 
P 

File đính kèm:

  • pdfthermal_distribution_and_reliability_prediction_for_3d_netwo.pdf