A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs
Three-dimensional convolutional neural networks (3D CNNs) have gained popularity in many complicated computer vision applications. Many customized accelerators based on FPGAs are proposed for 2D CNNs, while very few are for 3D CNNs. Three-D CNNs are far more computationally intensive and the design...
Ausführliche Beschreibung
Autor*in: |
Zhiqiang Liu [verfasserIn] Paul Chow [verfasserIn] Jinwei Xu [verfasserIn] Jingfei Jiang [verfasserIn] Yong Dou [verfasserIn] Jie Zhou [verfasserIn] |
---|
Format: |
E-Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2019 |
---|
Schlagwörter: |
---|
Übergeordnetes Werk: |
In: Electronics - MDPI AG, 2013, 8(2019), 1, p 65 |
---|---|
Übergeordnetes Werk: |
volume:8 ; year:2019 ; number:1, p 65 |
Links: |
---|
DOI / URN: |
10.3390/electronics8010065 |
---|
Katalog-ID: |
DOAJ085167320 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | DOAJ085167320 | ||
003 | DE-627 | ||
005 | 20230311033754.0 | ||
007 | cr uuu---uuuuu | ||
008 | 230311s2019 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.3390/electronics8010065 |2 doi | |
035 | |a (DE-627)DOAJ085167320 | ||
035 | |a (DE-599)DOAJ9f7657db1096408782bb4963b7b4ff78 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
050 | 0 | |a TK7800-8360 | |
100 | 0 | |a Zhiqiang Liu |e verfasserin |4 aut | |
245 | 1 | 2 | |a A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs |
264 | 1 | |c 2019 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a Computermedien |b c |2 rdamedia | ||
338 | |a Online-Ressource |b cr |2 rdacarrier | ||
520 | |a Three-dimensional convolutional neural networks (3D CNNs) have gained popularity in many complicated computer vision applications. Many customized accelerators based on FPGAs are proposed for 2D CNNs, while very few are for 3D CNNs. Three-D CNNs are far more computationally intensive and the design space for 3D CNN acceleration has been further expanded since one more dimension is introduced, making it a big challenge to accelerate 3D CNNs on FPGAs. Motivated by the finding that the computation patterns of 2D and 3D CNNs are very similar, we propose a uniform architecture design for accelerating both 2D and 3D CNNs in this paper. The uniform architecture is based on the idea of mapping convolutions to matrix multiplications. A customized mapping module is developed to generate the feature matrix tilings with no need to store the entire enlarged feature matrix on-chip or off-chip, a splitting strategy is adopted to reconstruct a convolutional layer to adapt to the on-chip memory capacity, and a 2D multiply-and-accumulate (MAC) array is adopted to compute matrix multiplications efficiently. For demonstration, we implement an accelerator prototype with a high-level synthesis (HLS) methodology on a Xilinx VC709 board and test the accelerator on three typical CNN models: AlexNet, VGG16, and C3D. Experimental results show that the accelerator achieves state-of-the-art throughput performance on both 2D and 3D CNNs, with much better energy efficiency than the CPU and GPU. | ||
650 | 4 | |a 2D CNN | |
650 | 4 | |a 3D CNN | |
650 | 4 | |a accelerator | |
650 | 4 | |a uniform architecture | |
650 | 4 | |a FPGA | |
650 | 4 | |a HLS | |
650 | 4 | |a matrix multiplication | |
650 | 4 | |a 2D MAC array | |
653 | 0 | |a Electronics | |
700 | 0 | |a Paul Chow |e verfasserin |4 aut | |
700 | 0 | |a Jinwei Xu |e verfasserin |4 aut | |
700 | 0 | |a Jingfei Jiang |e verfasserin |4 aut | |
700 | 0 | |a Yong Dou |e verfasserin |4 aut | |
700 | 0 | |a Jie Zhou |e verfasserin |4 aut | |
773 | 0 | 8 | |i In |t Electronics |d MDPI AG, 2013 |g 8(2019), 1, p 65 |w (DE-627)718626478 |w (DE-600)2662127-7 |x 20799292 |7 nnns |
773 | 1 | 8 | |g volume:8 |g year:2019 |g number:1, p 65 |
856 | 4 | 0 | |u https://doi.org/10.3390/electronics8010065 |z kostenfrei |
856 | 4 | 0 | |u https://doaj.org/article/9f7657db1096408782bb4963b7b4ff78 |z kostenfrei |
856 | 4 | 0 | |u http://www.mdpi.com/2079-9292/8/1/65 |z kostenfrei |
856 | 4 | 2 | |u https://doaj.org/toc/2079-9292 |y Journal toc |z kostenfrei |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_DOAJ | ||
912 | |a GBV_ILN_20 | ||
912 | |a GBV_ILN_22 | ||
912 | |a GBV_ILN_23 | ||
912 | |a GBV_ILN_24 | ||
912 | |a GBV_ILN_39 | ||
912 | |a GBV_ILN_40 | ||
912 | |a GBV_ILN_60 | ||
912 | |a GBV_ILN_62 | ||
912 | |a GBV_ILN_63 | ||
912 | |a GBV_ILN_65 | ||
912 | |a GBV_ILN_69 | ||
912 | |a GBV_ILN_70 | ||
912 | |a GBV_ILN_73 | ||
912 | |a GBV_ILN_95 | ||
912 | |a GBV_ILN_105 | ||
912 | |a GBV_ILN_110 | ||
912 | |a GBV_ILN_151 | ||
912 | |a GBV_ILN_161 | ||
912 | |a GBV_ILN_170 | ||
912 | |a GBV_ILN_213 | ||
912 | |a GBV_ILN_230 | ||
912 | |a GBV_ILN_285 | ||
912 | |a GBV_ILN_293 | ||
912 | |a GBV_ILN_370 | ||
912 | |a GBV_ILN_602 | ||
912 | |a GBV_ILN_2014 | ||
912 | |a GBV_ILN_4012 | ||
912 | |a GBV_ILN_4037 | ||
912 | |a GBV_ILN_4112 | ||
912 | |a GBV_ILN_4125 | ||
912 | |a GBV_ILN_4126 | ||
912 | |a GBV_ILN_4249 | ||
912 | |a GBV_ILN_4305 | ||
912 | |a GBV_ILN_4306 | ||
912 | |a GBV_ILN_4307 | ||
912 | |a GBV_ILN_4313 | ||
912 | |a GBV_ILN_4322 | ||
912 | |a GBV_ILN_4323 | ||
912 | |a GBV_ILN_4324 | ||
912 | |a GBV_ILN_4325 | ||
912 | |a GBV_ILN_4335 | ||
912 | |a GBV_ILN_4338 | ||
912 | |a GBV_ILN_4367 | ||
912 | |a GBV_ILN_4700 | ||
951 | |a AR | ||
952 | |d 8 |j 2019 |e 1, p 65 |
author_variant |
z l zl p c pc j x jx j j jj y d yd j z jz |
---|---|
matchkey_str |
article:20799292:2019----::uiomrhtcueeinoaclrtn2 |
hierarchy_sort_str |
2019 |
callnumber-subject-code |
TK |
publishDate |
2019 |
allfields |
10.3390/electronics8010065 doi (DE-627)DOAJ085167320 (DE-599)DOAJ9f7657db1096408782bb4963b7b4ff78 DE-627 ger DE-627 rakwb eng TK7800-8360 Zhiqiang Liu verfasserin aut A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs 2019 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Three-dimensional convolutional neural networks (3D CNNs) have gained popularity in many complicated computer vision applications. Many customized accelerators based on FPGAs are proposed for 2D CNNs, while very few are for 3D CNNs. Three-D CNNs are far more computationally intensive and the design space for 3D CNN acceleration has been further expanded since one more dimension is introduced, making it a big challenge to accelerate 3D CNNs on FPGAs. Motivated by the finding that the computation patterns of 2D and 3D CNNs are very similar, we propose a uniform architecture design for accelerating both 2D and 3D CNNs in this paper. The uniform architecture is based on the idea of mapping convolutions to matrix multiplications. A customized mapping module is developed to generate the feature matrix tilings with no need to store the entire enlarged feature matrix on-chip or off-chip, a splitting strategy is adopted to reconstruct a convolutional layer to adapt to the on-chip memory capacity, and a 2D multiply-and-accumulate (MAC) array is adopted to compute matrix multiplications efficiently. For demonstration, we implement an accelerator prototype with a high-level synthesis (HLS) methodology on a Xilinx VC709 board and test the accelerator on three typical CNN models: AlexNet, VGG16, and C3D. Experimental results show that the accelerator achieves state-of-the-art throughput performance on both 2D and 3D CNNs, with much better energy efficiency than the CPU and GPU. 2D CNN 3D CNN accelerator uniform architecture FPGA HLS matrix multiplication 2D MAC array Electronics Paul Chow verfasserin aut Jinwei Xu verfasserin aut Jingfei Jiang verfasserin aut Yong Dou verfasserin aut Jie Zhou verfasserin aut In Electronics MDPI AG, 2013 8(2019), 1, p 65 (DE-627)718626478 (DE-600)2662127-7 20799292 nnns volume:8 year:2019 number:1, p 65 https://doi.org/10.3390/electronics8010065 kostenfrei https://doaj.org/article/9f7657db1096408782bb4963b7b4ff78 kostenfrei http://www.mdpi.com/2079-9292/8/1/65 kostenfrei https://doaj.org/toc/2079-9292 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 8 2019 1, p 65 |
spelling |
10.3390/electronics8010065 doi (DE-627)DOAJ085167320 (DE-599)DOAJ9f7657db1096408782bb4963b7b4ff78 DE-627 ger DE-627 rakwb eng TK7800-8360 Zhiqiang Liu verfasserin aut A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs 2019 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Three-dimensional convolutional neural networks (3D CNNs) have gained popularity in many complicated computer vision applications. Many customized accelerators based on FPGAs are proposed for 2D CNNs, while very few are for 3D CNNs. Three-D CNNs are far more computationally intensive and the design space for 3D CNN acceleration has been further expanded since one more dimension is introduced, making it a big challenge to accelerate 3D CNNs on FPGAs. Motivated by the finding that the computation patterns of 2D and 3D CNNs are very similar, we propose a uniform architecture design for accelerating both 2D and 3D CNNs in this paper. The uniform architecture is based on the idea of mapping convolutions to matrix multiplications. A customized mapping module is developed to generate the feature matrix tilings with no need to store the entire enlarged feature matrix on-chip or off-chip, a splitting strategy is adopted to reconstruct a convolutional layer to adapt to the on-chip memory capacity, and a 2D multiply-and-accumulate (MAC) array is adopted to compute matrix multiplications efficiently. For demonstration, we implement an accelerator prototype with a high-level synthesis (HLS) methodology on a Xilinx VC709 board and test the accelerator on three typical CNN models: AlexNet, VGG16, and C3D. Experimental results show that the accelerator achieves state-of-the-art throughput performance on both 2D and 3D CNNs, with much better energy efficiency than the CPU and GPU. 2D CNN 3D CNN accelerator uniform architecture FPGA HLS matrix multiplication 2D MAC array Electronics Paul Chow verfasserin aut Jinwei Xu verfasserin aut Jingfei Jiang verfasserin aut Yong Dou verfasserin aut Jie Zhou verfasserin aut In Electronics MDPI AG, 2013 8(2019), 1, p 65 (DE-627)718626478 (DE-600)2662127-7 20799292 nnns volume:8 year:2019 number:1, p 65 https://doi.org/10.3390/electronics8010065 kostenfrei https://doaj.org/article/9f7657db1096408782bb4963b7b4ff78 kostenfrei http://www.mdpi.com/2079-9292/8/1/65 kostenfrei https://doaj.org/toc/2079-9292 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 8 2019 1, p 65 |
allfields_unstemmed |
10.3390/electronics8010065 doi (DE-627)DOAJ085167320 (DE-599)DOAJ9f7657db1096408782bb4963b7b4ff78 DE-627 ger DE-627 rakwb eng TK7800-8360 Zhiqiang Liu verfasserin aut A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs 2019 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Three-dimensional convolutional neural networks (3D CNNs) have gained popularity in many complicated computer vision applications. Many customized accelerators based on FPGAs are proposed for 2D CNNs, while very few are for 3D CNNs. Three-D CNNs are far more computationally intensive and the design space for 3D CNN acceleration has been further expanded since one more dimension is introduced, making it a big challenge to accelerate 3D CNNs on FPGAs. Motivated by the finding that the computation patterns of 2D and 3D CNNs are very similar, we propose a uniform architecture design for accelerating both 2D and 3D CNNs in this paper. The uniform architecture is based on the idea of mapping convolutions to matrix multiplications. A customized mapping module is developed to generate the feature matrix tilings with no need to store the entire enlarged feature matrix on-chip or off-chip, a splitting strategy is adopted to reconstruct a convolutional layer to adapt to the on-chip memory capacity, and a 2D multiply-and-accumulate (MAC) array is adopted to compute matrix multiplications efficiently. For demonstration, we implement an accelerator prototype with a high-level synthesis (HLS) methodology on a Xilinx VC709 board and test the accelerator on three typical CNN models: AlexNet, VGG16, and C3D. Experimental results show that the accelerator achieves state-of-the-art throughput performance on both 2D and 3D CNNs, with much better energy efficiency than the CPU and GPU. 2D CNN 3D CNN accelerator uniform architecture FPGA HLS matrix multiplication 2D MAC array Electronics Paul Chow verfasserin aut Jinwei Xu verfasserin aut Jingfei Jiang verfasserin aut Yong Dou verfasserin aut Jie Zhou verfasserin aut In Electronics MDPI AG, 2013 8(2019), 1, p 65 (DE-627)718626478 (DE-600)2662127-7 20799292 nnns volume:8 year:2019 number:1, p 65 https://doi.org/10.3390/electronics8010065 kostenfrei https://doaj.org/article/9f7657db1096408782bb4963b7b4ff78 kostenfrei http://www.mdpi.com/2079-9292/8/1/65 kostenfrei https://doaj.org/toc/2079-9292 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 8 2019 1, p 65 |
allfieldsGer |
10.3390/electronics8010065 doi (DE-627)DOAJ085167320 (DE-599)DOAJ9f7657db1096408782bb4963b7b4ff78 DE-627 ger DE-627 rakwb eng TK7800-8360 Zhiqiang Liu verfasserin aut A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs 2019 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Three-dimensional convolutional neural networks (3D CNNs) have gained popularity in many complicated computer vision applications. Many customized accelerators based on FPGAs are proposed for 2D CNNs, while very few are for 3D CNNs. Three-D CNNs are far more computationally intensive and the design space for 3D CNN acceleration has been further expanded since one more dimension is introduced, making it a big challenge to accelerate 3D CNNs on FPGAs. Motivated by the finding that the computation patterns of 2D and 3D CNNs are very similar, we propose a uniform architecture design for accelerating both 2D and 3D CNNs in this paper. The uniform architecture is based on the idea of mapping convolutions to matrix multiplications. A customized mapping module is developed to generate the feature matrix tilings with no need to store the entire enlarged feature matrix on-chip or off-chip, a splitting strategy is adopted to reconstruct a convolutional layer to adapt to the on-chip memory capacity, and a 2D multiply-and-accumulate (MAC) array is adopted to compute matrix multiplications efficiently. For demonstration, we implement an accelerator prototype with a high-level synthesis (HLS) methodology on a Xilinx VC709 board and test the accelerator on three typical CNN models: AlexNet, VGG16, and C3D. Experimental results show that the accelerator achieves state-of-the-art throughput performance on both 2D and 3D CNNs, with much better energy efficiency than the CPU and GPU. 2D CNN 3D CNN accelerator uniform architecture FPGA HLS matrix multiplication 2D MAC array Electronics Paul Chow verfasserin aut Jinwei Xu verfasserin aut Jingfei Jiang verfasserin aut Yong Dou verfasserin aut Jie Zhou verfasserin aut In Electronics MDPI AG, 2013 8(2019), 1, p 65 (DE-627)718626478 (DE-600)2662127-7 20799292 nnns volume:8 year:2019 number:1, p 65 https://doi.org/10.3390/electronics8010065 kostenfrei https://doaj.org/article/9f7657db1096408782bb4963b7b4ff78 kostenfrei http://www.mdpi.com/2079-9292/8/1/65 kostenfrei https://doaj.org/toc/2079-9292 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 8 2019 1, p 65 |
allfieldsSound |
10.3390/electronics8010065 doi (DE-627)DOAJ085167320 (DE-599)DOAJ9f7657db1096408782bb4963b7b4ff78 DE-627 ger DE-627 rakwb eng TK7800-8360 Zhiqiang Liu verfasserin aut A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs 2019 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Three-dimensional convolutional neural networks (3D CNNs) have gained popularity in many complicated computer vision applications. Many customized accelerators based on FPGAs are proposed for 2D CNNs, while very few are for 3D CNNs. Three-D CNNs are far more computationally intensive and the design space for 3D CNN acceleration has been further expanded since one more dimension is introduced, making it a big challenge to accelerate 3D CNNs on FPGAs. Motivated by the finding that the computation patterns of 2D and 3D CNNs are very similar, we propose a uniform architecture design for accelerating both 2D and 3D CNNs in this paper. The uniform architecture is based on the idea of mapping convolutions to matrix multiplications. A customized mapping module is developed to generate the feature matrix tilings with no need to store the entire enlarged feature matrix on-chip or off-chip, a splitting strategy is adopted to reconstruct a convolutional layer to adapt to the on-chip memory capacity, and a 2D multiply-and-accumulate (MAC) array is adopted to compute matrix multiplications efficiently. For demonstration, we implement an accelerator prototype with a high-level synthesis (HLS) methodology on a Xilinx VC709 board and test the accelerator on three typical CNN models: AlexNet, VGG16, and C3D. Experimental results show that the accelerator achieves state-of-the-art throughput performance on both 2D and 3D CNNs, with much better energy efficiency than the CPU and GPU. 2D CNN 3D CNN accelerator uniform architecture FPGA HLS matrix multiplication 2D MAC array Electronics Paul Chow verfasserin aut Jinwei Xu verfasserin aut Jingfei Jiang verfasserin aut Yong Dou verfasserin aut Jie Zhou verfasserin aut In Electronics MDPI AG, 2013 8(2019), 1, p 65 (DE-627)718626478 (DE-600)2662127-7 20799292 nnns volume:8 year:2019 number:1, p 65 https://doi.org/10.3390/electronics8010065 kostenfrei https://doaj.org/article/9f7657db1096408782bb4963b7b4ff78 kostenfrei http://www.mdpi.com/2079-9292/8/1/65 kostenfrei https://doaj.org/toc/2079-9292 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 8 2019 1, p 65 |
language |
English |
source |
In Electronics 8(2019), 1, p 65 volume:8 year:2019 number:1, p 65 |
sourceStr |
In Electronics 8(2019), 1, p 65 volume:8 year:2019 number:1, p 65 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
2D CNN 3D CNN accelerator uniform architecture FPGA HLS matrix multiplication 2D MAC array Electronics |
isfreeaccess_bool |
true |
container_title |
Electronics |
authorswithroles_txt_mv |
Zhiqiang Liu @@aut@@ Paul Chow @@aut@@ Jinwei Xu @@aut@@ Jingfei Jiang @@aut@@ Yong Dou @@aut@@ Jie Zhou @@aut@@ |
publishDateDaySort_date |
2019-01-01T00:00:00Z |
hierarchy_top_id |
718626478 |
id |
DOAJ085167320 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">DOAJ085167320</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230311033754.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">230311s2019 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.3390/electronics8010065</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)DOAJ085167320</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DOAJ9f7657db1096408782bb4963b7b4ff78</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">TK7800-8360</subfield></datafield><datafield tag="100" ind1="0" ind2=" "><subfield code="a">Zhiqiang Liu</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="2"><subfield code="a">A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2019</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Three-dimensional convolutional neural networks (3D CNNs) have gained popularity in many complicated computer vision applications. Many customized accelerators based on FPGAs are proposed for 2D CNNs, while very few are for 3D CNNs. Three-D CNNs are far more computationally intensive and the design space for 3D CNN acceleration has been further expanded since one more dimension is introduced, making it a big challenge to accelerate 3D CNNs on FPGAs. Motivated by the finding that the computation patterns of 2D and 3D CNNs are very similar, we propose a uniform architecture design for accelerating both 2D and 3D CNNs in this paper. The uniform architecture is based on the idea of mapping convolutions to matrix multiplications. A customized mapping module is developed to generate the feature matrix tilings with no need to store the entire enlarged feature matrix on-chip or off-chip, a splitting strategy is adopted to reconstruct a convolutional layer to adapt to the on-chip memory capacity, and a 2D multiply-and-accumulate (MAC) array is adopted to compute matrix multiplications efficiently. For demonstration, we implement an accelerator prototype with a high-level synthesis (HLS) methodology on a Xilinx VC709 board and test the accelerator on three typical CNN models: AlexNet, VGG16, and C3D. Experimental results show that the accelerator achieves state-of-the-art throughput performance on both 2D and 3D CNNs, with much better energy efficiency than the CPU and GPU.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">2D CNN</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">3D CNN</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">accelerator</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">uniform architecture</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">FPGA</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">HLS</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">matrix multiplication</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">2D MAC array</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Electronics</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Paul Chow</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Jinwei Xu</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Jingfei Jiang</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Yong Dou</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Jie Zhou</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">In</subfield><subfield code="t">Electronics</subfield><subfield code="d">MDPI AG, 2013</subfield><subfield code="g">8(2019), 1, p 65</subfield><subfield code="w">(DE-627)718626478</subfield><subfield code="w">(DE-600)2662127-7</subfield><subfield code="x">20799292</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:8</subfield><subfield code="g">year:2019</subfield><subfield code="g">number:1, p 65</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.3390/electronics8010065</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doaj.org/article/9f7657db1096408782bb4963b7b4ff78</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">http://www.mdpi.com/2079-9292/8/1/65</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">https://doaj.org/toc/2079-9292</subfield><subfield code="y">Journal toc</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_DOAJ</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_20</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_23</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_24</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_39</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_60</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_62</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_63</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_65</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_69</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_73</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_95</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_105</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_110</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_151</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_161</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_170</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_213</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_230</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_285</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_293</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_370</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_602</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2014</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4037</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4112</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4125</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4126</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4249</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4305</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4306</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4313</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4322</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4323</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4325</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4335</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4338</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4367</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4700</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">8</subfield><subfield code="j">2019</subfield><subfield code="e">1, p 65</subfield></datafield></record></collection>
|
callnumber-first |
T - Technology |
author |
Zhiqiang Liu |
spellingShingle |
Zhiqiang Liu misc TK7800-8360 misc 2D CNN misc 3D CNN misc accelerator misc uniform architecture misc FPGA misc HLS misc matrix multiplication misc 2D MAC array misc Electronics A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs |
authorStr |
Zhiqiang Liu |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)718626478 |
format |
electronic Article |
delete_txt_mv |
keep |
author_role |
aut aut aut aut aut aut |
collection |
DOAJ |
remote_str |
true |
callnumber-label |
TK7800-8360 |
illustrated |
Not Illustrated |
issn |
20799292 |
topic_title |
TK7800-8360 A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs 2D CNN 3D CNN accelerator uniform architecture FPGA HLS matrix multiplication 2D MAC array |
topic |
misc TK7800-8360 misc 2D CNN misc 3D CNN misc accelerator misc uniform architecture misc FPGA misc HLS misc matrix multiplication misc 2D MAC array misc Electronics |
topic_unstemmed |
misc TK7800-8360 misc 2D CNN misc 3D CNN misc accelerator misc uniform architecture misc FPGA misc HLS misc matrix multiplication misc 2D MAC array misc Electronics |
topic_browse |
misc TK7800-8360 misc 2D CNN misc 3D CNN misc accelerator misc uniform architecture misc FPGA misc HLS misc matrix multiplication misc 2D MAC array misc Electronics |
format_facet |
Elektronische Aufsätze Aufsätze Elektronische Ressource |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
cr |
hierarchy_parent_title |
Electronics |
hierarchy_parent_id |
718626478 |
hierarchy_top_title |
Electronics |
isfreeaccess_txt |
true |
familylinks_str_mv |
(DE-627)718626478 (DE-600)2662127-7 |
title |
A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs |
ctrlnum |
(DE-627)DOAJ085167320 (DE-599)DOAJ9f7657db1096408782bb4963b7b4ff78 |
title_full |
A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs |
author_sort |
Zhiqiang Liu |
journal |
Electronics |
journalStr |
Electronics |
callnumber-first-code |
T |
lang_code |
eng |
isOA_bool |
true |
recordtype |
marc |
publishDateSort |
2019 |
contenttype_str_mv |
txt |
author_browse |
Zhiqiang Liu Paul Chow Jinwei Xu Jingfei Jiang Yong Dou Jie Zhou |
container_volume |
8 |
class |
TK7800-8360 |
format_se |
Elektronische Aufsätze |
author-letter |
Zhiqiang Liu |
doi_str_mv |
10.3390/electronics8010065 |
author2-role |
verfasserin |
title_sort |
uniform architecture design for accelerating 2d and 3d cnns on fpgas |
callnumber |
TK7800-8360 |
title_auth |
A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs |
abstract |
Three-dimensional convolutional neural networks (3D CNNs) have gained popularity in many complicated computer vision applications. Many customized accelerators based on FPGAs are proposed for 2D CNNs, while very few are for 3D CNNs. Three-D CNNs are far more computationally intensive and the design space for 3D CNN acceleration has been further expanded since one more dimension is introduced, making it a big challenge to accelerate 3D CNNs on FPGAs. Motivated by the finding that the computation patterns of 2D and 3D CNNs are very similar, we propose a uniform architecture design for accelerating both 2D and 3D CNNs in this paper. The uniform architecture is based on the idea of mapping convolutions to matrix multiplications. A customized mapping module is developed to generate the feature matrix tilings with no need to store the entire enlarged feature matrix on-chip or off-chip, a splitting strategy is adopted to reconstruct a convolutional layer to adapt to the on-chip memory capacity, and a 2D multiply-and-accumulate (MAC) array is adopted to compute matrix multiplications efficiently. For demonstration, we implement an accelerator prototype with a high-level synthesis (HLS) methodology on a Xilinx VC709 board and test the accelerator on three typical CNN models: AlexNet, VGG16, and C3D. Experimental results show that the accelerator achieves state-of-the-art throughput performance on both 2D and 3D CNNs, with much better energy efficiency than the CPU and GPU. |
abstractGer |
Three-dimensional convolutional neural networks (3D CNNs) have gained popularity in many complicated computer vision applications. Many customized accelerators based on FPGAs are proposed for 2D CNNs, while very few are for 3D CNNs. Three-D CNNs are far more computationally intensive and the design space for 3D CNN acceleration has been further expanded since one more dimension is introduced, making it a big challenge to accelerate 3D CNNs on FPGAs. Motivated by the finding that the computation patterns of 2D and 3D CNNs are very similar, we propose a uniform architecture design for accelerating both 2D and 3D CNNs in this paper. The uniform architecture is based on the idea of mapping convolutions to matrix multiplications. A customized mapping module is developed to generate the feature matrix tilings with no need to store the entire enlarged feature matrix on-chip or off-chip, a splitting strategy is adopted to reconstruct a convolutional layer to adapt to the on-chip memory capacity, and a 2D multiply-and-accumulate (MAC) array is adopted to compute matrix multiplications efficiently. For demonstration, we implement an accelerator prototype with a high-level synthesis (HLS) methodology on a Xilinx VC709 board and test the accelerator on three typical CNN models: AlexNet, VGG16, and C3D. Experimental results show that the accelerator achieves state-of-the-art throughput performance on both 2D and 3D CNNs, with much better energy efficiency than the CPU and GPU. |
abstract_unstemmed |
Three-dimensional convolutional neural networks (3D CNNs) have gained popularity in many complicated computer vision applications. Many customized accelerators based on FPGAs are proposed for 2D CNNs, while very few are for 3D CNNs. Three-D CNNs are far more computationally intensive and the design space for 3D CNN acceleration has been further expanded since one more dimension is introduced, making it a big challenge to accelerate 3D CNNs on FPGAs. Motivated by the finding that the computation patterns of 2D and 3D CNNs are very similar, we propose a uniform architecture design for accelerating both 2D and 3D CNNs in this paper. The uniform architecture is based on the idea of mapping convolutions to matrix multiplications. A customized mapping module is developed to generate the feature matrix tilings with no need to store the entire enlarged feature matrix on-chip or off-chip, a splitting strategy is adopted to reconstruct a convolutional layer to adapt to the on-chip memory capacity, and a 2D multiply-and-accumulate (MAC) array is adopted to compute matrix multiplications efficiently. For demonstration, we implement an accelerator prototype with a high-level synthesis (HLS) methodology on a Xilinx VC709 board and test the accelerator on three typical CNN models: AlexNet, VGG16, and C3D. Experimental results show that the accelerator achieves state-of-the-art throughput performance on both 2D and 3D CNNs, with much better energy efficiency than the CPU and GPU. |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 |
container_issue |
1, p 65 |
title_short |
A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs |
url |
https://doi.org/10.3390/electronics8010065 https://doaj.org/article/9f7657db1096408782bb4963b7b4ff78 http://www.mdpi.com/2079-9292/8/1/65 https://doaj.org/toc/2079-9292 |
remote_bool |
true |
author2 |
Paul Chow Jinwei Xu Jingfei Jiang Yong Dou Jie Zhou |
author2Str |
Paul Chow Jinwei Xu Jingfei Jiang Yong Dou Jie Zhou |
ppnlink |
718626478 |
callnumber-subject |
TK - Electrical and Nuclear Engineering |
mediatype_str_mv |
c |
isOA_txt |
true |
hochschulschrift_bool |
false |
doi_str |
10.3390/electronics8010065 |
callnumber-a |
TK7800-8360 |
up_date |
2024-07-04T02:05:53.553Z |
_version_ |
1803612327567687680 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">DOAJ085167320</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230311033754.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">230311s2019 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.3390/electronics8010065</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)DOAJ085167320</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DOAJ9f7657db1096408782bb4963b7b4ff78</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">TK7800-8360</subfield></datafield><datafield tag="100" ind1="0" ind2=" "><subfield code="a">Zhiqiang Liu</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="2"><subfield code="a">A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2019</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Three-dimensional convolutional neural networks (3D CNNs) have gained popularity in many complicated computer vision applications. Many customized accelerators based on FPGAs are proposed for 2D CNNs, while very few are for 3D CNNs. Three-D CNNs are far more computationally intensive and the design space for 3D CNN acceleration has been further expanded since one more dimension is introduced, making it a big challenge to accelerate 3D CNNs on FPGAs. Motivated by the finding that the computation patterns of 2D and 3D CNNs are very similar, we propose a uniform architecture design for accelerating both 2D and 3D CNNs in this paper. The uniform architecture is based on the idea of mapping convolutions to matrix multiplications. A customized mapping module is developed to generate the feature matrix tilings with no need to store the entire enlarged feature matrix on-chip or off-chip, a splitting strategy is adopted to reconstruct a convolutional layer to adapt to the on-chip memory capacity, and a 2D multiply-and-accumulate (MAC) array is adopted to compute matrix multiplications efficiently. For demonstration, we implement an accelerator prototype with a high-level synthesis (HLS) methodology on a Xilinx VC709 board and test the accelerator on three typical CNN models: AlexNet, VGG16, and C3D. Experimental results show that the accelerator achieves state-of-the-art throughput performance on both 2D and 3D CNNs, with much better energy efficiency than the CPU and GPU.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">2D CNN</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">3D CNN</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">accelerator</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">uniform architecture</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">FPGA</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">HLS</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">matrix multiplication</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">2D MAC array</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Electronics</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Paul Chow</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Jinwei Xu</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Jingfei Jiang</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Yong Dou</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Jie Zhou</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">In</subfield><subfield code="t">Electronics</subfield><subfield code="d">MDPI AG, 2013</subfield><subfield code="g">8(2019), 1, p 65</subfield><subfield code="w">(DE-627)718626478</subfield><subfield code="w">(DE-600)2662127-7</subfield><subfield code="x">20799292</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:8</subfield><subfield code="g">year:2019</subfield><subfield code="g">number:1, p 65</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.3390/electronics8010065</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doaj.org/article/9f7657db1096408782bb4963b7b4ff78</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">http://www.mdpi.com/2079-9292/8/1/65</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">https://doaj.org/toc/2079-9292</subfield><subfield code="y">Journal toc</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_DOAJ</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_20</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_23</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_24</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_39</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_60</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_62</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_63</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_65</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_69</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_73</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_95</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_105</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_110</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_151</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_161</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_170</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_213</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_230</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_285</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_293</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_370</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_602</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2014</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4037</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4112</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4125</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4126</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4249</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4305</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4306</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4313</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4322</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4323</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4325</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4335</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4338</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4367</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4700</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">8</subfield><subfield code="j">2019</subfield><subfield code="e">1, p 65</subfield></datafield></record></collection>
|
score |
7.3976746 |