ShiDianNao
In recent years, neural network accelerators have been shown to achieve both high energy efficiency and high performance for a broad application scope within the important category of recognition and mining applications. Still, both the energy efficiency and performance of such accelerators remain l...
Ausführliche Beschreibung
Autor*in: |
Du, Zidong [verfasserIn] |
---|
Format: |
Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2016 |
---|
Übergeordnetes Werk: |
Enthalten in: Computer architecture news - New York, NY : ACM, 1972, 43(2016), 3, Seite 92-104 |
---|---|
Übergeordnetes Werk: |
volume:43 ; year:2016 ; number:3 ; pages:92-104 |
Links: |
---|
DOI / URN: |
10.1145/2872887.2750389 |
---|
Katalog-ID: |
OLC1973673711 |
---|
LEADER | 01000caa a2200265 4500 | ||
---|---|---|---|
001 | OLC1973673711 | ||
003 | DE-627 | ||
005 | 20230714184925.0 | ||
007 | tu | ||
008 | 160430s2016 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1145/2872887.2750389 |2 doi | |
028 | 5 | 2 | |a PQ20160430 |
035 | |a (DE-627)OLC1973673711 | ||
035 | |a (DE-599)GBVOLC1973673711 | ||
035 | |a (PRQ)acm_primary_27503890 | ||
035 | |a (KEY)0040085820160000043000300092shidiannao | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 004 |q DNB |
100 | 1 | |a Du, Zidong |e verfasserin |4 aut | |
245 | 1 | 0 | |a ShiDianNao |
264 | 1 | |c 2016 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
520 | |a In recent years, neural network accelerators have been shown to achieve both high energy efficiency and high performance for a broad application scope within the important category of recognition and mining applications. Still, both the energy efficiency and performance of such accelerators remain limited by memory accesses. In this paper, we focus on image applications, arguably the most important category among recognition and mining applications. The neural networks which are state-of-the-art for these applications are Convolutional Neural Networks (CNN), and they have an important property: weights are shared among many neurons, considerably reducing the neural network memory footprint. This property allows to entirely map a CNN within an SRAM, eliminating all DRAM accesses for weights. By further hoisting this accelerator next to the image sensor, it is possible to eliminate all remaining DRAM accesses, i.e., for inputs and outputs. In this paper, we propose such a CNN accelerator, placed next to a CMOS or CCD sensor. The absence of DRAM accesses combined with a careful exploitation of the specific data access patterns within CNNs allows us to design an accelerator which is 60× more energy efficient than the previous state-of-the-art neural network accelerator. We present a full design down to the layout at 65 nm, with a modest footprint of 4.86mm 2 and consuming only 320mW, but still about 30× faster than high-end GPUs. | ||
700 | 1 | |a Fasthuber, Robert |4 oth | |
700 | 1 | |a Chen, Tianshi |4 oth | |
700 | 1 | |a Ienne, Paolo |4 oth | |
700 | 1 | |a Li, Ling |4 oth | |
700 | 1 | |a Luo, Tao |4 oth | |
700 | 1 | |a Feng, Xiaobing |4 oth | |
700 | 1 | |a Chen, Yunji |4 oth | |
700 | 1 | |a Temam, Olivier |4 oth | |
773 | 0 | 8 | |i Enthalten in |t Computer architecture news |d New York, NY : ACM, 1972 |g 43(2016), 3, Seite 92-104 |w (DE-627)129397881 |w (DE-600)186012-4 |w (DE-576)014781093 |x 0163-5964 |7 nnns |
773 | 1 | 8 | |g volume:43 |g year:2016 |g number:3 |g pages:92-104 |
856 | 4 | 1 | |u http://dx.doi.org/10.1145/2872887.2750389 |3 Volltext |
856 | 4 | 2 | |u http://dl.acm.org/citation.cfm?id=2750389 |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a SSG-OLC-MAT | ||
912 | |a GBV_ILN_70 | ||
912 | |a GBV_ILN_134 | ||
912 | |a GBV_ILN_2021 | ||
912 | |a GBV_ILN_2190 | ||
951 | |a AR | ||
952 | |d 43 |j 2016 |e 3 |h 92-104 |
author_variant |
z d zd |
---|---|
matchkey_str |
article:01635964:2016----::hda |
hierarchy_sort_str |
2016 |
publishDate |
2016 |
allfields |
10.1145/2872887.2750389 doi PQ20160430 (DE-627)OLC1973673711 (DE-599)GBVOLC1973673711 (PRQ)acm_primary_27503890 (KEY)0040085820160000043000300092shidiannao DE-627 ger DE-627 rakwb eng 004 DNB Du, Zidong verfasserin aut ShiDianNao 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier In recent years, neural network accelerators have been shown to achieve both high energy efficiency and high performance for a broad application scope within the important category of recognition and mining applications. Still, both the energy efficiency and performance of such accelerators remain limited by memory accesses. In this paper, we focus on image applications, arguably the most important category among recognition and mining applications. The neural networks which are state-of-the-art for these applications are Convolutional Neural Networks (CNN), and they have an important property: weights are shared among many neurons, considerably reducing the neural network memory footprint. This property allows to entirely map a CNN within an SRAM, eliminating all DRAM accesses for weights. By further hoisting this accelerator next to the image sensor, it is possible to eliminate all remaining DRAM accesses, i.e., for inputs and outputs. In this paper, we propose such a CNN accelerator, placed next to a CMOS or CCD sensor. The absence of DRAM accesses combined with a careful exploitation of the specific data access patterns within CNNs allows us to design an accelerator which is 60× more energy efficient than the previous state-of-the-art neural network accelerator. We present a full design down to the layout at 65 nm, with a modest footprint of 4.86mm 2 and consuming only 320mW, but still about 30× faster than high-end GPUs. Fasthuber, Robert oth Chen, Tianshi oth Ienne, Paolo oth Li, Ling oth Luo, Tao oth Feng, Xiaobing oth Chen, Yunji oth Temam, Olivier oth Enthalten in Computer architecture news New York, NY : ACM, 1972 43(2016), 3, Seite 92-104 (DE-627)129397881 (DE-600)186012-4 (DE-576)014781093 0163-5964 nnns volume:43 year:2016 number:3 pages:92-104 http://dx.doi.org/10.1145/2872887.2750389 Volltext http://dl.acm.org/citation.cfm?id=2750389 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_70 GBV_ILN_134 GBV_ILN_2021 GBV_ILN_2190 AR 43 2016 3 92-104 |
spelling |
10.1145/2872887.2750389 doi PQ20160430 (DE-627)OLC1973673711 (DE-599)GBVOLC1973673711 (PRQ)acm_primary_27503890 (KEY)0040085820160000043000300092shidiannao DE-627 ger DE-627 rakwb eng 004 DNB Du, Zidong verfasserin aut ShiDianNao 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier In recent years, neural network accelerators have been shown to achieve both high energy efficiency and high performance for a broad application scope within the important category of recognition and mining applications. Still, both the energy efficiency and performance of such accelerators remain limited by memory accesses. In this paper, we focus on image applications, arguably the most important category among recognition and mining applications. The neural networks which are state-of-the-art for these applications are Convolutional Neural Networks (CNN), and they have an important property: weights are shared among many neurons, considerably reducing the neural network memory footprint. This property allows to entirely map a CNN within an SRAM, eliminating all DRAM accesses for weights. By further hoisting this accelerator next to the image sensor, it is possible to eliminate all remaining DRAM accesses, i.e., for inputs and outputs. In this paper, we propose such a CNN accelerator, placed next to a CMOS or CCD sensor. The absence of DRAM accesses combined with a careful exploitation of the specific data access patterns within CNNs allows us to design an accelerator which is 60× more energy efficient than the previous state-of-the-art neural network accelerator. We present a full design down to the layout at 65 nm, with a modest footprint of 4.86mm 2 and consuming only 320mW, but still about 30× faster than high-end GPUs. Fasthuber, Robert oth Chen, Tianshi oth Ienne, Paolo oth Li, Ling oth Luo, Tao oth Feng, Xiaobing oth Chen, Yunji oth Temam, Olivier oth Enthalten in Computer architecture news New York, NY : ACM, 1972 43(2016), 3, Seite 92-104 (DE-627)129397881 (DE-600)186012-4 (DE-576)014781093 0163-5964 nnns volume:43 year:2016 number:3 pages:92-104 http://dx.doi.org/10.1145/2872887.2750389 Volltext http://dl.acm.org/citation.cfm?id=2750389 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_70 GBV_ILN_134 GBV_ILN_2021 GBV_ILN_2190 AR 43 2016 3 92-104 |
allfields_unstemmed |
10.1145/2872887.2750389 doi PQ20160430 (DE-627)OLC1973673711 (DE-599)GBVOLC1973673711 (PRQ)acm_primary_27503890 (KEY)0040085820160000043000300092shidiannao DE-627 ger DE-627 rakwb eng 004 DNB Du, Zidong verfasserin aut ShiDianNao 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier In recent years, neural network accelerators have been shown to achieve both high energy efficiency and high performance for a broad application scope within the important category of recognition and mining applications. Still, both the energy efficiency and performance of such accelerators remain limited by memory accesses. In this paper, we focus on image applications, arguably the most important category among recognition and mining applications. The neural networks which are state-of-the-art for these applications are Convolutional Neural Networks (CNN), and they have an important property: weights are shared among many neurons, considerably reducing the neural network memory footprint. This property allows to entirely map a CNN within an SRAM, eliminating all DRAM accesses for weights. By further hoisting this accelerator next to the image sensor, it is possible to eliminate all remaining DRAM accesses, i.e., for inputs and outputs. In this paper, we propose such a CNN accelerator, placed next to a CMOS or CCD sensor. The absence of DRAM accesses combined with a careful exploitation of the specific data access patterns within CNNs allows us to design an accelerator which is 60× more energy efficient than the previous state-of-the-art neural network accelerator. We present a full design down to the layout at 65 nm, with a modest footprint of 4.86mm 2 and consuming only 320mW, but still about 30× faster than high-end GPUs. Fasthuber, Robert oth Chen, Tianshi oth Ienne, Paolo oth Li, Ling oth Luo, Tao oth Feng, Xiaobing oth Chen, Yunji oth Temam, Olivier oth Enthalten in Computer architecture news New York, NY : ACM, 1972 43(2016), 3, Seite 92-104 (DE-627)129397881 (DE-600)186012-4 (DE-576)014781093 0163-5964 nnns volume:43 year:2016 number:3 pages:92-104 http://dx.doi.org/10.1145/2872887.2750389 Volltext http://dl.acm.org/citation.cfm?id=2750389 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_70 GBV_ILN_134 GBV_ILN_2021 GBV_ILN_2190 AR 43 2016 3 92-104 |
allfieldsGer |
10.1145/2872887.2750389 doi PQ20160430 (DE-627)OLC1973673711 (DE-599)GBVOLC1973673711 (PRQ)acm_primary_27503890 (KEY)0040085820160000043000300092shidiannao DE-627 ger DE-627 rakwb eng 004 DNB Du, Zidong verfasserin aut ShiDianNao 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier In recent years, neural network accelerators have been shown to achieve both high energy efficiency and high performance for a broad application scope within the important category of recognition and mining applications. Still, both the energy efficiency and performance of such accelerators remain limited by memory accesses. In this paper, we focus on image applications, arguably the most important category among recognition and mining applications. The neural networks which are state-of-the-art for these applications are Convolutional Neural Networks (CNN), and they have an important property: weights are shared among many neurons, considerably reducing the neural network memory footprint. This property allows to entirely map a CNN within an SRAM, eliminating all DRAM accesses for weights. By further hoisting this accelerator next to the image sensor, it is possible to eliminate all remaining DRAM accesses, i.e., for inputs and outputs. In this paper, we propose such a CNN accelerator, placed next to a CMOS or CCD sensor. The absence of DRAM accesses combined with a careful exploitation of the specific data access patterns within CNNs allows us to design an accelerator which is 60× more energy efficient than the previous state-of-the-art neural network accelerator. We present a full design down to the layout at 65 nm, with a modest footprint of 4.86mm 2 and consuming only 320mW, but still about 30× faster than high-end GPUs. Fasthuber, Robert oth Chen, Tianshi oth Ienne, Paolo oth Li, Ling oth Luo, Tao oth Feng, Xiaobing oth Chen, Yunji oth Temam, Olivier oth Enthalten in Computer architecture news New York, NY : ACM, 1972 43(2016), 3, Seite 92-104 (DE-627)129397881 (DE-600)186012-4 (DE-576)014781093 0163-5964 nnns volume:43 year:2016 number:3 pages:92-104 http://dx.doi.org/10.1145/2872887.2750389 Volltext http://dl.acm.org/citation.cfm?id=2750389 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_70 GBV_ILN_134 GBV_ILN_2021 GBV_ILN_2190 AR 43 2016 3 92-104 |
allfieldsSound |
10.1145/2872887.2750389 doi PQ20160430 (DE-627)OLC1973673711 (DE-599)GBVOLC1973673711 (PRQ)acm_primary_27503890 (KEY)0040085820160000043000300092shidiannao DE-627 ger DE-627 rakwb eng 004 DNB Du, Zidong verfasserin aut ShiDianNao 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier In recent years, neural network accelerators have been shown to achieve both high energy efficiency and high performance for a broad application scope within the important category of recognition and mining applications. Still, both the energy efficiency and performance of such accelerators remain limited by memory accesses. In this paper, we focus on image applications, arguably the most important category among recognition and mining applications. The neural networks which are state-of-the-art for these applications are Convolutional Neural Networks (CNN), and they have an important property: weights are shared among many neurons, considerably reducing the neural network memory footprint. This property allows to entirely map a CNN within an SRAM, eliminating all DRAM accesses for weights. By further hoisting this accelerator next to the image sensor, it is possible to eliminate all remaining DRAM accesses, i.e., for inputs and outputs. In this paper, we propose such a CNN accelerator, placed next to a CMOS or CCD sensor. The absence of DRAM accesses combined with a careful exploitation of the specific data access patterns within CNNs allows us to design an accelerator which is 60× more energy efficient than the previous state-of-the-art neural network accelerator. We present a full design down to the layout at 65 nm, with a modest footprint of 4.86mm 2 and consuming only 320mW, but still about 30× faster than high-end GPUs. Fasthuber, Robert oth Chen, Tianshi oth Ienne, Paolo oth Li, Ling oth Luo, Tao oth Feng, Xiaobing oth Chen, Yunji oth Temam, Olivier oth Enthalten in Computer architecture news New York, NY : ACM, 1972 43(2016), 3, Seite 92-104 (DE-627)129397881 (DE-600)186012-4 (DE-576)014781093 0163-5964 nnns volume:43 year:2016 number:3 pages:92-104 http://dx.doi.org/10.1145/2872887.2750389 Volltext http://dl.acm.org/citation.cfm?id=2750389 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_70 GBV_ILN_134 GBV_ILN_2021 GBV_ILN_2190 AR 43 2016 3 92-104 |
language |
English |
source |
Enthalten in Computer architecture news 43(2016), 3, Seite 92-104 volume:43 year:2016 number:3 pages:92-104 |
sourceStr |
Enthalten in Computer architecture news 43(2016), 3, Seite 92-104 volume:43 year:2016 number:3 pages:92-104 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
dewey-raw |
004 |
isfreeaccess_bool |
false |
container_title |
Computer architecture news |
authorswithroles_txt_mv |
Du, Zidong @@aut@@ Fasthuber, Robert @@oth@@ Chen, Tianshi @@oth@@ Ienne, Paolo @@oth@@ Li, Ling @@oth@@ Luo, Tao @@oth@@ Feng, Xiaobing @@oth@@ Chen, Yunji @@oth@@ Temam, Olivier @@oth@@ |
publishDateDaySort_date |
2016-01-01T00:00:00Z |
hierarchy_top_id |
129397881 |
dewey-sort |
14 |
id |
OLC1973673711 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a2200265 4500</leader><controlfield tag="001">OLC1973673711</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230714184925.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">160430s2016 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1145/2872887.2750389</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">PQ20160430</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC1973673711</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBVOLC1973673711</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(PRQ)acm_primary_27503890</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(KEY)0040085820160000043000300092shidiannao</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">DNB</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Du, Zidong</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">ShiDianNao</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2016</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">In recent years, neural network accelerators have been shown to achieve both high energy efficiency and high performance for a broad application scope within the important category of recognition and mining applications. Still, both the energy efficiency and performance of such accelerators remain limited by memory accesses. In this paper, we focus on image applications, arguably the most important category among recognition and mining applications. The neural networks which are state-of-the-art for these applications are Convolutional Neural Networks (CNN), and they have an important property: weights are shared among many neurons, considerably reducing the neural network memory footprint. This property allows to entirely map a CNN within an SRAM, eliminating all DRAM accesses for weights. By further hoisting this accelerator next to the image sensor, it is possible to eliminate all remaining DRAM accesses, i.e., for inputs and outputs. In this paper, we propose such a CNN accelerator, placed next to a CMOS or CCD sensor. The absence of DRAM accesses combined with a careful exploitation of the specific data access patterns within CNNs allows us to design an accelerator which is 60&times more energy efficient than the previous state-of-the-art neural network accelerator. We present a full design down to the layout at 65 nm, with a modest footprint of 4.86mm 2 and consuming only 320mW, but still about 30× faster than high-end GPUs.</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Fasthuber, Robert</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Chen, Tianshi</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ienne, Paolo</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Li, Ling</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Luo, Tao</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Feng, Xiaobing</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Chen, Yunji</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Temam, Olivier</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Computer architecture news</subfield><subfield code="d">New York, NY : ACM, 1972</subfield><subfield code="g">43(2016), 3, Seite 92-104</subfield><subfield code="w">(DE-627)129397881</subfield><subfield code="w">(DE-600)186012-4</subfield><subfield code="w">(DE-576)014781093</subfield><subfield code="x">0163-5964</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:43</subfield><subfield code="g">year:2016</subfield><subfield code="g">number:3</subfield><subfield code="g">pages:92-104</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">http://dx.doi.org/10.1145/2872887.2750389</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://dl.acm.org/citation.cfm?id=2750389</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_134</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2021</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2190</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">43</subfield><subfield code="j">2016</subfield><subfield code="e">3</subfield><subfield code="h">92-104</subfield></datafield></record></collection>
|
author |
Du, Zidong |
spellingShingle |
Du, Zidong ddc 004 ShiDianNao |
authorStr |
Du, Zidong |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)129397881 |
format |
Article |
dewey-ones |
004 - Data processing & computer science |
delete_txt_mv |
keep |
author_role |
aut |
collection |
OLC |
remote_str |
false |
illustrated |
Not Illustrated |
issn |
0163-5964 |
topic_title |
004 DNB ShiDianNao |
topic |
ddc 004 |
topic_unstemmed |
ddc 004 |
topic_browse |
ddc 004 |
format_facet |
Aufsätze Gedruckte Aufsätze |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
nc |
author2_variant |
r f rf t c tc p i pi l l ll t l tl x f xf y c yc o t ot |
hierarchy_parent_title |
Computer architecture news |
hierarchy_parent_id |
129397881 |
dewey-tens |
000 - Computer science, knowledge & systems |
hierarchy_top_title |
Computer architecture news |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)129397881 (DE-600)186012-4 (DE-576)014781093 |
title |
ShiDianNao |
ctrlnum |
(DE-627)OLC1973673711 (DE-599)GBVOLC1973673711 (PRQ)acm_primary_27503890 (KEY)0040085820160000043000300092shidiannao |
title_full |
ShiDianNao |
author_sort |
Du, Zidong |
journal |
Computer architecture news |
journalStr |
Computer architecture news |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
000 - Computer science, information & general works |
recordtype |
marc |
publishDateSort |
2016 |
contenttype_str_mv |
txt |
container_start_page |
92 |
author_browse |
Du, Zidong |
container_volume |
43 |
class |
004 DNB |
format_se |
Aufsätze |
author-letter |
Du, Zidong |
doi_str_mv |
10.1145/2872887.2750389 |
dewey-full |
004 |
title_sort |
shidiannao |
title_auth |
ShiDianNao |
abstract |
In recent years, neural network accelerators have been shown to achieve both high energy efficiency and high performance for a broad application scope within the important category of recognition and mining applications. Still, both the energy efficiency and performance of such accelerators remain limited by memory accesses. In this paper, we focus on image applications, arguably the most important category among recognition and mining applications. The neural networks which are state-of-the-art for these applications are Convolutional Neural Networks (CNN), and they have an important property: weights are shared among many neurons, considerably reducing the neural network memory footprint. This property allows to entirely map a CNN within an SRAM, eliminating all DRAM accesses for weights. By further hoisting this accelerator next to the image sensor, it is possible to eliminate all remaining DRAM accesses, i.e., for inputs and outputs. In this paper, we propose such a CNN accelerator, placed next to a CMOS or CCD sensor. The absence of DRAM accesses combined with a careful exploitation of the specific data access patterns within CNNs allows us to design an accelerator which is 60× more energy efficient than the previous state-of-the-art neural network accelerator. We present a full design down to the layout at 65 nm, with a modest footprint of 4.86mm 2 and consuming only 320mW, but still about 30× faster than high-end GPUs. |
abstractGer |
In recent years, neural network accelerators have been shown to achieve both high energy efficiency and high performance for a broad application scope within the important category of recognition and mining applications. Still, both the energy efficiency and performance of such accelerators remain limited by memory accesses. In this paper, we focus on image applications, arguably the most important category among recognition and mining applications. The neural networks which are state-of-the-art for these applications are Convolutional Neural Networks (CNN), and they have an important property: weights are shared among many neurons, considerably reducing the neural network memory footprint. This property allows to entirely map a CNN within an SRAM, eliminating all DRAM accesses for weights. By further hoisting this accelerator next to the image sensor, it is possible to eliminate all remaining DRAM accesses, i.e., for inputs and outputs. In this paper, we propose such a CNN accelerator, placed next to a CMOS or CCD sensor. The absence of DRAM accesses combined with a careful exploitation of the specific data access patterns within CNNs allows us to design an accelerator which is 60× more energy efficient than the previous state-of-the-art neural network accelerator. We present a full design down to the layout at 65 nm, with a modest footprint of 4.86mm 2 and consuming only 320mW, but still about 30× faster than high-end GPUs. |
abstract_unstemmed |
In recent years, neural network accelerators have been shown to achieve both high energy efficiency and high performance for a broad application scope within the important category of recognition and mining applications. Still, both the energy efficiency and performance of such accelerators remain limited by memory accesses. In this paper, we focus on image applications, arguably the most important category among recognition and mining applications. The neural networks which are state-of-the-art for these applications are Convolutional Neural Networks (CNN), and they have an important property: weights are shared among many neurons, considerably reducing the neural network memory footprint. This property allows to entirely map a CNN within an SRAM, eliminating all DRAM accesses for weights. By further hoisting this accelerator next to the image sensor, it is possible to eliminate all remaining DRAM accesses, i.e., for inputs and outputs. In this paper, we propose such a CNN accelerator, placed next to a CMOS or CCD sensor. The absence of DRAM accesses combined with a careful exploitation of the specific data access patterns within CNNs allows us to design an accelerator which is 60× more energy efficient than the previous state-of-the-art neural network accelerator. We present a full design down to the layout at 65 nm, with a modest footprint of 4.86mm 2 and consuming only 320mW, but still about 30× faster than high-end GPUs. |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_70 GBV_ILN_134 GBV_ILN_2021 GBV_ILN_2190 |
container_issue |
3 |
title_short |
ShiDianNao |
url |
http://dx.doi.org/10.1145/2872887.2750389 http://dl.acm.org/citation.cfm?id=2750389 |
remote_bool |
false |
author2 |
Fasthuber, Robert Chen, Tianshi Ienne, Paolo Li, Ling Luo, Tao Feng, Xiaobing Chen, Yunji Temam, Olivier |
author2Str |
Fasthuber, Robert Chen, Tianshi Ienne, Paolo Li, Ling Luo, Tao Feng, Xiaobing Chen, Yunji Temam, Olivier |
ppnlink |
129397881 |
mediatype_str_mv |
n |
isOA_txt |
false |
hochschulschrift_bool |
false |
author2_role |
oth oth oth oth oth oth oth oth |
doi_str |
10.1145/2872887.2750389 |
up_date |
2024-07-04T02:55:17.852Z |
_version_ |
1803615435859427328 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a2200265 4500</leader><controlfield tag="001">OLC1973673711</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230714184925.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">160430s2016 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1145/2872887.2750389</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">PQ20160430</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC1973673711</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBVOLC1973673711</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(PRQ)acm_primary_27503890</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(KEY)0040085820160000043000300092shidiannao</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">DNB</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Du, Zidong</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">ShiDianNao</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2016</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">In recent years, neural network accelerators have been shown to achieve both high energy efficiency and high performance for a broad application scope within the important category of recognition and mining applications. Still, both the energy efficiency and performance of such accelerators remain limited by memory accesses. In this paper, we focus on image applications, arguably the most important category among recognition and mining applications. The neural networks which are state-of-the-art for these applications are Convolutional Neural Networks (CNN), and they have an important property: weights are shared among many neurons, considerably reducing the neural network memory footprint. This property allows to entirely map a CNN within an SRAM, eliminating all DRAM accesses for weights. By further hoisting this accelerator next to the image sensor, it is possible to eliminate all remaining DRAM accesses, i.e., for inputs and outputs. In this paper, we propose such a CNN accelerator, placed next to a CMOS or CCD sensor. The absence of DRAM accesses combined with a careful exploitation of the specific data access patterns within CNNs allows us to design an accelerator which is 60&times more energy efficient than the previous state-of-the-art neural network accelerator. We present a full design down to the layout at 65 nm, with a modest footprint of 4.86mm 2 and consuming only 320mW, but still about 30× faster than high-end GPUs.</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Fasthuber, Robert</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Chen, Tianshi</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ienne, Paolo</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Li, Ling</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Luo, Tao</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Feng, Xiaobing</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Chen, Yunji</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Temam, Olivier</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Computer architecture news</subfield><subfield code="d">New York, NY : ACM, 1972</subfield><subfield code="g">43(2016), 3, Seite 92-104</subfield><subfield code="w">(DE-627)129397881</subfield><subfield code="w">(DE-600)186012-4</subfield><subfield code="w">(DE-576)014781093</subfield><subfield code="x">0163-5964</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:43</subfield><subfield code="g">year:2016</subfield><subfield code="g">number:3</subfield><subfield code="g">pages:92-104</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">http://dx.doi.org/10.1145/2872887.2750389</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://dl.acm.org/citation.cfm?id=2750389</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_134</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2021</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2190</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">43</subfield><subfield code="j">2016</subfield><subfield code="e">3</subfield><subfield code="h">92-104</subfield></datafield></record></collection>
|
score |
7.3976517 |