The position-based compression techniques for DNN model
Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to co...
Ausführliche Beschreibung
Autor*in: |
Tang, Minghua [verfasserIn] |
---|
Format: |
Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2023 |
---|
Schlagwörter: |
---|
Anmerkung: |
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. |
---|
Übergeordnetes Werk: |
Enthalten in: The journal of supercomputing - Springer US, 1987, 79(2023), 15 vom: 08. Mai, Seite 17445-17474 |
---|---|
Übergeordnetes Werk: |
volume:79 ; year:2023 ; number:15 ; day:08 ; month:05 ; pages:17445-17474 |
Links: |
---|
DOI / URN: |
10.1007/s11227-023-05339-4 |
---|
Katalog-ID: |
OLC2145310800 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | OLC2145310800 | ||
003 | DE-627 | ||
005 | 20240118103847.0 | ||
007 | tu | ||
008 | 240118s2023 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1007/s11227-023-05339-4 |2 doi | |
035 | |a (DE-627)OLC2145310800 | ||
035 | |a (DE-He213)s11227-023-05339-4-p | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 004 |a 620 |q VZ |
100 | 1 | |a Tang, Minghua |e verfasserin |4 aut | |
245 | 1 | 0 | |a The position-based compression techniques for DNN model |
264 | 1 | |c 2023 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
500 | |a © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. | ||
520 | |a Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique. | ||
650 | 4 | |a Deep neural networks | |
650 | 4 | |a Deep neural network accelerator | |
650 | 4 | |a Weights compression | |
700 | 1 | |a Russo, Enrico |4 aut | |
700 | 1 | |a Palesi, Maurizio |4 aut | |
773 | 0 | 8 | |i Enthalten in |t The journal of supercomputing |d Springer US, 1987 |g 79(2023), 15 vom: 08. Mai, Seite 17445-17474 |w (DE-627)13046466X |w (DE-600)740510-8 |w (DE-576)018667775 |x 0920-8542 |7 nnns |
773 | 1 | 8 | |g volume:79 |g year:2023 |g number:15 |g day:08 |g month:05 |g pages:17445-17474 |
856 | 4 | 1 | |u https://doi.org/10.1007/s11227-023-05339-4 |z lizenzpflichtig |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a SSG-OLC-TEC | ||
912 | |a SSG-OLC-MAT | ||
951 | |a AR | ||
952 | |d 79 |j 2023 |e 15 |b 08 |c 05 |h 17445-17474 |
author_variant |
m t mt e r er m p mp |
---|---|
matchkey_str |
article:09208542:2023----::hpstobsdopesotcnqe |
hierarchy_sort_str |
2023 |
publishDate |
2023 |
allfields |
10.1007/s11227-023-05339-4 doi (DE-627)OLC2145310800 (DE-He213)s11227-023-05339-4-p DE-627 ger DE-627 rakwb eng 004 620 VZ Tang, Minghua verfasserin aut The position-based compression techniques for DNN model 2023 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique. Deep neural networks Deep neural network accelerator Weights compression Russo, Enrico aut Palesi, Maurizio aut Enthalten in The journal of supercomputing Springer US, 1987 79(2023), 15 vom: 08. Mai, Seite 17445-17474 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:79 year:2023 number:15 day:08 month:05 pages:17445-17474 https://doi.org/10.1007/s11227-023-05339-4 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT AR 79 2023 15 08 05 17445-17474 |
spelling |
10.1007/s11227-023-05339-4 doi (DE-627)OLC2145310800 (DE-He213)s11227-023-05339-4-p DE-627 ger DE-627 rakwb eng 004 620 VZ Tang, Minghua verfasserin aut The position-based compression techniques for DNN model 2023 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique. Deep neural networks Deep neural network accelerator Weights compression Russo, Enrico aut Palesi, Maurizio aut Enthalten in The journal of supercomputing Springer US, 1987 79(2023), 15 vom: 08. Mai, Seite 17445-17474 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:79 year:2023 number:15 day:08 month:05 pages:17445-17474 https://doi.org/10.1007/s11227-023-05339-4 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT AR 79 2023 15 08 05 17445-17474 |
allfields_unstemmed |
10.1007/s11227-023-05339-4 doi (DE-627)OLC2145310800 (DE-He213)s11227-023-05339-4-p DE-627 ger DE-627 rakwb eng 004 620 VZ Tang, Minghua verfasserin aut The position-based compression techniques for DNN model 2023 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique. Deep neural networks Deep neural network accelerator Weights compression Russo, Enrico aut Palesi, Maurizio aut Enthalten in The journal of supercomputing Springer US, 1987 79(2023), 15 vom: 08. Mai, Seite 17445-17474 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:79 year:2023 number:15 day:08 month:05 pages:17445-17474 https://doi.org/10.1007/s11227-023-05339-4 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT AR 79 2023 15 08 05 17445-17474 |
allfieldsGer |
10.1007/s11227-023-05339-4 doi (DE-627)OLC2145310800 (DE-He213)s11227-023-05339-4-p DE-627 ger DE-627 rakwb eng 004 620 VZ Tang, Minghua verfasserin aut The position-based compression techniques for DNN model 2023 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique. Deep neural networks Deep neural network accelerator Weights compression Russo, Enrico aut Palesi, Maurizio aut Enthalten in The journal of supercomputing Springer US, 1987 79(2023), 15 vom: 08. Mai, Seite 17445-17474 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:79 year:2023 number:15 day:08 month:05 pages:17445-17474 https://doi.org/10.1007/s11227-023-05339-4 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT AR 79 2023 15 08 05 17445-17474 |
allfieldsSound |
10.1007/s11227-023-05339-4 doi (DE-627)OLC2145310800 (DE-He213)s11227-023-05339-4-p DE-627 ger DE-627 rakwb eng 004 620 VZ Tang, Minghua verfasserin aut The position-based compression techniques for DNN model 2023 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique. Deep neural networks Deep neural network accelerator Weights compression Russo, Enrico aut Palesi, Maurizio aut Enthalten in The journal of supercomputing Springer US, 1987 79(2023), 15 vom: 08. Mai, Seite 17445-17474 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:79 year:2023 number:15 day:08 month:05 pages:17445-17474 https://doi.org/10.1007/s11227-023-05339-4 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT AR 79 2023 15 08 05 17445-17474 |
language |
English |
source |
Enthalten in The journal of supercomputing 79(2023), 15 vom: 08. Mai, Seite 17445-17474 volume:79 year:2023 number:15 day:08 month:05 pages:17445-17474 |
sourceStr |
Enthalten in The journal of supercomputing 79(2023), 15 vom: 08. Mai, Seite 17445-17474 volume:79 year:2023 number:15 day:08 month:05 pages:17445-17474 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Deep neural networks Deep neural network accelerator Weights compression |
dewey-raw |
004 |
isfreeaccess_bool |
false |
container_title |
The journal of supercomputing |
authorswithroles_txt_mv |
Tang, Minghua @@aut@@ Russo, Enrico @@aut@@ Palesi, Maurizio @@aut@@ |
publishDateDaySort_date |
2023-05-08T00:00:00Z |
hierarchy_top_id |
13046466X |
dewey-sort |
14 |
id |
OLC2145310800 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">OLC2145310800</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20240118103847.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">240118s2023 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s11227-023-05339-4</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2145310800</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s11227-023-05339-4-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="a">620</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Tang, Minghua</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">The position-based compression techniques for DNN model</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2023</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Deep neural networks</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Deep neural network accelerator</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Weights compression</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Russo, Enrico</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Palesi, Maurizio</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">The journal of supercomputing</subfield><subfield code="d">Springer US, 1987</subfield><subfield code="g">79(2023), 15 vom: 08. Mai, Seite 17445-17474</subfield><subfield code="w">(DE-627)13046466X</subfield><subfield code="w">(DE-600)740510-8</subfield><subfield code="w">(DE-576)018667775</subfield><subfield code="x">0920-8542</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:79</subfield><subfield code="g">year:2023</subfield><subfield code="g">number:15</subfield><subfield code="g">day:08</subfield><subfield code="g">month:05</subfield><subfield code="g">pages:17445-17474</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s11227-023-05339-4</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">79</subfield><subfield code="j">2023</subfield><subfield code="e">15</subfield><subfield code="b">08</subfield><subfield code="c">05</subfield><subfield code="h">17445-17474</subfield></datafield></record></collection>
|
author |
Tang, Minghua |
spellingShingle |
Tang, Minghua ddc 004 misc Deep neural networks misc Deep neural network accelerator misc Weights compression The position-based compression techniques for DNN model |
authorStr |
Tang, Minghua |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)13046466X |
format |
Article |
dewey-ones |
004 - Data processing & computer science 620 - Engineering & allied operations |
delete_txt_mv |
keep |
author_role |
aut aut aut |
collection |
OLC |
remote_str |
false |
illustrated |
Not Illustrated |
issn |
0920-8542 |
topic_title |
004 620 VZ The position-based compression techniques for DNN model Deep neural networks Deep neural network accelerator Weights compression |
topic |
ddc 004 misc Deep neural networks misc Deep neural network accelerator misc Weights compression |
topic_unstemmed |
ddc 004 misc Deep neural networks misc Deep neural network accelerator misc Weights compression |
topic_browse |
ddc 004 misc Deep neural networks misc Deep neural network accelerator misc Weights compression |
format_facet |
Aufsätze Gedruckte Aufsätze |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
nc |
hierarchy_parent_title |
The journal of supercomputing |
hierarchy_parent_id |
13046466X |
dewey-tens |
000 - Computer science, knowledge & systems 620 - Engineering |
hierarchy_top_title |
The journal of supercomputing |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 |
title |
The position-based compression techniques for DNN model |
ctrlnum |
(DE-627)OLC2145310800 (DE-He213)s11227-023-05339-4-p |
title_full |
The position-based compression techniques for DNN model |
author_sort |
Tang, Minghua |
journal |
The journal of supercomputing |
journalStr |
The journal of supercomputing |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
000 - Computer science, information & general works 600 - Technology |
recordtype |
marc |
publishDateSort |
2023 |
contenttype_str_mv |
txt |
container_start_page |
17445 |
author_browse |
Tang, Minghua Russo, Enrico Palesi, Maurizio |
container_volume |
79 |
class |
004 620 VZ |
format_se |
Aufsätze |
author-letter |
Tang, Minghua |
doi_str_mv |
10.1007/s11227-023-05339-4 |
dewey-full |
004 620 |
title_sort |
the position-based compression techniques for dnn model |
title_auth |
The position-based compression techniques for DNN model |
abstract |
Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. |
abstractGer |
Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. |
abstract_unstemmed |
Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT |
container_issue |
15 |
title_short |
The position-based compression techniques for DNN model |
url |
https://doi.org/10.1007/s11227-023-05339-4 |
remote_bool |
false |
author2 |
Russo, Enrico Palesi, Maurizio |
author2Str |
Russo, Enrico Palesi, Maurizio |
ppnlink |
13046466X |
mediatype_str_mv |
n |
isOA_txt |
false |
hochschulschrift_bool |
false |
doi_str |
10.1007/s11227-023-05339-4 |
up_date |
2024-07-04T02:42:31.678Z |
_version_ |
1803614632473001984 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">OLC2145310800</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20240118103847.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">240118s2023 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s11227-023-05339-4</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2145310800</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s11227-023-05339-4-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="a">620</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Tang, Minghua</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">The position-based compression techniques for DNN model</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2023</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Deep neural networks</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Deep neural network accelerator</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Weights compression</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Russo, Enrico</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Palesi, Maurizio</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">The journal of supercomputing</subfield><subfield code="d">Springer US, 1987</subfield><subfield code="g">79(2023), 15 vom: 08. Mai, Seite 17445-17474</subfield><subfield code="w">(DE-627)13046466X</subfield><subfield code="w">(DE-600)740510-8</subfield><subfield code="w">(DE-576)018667775</subfield><subfield code="x">0920-8542</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:79</subfield><subfield code="g">year:2023</subfield><subfield code="g">number:15</subfield><subfield code="g">day:08</subfield><subfield code="g">month:05</subfield><subfield code="g">pages:17445-17474</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s11227-023-05339-4</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">79</subfield><subfield code="j">2023</subfield><subfield code="e">15</subfield><subfield code="b">08</subfield><subfield code="c">05</subfield><subfield code="h">17445-17474</subfield></datafield></record></collection>
|
score |
7.401597 |