The position-based compression techniques for DNN model

Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to co...
Ausführliche Beschreibung

Gespeichert in:

Autor*in:	Tang, Minghua [verfasserIn] Russo, Enrico Palesi, Maurizio

Format:	Artikel
Sprache:	Englisch

Erschienen:	2023

Schlagwörter:	Deep neural networks Deep neural network accelerator Weights compression

Anmerkung:	© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Übergeordnetes Werk:	Enthalten in: The journal of supercomputing - Springer US, 1987, 79(2023), 15 vom: 08. Mai, Seite 17445-17474
Übergeordnetes Werk:	volume:79 ; year:2023 ; number:15 ; day:08 ; month:05 ; pages:17445-17474

Links:	Volltext

DOI / URN:	10.1007/s11227-023-05339-4

Katalog-ID:	OLC2145310800

Internformat


LEADER	01000naa a22002652 4500
001	OLC2145310800
003	DE-627
005	20240118103847.0
007	tu
008	240118s2023 xx \|\|\|\|\| 00\| \|\|eng c
024	7		\|a 10.1007/s11227-023-05339-4 \|2 doi
035			\|a (DE-627)OLC2145310800
035			\|a (DE-He213)s11227-023-05339-4-p
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
082	0	4	\|a 004 \|a 620 \|q VZ
100	1		\|a Tang, Minghua \|e verfasserin \|4 aut
245	1	0	\|a The position-based compression techniques for DNN model
264		1	\|c 2023
336			\|a Text \|b txt \|2 rdacontent
337			\|a ohne Hilfsmittel zu benutzen \|b n \|2 rdamedia
338			\|a Band \|b nc \|2 rdacarrier
500			\|a © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
520			\|a Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique.
650		4	\|a Deep neural networks
650		4	\|a Deep neural network accelerator
650		4	\|a Weights compression
700	1		\|a Russo, Enrico \|4 aut
700	1		\|a Palesi, Maurizio \|4 aut
773	0	8	\|i Enthalten in \|t The journal of supercomputing \|d Springer US, 1987 \|g 79(2023), 15 vom: 08. Mai, Seite 17445-17474 \|w (DE-627)13046466X \|w (DE-600)740510-8 \|w (DE-576)018667775 \|x 0920-8542 \|7 nnns
773	1	8	\|g volume:79 \|g year:2023 \|g number:15 \|g day:08 \|g month:05 \|g pages:17445-17474
856	4	1	\|u https://doi.org/10.1007/s11227-023-05339-4 \|z lizenzpflichtig \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_OLC
912			\|a SSG-OLC-TEC
912			\|a SSG-OLC-MAT
951			\|a AR
952			\|d 79 \|j 2023 \|e 15 \|b 08 \|c 05 \|h 17445-17474

Indexfelder

author_variant	m t mt e r er m p mp
matchkey_str	article:09208542:2023----::hpstobsdopesotcnqe
hierarchy_sort_str	2023
publishDate	2023
allfields	10.1007/s11227-023-05339-4 doi (DE-627)OLC2145310800 (DE-He213)s11227-023-05339-4-p DE-627 ger DE-627 rakwb eng 004 620 VZ Tang, Minghua verfasserin aut The position-based compression techniques for DNN model 2023 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique. Deep neural networks Deep neural network accelerator Weights compression Russo, Enrico aut Palesi, Maurizio aut Enthalten in The journal of supercomputing Springer US, 1987 79(2023), 15 vom: 08. Mai, Seite 17445-17474 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:79 year:2023 number:15 day:08 month:05 pages:17445-17474 https://doi.org/10.1007/s11227-023-05339-4 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT AR 79 2023 15 08 05 17445-17474
spelling	10.1007/s11227-023-05339-4 doi (DE-627)OLC2145310800 (DE-He213)s11227-023-05339-4-p DE-627 ger DE-627 rakwb eng 004 620 VZ Tang, Minghua verfasserin aut The position-based compression techniques for DNN model 2023 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique. Deep neural networks Deep neural network accelerator Weights compression Russo, Enrico aut Palesi, Maurizio aut Enthalten in The journal of supercomputing Springer US, 1987 79(2023), 15 vom: 08. Mai, Seite 17445-17474 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:79 year:2023 number:15 day:08 month:05 pages:17445-17474 https://doi.org/10.1007/s11227-023-05339-4 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT AR 79 2023 15 08 05 17445-17474
allfields_unstemmed	10.1007/s11227-023-05339-4 doi (DE-627)OLC2145310800 (DE-He213)s11227-023-05339-4-p DE-627 ger DE-627 rakwb eng 004 620 VZ Tang, Minghua verfasserin aut The position-based compression techniques for DNN model 2023 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique. Deep neural networks Deep neural network accelerator Weights compression Russo, Enrico aut Palesi, Maurizio aut Enthalten in The journal of supercomputing Springer US, 1987 79(2023), 15 vom: 08. Mai, Seite 17445-17474 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:79 year:2023 number:15 day:08 month:05 pages:17445-17474 https://doi.org/10.1007/s11227-023-05339-4 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT AR 79 2023 15 08 05 17445-17474
allfieldsGer	10.1007/s11227-023-05339-4 doi (DE-627)OLC2145310800 (DE-He213)s11227-023-05339-4-p DE-627 ger DE-627 rakwb eng 004 620 VZ Tang, Minghua verfasserin aut The position-based compression techniques for DNN model 2023 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique. Deep neural networks Deep neural network accelerator Weights compression Russo, Enrico aut Palesi, Maurizio aut Enthalten in The journal of supercomputing Springer US, 1987 79(2023), 15 vom: 08. Mai, Seite 17445-17474 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:79 year:2023 number:15 day:08 month:05 pages:17445-17474 https://doi.org/10.1007/s11227-023-05339-4 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT AR 79 2023 15 08 05 17445-17474
allfieldsSound	10.1007/s11227-023-05339-4 doi (DE-627)OLC2145310800 (DE-He213)s11227-023-05339-4-p DE-627 ger DE-627 rakwb eng 004 620 VZ Tang, Minghua verfasserin aut The position-based compression techniques for DNN model 2023 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique. Deep neural networks Deep neural network accelerator Weights compression Russo, Enrico aut Palesi, Maurizio aut Enthalten in The journal of supercomputing Springer US, 1987 79(2023), 15 vom: 08. Mai, Seite 17445-17474 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:79 year:2023 number:15 day:08 month:05 pages:17445-17474 https://doi.org/10.1007/s11227-023-05339-4 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT AR 79 2023 15 08 05 17445-17474
language	English
source	Enthalten in The journal of supercomputing 79(2023), 15 vom: 08. Mai, Seite 17445-17474 volume:79 year:2023 number:15 day:08 month:05 pages:17445-17474
sourceStr	Enthalten in The journal of supercomputing 79(2023), 15 vom: 08. Mai, Seite 17445-17474 volume:79 year:2023 number:15 day:08 month:05 pages:17445-17474
format_phy_str_mv	Article
institution	findex.gbv.de
topic_facet	Deep neural networks Deep neural network accelerator Weights compression
dewey-raw	004
isfreeaccess_bool	false
container_title	The journal of supercomputing
authorswithroles_txt_mv	Tang, Minghua @@aut@@ Russo, Enrico @@aut@@ Palesi, Maurizio @@aut@@
publishDateDaySort_date	2023-05-08T00:00:00Z
hierarchy_top_id	13046466X
dewey-sort	14
id	OLC2145310800
language_de	englisch
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">OLC2145310800</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20240118103847.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">240118s2023 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s11227-023-05339-4</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2145310800</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s11227-023-05339-4-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="a">620</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Tang, Minghua</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">The position-based compression techniques for DNN model</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2023</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Deep neural networks</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Deep neural network accelerator</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Weights compression</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Russo, Enrico</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Palesi, Maurizio</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">The journal of supercomputing</subfield><subfield code="d">Springer US, 1987</subfield><subfield code="g">79(2023), 15 vom: 08. Mai, Seite 17445-17474</subfield><subfield code="w">(DE-627)13046466X</subfield><subfield code="w">(DE-600)740510-8</subfield><subfield code="w">(DE-576)018667775</subfield><subfield code="x">0920-8542</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:79</subfield><subfield code="g">year:2023</subfield><subfield code="g">number:15</subfield><subfield code="g">day:08</subfield><subfield code="g">month:05</subfield><subfield code="g">pages:17445-17474</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s11227-023-05339-4</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">79</subfield><subfield code="j">2023</subfield><subfield code="e">15</subfield><subfield code="b">08</subfield><subfield code="c">05</subfield><subfield code="h">17445-17474</subfield></datafield></record></collection>
author	Tang, Minghua
spellingShingle	Tang, Minghua ddc 004 misc Deep neural networks misc Deep neural network accelerator misc Weights compression The position-based compression techniques for DNN model
authorStr	Tang, Minghua
ppnlink_with_tag_str_mv	@@773@@(DE-627)13046466X
format	Article
dewey-ones	004 - Data processing & computer science 620 - Engineering & allied operations
delete_txt_mv	keep
author_role	aut aut aut
collection	OLC
remote_str	false
illustrated	Not Illustrated
issn	0920-8542
topic_title	004 620 VZ The position-based compression techniques for DNN model Deep neural networks Deep neural network accelerator Weights compression
topic	ddc 004 misc Deep neural networks misc Deep neural network accelerator misc Weights compression
topic_unstemmed	ddc 004 misc Deep neural networks misc Deep neural network accelerator misc Weights compression
topic_browse	ddc 004 misc Deep neural networks misc Deep neural network accelerator misc Weights compression
format_facet	Aufsätze Gedruckte Aufsätze
format_main_str_mv	Text Zeitschrift/Artikel
carriertype_str_mv	nc
hierarchy_parent_title	The journal of supercomputing
hierarchy_parent_id	13046466X
dewey-tens	000 - Computer science, knowledge & systems 620 - Engineering
hierarchy_top_title	The journal of supercomputing
isfreeaccess_txt	false
familylinks_str_mv	(DE-627)13046466X (DE-600)740510-8 (DE-576)018667775
title	The position-based compression techniques for DNN model
ctrlnum	(DE-627)OLC2145310800 (DE-He213)s11227-023-05339-4-p
title_full	The position-based compression techniques for DNN model
author_sort	Tang, Minghua
journal	The journal of supercomputing
journalStr	The journal of supercomputing
lang_code	eng
isOA_bool	false
dewey-hundreds	000 - Computer science, information & general works 600 - Technology
recordtype	marc
publishDateSort	2023
contenttype_str_mv	txt
container_start_page	17445
author_browse	Tang, Minghua Russo, Enrico Palesi, Maurizio
container_volume	79
class	004 620 VZ
format_se	Aufsätze
author-letter	Tang, Minghua
doi_str_mv	10.1007/s11227-023-05339-4
dewey-full	004 620
title_sort	the position-based compression techniques for dnn model
title_auth	The position-based compression techniques for DNN model
abstract	Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
abstractGer	Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
abstract_unstemmed	Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
collection_details	GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT
container_issue	15
title_short	The position-based compression techniques for DNN model
url	https://doi.org/10.1007/s11227-023-05339-4
remote_bool	false
author2	Russo, Enrico Palesi, Maurizio
author2Str	Russo, Enrico Palesi, Maurizio
ppnlink	13046466X
mediatype_str_mv	n
isOA_txt	false
hochschulschrift_bool	false
doi_str	10.1007/s11227-023-05339-4
up_date	2024-07-04T02:42:31.678Z
_version_	1803614632473001984
fullrecord_marcxml	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">OLC2145310800</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20240118103847.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">240118s2023 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s11227-023-05339-4</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2145310800</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s11227-023-05339-4-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="a">620</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Tang, Minghua</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">The position-based compression techniques for DNN model</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2023</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. corrected publication 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract In deep neural network (DNN) accelerators, it is expensive to transfer model parameters from the main memory to the processing elements. Data movement accounts for a large number of the inference latency and energy consumption. In this paper, we present three position-based techniques to compress the DNN model parameters. The techniques could lead to significant energy and performance improvement. The three presented compression techniques are lossless. The first technique takes into consideration the regularly repeat property of the DNN weights to compress them. The second technique saves the relative distance between weights instead of the weights to compress the model. The third technique applies Huffman coding on the relative distance based on the second technique. The proposed techniques are assessed on several DNNs. The results show that, the first technique could decrease 38% of latency and 36% energy, respectively. The second technique could decrease 41% of latency and 39% energy, respectively. The third technique could decrease 45% of latency and 43% energy, respectively. Applying Huffman code could achieve additional 7% reduction in both latency and energy based on the second technique.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Deep neural networks</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Deep neural network accelerator</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Weights compression</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Russo, Enrico</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Palesi, Maurizio</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">The journal of supercomputing</subfield><subfield code="d">Springer US, 1987</subfield><subfield code="g">79(2023), 15 vom: 08. Mai, Seite 17445-17474</subfield><subfield code="w">(DE-627)13046466X</subfield><subfield code="w">(DE-600)740510-8</subfield><subfield code="w">(DE-576)018667775</subfield><subfield code="x">0920-8542</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:79</subfield><subfield code="g">year:2023</subfield><subfield code="g">number:15</subfield><subfield code="g">day:08</subfield><subfield code="g">month:05</subfield><subfield code="g">pages:17445-17474</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s11227-023-05339-4</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">79</subfield><subfield code="j">2023</subfield><subfield code="e">15</subfield><subfield code="b">08</subfield><subfield code="c">05</subfield><subfield code="h">17445-17474</subfield></datafield></record></collection>
score	7.401597

Nicht das Richtige dabei?

Schreiben Sie uns!

The position-based compression techniques for DNN model

Nicht das Richtige dabei?

Zugang & Verfügbarkeit

Vorhandene Bände

Nicht das Richtige dabei?