Optimising lossless stages in a GPU-based MPEG encoder

Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even m...
Ausführliche Beschreibung

Gespeichert in:

Autor*in:	Montero, Pablo [verfasserIn] Gulías, Víctor M. Taibo, Javier Rivas, Samuel

Format:	Artikel
Sprache:	Englisch

Erschienen:	2012

Schlagwörter:	Zigzag scan Huffman coding GPU Video compression MPEG Entropy coding

Anmerkung:	© Springer Science+Business Media, LLC 2012

Übergeordnetes Werk:	Enthalten in: Multimedia tools and applications - Springer US, 1995, 65(2012), 3 vom: 07. März, Seite 495-520
Übergeordnetes Werk:	volume:65 ; year:2012 ; number:3 ; day:07 ; month:03 ; pages:495-520

Links:	Volltext

DOI / URN:	10.1007/s11042-012-1053-9

Katalog-ID:	OLC2035007747

Internformat


LEADER	01000caa a22002652 4500
001	OLC2035007747
003	DE-627
005	20230503192619.0
007	tu
008	200819s2012 xx \|\|\|\|\| 00\| \|\|eng c
024	7		\|a 10.1007/s11042-012-1053-9 \|2 doi
035			\|a (DE-627)OLC2035007747
035			\|a (DE-He213)s11042-012-1053-9-p
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
082	0	4	\|a 070 \|a 004 \|q VZ
100	1		\|a Montero, Pablo \|e verfasserin \|4 aut
245	1	0	\|a Optimising lossless stages in a GPU-based MPEG encoder
264		1	\|c 2012
336			\|a Text \|b txt \|2 rdacontent
337			\|a ohne Hilfsmittel zu benutzen \|b n \|2 rdamedia
338			\|a Band \|b nc \|2 rdacarrier
500			\|a © Springer Science+Business Media, LLC 2012
520			\|a Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even more appealing, since the images to encode are already in the GPU, eliminating the costs of transferring raw video from the CPU to the GPU. However, after a raw frame has been transformed and quantized by the GPU, the resulting coefficients must be reordered, entropy encoded and framed into the resulting MPEG bitstream. These last steps are essentially sequential and their straightforward GPU implementation is inefficient compared to CPU-based implementations. We present different approaches to implement part of these steps in GPU, aiming for a better usage of the memory bus, compensating the suboptimal use of the GPU with the gains in transfer time. We analyze three approaches to perform the zigzag scan and Huffman coding combining GPU and CPU, and two approaches to assemble the results to build the actual output bitstream both in GPU and CPU memory. Our experiments show that optimising the amount of data transferred from GPU to CPU implementing the last sequential compression steps in the GPU, and using a parallel fast scan implementation of the zigzag scanning improve the overall performance of the system. Savings in transfer time outweigh the extra cost incurred in the GPU.
650		4	\|a Zigzag scan
650		4	\|a Huffman coding
650		4	\|a GPU
650		4	\|a Video compression
650		4	\|a MPEG
650		4	\|a Entropy coding
700	1		\|a Gulías, Víctor M. \|4 aut
700	1		\|a Taibo, Javier \|4 aut
700	1		\|a Rivas, Samuel \|4 aut
773	0	8	\|i Enthalten in \|t Multimedia tools and applications \|d Springer US, 1995 \|g 65(2012), 3 vom: 07. März, Seite 495-520 \|w (DE-627)189064145 \|w (DE-600)1287642-2 \|w (DE-576)052842126 \|x 1380-7501 \|7 nnns
773	1	8	\|g volume:65 \|g year:2012 \|g number:3 \|g day:07 \|g month:03 \|g pages:495-520
856	4	1	\|u https://doi.org/10.1007/s11042-012-1053-9 \|z lizenzpflichtig \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_OLC
912			\|a SSG-OLC-MAT
912			\|a SSG-OLC-BUB
912			\|a SSG-OLC-MKW
912			\|a GBV_ILN_70
951			\|a AR
952			\|d 65 \|j 2012 \|e 3 \|b 07 \|c 03 \|h 495-520

Indexfelder

author_variant	p m pm v m g vm vmg j t jt s r sr
matchkey_str	article:13807501:2012----::piiigosestgsngua
hierarchy_sort_str	2012
publishDate	2012
allfields	10.1007/s11042-012-1053-9 doi (DE-627)OLC2035007747 (DE-He213)s11042-012-1053-9-p DE-627 ger DE-627 rakwb eng 070 004 VZ Montero, Pablo verfasserin aut Optimising lossless stages in a GPU-based MPEG encoder 2012 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2012 Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even more appealing, since the images to encode are already in the GPU, eliminating the costs of transferring raw video from the CPU to the GPU. However, after a raw frame has been transformed and quantized by the GPU, the resulting coefficients must be reordered, entropy encoded and framed into the resulting MPEG bitstream. These last steps are essentially sequential and their straightforward GPU implementation is inefficient compared to CPU-based implementations. We present different approaches to implement part of these steps in GPU, aiming for a better usage of the memory bus, compensating the suboptimal use of the GPU with the gains in transfer time. We analyze three approaches to perform the zigzag scan and Huffman coding combining GPU and CPU, and two approaches to assemble the results to build the actual output bitstream both in GPU and CPU memory. Our experiments show that optimising the amount of data transferred from GPU to CPU implementing the last sequential compression steps in the GPU, and using a parallel fast scan implementation of the zigzag scanning improve the overall performance of the system. Savings in transfer time outweigh the extra cost incurred in the GPU. Zigzag scan Huffman coding GPU Video compression MPEG Entropy coding Gulías, Víctor M. aut Taibo, Javier aut Rivas, Samuel aut Enthalten in Multimedia tools and applications Springer US, 1995 65(2012), 3 vom: 07. März, Seite 495-520 (DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 1380-7501 nnns volume:65 year:2012 number:3 day:07 month:03 pages:495-520 https://doi.org/10.1007/s11042-012-1053-9 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW GBV_ILN_70 AR 65 2012 3 07 03 495-520
spelling	10.1007/s11042-012-1053-9 doi (DE-627)OLC2035007747 (DE-He213)s11042-012-1053-9-p DE-627 ger DE-627 rakwb eng 070 004 VZ Montero, Pablo verfasserin aut Optimising lossless stages in a GPU-based MPEG encoder 2012 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2012 Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even more appealing, since the images to encode are already in the GPU, eliminating the costs of transferring raw video from the CPU to the GPU. However, after a raw frame has been transformed and quantized by the GPU, the resulting coefficients must be reordered, entropy encoded and framed into the resulting MPEG bitstream. These last steps are essentially sequential and their straightforward GPU implementation is inefficient compared to CPU-based implementations. We present different approaches to implement part of these steps in GPU, aiming for a better usage of the memory bus, compensating the suboptimal use of the GPU with the gains in transfer time. We analyze three approaches to perform the zigzag scan and Huffman coding combining GPU and CPU, and two approaches to assemble the results to build the actual output bitstream both in GPU and CPU memory. Our experiments show that optimising the amount of data transferred from GPU to CPU implementing the last sequential compression steps in the GPU, and using a parallel fast scan implementation of the zigzag scanning improve the overall performance of the system. Savings in transfer time outweigh the extra cost incurred in the GPU. Zigzag scan Huffman coding GPU Video compression MPEG Entropy coding Gulías, Víctor M. aut Taibo, Javier aut Rivas, Samuel aut Enthalten in Multimedia tools and applications Springer US, 1995 65(2012), 3 vom: 07. März, Seite 495-520 (DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 1380-7501 nnns volume:65 year:2012 number:3 day:07 month:03 pages:495-520 https://doi.org/10.1007/s11042-012-1053-9 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW GBV_ILN_70 AR 65 2012 3 07 03 495-520
allfields_unstemmed	10.1007/s11042-012-1053-9 doi (DE-627)OLC2035007747 (DE-He213)s11042-012-1053-9-p DE-627 ger DE-627 rakwb eng 070 004 VZ Montero, Pablo verfasserin aut Optimising lossless stages in a GPU-based MPEG encoder 2012 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2012 Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even more appealing, since the images to encode are already in the GPU, eliminating the costs of transferring raw video from the CPU to the GPU. However, after a raw frame has been transformed and quantized by the GPU, the resulting coefficients must be reordered, entropy encoded and framed into the resulting MPEG bitstream. These last steps are essentially sequential and their straightforward GPU implementation is inefficient compared to CPU-based implementations. We present different approaches to implement part of these steps in GPU, aiming for a better usage of the memory bus, compensating the suboptimal use of the GPU with the gains in transfer time. We analyze three approaches to perform the zigzag scan and Huffman coding combining GPU and CPU, and two approaches to assemble the results to build the actual output bitstream both in GPU and CPU memory. Our experiments show that optimising the amount of data transferred from GPU to CPU implementing the last sequential compression steps in the GPU, and using a parallel fast scan implementation of the zigzag scanning improve the overall performance of the system. Savings in transfer time outweigh the extra cost incurred in the GPU. Zigzag scan Huffman coding GPU Video compression MPEG Entropy coding Gulías, Víctor M. aut Taibo, Javier aut Rivas, Samuel aut Enthalten in Multimedia tools and applications Springer US, 1995 65(2012), 3 vom: 07. März, Seite 495-520 (DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 1380-7501 nnns volume:65 year:2012 number:3 day:07 month:03 pages:495-520 https://doi.org/10.1007/s11042-012-1053-9 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW GBV_ILN_70 AR 65 2012 3 07 03 495-520
allfieldsGer	10.1007/s11042-012-1053-9 doi (DE-627)OLC2035007747 (DE-He213)s11042-012-1053-9-p DE-627 ger DE-627 rakwb eng 070 004 VZ Montero, Pablo verfasserin aut Optimising lossless stages in a GPU-based MPEG encoder 2012 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2012 Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even more appealing, since the images to encode are already in the GPU, eliminating the costs of transferring raw video from the CPU to the GPU. However, after a raw frame has been transformed and quantized by the GPU, the resulting coefficients must be reordered, entropy encoded and framed into the resulting MPEG bitstream. These last steps are essentially sequential and their straightforward GPU implementation is inefficient compared to CPU-based implementations. We present different approaches to implement part of these steps in GPU, aiming for a better usage of the memory bus, compensating the suboptimal use of the GPU with the gains in transfer time. We analyze three approaches to perform the zigzag scan and Huffman coding combining GPU and CPU, and two approaches to assemble the results to build the actual output bitstream both in GPU and CPU memory. Our experiments show that optimising the amount of data transferred from GPU to CPU implementing the last sequential compression steps in the GPU, and using a parallel fast scan implementation of the zigzag scanning improve the overall performance of the system. Savings in transfer time outweigh the extra cost incurred in the GPU. Zigzag scan Huffman coding GPU Video compression MPEG Entropy coding Gulías, Víctor M. aut Taibo, Javier aut Rivas, Samuel aut Enthalten in Multimedia tools and applications Springer US, 1995 65(2012), 3 vom: 07. März, Seite 495-520 (DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 1380-7501 nnns volume:65 year:2012 number:3 day:07 month:03 pages:495-520 https://doi.org/10.1007/s11042-012-1053-9 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW GBV_ILN_70 AR 65 2012 3 07 03 495-520
allfieldsSound	10.1007/s11042-012-1053-9 doi (DE-627)OLC2035007747 (DE-He213)s11042-012-1053-9-p DE-627 ger DE-627 rakwb eng 070 004 VZ Montero, Pablo verfasserin aut Optimising lossless stages in a GPU-based MPEG encoder 2012 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2012 Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even more appealing, since the images to encode are already in the GPU, eliminating the costs of transferring raw video from the CPU to the GPU. However, after a raw frame has been transformed and quantized by the GPU, the resulting coefficients must be reordered, entropy encoded and framed into the resulting MPEG bitstream. These last steps are essentially sequential and their straightforward GPU implementation is inefficient compared to CPU-based implementations. We present different approaches to implement part of these steps in GPU, aiming for a better usage of the memory bus, compensating the suboptimal use of the GPU with the gains in transfer time. We analyze three approaches to perform the zigzag scan and Huffman coding combining GPU and CPU, and two approaches to assemble the results to build the actual output bitstream both in GPU and CPU memory. Our experiments show that optimising the amount of data transferred from GPU to CPU implementing the last sequential compression steps in the GPU, and using a parallel fast scan implementation of the zigzag scanning improve the overall performance of the system. Savings in transfer time outweigh the extra cost incurred in the GPU. Zigzag scan Huffman coding GPU Video compression MPEG Entropy coding Gulías, Víctor M. aut Taibo, Javier aut Rivas, Samuel aut Enthalten in Multimedia tools and applications Springer US, 1995 65(2012), 3 vom: 07. März, Seite 495-520 (DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 1380-7501 nnns volume:65 year:2012 number:3 day:07 month:03 pages:495-520 https://doi.org/10.1007/s11042-012-1053-9 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW GBV_ILN_70 AR 65 2012 3 07 03 495-520
language	English
source	Enthalten in Multimedia tools and applications 65(2012), 3 vom: 07. März, Seite 495-520 volume:65 year:2012 number:3 day:07 month:03 pages:495-520
sourceStr	Enthalten in Multimedia tools and applications 65(2012), 3 vom: 07. März, Seite 495-520 volume:65 year:2012 number:3 day:07 month:03 pages:495-520
format_phy_str_mv	Article
institution	findex.gbv.de
topic_facet	Zigzag scan Huffman coding GPU Video compression MPEG Entropy coding
dewey-raw	070
isfreeaccess_bool	false
container_title	Multimedia tools and applications
authorswithroles_txt_mv	Montero, Pablo @@aut@@ Gulías, Víctor M. @@aut@@ Taibo, Javier @@aut@@ Rivas, Samuel @@aut@@
publishDateDaySort_date	2012-03-07T00:00:00Z
hierarchy_top_id	189064145
dewey-sort	270
id	OLC2035007747
language_de	englisch
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2035007747</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503192619.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2012 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s11042-012-1053-9</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2035007747</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s11042-012-1053-9-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">070</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Montero, Pablo</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Optimising lossless stages in a GPU-based MPEG encoder</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2012</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Science+Business Media, LLC 2012</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even more appealing, since the images to encode are already in the GPU, eliminating the costs of transferring raw video from the CPU to the GPU. However, after a raw frame has been transformed and quantized by the GPU, the resulting coefficients must be reordered, entropy encoded and framed into the resulting MPEG bitstream. These last steps are essentially sequential and their straightforward GPU implementation is inefficient compared to CPU-based implementations. We present different approaches to implement part of these steps in GPU, aiming for a better usage of the memory bus, compensating the suboptimal use of the GPU with the gains in transfer time. We analyze three approaches to perform the zigzag scan and Huffman coding combining GPU and CPU, and two approaches to assemble the results to build the actual output bitstream both in GPU and CPU memory. Our experiments show that optimising the amount of data transferred from GPU to CPU implementing the last sequential compression steps in the GPU, and using a parallel fast scan implementation of the zigzag scanning improve the overall performance of the system. Savings in transfer time outweigh the extra cost incurred in the GPU.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Zigzag scan</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Huffman coding</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">GPU</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Video compression</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">MPEG</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Entropy coding</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Gulías, Víctor M.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Taibo, Javier</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Rivas, Samuel</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Multimedia tools and applications</subfield><subfield code="d">Springer US, 1995</subfield><subfield code="g">65(2012), 3 vom: 07. März, Seite 495-520</subfield><subfield code="w">(DE-627)189064145</subfield><subfield code="w">(DE-600)1287642-2</subfield><subfield code="w">(DE-576)052842126</subfield><subfield code="x">1380-7501</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:65</subfield><subfield code="g">year:2012</subfield><subfield code="g">number:3</subfield><subfield code="g">day:07</subfield><subfield code="g">month:03</subfield><subfield code="g">pages:495-520</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s11042-012-1053-9</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MKW</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">65</subfield><subfield code="j">2012</subfield><subfield code="e">3</subfield><subfield code="b">07</subfield><subfield code="c">03</subfield><subfield code="h">495-520</subfield></datafield></record></collection>
author	Montero, Pablo
spellingShingle	Montero, Pablo ddc 070 misc Zigzag scan misc Huffman coding misc GPU misc Video compression misc MPEG misc Entropy coding Optimising lossless stages in a GPU-based MPEG encoder
authorStr	Montero, Pablo
ppnlink_with_tag_str_mv	@@773@@(DE-627)189064145
format	Article
dewey-ones	070 - News media, journalism & publishing 004 - Data processing & computer science
delete_txt_mv	keep
author_role	aut aut aut aut
collection	OLC
remote_str	false
illustrated	Not Illustrated
issn	1380-7501
topic_title	070 004 VZ Optimising lossless stages in a GPU-based MPEG encoder Zigzag scan Huffman coding GPU Video compression MPEG Entropy coding
topic	ddc 070 misc Zigzag scan misc Huffman coding misc GPU misc Video compression misc MPEG misc Entropy coding
topic_unstemmed	ddc 070 misc Zigzag scan misc Huffman coding misc GPU misc Video compression misc MPEG misc Entropy coding
topic_browse	ddc 070 misc Zigzag scan misc Huffman coding misc GPU misc Video compression misc MPEG misc Entropy coding
format_facet	Aufsätze Gedruckte Aufsätze
format_main_str_mv	Text Zeitschrift/Artikel
carriertype_str_mv	nc
hierarchy_parent_title	Multimedia tools and applications
hierarchy_parent_id	189064145
dewey-tens	070 - News media, journalism & publishing 000 - Computer science, knowledge & systems
hierarchy_top_title	Multimedia tools and applications
isfreeaccess_txt	false
familylinks_str_mv	(DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126
title	Optimising lossless stages in a GPU-based MPEG encoder
ctrlnum	(DE-627)OLC2035007747 (DE-He213)s11042-012-1053-9-p
title_full	Optimising lossless stages in a GPU-based MPEG encoder
author_sort	Montero, Pablo
journal	Multimedia tools and applications
journalStr	Multimedia tools and applications
lang_code	eng
isOA_bool	false
dewey-hundreds	000 - Computer science, information & general works
recordtype	marc
publishDateSort	2012
contenttype_str_mv	txt
container_start_page	495
author_browse	Montero, Pablo Gulías, Víctor M. Taibo, Javier Rivas, Samuel
container_volume	65
class	070 004 VZ
format_se	Aufsätze
author-letter	Montero, Pablo
doi_str_mv	10.1007/s11042-012-1053-9
dewey-full	070 004
title_sort	optimising lossless stages in a gpu-based mpeg encoder
title_auth	Optimising lossless stages in a GPU-based MPEG encoder
abstract	Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even more appealing, since the images to encode are already in the GPU, eliminating the costs of transferring raw video from the CPU to the GPU. However, after a raw frame has been transformed and quantized by the GPU, the resulting coefficients must be reordered, entropy encoded and framed into the resulting MPEG bitstream. These last steps are essentially sequential and their straightforward GPU implementation is inefficient compared to CPU-based implementations. We present different approaches to implement part of these steps in GPU, aiming for a better usage of the memory bus, compensating the suboptimal use of the GPU with the gains in transfer time. We analyze three approaches to perform the zigzag scan and Huffman coding combining GPU and CPU, and two approaches to assemble the results to build the actual output bitstream both in GPU and CPU memory. Our experiments show that optimising the amount of data transferred from GPU to CPU implementing the last sequential compression steps in the GPU, and using a parallel fast scan implementation of the zigzag scanning improve the overall performance of the system. Savings in transfer time outweigh the extra cost incurred in the GPU. © Springer Science+Business Media, LLC 2012
abstractGer	Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even more appealing, since the images to encode are already in the GPU, eliminating the costs of transferring raw video from the CPU to the GPU. However, after a raw frame has been transformed and quantized by the GPU, the resulting coefficients must be reordered, entropy encoded and framed into the resulting MPEG bitstream. These last steps are essentially sequential and their straightforward GPU implementation is inefficient compared to CPU-based implementations. We present different approaches to implement part of these steps in GPU, aiming for a better usage of the memory bus, compensating the suboptimal use of the GPU with the gains in transfer time. We analyze three approaches to perform the zigzag scan and Huffman coding combining GPU and CPU, and two approaches to assemble the results to build the actual output bitstream both in GPU and CPU memory. Our experiments show that optimising the amount of data transferred from GPU to CPU implementing the last sequential compression steps in the GPU, and using a parallel fast scan implementation of the zigzag scanning improve the overall performance of the system. Savings in transfer time outweigh the extra cost incurred in the GPU. © Springer Science+Business Media, LLC 2012
abstract_unstemmed	Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even more appealing, since the images to encode are already in the GPU, eliminating the costs of transferring raw video from the CPU to the GPU. However, after a raw frame has been transformed and quantized by the GPU, the resulting coefficients must be reordered, entropy encoded and framed into the resulting MPEG bitstream. These last steps are essentially sequential and their straightforward GPU implementation is inefficient compared to CPU-based implementations. We present different approaches to implement part of these steps in GPU, aiming for a better usage of the memory bus, compensating the suboptimal use of the GPU with the gains in transfer time. We analyze three approaches to perform the zigzag scan and Huffman coding combining GPU and CPU, and two approaches to assemble the results to build the actual output bitstream both in GPU and CPU memory. Our experiments show that optimising the amount of data transferred from GPU to CPU implementing the last sequential compression steps in the GPU, and using a parallel fast scan implementation of the zigzag scanning improve the overall performance of the system. Savings in transfer time outweigh the extra cost incurred in the GPU. © Springer Science+Business Media, LLC 2012
collection_details	GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW GBV_ILN_70
container_issue	3
title_short	Optimising lossless stages in a GPU-based MPEG encoder
url	https://doi.org/10.1007/s11042-012-1053-9
remote_bool	false
author2	Gulías, Víctor M. Taibo, Javier Rivas, Samuel
author2Str	Gulías, Víctor M. Taibo, Javier Rivas, Samuel
ppnlink	189064145
mediatype_str_mv	n
isOA_txt	false
hochschulschrift_bool	false
doi_str	10.1007/s11042-012-1053-9
up_date	2024-07-03T23:24:39.635Z
_version_	1803602183730495488
fullrecord_marcxml	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2035007747</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503192619.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2012 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s11042-012-1053-9</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2035007747</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s11042-012-1053-9-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">070</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Montero, Pablo</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Optimising lossless stages in a GPU-based MPEG encoder</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2012</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Science+Business Media, LLC 2012</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even more appealing, since the images to encode are already in the GPU, eliminating the costs of transferring raw video from the CPU to the GPU. However, after a raw frame has been transformed and quantized by the GPU, the resulting coefficients must be reordered, entropy encoded and framed into the resulting MPEG bitstream. These last steps are essentially sequential and their straightforward GPU implementation is inefficient compared to CPU-based implementations. We present different approaches to implement part of these steps in GPU, aiming for a better usage of the memory bus, compensating the suboptimal use of the GPU with the gains in transfer time. We analyze three approaches to perform the zigzag scan and Huffman coding combining GPU and CPU, and two approaches to assemble the results to build the actual output bitstream both in GPU and CPU memory. Our experiments show that optimising the amount of data transferred from GPU to CPU implementing the last sequential compression steps in the GPU, and using a parallel fast scan implementation of the zigzag scanning improve the overall performance of the system. Savings in transfer time outweigh the extra cost incurred in the GPU.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Zigzag scan</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Huffman coding</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">GPU</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Video compression</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">MPEG</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Entropy coding</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Gulías, Víctor M.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Taibo, Javier</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Rivas, Samuel</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Multimedia tools and applications</subfield><subfield code="d">Springer US, 1995</subfield><subfield code="g">65(2012), 3 vom: 07. März, Seite 495-520</subfield><subfield code="w">(DE-627)189064145</subfield><subfield code="w">(DE-600)1287642-2</subfield><subfield code="w">(DE-576)052842126</subfield><subfield code="x">1380-7501</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:65</subfield><subfield code="g">year:2012</subfield><subfield code="g">number:3</subfield><subfield code="g">day:07</subfield><subfield code="g">month:03</subfield><subfield code="g">pages:495-520</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s11042-012-1053-9</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MKW</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">65</subfield><subfield code="j">2012</subfield><subfield code="e">3</subfield><subfield code="b">07</subfield><subfield code="c">03</subfield><subfield code="h">495-520</subfield></datafield></record></collection>
score	7.4002275

Nicht das Richtige dabei?

Schreiben Sie uns!

Optimising lossless stages in a GPU-based MPEG encoder

Nicht das Richtige dabei?

Zugang & Verfügbarkeit

Vorhandene Bände

Nicht das Richtige dabei?