Optimising lossless stages in a GPU-based MPEG encoder
Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even m...
Ausführliche Beschreibung
Autor*in: |
Montero, Pablo [verfasserIn] |
---|
Format: |
Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2012 |
---|
Schlagwörter: |
---|
Anmerkung: |
© Springer Science+Business Media, LLC 2012 |
---|
Übergeordnetes Werk: |
Enthalten in: Multimedia tools and applications - Springer US, 1995, 65(2012), 3 vom: 07. März, Seite 495-520 |
---|---|
Übergeordnetes Werk: |
volume:65 ; year:2012 ; number:3 ; day:07 ; month:03 ; pages:495-520 |
Links: |
---|
DOI / URN: |
10.1007/s11042-012-1053-9 |
---|
Katalog-ID: |
OLC2035007747 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | OLC2035007747 | ||
003 | DE-627 | ||
005 | 20230503192619.0 | ||
007 | tu | ||
008 | 200819s2012 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1007/s11042-012-1053-9 |2 doi | |
035 | |a (DE-627)OLC2035007747 | ||
035 | |a (DE-He213)s11042-012-1053-9-p | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 070 |a 004 |q VZ |
100 | 1 | |a Montero, Pablo |e verfasserin |4 aut | |
245 | 1 | 0 | |a Optimising lossless stages in a GPU-based MPEG encoder |
264 | 1 | |c 2012 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
500 | |a © Springer Science+Business Media, LLC 2012 | ||
520 | |a Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even more appealing, since the images to encode are already in the GPU, eliminating the costs of transferring raw video from the CPU to the GPU. However, after a raw frame has been transformed and quantized by the GPU, the resulting coefficients must be reordered, entropy encoded and framed into the resulting MPEG bitstream. These last steps are essentially sequential and their straightforward GPU implementation is inefficient compared to CPU-based implementations. We present different approaches to implement part of these steps in GPU, aiming for a better usage of the memory bus, compensating the suboptimal use of the GPU with the gains in transfer time. We analyze three approaches to perform the zigzag scan and Huffman coding combining GPU and CPU, and two approaches to assemble the results to build the actual output bitstream both in GPU and CPU memory. Our experiments show that optimising the amount of data transferred from GPU to CPU implementing the last sequential compression steps in the GPU, and using a parallel fast scan implementation of the zigzag scanning improve the overall performance of the system. Savings in transfer time outweigh the extra cost incurred in the GPU. | ||
650 | 4 | |a Zigzag scan | |
650 | 4 | |a Huffman coding | |
650 | 4 | |a GPU | |
650 | 4 | |a Video compression | |
650 | 4 | |a MPEG | |
650 | 4 | |a Entropy coding | |
700 | 1 | |a Gulías, Víctor M. |4 aut | |
700 | 1 | |a Taibo, Javier |4 aut | |
700 | 1 | |a Rivas, Samuel |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Multimedia tools and applications |d Springer US, 1995 |g 65(2012), 3 vom: 07. März, Seite 495-520 |w (DE-627)189064145 |w (DE-600)1287642-2 |w (DE-576)052842126 |x 1380-7501 |7 nnns |
773 | 1 | 8 | |g volume:65 |g year:2012 |g number:3 |g day:07 |g month:03 |g pages:495-520 |
856 | 4 | 1 | |u https://doi.org/10.1007/s11042-012-1053-9 |z lizenzpflichtig |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a SSG-OLC-MAT | ||
912 | |a SSG-OLC-BUB | ||
912 | |a SSG-OLC-MKW | ||
912 | |a GBV_ILN_70 | ||
951 | |a AR | ||
952 | |d 65 |j 2012 |e 3 |b 07 |c 03 |h 495-520 |
author_variant |
p m pm v m g vm vmg j t jt s r sr |
---|---|
matchkey_str |
article:13807501:2012----::piiigosestgsngua |
hierarchy_sort_str |
2012 |
publishDate |
2012 |
allfields |
10.1007/s11042-012-1053-9 doi (DE-627)OLC2035007747 (DE-He213)s11042-012-1053-9-p DE-627 ger DE-627 rakwb eng 070 004 VZ Montero, Pablo verfasserin aut Optimising lossless stages in a GPU-based MPEG encoder 2012 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2012 Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even more appealing, since the images to encode are already in the GPU, eliminating the costs of transferring raw video from the CPU to the GPU. However, after a raw frame has been transformed and quantized by the GPU, the resulting coefficients must be reordered, entropy encoded and framed into the resulting MPEG bitstream. These last steps are essentially sequential and their straightforward GPU implementation is inefficient compared to CPU-based implementations. We present different approaches to implement part of these steps in GPU, aiming for a better usage of the memory bus, compensating the suboptimal use of the GPU with the gains in transfer time. We analyze three approaches to perform the zigzag scan and Huffman coding combining GPU and CPU, and two approaches to assemble the results to build the actual output bitstream both in GPU and CPU memory. Our experiments show that optimising the amount of data transferred from GPU to CPU implementing the last sequential compression steps in the GPU, and using a parallel fast scan implementation of the zigzag scanning improve the overall performance of the system. Savings in transfer time outweigh the extra cost incurred in the GPU. Zigzag scan Huffman coding GPU Video compression MPEG Entropy coding Gulías, Víctor M. aut Taibo, Javier aut Rivas, Samuel aut Enthalten in Multimedia tools and applications Springer US, 1995 65(2012), 3 vom: 07. März, Seite 495-520 (DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 1380-7501 nnns volume:65 year:2012 number:3 day:07 month:03 pages:495-520 https://doi.org/10.1007/s11042-012-1053-9 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW GBV_ILN_70 AR 65 2012 3 07 03 495-520 |
spelling |
10.1007/s11042-012-1053-9 doi (DE-627)OLC2035007747 (DE-He213)s11042-012-1053-9-p DE-627 ger DE-627 rakwb eng 070 004 VZ Montero, Pablo verfasserin aut Optimising lossless stages in a GPU-based MPEG encoder 2012 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2012 Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even more appealing, since the images to encode are already in the GPU, eliminating the costs of transferring raw video from the CPU to the GPU. However, after a raw frame has been transformed and quantized by the GPU, the resulting coefficients must be reordered, entropy encoded and framed into the resulting MPEG bitstream. These last steps are essentially sequential and their straightforward GPU implementation is inefficient compared to CPU-based implementations. We present different approaches to implement part of these steps in GPU, aiming for a better usage of the memory bus, compensating the suboptimal use of the GPU with the gains in transfer time. We analyze three approaches to perform the zigzag scan and Huffman coding combining GPU and CPU, and two approaches to assemble the results to build the actual output bitstream both in GPU and CPU memory. Our experiments show that optimising the amount of data transferred from GPU to CPU implementing the last sequential compression steps in the GPU, and using a parallel fast scan implementation of the zigzag scanning improve the overall performance of the system. Savings in transfer time outweigh the extra cost incurred in the GPU. Zigzag scan Huffman coding GPU Video compression MPEG Entropy coding Gulías, Víctor M. aut Taibo, Javier aut Rivas, Samuel aut Enthalten in Multimedia tools and applications Springer US, 1995 65(2012), 3 vom: 07. März, Seite 495-520 (DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 1380-7501 nnns volume:65 year:2012 number:3 day:07 month:03 pages:495-520 https://doi.org/10.1007/s11042-012-1053-9 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW GBV_ILN_70 AR 65 2012 3 07 03 495-520 |
allfields_unstemmed |
10.1007/s11042-012-1053-9 doi (DE-627)OLC2035007747 (DE-He213)s11042-012-1053-9-p DE-627 ger DE-627 rakwb eng 070 004 VZ Montero, Pablo verfasserin aut Optimising lossless stages in a GPU-based MPEG encoder 2012 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2012 Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even more appealing, since the images to encode are already in the GPU, eliminating the costs of transferring raw video from the CPU to the GPU. However, after a raw frame has been transformed and quantized by the GPU, the resulting coefficients must be reordered, entropy encoded and framed into the resulting MPEG bitstream. These last steps are essentially sequential and their straightforward GPU implementation is inefficient compared to CPU-based implementations. We present different approaches to implement part of these steps in GPU, aiming for a better usage of the memory bus, compensating the suboptimal use of the GPU with the gains in transfer time. We analyze three approaches to perform the zigzag scan and Huffman coding combining GPU and CPU, and two approaches to assemble the results to build the actual output bitstream both in GPU and CPU memory. Our experiments show that optimising the amount of data transferred from GPU to CPU implementing the last sequential compression steps in the GPU, and using a parallel fast scan implementation of the zigzag scanning improve the overall performance of the system. Savings in transfer time outweigh the extra cost incurred in the GPU. Zigzag scan Huffman coding GPU Video compression MPEG Entropy coding Gulías, Víctor M. aut Taibo, Javier aut Rivas, Samuel aut Enthalten in Multimedia tools and applications Springer US, 1995 65(2012), 3 vom: 07. März, Seite 495-520 (DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 1380-7501 nnns volume:65 year:2012 number:3 day:07 month:03 pages:495-520 https://doi.org/10.1007/s11042-012-1053-9 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW GBV_ILN_70 AR 65 2012 3 07 03 495-520 |
allfieldsGer |
10.1007/s11042-012-1053-9 doi (DE-627)OLC2035007747 (DE-He213)s11042-012-1053-9-p DE-627 ger DE-627 rakwb eng 070 004 VZ Montero, Pablo verfasserin aut Optimising lossless stages in a GPU-based MPEG encoder 2012 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2012 Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even more appealing, since the images to encode are already in the GPU, eliminating the costs of transferring raw video from the CPU to the GPU. However, after a raw frame has been transformed and quantized by the GPU, the resulting coefficients must be reordered, entropy encoded and framed into the resulting MPEG bitstream. These last steps are essentially sequential and their straightforward GPU implementation is inefficient compared to CPU-based implementations. We present different approaches to implement part of these steps in GPU, aiming for a better usage of the memory bus, compensating the suboptimal use of the GPU with the gains in transfer time. We analyze three approaches to perform the zigzag scan and Huffman coding combining GPU and CPU, and two approaches to assemble the results to build the actual output bitstream both in GPU and CPU memory. Our experiments show that optimising the amount of data transferred from GPU to CPU implementing the last sequential compression steps in the GPU, and using a parallel fast scan implementation of the zigzag scanning improve the overall performance of the system. Savings in transfer time outweigh the extra cost incurred in the GPU. Zigzag scan Huffman coding GPU Video compression MPEG Entropy coding Gulías, Víctor M. aut Taibo, Javier aut Rivas, Samuel aut Enthalten in Multimedia tools and applications Springer US, 1995 65(2012), 3 vom: 07. März, Seite 495-520 (DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 1380-7501 nnns volume:65 year:2012 number:3 day:07 month:03 pages:495-520 https://doi.org/10.1007/s11042-012-1053-9 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW GBV_ILN_70 AR 65 2012 3 07 03 495-520 |
allfieldsSound |
10.1007/s11042-012-1053-9 doi (DE-627)OLC2035007747 (DE-He213)s11042-012-1053-9-p DE-627 ger DE-627 rakwb eng 070 004 VZ Montero, Pablo verfasserin aut Optimising lossless stages in a GPU-based MPEG encoder 2012 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2012 Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even more appealing, since the images to encode are already in the GPU, eliminating the costs of transferring raw video from the CPU to the GPU. However, after a raw frame has been transformed and quantized by the GPU, the resulting coefficients must be reordered, entropy encoded and framed into the resulting MPEG bitstream. These last steps are essentially sequential and their straightforward GPU implementation is inefficient compared to CPU-based implementations. We present different approaches to implement part of these steps in GPU, aiming for a better usage of the memory bus, compensating the suboptimal use of the GPU with the gains in transfer time. We analyze three approaches to perform the zigzag scan and Huffman coding combining GPU and CPU, and two approaches to assemble the results to build the actual output bitstream both in GPU and CPU memory. Our experiments show that optimising the amount of data transferred from GPU to CPU implementing the last sequential compression steps in the GPU, and using a parallel fast scan implementation of the zigzag scanning improve the overall performance of the system. Savings in transfer time outweigh the extra cost incurred in the GPU. Zigzag scan Huffman coding GPU Video compression MPEG Entropy coding Gulías, Víctor M. aut Taibo, Javier aut Rivas, Samuel aut Enthalten in Multimedia tools and applications Springer US, 1995 65(2012), 3 vom: 07. März, Seite 495-520 (DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 1380-7501 nnns volume:65 year:2012 number:3 day:07 month:03 pages:495-520 https://doi.org/10.1007/s11042-012-1053-9 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW GBV_ILN_70 AR 65 2012 3 07 03 495-520 |
language |
English |
source |
Enthalten in Multimedia tools and applications 65(2012), 3 vom: 07. März, Seite 495-520 volume:65 year:2012 number:3 day:07 month:03 pages:495-520 |
sourceStr |
Enthalten in Multimedia tools and applications 65(2012), 3 vom: 07. März, Seite 495-520 volume:65 year:2012 number:3 day:07 month:03 pages:495-520 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Zigzag scan Huffman coding GPU Video compression MPEG Entropy coding |
dewey-raw |
070 |
isfreeaccess_bool |
false |
container_title |
Multimedia tools and applications |
authorswithroles_txt_mv |
Montero, Pablo @@aut@@ Gulías, Víctor M. @@aut@@ Taibo, Javier @@aut@@ Rivas, Samuel @@aut@@ |
publishDateDaySort_date |
2012-03-07T00:00:00Z |
hierarchy_top_id |
189064145 |
dewey-sort |
270 |
id |
OLC2035007747 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2035007747</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503192619.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2012 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s11042-012-1053-9</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2035007747</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s11042-012-1053-9-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">070</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Montero, Pablo</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Optimising lossless stages in a GPU-based MPEG encoder</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2012</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Science+Business Media, LLC 2012</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even more appealing, since the images to encode are already in the GPU, eliminating the costs of transferring raw video from the CPU to the GPU. However, after a raw frame has been transformed and quantized by the GPU, the resulting coefficients must be reordered, entropy encoded and framed into the resulting MPEG bitstream. These last steps are essentially sequential and their straightforward GPU implementation is inefficient compared to CPU-based implementations. We present different approaches to implement part of these steps in GPU, aiming for a better usage of the memory bus, compensating the suboptimal use of the GPU with the gains in transfer time. We analyze three approaches to perform the zigzag scan and Huffman coding combining GPU and CPU, and two approaches to assemble the results to build the actual output bitstream both in GPU and CPU memory. Our experiments show that optimising the amount of data transferred from GPU to CPU implementing the last sequential compression steps in the GPU, and using a parallel fast scan implementation of the zigzag scanning improve the overall performance of the system. Savings in transfer time outweigh the extra cost incurred in the GPU.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Zigzag scan</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Huffman coding</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">GPU</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Video compression</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">MPEG</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Entropy coding</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Gulías, Víctor M.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Taibo, Javier</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Rivas, Samuel</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Multimedia tools and applications</subfield><subfield code="d">Springer US, 1995</subfield><subfield code="g">65(2012), 3 vom: 07. März, Seite 495-520</subfield><subfield code="w">(DE-627)189064145</subfield><subfield code="w">(DE-600)1287642-2</subfield><subfield code="w">(DE-576)052842126</subfield><subfield code="x">1380-7501</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:65</subfield><subfield code="g">year:2012</subfield><subfield code="g">number:3</subfield><subfield code="g">day:07</subfield><subfield code="g">month:03</subfield><subfield code="g">pages:495-520</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s11042-012-1053-9</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MKW</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">65</subfield><subfield code="j">2012</subfield><subfield code="e">3</subfield><subfield code="b">07</subfield><subfield code="c">03</subfield><subfield code="h">495-520</subfield></datafield></record></collection>
|
author |
Montero, Pablo |
spellingShingle |
Montero, Pablo ddc 070 misc Zigzag scan misc Huffman coding misc GPU misc Video compression misc MPEG misc Entropy coding Optimising lossless stages in a GPU-based MPEG encoder |
authorStr |
Montero, Pablo |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)189064145 |
format |
Article |
dewey-ones |
070 - News media, journalism & publishing 004 - Data processing & computer science |
delete_txt_mv |
keep |
author_role |
aut aut aut aut |
collection |
OLC |
remote_str |
false |
illustrated |
Not Illustrated |
issn |
1380-7501 |
topic_title |
070 004 VZ Optimising lossless stages in a GPU-based MPEG encoder Zigzag scan Huffman coding GPU Video compression MPEG Entropy coding |
topic |
ddc 070 misc Zigzag scan misc Huffman coding misc GPU misc Video compression misc MPEG misc Entropy coding |
topic_unstemmed |
ddc 070 misc Zigzag scan misc Huffman coding misc GPU misc Video compression misc MPEG misc Entropy coding |
topic_browse |
ddc 070 misc Zigzag scan misc Huffman coding misc GPU misc Video compression misc MPEG misc Entropy coding |
format_facet |
Aufsätze Gedruckte Aufsätze |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
nc |
hierarchy_parent_title |
Multimedia tools and applications |
hierarchy_parent_id |
189064145 |
dewey-tens |
070 - News media, journalism & publishing 000 - Computer science, knowledge & systems |
hierarchy_top_title |
Multimedia tools and applications |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 |
title |
Optimising lossless stages in a GPU-based MPEG encoder |
ctrlnum |
(DE-627)OLC2035007747 (DE-He213)s11042-012-1053-9-p |
title_full |
Optimising lossless stages in a GPU-based MPEG encoder |
author_sort |
Montero, Pablo |
journal |
Multimedia tools and applications |
journalStr |
Multimedia tools and applications |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
000 - Computer science, information & general works |
recordtype |
marc |
publishDateSort |
2012 |
contenttype_str_mv |
txt |
container_start_page |
495 |
author_browse |
Montero, Pablo Gulías, Víctor M. Taibo, Javier Rivas, Samuel |
container_volume |
65 |
class |
070 004 VZ |
format_se |
Aufsätze |
author-letter |
Montero, Pablo |
doi_str_mv |
10.1007/s11042-012-1053-9 |
dewey-full |
070 004 |
title_sort |
optimising lossless stages in a gpu-based mpeg encoder |
title_auth |
Optimising lossless stages in a GPU-based MPEG encoder |
abstract |
Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even more appealing, since the images to encode are already in the GPU, eliminating the costs of transferring raw video from the CPU to the GPU. However, after a raw frame has been transformed and quantized by the GPU, the resulting coefficients must be reordered, entropy encoded and framed into the resulting MPEG bitstream. These last steps are essentially sequential and their straightforward GPU implementation is inefficient compared to CPU-based implementations. We present different approaches to implement part of these steps in GPU, aiming for a better usage of the memory bus, compensating the suboptimal use of the GPU with the gains in transfer time. We analyze three approaches to perform the zigzag scan and Huffman coding combining GPU and CPU, and two approaches to assemble the results to build the actual output bitstream both in GPU and CPU memory. Our experiments show that optimising the amount of data transferred from GPU to CPU implementing the last sequential compression steps in the GPU, and using a parallel fast scan implementation of the zigzag scanning improve the overall performance of the system. Savings in transfer time outweigh the extra cost incurred in the GPU. © Springer Science+Business Media, LLC 2012 |
abstractGer |
Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even more appealing, since the images to encode are already in the GPU, eliminating the costs of transferring raw video from the CPU to the GPU. However, after a raw frame has been transformed and quantized by the GPU, the resulting coefficients must be reordered, entropy encoded and framed into the resulting MPEG bitstream. These last steps are essentially sequential and their straightforward GPU implementation is inefficient compared to CPU-based implementations. We present different approaches to implement part of these steps in GPU, aiming for a better usage of the memory bus, compensating the suboptimal use of the GPU with the gains in transfer time. We analyze three approaches to perform the zigzag scan and Huffman coding combining GPU and CPU, and two approaches to assemble the results to build the actual output bitstream both in GPU and CPU memory. Our experiments show that optimising the amount of data transferred from GPU to CPU implementing the last sequential compression steps in the GPU, and using a parallel fast scan implementation of the zigzag scanning improve the overall performance of the system. Savings in transfer time outweigh the extra cost incurred in the GPU. © Springer Science+Business Media, LLC 2012 |
abstract_unstemmed |
Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even more appealing, since the images to encode are already in the GPU, eliminating the costs of transferring raw video from the CPU to the GPU. However, after a raw frame has been transformed and quantized by the GPU, the resulting coefficients must be reordered, entropy encoded and framed into the resulting MPEG bitstream. These last steps are essentially sequential and their straightforward GPU implementation is inefficient compared to CPU-based implementations. We present different approaches to implement part of these steps in GPU, aiming for a better usage of the memory bus, compensating the suboptimal use of the GPU with the gains in transfer time. We analyze three approaches to perform the zigzag scan and Huffman coding combining GPU and CPU, and two approaches to assemble the results to build the actual output bitstream both in GPU and CPU memory. Our experiments show that optimising the amount of data transferred from GPU to CPU implementing the last sequential compression steps in the GPU, and using a parallel fast scan implementation of the zigzag scanning improve the overall performance of the system. Savings in transfer time outweigh the extra cost incurred in the GPU. © Springer Science+Business Media, LLC 2012 |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW GBV_ILN_70 |
container_issue |
3 |
title_short |
Optimising lossless stages in a GPU-based MPEG encoder |
url |
https://doi.org/10.1007/s11042-012-1053-9 |
remote_bool |
false |
author2 |
Gulías, Víctor M. Taibo, Javier Rivas, Samuel |
author2Str |
Gulías, Víctor M. Taibo, Javier Rivas, Samuel |
ppnlink |
189064145 |
mediatype_str_mv |
n |
isOA_txt |
false |
hochschulschrift_bool |
false |
doi_str |
10.1007/s11042-012-1053-9 |
up_date |
2024-07-03T23:24:39.635Z |
_version_ |
1803602183730495488 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2035007747</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503192619.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2012 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s11042-012-1053-9</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2035007747</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s11042-012-1053-9-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">070</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Montero, Pablo</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Optimising lossless stages in a GPU-based MPEG encoder</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2012</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Science+Business Media, LLC 2012</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Modern GPUs excel in parallel computations, so they are an interesting target to perform matrix transformations such as the DCT, a fundamental part of MPEG video coding algorithms. Considering a system to encode synthetic video (e.g., computer-generated frames), this approach becomes even more appealing, since the images to encode are already in the GPU, eliminating the costs of transferring raw video from the CPU to the GPU. However, after a raw frame has been transformed and quantized by the GPU, the resulting coefficients must be reordered, entropy encoded and framed into the resulting MPEG bitstream. These last steps are essentially sequential and their straightforward GPU implementation is inefficient compared to CPU-based implementations. We present different approaches to implement part of these steps in GPU, aiming for a better usage of the memory bus, compensating the suboptimal use of the GPU with the gains in transfer time. We analyze three approaches to perform the zigzag scan and Huffman coding combining GPU and CPU, and two approaches to assemble the results to build the actual output bitstream both in GPU and CPU memory. Our experiments show that optimising the amount of data transferred from GPU to CPU implementing the last sequential compression steps in the GPU, and using a parallel fast scan implementation of the zigzag scanning improve the overall performance of the system. Savings in transfer time outweigh the extra cost incurred in the GPU.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Zigzag scan</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Huffman coding</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">GPU</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Video compression</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">MPEG</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Entropy coding</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Gulías, Víctor M.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Taibo, Javier</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Rivas, Samuel</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Multimedia tools and applications</subfield><subfield code="d">Springer US, 1995</subfield><subfield code="g">65(2012), 3 vom: 07. März, Seite 495-520</subfield><subfield code="w">(DE-627)189064145</subfield><subfield code="w">(DE-600)1287642-2</subfield><subfield code="w">(DE-576)052842126</subfield><subfield code="x">1380-7501</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:65</subfield><subfield code="g">year:2012</subfield><subfield code="g">number:3</subfield><subfield code="g">day:07</subfield><subfield code="g">month:03</subfield><subfield code="g">pages:495-520</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s11042-012-1053-9</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MKW</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">65</subfield><subfield code="j">2012</subfield><subfield code="e">3</subfield><subfield code="b">07</subfield><subfield code="c">03</subfield><subfield code="h">495-520</subfield></datafield></record></collection>
|
score |
7.4002275 |