Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform
Abstract Compared with other video semantic clues, such as gestures, motions etc., video text generally provides highly useful and fairly precise semantic information, the analysis of which can to a great extent facilitate video and scene understanding. It can be observed that the video texts show s...
Ausführliche Beschreibung
Autor*in: |
Huang, Xiaodong [verfasserIn] |
---|
Format: |
Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2017 |
---|
Schlagwörter: |
---|
Anmerkung: |
© Springer Science+Business Media New York 2017 |
---|
Übergeordnetes Werk: |
Enthalten in: Multimedia tools and applications - Springer US, 1995, 77(2017), 6 vom: 25. März, Seite 7033-7049 |
---|---|
Übergeordnetes Werk: |
volume:77 ; year:2017 ; number:6 ; day:25 ; month:03 ; pages:7033-7049 |
Links: |
---|
DOI / URN: |
10.1007/s11042-017-4619-8 |
---|
Katalog-ID: |
OLC2035045304 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | OLC2035045304 | ||
003 | DE-627 | ||
005 | 20230503193442.0 | ||
007 | tu | ||
008 | 200819s2017 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1007/s11042-017-4619-8 |2 doi | |
035 | |a (DE-627)OLC2035045304 | ||
035 | |a (DE-He213)s11042-017-4619-8-p | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 070 |a 004 |q VZ |
100 | 1 | |a Huang, Xiaodong |e verfasserin |0 (orcid)0000-0002-7953-750X |4 aut | |
245 | 1 | 0 | |a Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform |
264 | 1 | |c 2017 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
500 | |a © Springer Science+Business Media New York 2017 | ||
520 | |a Abstract Compared with other video semantic clues, such as gestures, motions etc., video text generally provides highly useful and fairly precise semantic information, the analysis of which can to a great extent facilitate video and scene understanding. It can be observed that the video texts show stronger edges. The Nonsubsampled Contourlet Transform (NSCT) is a fully shift-invariant, multi-scale, and multi-direction expansion, which can preserve the edge/silhouette of the text characters well. Therefore, in this paper, a new approach has been proposed to detect video text based on NSCT. First of all, the 8 directional coefficients of NSCT are combined to build the directional edge map (DEM), which can keep the horizontal, vertical and diagonal edge features and suppress other directional edge features. Then various directional pixels of DEM are integrated into a whole binary image (BE). Based on the BE, text frame classification is carried out to determine whether the video frames contain the text lines. Finally, text detection based on the BE is performed on consecutive frames to discriminate the video text from non-text regions. Experimental evaluations based on our collected TV videos data set demonstrate that our method significantly outperforms the other 3 video text detection algorithms in both detection speed and accuracy, especially when there are challenges such as video text with various sizes, languages, colors, fonts, short or long text lines. | ||
650 | 4 | |a Video text | |
650 | 4 | |a Nonsubsampled Contourlet Transform | |
650 | 4 | |a Text detection | |
650 | 4 | |a Text frame classification | |
773 | 0 | 8 | |i Enthalten in |t Multimedia tools and applications |d Springer US, 1995 |g 77(2017), 6 vom: 25. März, Seite 7033-7049 |w (DE-627)189064145 |w (DE-600)1287642-2 |w (DE-576)052842126 |x 1380-7501 |7 nnns |
773 | 1 | 8 | |g volume:77 |g year:2017 |g number:6 |g day:25 |g month:03 |g pages:7033-7049 |
856 | 4 | 1 | |u https://doi.org/10.1007/s11042-017-4619-8 |z lizenzpflichtig |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a SSG-OLC-MAT | ||
912 | |a SSG-OLC-BUB | ||
912 | |a SSG-OLC-MKW | ||
912 | |a GBV_ILN_70 | ||
951 | |a AR | ||
952 | |d 77 |j 2017 |e 6 |b 25 |c 03 |h 7033-7049 |
author_variant |
x h xh |
---|---|
matchkey_str |
article:13807501:2017----::uoaivdoueipsdeteetobsdnosbap |
hierarchy_sort_str |
2017 |
publishDate |
2017 |
allfields |
10.1007/s11042-017-4619-8 doi (DE-627)OLC2035045304 (DE-He213)s11042-017-4619-8-p DE-627 ger DE-627 rakwb eng 070 004 VZ Huang, Xiaodong verfasserin (orcid)0000-0002-7953-750X aut Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media New York 2017 Abstract Compared with other video semantic clues, such as gestures, motions etc., video text generally provides highly useful and fairly precise semantic information, the analysis of which can to a great extent facilitate video and scene understanding. It can be observed that the video texts show stronger edges. The Nonsubsampled Contourlet Transform (NSCT) is a fully shift-invariant, multi-scale, and multi-direction expansion, which can preserve the edge/silhouette of the text characters well. Therefore, in this paper, a new approach has been proposed to detect video text based on NSCT. First of all, the 8 directional coefficients of NSCT are combined to build the directional edge map (DEM), which can keep the horizontal, vertical and diagonal edge features and suppress other directional edge features. Then various directional pixels of DEM are integrated into a whole binary image (BE). Based on the BE, text frame classification is carried out to determine whether the video frames contain the text lines. Finally, text detection based on the BE is performed on consecutive frames to discriminate the video text from non-text regions. Experimental evaluations based on our collected TV videos data set demonstrate that our method significantly outperforms the other 3 video text detection algorithms in both detection speed and accuracy, especially when there are challenges such as video text with various sizes, languages, colors, fonts, short or long text lines. Video text Nonsubsampled Contourlet Transform Text detection Text frame classification Enthalten in Multimedia tools and applications Springer US, 1995 77(2017), 6 vom: 25. März, Seite 7033-7049 (DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 1380-7501 nnns volume:77 year:2017 number:6 day:25 month:03 pages:7033-7049 https://doi.org/10.1007/s11042-017-4619-8 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW GBV_ILN_70 AR 77 2017 6 25 03 7033-7049 |
spelling |
10.1007/s11042-017-4619-8 doi (DE-627)OLC2035045304 (DE-He213)s11042-017-4619-8-p DE-627 ger DE-627 rakwb eng 070 004 VZ Huang, Xiaodong verfasserin (orcid)0000-0002-7953-750X aut Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media New York 2017 Abstract Compared with other video semantic clues, such as gestures, motions etc., video text generally provides highly useful and fairly precise semantic information, the analysis of which can to a great extent facilitate video and scene understanding. It can be observed that the video texts show stronger edges. The Nonsubsampled Contourlet Transform (NSCT) is a fully shift-invariant, multi-scale, and multi-direction expansion, which can preserve the edge/silhouette of the text characters well. Therefore, in this paper, a new approach has been proposed to detect video text based on NSCT. First of all, the 8 directional coefficients of NSCT are combined to build the directional edge map (DEM), which can keep the horizontal, vertical and diagonal edge features and suppress other directional edge features. Then various directional pixels of DEM are integrated into a whole binary image (BE). Based on the BE, text frame classification is carried out to determine whether the video frames contain the text lines. Finally, text detection based on the BE is performed on consecutive frames to discriminate the video text from non-text regions. Experimental evaluations based on our collected TV videos data set demonstrate that our method significantly outperforms the other 3 video text detection algorithms in both detection speed and accuracy, especially when there are challenges such as video text with various sizes, languages, colors, fonts, short or long text lines. Video text Nonsubsampled Contourlet Transform Text detection Text frame classification Enthalten in Multimedia tools and applications Springer US, 1995 77(2017), 6 vom: 25. März, Seite 7033-7049 (DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 1380-7501 nnns volume:77 year:2017 number:6 day:25 month:03 pages:7033-7049 https://doi.org/10.1007/s11042-017-4619-8 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW GBV_ILN_70 AR 77 2017 6 25 03 7033-7049 |
allfields_unstemmed |
10.1007/s11042-017-4619-8 doi (DE-627)OLC2035045304 (DE-He213)s11042-017-4619-8-p DE-627 ger DE-627 rakwb eng 070 004 VZ Huang, Xiaodong verfasserin (orcid)0000-0002-7953-750X aut Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media New York 2017 Abstract Compared with other video semantic clues, such as gestures, motions etc., video text generally provides highly useful and fairly precise semantic information, the analysis of which can to a great extent facilitate video and scene understanding. It can be observed that the video texts show stronger edges. The Nonsubsampled Contourlet Transform (NSCT) is a fully shift-invariant, multi-scale, and multi-direction expansion, which can preserve the edge/silhouette of the text characters well. Therefore, in this paper, a new approach has been proposed to detect video text based on NSCT. First of all, the 8 directional coefficients of NSCT are combined to build the directional edge map (DEM), which can keep the horizontal, vertical and diagonal edge features and suppress other directional edge features. Then various directional pixels of DEM are integrated into a whole binary image (BE). Based on the BE, text frame classification is carried out to determine whether the video frames contain the text lines. Finally, text detection based on the BE is performed on consecutive frames to discriminate the video text from non-text regions. Experimental evaluations based on our collected TV videos data set demonstrate that our method significantly outperforms the other 3 video text detection algorithms in both detection speed and accuracy, especially when there are challenges such as video text with various sizes, languages, colors, fonts, short or long text lines. Video text Nonsubsampled Contourlet Transform Text detection Text frame classification Enthalten in Multimedia tools and applications Springer US, 1995 77(2017), 6 vom: 25. März, Seite 7033-7049 (DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 1380-7501 nnns volume:77 year:2017 number:6 day:25 month:03 pages:7033-7049 https://doi.org/10.1007/s11042-017-4619-8 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW GBV_ILN_70 AR 77 2017 6 25 03 7033-7049 |
allfieldsGer |
10.1007/s11042-017-4619-8 doi (DE-627)OLC2035045304 (DE-He213)s11042-017-4619-8-p DE-627 ger DE-627 rakwb eng 070 004 VZ Huang, Xiaodong verfasserin (orcid)0000-0002-7953-750X aut Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media New York 2017 Abstract Compared with other video semantic clues, such as gestures, motions etc., video text generally provides highly useful and fairly precise semantic information, the analysis of which can to a great extent facilitate video and scene understanding. It can be observed that the video texts show stronger edges. The Nonsubsampled Contourlet Transform (NSCT) is a fully shift-invariant, multi-scale, and multi-direction expansion, which can preserve the edge/silhouette of the text characters well. Therefore, in this paper, a new approach has been proposed to detect video text based on NSCT. First of all, the 8 directional coefficients of NSCT are combined to build the directional edge map (DEM), which can keep the horizontal, vertical and diagonal edge features and suppress other directional edge features. Then various directional pixels of DEM are integrated into a whole binary image (BE). Based on the BE, text frame classification is carried out to determine whether the video frames contain the text lines. Finally, text detection based on the BE is performed on consecutive frames to discriminate the video text from non-text regions. Experimental evaluations based on our collected TV videos data set demonstrate that our method significantly outperforms the other 3 video text detection algorithms in both detection speed and accuracy, especially when there are challenges such as video text with various sizes, languages, colors, fonts, short or long text lines. Video text Nonsubsampled Contourlet Transform Text detection Text frame classification Enthalten in Multimedia tools and applications Springer US, 1995 77(2017), 6 vom: 25. März, Seite 7033-7049 (DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 1380-7501 nnns volume:77 year:2017 number:6 day:25 month:03 pages:7033-7049 https://doi.org/10.1007/s11042-017-4619-8 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW GBV_ILN_70 AR 77 2017 6 25 03 7033-7049 |
allfieldsSound |
10.1007/s11042-017-4619-8 doi (DE-627)OLC2035045304 (DE-He213)s11042-017-4619-8-p DE-627 ger DE-627 rakwb eng 070 004 VZ Huang, Xiaodong verfasserin (orcid)0000-0002-7953-750X aut Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media New York 2017 Abstract Compared with other video semantic clues, such as gestures, motions etc., video text generally provides highly useful and fairly precise semantic information, the analysis of which can to a great extent facilitate video and scene understanding. It can be observed that the video texts show stronger edges. The Nonsubsampled Contourlet Transform (NSCT) is a fully shift-invariant, multi-scale, and multi-direction expansion, which can preserve the edge/silhouette of the text characters well. Therefore, in this paper, a new approach has been proposed to detect video text based on NSCT. First of all, the 8 directional coefficients of NSCT are combined to build the directional edge map (DEM), which can keep the horizontal, vertical and diagonal edge features and suppress other directional edge features. Then various directional pixels of DEM are integrated into a whole binary image (BE). Based on the BE, text frame classification is carried out to determine whether the video frames contain the text lines. Finally, text detection based on the BE is performed on consecutive frames to discriminate the video text from non-text regions. Experimental evaluations based on our collected TV videos data set demonstrate that our method significantly outperforms the other 3 video text detection algorithms in both detection speed and accuracy, especially when there are challenges such as video text with various sizes, languages, colors, fonts, short or long text lines. Video text Nonsubsampled Contourlet Transform Text detection Text frame classification Enthalten in Multimedia tools and applications Springer US, 1995 77(2017), 6 vom: 25. März, Seite 7033-7049 (DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 1380-7501 nnns volume:77 year:2017 number:6 day:25 month:03 pages:7033-7049 https://doi.org/10.1007/s11042-017-4619-8 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW GBV_ILN_70 AR 77 2017 6 25 03 7033-7049 |
language |
English |
source |
Enthalten in Multimedia tools and applications 77(2017), 6 vom: 25. März, Seite 7033-7049 volume:77 year:2017 number:6 day:25 month:03 pages:7033-7049 |
sourceStr |
Enthalten in Multimedia tools and applications 77(2017), 6 vom: 25. März, Seite 7033-7049 volume:77 year:2017 number:6 day:25 month:03 pages:7033-7049 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Video text Nonsubsampled Contourlet Transform Text detection Text frame classification |
dewey-raw |
070 |
isfreeaccess_bool |
false |
container_title |
Multimedia tools and applications |
authorswithroles_txt_mv |
Huang, Xiaodong @@aut@@ |
publishDateDaySort_date |
2017-03-25T00:00:00Z |
hierarchy_top_id |
189064145 |
dewey-sort |
270 |
id |
OLC2035045304 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2035045304</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503193442.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2017 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s11042-017-4619-8</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2035045304</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s11042-017-4619-8-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">070</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Huang, Xiaodong</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0002-7953-750X</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2017</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Science+Business Media New York 2017</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Compared with other video semantic clues, such as gestures, motions etc., video text generally provides highly useful and fairly precise semantic information, the analysis of which can to a great extent facilitate video and scene understanding. It can be observed that the video texts show stronger edges. The Nonsubsampled Contourlet Transform (NSCT) is a fully shift-invariant, multi-scale, and multi-direction expansion, which can preserve the edge/silhouette of the text characters well. Therefore, in this paper, a new approach has been proposed to detect video text based on NSCT. First of all, the 8 directional coefficients of NSCT are combined to build the directional edge map (DEM), which can keep the horizontal, vertical and diagonal edge features and suppress other directional edge features. Then various directional pixels of DEM are integrated into a whole binary image (BE). Based on the BE, text frame classification is carried out to determine whether the video frames contain the text lines. Finally, text detection based on the BE is performed on consecutive frames to discriminate the video text from non-text regions. Experimental evaluations based on our collected TV videos data set demonstrate that our method significantly outperforms the other 3 video text detection algorithms in both detection speed and accuracy, especially when there are challenges such as video text with various sizes, languages, colors, fonts, short or long text lines.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Video text</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Nonsubsampled Contourlet Transform</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Text detection</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Text frame classification</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Multimedia tools and applications</subfield><subfield code="d">Springer US, 1995</subfield><subfield code="g">77(2017), 6 vom: 25. März, Seite 7033-7049</subfield><subfield code="w">(DE-627)189064145</subfield><subfield code="w">(DE-600)1287642-2</subfield><subfield code="w">(DE-576)052842126</subfield><subfield code="x">1380-7501</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:77</subfield><subfield code="g">year:2017</subfield><subfield code="g">number:6</subfield><subfield code="g">day:25</subfield><subfield code="g">month:03</subfield><subfield code="g">pages:7033-7049</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s11042-017-4619-8</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MKW</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">77</subfield><subfield code="j">2017</subfield><subfield code="e">6</subfield><subfield code="b">25</subfield><subfield code="c">03</subfield><subfield code="h">7033-7049</subfield></datafield></record></collection>
|
author |
Huang, Xiaodong |
spellingShingle |
Huang, Xiaodong ddc 070 misc Video text misc Nonsubsampled Contourlet Transform misc Text detection misc Text frame classification Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform |
authorStr |
Huang, Xiaodong |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)189064145 |
format |
Article |
dewey-ones |
070 - News media, journalism & publishing 004 - Data processing & computer science |
delete_txt_mv |
keep |
author_role |
aut |
collection |
OLC |
remote_str |
false |
illustrated |
Not Illustrated |
issn |
1380-7501 |
topic_title |
070 004 VZ Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform Video text Nonsubsampled Contourlet Transform Text detection Text frame classification |
topic |
ddc 070 misc Video text misc Nonsubsampled Contourlet Transform misc Text detection misc Text frame classification |
topic_unstemmed |
ddc 070 misc Video text misc Nonsubsampled Contourlet Transform misc Text detection misc Text frame classification |
topic_browse |
ddc 070 misc Video text misc Nonsubsampled Contourlet Transform misc Text detection misc Text frame classification |
format_facet |
Aufsätze Gedruckte Aufsätze |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
nc |
hierarchy_parent_title |
Multimedia tools and applications |
hierarchy_parent_id |
189064145 |
dewey-tens |
070 - News media, journalism & publishing 000 - Computer science, knowledge & systems |
hierarchy_top_title |
Multimedia tools and applications |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 |
title |
Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform |
ctrlnum |
(DE-627)OLC2035045304 (DE-He213)s11042-017-4619-8-p |
title_full |
Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform |
author_sort |
Huang, Xiaodong |
journal |
Multimedia tools and applications |
journalStr |
Multimedia tools and applications |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
000 - Computer science, information & general works |
recordtype |
marc |
publishDateSort |
2017 |
contenttype_str_mv |
txt |
container_start_page |
7033 |
author_browse |
Huang, Xiaodong |
container_volume |
77 |
class |
070 004 VZ |
format_se |
Aufsätze |
author-letter |
Huang, Xiaodong |
doi_str_mv |
10.1007/s11042-017-4619-8 |
normlink |
(ORCID)0000-0002-7953-750X |
normlink_prefix_str_mv |
(orcid)0000-0002-7953-750X |
dewey-full |
070 004 |
title_sort |
automatic video superimposed text detection based on nonsubsampled contourlet transform |
title_auth |
Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform |
abstract |
Abstract Compared with other video semantic clues, such as gestures, motions etc., video text generally provides highly useful and fairly precise semantic information, the analysis of which can to a great extent facilitate video and scene understanding. It can be observed that the video texts show stronger edges. The Nonsubsampled Contourlet Transform (NSCT) is a fully shift-invariant, multi-scale, and multi-direction expansion, which can preserve the edge/silhouette of the text characters well. Therefore, in this paper, a new approach has been proposed to detect video text based on NSCT. First of all, the 8 directional coefficients of NSCT are combined to build the directional edge map (DEM), which can keep the horizontal, vertical and diagonal edge features and suppress other directional edge features. Then various directional pixels of DEM are integrated into a whole binary image (BE). Based on the BE, text frame classification is carried out to determine whether the video frames contain the text lines. Finally, text detection based on the BE is performed on consecutive frames to discriminate the video text from non-text regions. Experimental evaluations based on our collected TV videos data set demonstrate that our method significantly outperforms the other 3 video text detection algorithms in both detection speed and accuracy, especially when there are challenges such as video text with various sizes, languages, colors, fonts, short or long text lines. © Springer Science+Business Media New York 2017 |
abstractGer |
Abstract Compared with other video semantic clues, such as gestures, motions etc., video text generally provides highly useful and fairly precise semantic information, the analysis of which can to a great extent facilitate video and scene understanding. It can be observed that the video texts show stronger edges. The Nonsubsampled Contourlet Transform (NSCT) is a fully shift-invariant, multi-scale, and multi-direction expansion, which can preserve the edge/silhouette of the text characters well. Therefore, in this paper, a new approach has been proposed to detect video text based on NSCT. First of all, the 8 directional coefficients of NSCT are combined to build the directional edge map (DEM), which can keep the horizontal, vertical and diagonal edge features and suppress other directional edge features. Then various directional pixels of DEM are integrated into a whole binary image (BE). Based on the BE, text frame classification is carried out to determine whether the video frames contain the text lines. Finally, text detection based on the BE is performed on consecutive frames to discriminate the video text from non-text regions. Experimental evaluations based on our collected TV videos data set demonstrate that our method significantly outperforms the other 3 video text detection algorithms in both detection speed and accuracy, especially when there are challenges such as video text with various sizes, languages, colors, fonts, short or long text lines. © Springer Science+Business Media New York 2017 |
abstract_unstemmed |
Abstract Compared with other video semantic clues, such as gestures, motions etc., video text generally provides highly useful and fairly precise semantic information, the analysis of which can to a great extent facilitate video and scene understanding. It can be observed that the video texts show stronger edges. The Nonsubsampled Contourlet Transform (NSCT) is a fully shift-invariant, multi-scale, and multi-direction expansion, which can preserve the edge/silhouette of the text characters well. Therefore, in this paper, a new approach has been proposed to detect video text based on NSCT. First of all, the 8 directional coefficients of NSCT are combined to build the directional edge map (DEM), which can keep the horizontal, vertical and diagonal edge features and suppress other directional edge features. Then various directional pixels of DEM are integrated into a whole binary image (BE). Based on the BE, text frame classification is carried out to determine whether the video frames contain the text lines. Finally, text detection based on the BE is performed on consecutive frames to discriminate the video text from non-text regions. Experimental evaluations based on our collected TV videos data set demonstrate that our method significantly outperforms the other 3 video text detection algorithms in both detection speed and accuracy, especially when there are challenges such as video text with various sizes, languages, colors, fonts, short or long text lines. © Springer Science+Business Media New York 2017 |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW GBV_ILN_70 |
container_issue |
6 |
title_short |
Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform |
url |
https://doi.org/10.1007/s11042-017-4619-8 |
remote_bool |
false |
ppnlink |
189064145 |
mediatype_str_mv |
n |
isOA_txt |
false |
hochschulschrift_bool |
false |
doi_str |
10.1007/s11042-017-4619-8 |
up_date |
2024-07-03T23:34:50.175Z |
_version_ |
1803602823927037952 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2035045304</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503193442.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2017 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s11042-017-4619-8</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2035045304</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s11042-017-4619-8-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">070</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Huang, Xiaodong</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0002-7953-750X</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2017</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Science+Business Media New York 2017</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Compared with other video semantic clues, such as gestures, motions etc., video text generally provides highly useful and fairly precise semantic information, the analysis of which can to a great extent facilitate video and scene understanding. It can be observed that the video texts show stronger edges. The Nonsubsampled Contourlet Transform (NSCT) is a fully shift-invariant, multi-scale, and multi-direction expansion, which can preserve the edge/silhouette of the text characters well. Therefore, in this paper, a new approach has been proposed to detect video text based on NSCT. First of all, the 8 directional coefficients of NSCT are combined to build the directional edge map (DEM), which can keep the horizontal, vertical and diagonal edge features and suppress other directional edge features. Then various directional pixels of DEM are integrated into a whole binary image (BE). Based on the BE, text frame classification is carried out to determine whether the video frames contain the text lines. Finally, text detection based on the BE is performed on consecutive frames to discriminate the video text from non-text regions. Experimental evaluations based on our collected TV videos data set demonstrate that our method significantly outperforms the other 3 video text detection algorithms in both detection speed and accuracy, especially when there are challenges such as video text with various sizes, languages, colors, fonts, short or long text lines.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Video text</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Nonsubsampled Contourlet Transform</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Text detection</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Text frame classification</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Multimedia tools and applications</subfield><subfield code="d">Springer US, 1995</subfield><subfield code="g">77(2017), 6 vom: 25. März, Seite 7033-7049</subfield><subfield code="w">(DE-627)189064145</subfield><subfield code="w">(DE-600)1287642-2</subfield><subfield code="w">(DE-576)052842126</subfield><subfield code="x">1380-7501</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:77</subfield><subfield code="g">year:2017</subfield><subfield code="g">number:6</subfield><subfield code="g">day:25</subfield><subfield code="g">month:03</subfield><subfield code="g">pages:7033-7049</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s11042-017-4619-8</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MKW</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">77</subfield><subfield code="j">2017</subfield><subfield code="e">6</subfield><subfield code="b">25</subfield><subfield code="c">03</subfield><subfield code="h">7033-7049</subfield></datafield></record></collection>
|
score |
7.3996086 |