Age group classification and gender recognition from speech with temporal convolutional neural networks
Abstract This paper analyses the performance of different types of Deep Neural Networks to jointly estimate age and identify gender from speech, to be applied in Interactive Voice Response systems available in call centres. Deep Neural Networks are used, because they have recently demonstrated discr...
Ausführliche Beschreibung
Autor*in: |
Sánchez-Hevia, Héctor A. [verfasserIn] |
---|
Format: |
Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2022 |
---|
Schlagwörter: |
---|
Anmerkung: |
© The Author(s) 2021 |
---|
Übergeordnetes Werk: |
Enthalten in: Multimedia tools and applications - Springer US, 1995, 81(2022), 3 vom: Jan., Seite 3535-3552 |
---|---|
Übergeordnetes Werk: |
volume:81 ; year:2022 ; number:3 ; month:01 ; pages:3535-3552 |
Links: |
---|
DOI / URN: |
10.1007/s11042-021-11614-4 |
---|
Katalog-ID: |
OLC2078099147 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | OLC2078099147 | ||
003 | DE-627 | ||
005 | 20230505225227.0 | ||
007 | tu | ||
008 | 221220s2022 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1007/s11042-021-11614-4 |2 doi | |
035 | |a (DE-627)OLC2078099147 | ||
035 | |a (DE-He213)s11042-021-11614-4-p | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 070 |a 004 |q VZ |
100 | 1 | |a Sánchez-Hevia, Héctor A. |e verfasserin |4 aut | |
245 | 1 | 0 | |a Age group classification and gender recognition from speech with temporal convolutional neural networks |
264 | 1 | |c 2022 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
500 | |a © The Author(s) 2021 | ||
520 | |a Abstract This paper analyses the performance of different types of Deep Neural Networks to jointly estimate age and identify gender from speech, to be applied in Interactive Voice Response systems available in call centres. Deep Neural Networks are used, because they have recently demonstrated discriminative and representation capabilities in a wide range of applications, including speech processing problems based on feature extraction and selection. Networks with different sizes are analysed to obtain information on how performance depends on the network architecture and the number of free parameters. The speech corpus used for the experiments is Mozilla’s Common Voice dataset, an open and crowdsourced speech corpus. The results are really good for gender classification, independently of the type of neural network, but improve with the network size. Regarding the classification by age groups, the combination of convolutional neural networks and temporal neural networks seems to be the best option among the analysed, and again, the larger the size of the network, the better the results. The results are promising for use in IVR systems, with the best systems achieving a gender identification error of less than 2% and a classification error by age group of less than 20%. | ||
650 | 4 | |a Interactive voice response | |
650 | 4 | |a Age estimation | |
650 | 4 | |a Gender recognition | |
650 | 4 | |a Human-robot interaction | |
650 | 4 | |a Machine learning | |
700 | 1 | |a Gil-Pita, Roberto |4 aut | |
700 | 1 | |a Utrilla-Manso, Manuel |4 aut | |
700 | 1 | |a Rosa-Zurera, Manuel |0 (orcid)0000-0002-3073-3278 |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Multimedia tools and applications |d Springer US, 1995 |g 81(2022), 3 vom: Jan., Seite 3535-3552 |w (DE-627)189064145 |w (DE-600)1287642-2 |w (DE-576)052842126 |x 1380-7501 |7 nnns |
773 | 1 | 8 | |g volume:81 |g year:2022 |g number:3 |g month:01 |g pages:3535-3552 |
856 | 4 | 1 | |u https://doi.org/10.1007/s11042-021-11614-4 |z lizenzpflichtig |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a SSG-OLC-MAT | ||
912 | |a SSG-OLC-BUB | ||
912 | |a SSG-OLC-MKW | ||
951 | |a AR | ||
952 | |d 81 |j 2022 |e 3 |c 01 |h 3535-3552 |
author_variant |
h a s h has hash r g p rgp m u m mum m r z mrz |
---|---|
matchkey_str |
article:13807501:2022----::ggoplsiiainngnercgiinrmpehiheprlo |
hierarchy_sort_str |
2022 |
publishDate |
2022 |
allfields |
10.1007/s11042-021-11614-4 doi (DE-627)OLC2078099147 (DE-He213)s11042-021-11614-4-p DE-627 ger DE-627 rakwb eng 070 004 VZ Sánchez-Hevia, Héctor A. verfasserin aut Age group classification and gender recognition from speech with temporal convolutional neural networks 2022 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2021 Abstract This paper analyses the performance of different types of Deep Neural Networks to jointly estimate age and identify gender from speech, to be applied in Interactive Voice Response systems available in call centres. Deep Neural Networks are used, because they have recently demonstrated discriminative and representation capabilities in a wide range of applications, including speech processing problems based on feature extraction and selection. Networks with different sizes are analysed to obtain information on how performance depends on the network architecture and the number of free parameters. The speech corpus used for the experiments is Mozilla’s Common Voice dataset, an open and crowdsourced speech corpus. The results are really good for gender classification, independently of the type of neural network, but improve with the network size. Regarding the classification by age groups, the combination of convolutional neural networks and temporal neural networks seems to be the best option among the analysed, and again, the larger the size of the network, the better the results. The results are promising for use in IVR systems, with the best systems achieving a gender identification error of less than 2% and a classification error by age group of less than 20%. Interactive voice response Age estimation Gender recognition Human-robot interaction Machine learning Gil-Pita, Roberto aut Utrilla-Manso, Manuel aut Rosa-Zurera, Manuel (orcid)0000-0002-3073-3278 aut Enthalten in Multimedia tools and applications Springer US, 1995 81(2022), 3 vom: Jan., Seite 3535-3552 (DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 1380-7501 nnns volume:81 year:2022 number:3 month:01 pages:3535-3552 https://doi.org/10.1007/s11042-021-11614-4 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW AR 81 2022 3 01 3535-3552 |
spelling |
10.1007/s11042-021-11614-4 doi (DE-627)OLC2078099147 (DE-He213)s11042-021-11614-4-p DE-627 ger DE-627 rakwb eng 070 004 VZ Sánchez-Hevia, Héctor A. verfasserin aut Age group classification and gender recognition from speech with temporal convolutional neural networks 2022 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2021 Abstract This paper analyses the performance of different types of Deep Neural Networks to jointly estimate age and identify gender from speech, to be applied in Interactive Voice Response systems available in call centres. Deep Neural Networks are used, because they have recently demonstrated discriminative and representation capabilities in a wide range of applications, including speech processing problems based on feature extraction and selection. Networks with different sizes are analysed to obtain information on how performance depends on the network architecture and the number of free parameters. The speech corpus used for the experiments is Mozilla’s Common Voice dataset, an open and crowdsourced speech corpus. The results are really good for gender classification, independently of the type of neural network, but improve with the network size. Regarding the classification by age groups, the combination of convolutional neural networks and temporal neural networks seems to be the best option among the analysed, and again, the larger the size of the network, the better the results. The results are promising for use in IVR systems, with the best systems achieving a gender identification error of less than 2% and a classification error by age group of less than 20%. Interactive voice response Age estimation Gender recognition Human-robot interaction Machine learning Gil-Pita, Roberto aut Utrilla-Manso, Manuel aut Rosa-Zurera, Manuel (orcid)0000-0002-3073-3278 aut Enthalten in Multimedia tools and applications Springer US, 1995 81(2022), 3 vom: Jan., Seite 3535-3552 (DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 1380-7501 nnns volume:81 year:2022 number:3 month:01 pages:3535-3552 https://doi.org/10.1007/s11042-021-11614-4 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW AR 81 2022 3 01 3535-3552 |
allfields_unstemmed |
10.1007/s11042-021-11614-4 doi (DE-627)OLC2078099147 (DE-He213)s11042-021-11614-4-p DE-627 ger DE-627 rakwb eng 070 004 VZ Sánchez-Hevia, Héctor A. verfasserin aut Age group classification and gender recognition from speech with temporal convolutional neural networks 2022 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2021 Abstract This paper analyses the performance of different types of Deep Neural Networks to jointly estimate age and identify gender from speech, to be applied in Interactive Voice Response systems available in call centres. Deep Neural Networks are used, because they have recently demonstrated discriminative and representation capabilities in a wide range of applications, including speech processing problems based on feature extraction and selection. Networks with different sizes are analysed to obtain information on how performance depends on the network architecture and the number of free parameters. The speech corpus used for the experiments is Mozilla’s Common Voice dataset, an open and crowdsourced speech corpus. The results are really good for gender classification, independently of the type of neural network, but improve with the network size. Regarding the classification by age groups, the combination of convolutional neural networks and temporal neural networks seems to be the best option among the analysed, and again, the larger the size of the network, the better the results. The results are promising for use in IVR systems, with the best systems achieving a gender identification error of less than 2% and a classification error by age group of less than 20%. Interactive voice response Age estimation Gender recognition Human-robot interaction Machine learning Gil-Pita, Roberto aut Utrilla-Manso, Manuel aut Rosa-Zurera, Manuel (orcid)0000-0002-3073-3278 aut Enthalten in Multimedia tools and applications Springer US, 1995 81(2022), 3 vom: Jan., Seite 3535-3552 (DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 1380-7501 nnns volume:81 year:2022 number:3 month:01 pages:3535-3552 https://doi.org/10.1007/s11042-021-11614-4 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW AR 81 2022 3 01 3535-3552 |
allfieldsGer |
10.1007/s11042-021-11614-4 doi (DE-627)OLC2078099147 (DE-He213)s11042-021-11614-4-p DE-627 ger DE-627 rakwb eng 070 004 VZ Sánchez-Hevia, Héctor A. verfasserin aut Age group classification and gender recognition from speech with temporal convolutional neural networks 2022 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2021 Abstract This paper analyses the performance of different types of Deep Neural Networks to jointly estimate age and identify gender from speech, to be applied in Interactive Voice Response systems available in call centres. Deep Neural Networks are used, because they have recently demonstrated discriminative and representation capabilities in a wide range of applications, including speech processing problems based on feature extraction and selection. Networks with different sizes are analysed to obtain information on how performance depends on the network architecture and the number of free parameters. The speech corpus used for the experiments is Mozilla’s Common Voice dataset, an open and crowdsourced speech corpus. The results are really good for gender classification, independently of the type of neural network, but improve with the network size. Regarding the classification by age groups, the combination of convolutional neural networks and temporal neural networks seems to be the best option among the analysed, and again, the larger the size of the network, the better the results. The results are promising for use in IVR systems, with the best systems achieving a gender identification error of less than 2% and a classification error by age group of less than 20%. Interactive voice response Age estimation Gender recognition Human-robot interaction Machine learning Gil-Pita, Roberto aut Utrilla-Manso, Manuel aut Rosa-Zurera, Manuel (orcid)0000-0002-3073-3278 aut Enthalten in Multimedia tools and applications Springer US, 1995 81(2022), 3 vom: Jan., Seite 3535-3552 (DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 1380-7501 nnns volume:81 year:2022 number:3 month:01 pages:3535-3552 https://doi.org/10.1007/s11042-021-11614-4 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW AR 81 2022 3 01 3535-3552 |
allfieldsSound |
10.1007/s11042-021-11614-4 doi (DE-627)OLC2078099147 (DE-He213)s11042-021-11614-4-p DE-627 ger DE-627 rakwb eng 070 004 VZ Sánchez-Hevia, Héctor A. verfasserin aut Age group classification and gender recognition from speech with temporal convolutional neural networks 2022 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2021 Abstract This paper analyses the performance of different types of Deep Neural Networks to jointly estimate age and identify gender from speech, to be applied in Interactive Voice Response systems available in call centres. Deep Neural Networks are used, because they have recently demonstrated discriminative and representation capabilities in a wide range of applications, including speech processing problems based on feature extraction and selection. Networks with different sizes are analysed to obtain information on how performance depends on the network architecture and the number of free parameters. The speech corpus used for the experiments is Mozilla’s Common Voice dataset, an open and crowdsourced speech corpus. The results are really good for gender classification, independently of the type of neural network, but improve with the network size. Regarding the classification by age groups, the combination of convolutional neural networks and temporal neural networks seems to be the best option among the analysed, and again, the larger the size of the network, the better the results. The results are promising for use in IVR systems, with the best systems achieving a gender identification error of less than 2% and a classification error by age group of less than 20%. Interactive voice response Age estimation Gender recognition Human-robot interaction Machine learning Gil-Pita, Roberto aut Utrilla-Manso, Manuel aut Rosa-Zurera, Manuel (orcid)0000-0002-3073-3278 aut Enthalten in Multimedia tools and applications Springer US, 1995 81(2022), 3 vom: Jan., Seite 3535-3552 (DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 1380-7501 nnns volume:81 year:2022 number:3 month:01 pages:3535-3552 https://doi.org/10.1007/s11042-021-11614-4 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW AR 81 2022 3 01 3535-3552 |
language |
English |
source |
Enthalten in Multimedia tools and applications 81(2022), 3 vom: Jan., Seite 3535-3552 volume:81 year:2022 number:3 month:01 pages:3535-3552 |
sourceStr |
Enthalten in Multimedia tools and applications 81(2022), 3 vom: Jan., Seite 3535-3552 volume:81 year:2022 number:3 month:01 pages:3535-3552 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Interactive voice response Age estimation Gender recognition Human-robot interaction Machine learning |
dewey-raw |
070 |
isfreeaccess_bool |
false |
container_title |
Multimedia tools and applications |
authorswithroles_txt_mv |
Sánchez-Hevia, Héctor A. @@aut@@ Gil-Pita, Roberto @@aut@@ Utrilla-Manso, Manuel @@aut@@ Rosa-Zurera, Manuel @@aut@@ |
publishDateDaySort_date |
2022-01-01T00:00:00Z |
hierarchy_top_id |
189064145 |
dewey-sort |
270 |
id |
OLC2078099147 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2078099147</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230505225227.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">221220s2022 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s11042-021-11614-4</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2078099147</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s11042-021-11614-4-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">070</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Sánchez-Hevia, Héctor A.</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Age group classification and gender recognition from speech with temporal convolutional neural networks</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2022</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s) 2021</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract This paper analyses the performance of different types of Deep Neural Networks to jointly estimate age and identify gender from speech, to be applied in Interactive Voice Response systems available in call centres. Deep Neural Networks are used, because they have recently demonstrated discriminative and representation capabilities in a wide range of applications, including speech processing problems based on feature extraction and selection. Networks with different sizes are analysed to obtain information on how performance depends on the network architecture and the number of free parameters. The speech corpus used for the experiments is Mozilla’s Common Voice dataset, an open and crowdsourced speech corpus. The results are really good for gender classification, independently of the type of neural network, but improve with the network size. Regarding the classification by age groups, the combination of convolutional neural networks and temporal neural networks seems to be the best option among the analysed, and again, the larger the size of the network, the better the results. The results are promising for use in IVR systems, with the best systems achieving a gender identification error of less than 2% and a classification error by age group of less than 20%.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Interactive voice response</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Age estimation</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Gender recognition</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Human-robot interaction</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Machine learning</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Gil-Pita, Roberto</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Utrilla-Manso, Manuel</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Rosa-Zurera, Manuel</subfield><subfield code="0">(orcid)0000-0002-3073-3278</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Multimedia tools and applications</subfield><subfield code="d">Springer US, 1995</subfield><subfield code="g">81(2022), 3 vom: Jan., Seite 3535-3552</subfield><subfield code="w">(DE-627)189064145</subfield><subfield code="w">(DE-600)1287642-2</subfield><subfield code="w">(DE-576)052842126</subfield><subfield code="x">1380-7501</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:81</subfield><subfield code="g">year:2022</subfield><subfield code="g">number:3</subfield><subfield code="g">month:01</subfield><subfield code="g">pages:3535-3552</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s11042-021-11614-4</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MKW</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">81</subfield><subfield code="j">2022</subfield><subfield code="e">3</subfield><subfield code="c">01</subfield><subfield code="h">3535-3552</subfield></datafield></record></collection>
|
author |
Sánchez-Hevia, Héctor A. |
spellingShingle |
Sánchez-Hevia, Héctor A. ddc 070 misc Interactive voice response misc Age estimation misc Gender recognition misc Human-robot interaction misc Machine learning Age group classification and gender recognition from speech with temporal convolutional neural networks |
authorStr |
Sánchez-Hevia, Héctor A. |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)189064145 |
format |
Article |
dewey-ones |
070 - News media, journalism & publishing 004 - Data processing & computer science |
delete_txt_mv |
keep |
author_role |
aut aut aut aut |
collection |
OLC |
remote_str |
false |
illustrated |
Not Illustrated |
issn |
1380-7501 |
topic_title |
070 004 VZ Age group classification and gender recognition from speech with temporal convolutional neural networks Interactive voice response Age estimation Gender recognition Human-robot interaction Machine learning |
topic |
ddc 070 misc Interactive voice response misc Age estimation misc Gender recognition misc Human-robot interaction misc Machine learning |
topic_unstemmed |
ddc 070 misc Interactive voice response misc Age estimation misc Gender recognition misc Human-robot interaction misc Machine learning |
topic_browse |
ddc 070 misc Interactive voice response misc Age estimation misc Gender recognition misc Human-robot interaction misc Machine learning |
format_facet |
Aufsätze Gedruckte Aufsätze |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
nc |
hierarchy_parent_title |
Multimedia tools and applications |
hierarchy_parent_id |
189064145 |
dewey-tens |
070 - News media, journalism & publishing 000 - Computer science, knowledge & systems |
hierarchy_top_title |
Multimedia tools and applications |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)189064145 (DE-600)1287642-2 (DE-576)052842126 |
title |
Age group classification and gender recognition from speech with temporal convolutional neural networks |
ctrlnum |
(DE-627)OLC2078099147 (DE-He213)s11042-021-11614-4-p |
title_full |
Age group classification and gender recognition from speech with temporal convolutional neural networks |
author_sort |
Sánchez-Hevia, Héctor A. |
journal |
Multimedia tools and applications |
journalStr |
Multimedia tools and applications |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
000 - Computer science, information & general works |
recordtype |
marc |
publishDateSort |
2022 |
contenttype_str_mv |
txt |
container_start_page |
3535 |
author_browse |
Sánchez-Hevia, Héctor A. Gil-Pita, Roberto Utrilla-Manso, Manuel Rosa-Zurera, Manuel |
container_volume |
81 |
class |
070 004 VZ |
format_se |
Aufsätze |
author-letter |
Sánchez-Hevia, Héctor A. |
doi_str_mv |
10.1007/s11042-021-11614-4 |
normlink |
(ORCID)0000-0002-3073-3278 |
normlink_prefix_str_mv |
(orcid)0000-0002-3073-3278 |
dewey-full |
070 004 |
title_sort |
age group classification and gender recognition from speech with temporal convolutional neural networks |
title_auth |
Age group classification and gender recognition from speech with temporal convolutional neural networks |
abstract |
Abstract This paper analyses the performance of different types of Deep Neural Networks to jointly estimate age and identify gender from speech, to be applied in Interactive Voice Response systems available in call centres. Deep Neural Networks are used, because they have recently demonstrated discriminative and representation capabilities in a wide range of applications, including speech processing problems based on feature extraction and selection. Networks with different sizes are analysed to obtain information on how performance depends on the network architecture and the number of free parameters. The speech corpus used for the experiments is Mozilla’s Common Voice dataset, an open and crowdsourced speech corpus. The results are really good for gender classification, independently of the type of neural network, but improve with the network size. Regarding the classification by age groups, the combination of convolutional neural networks and temporal neural networks seems to be the best option among the analysed, and again, the larger the size of the network, the better the results. The results are promising for use in IVR systems, with the best systems achieving a gender identification error of less than 2% and a classification error by age group of less than 20%. © The Author(s) 2021 |
abstractGer |
Abstract This paper analyses the performance of different types of Deep Neural Networks to jointly estimate age and identify gender from speech, to be applied in Interactive Voice Response systems available in call centres. Deep Neural Networks are used, because they have recently demonstrated discriminative and representation capabilities in a wide range of applications, including speech processing problems based on feature extraction and selection. Networks with different sizes are analysed to obtain information on how performance depends on the network architecture and the number of free parameters. The speech corpus used for the experiments is Mozilla’s Common Voice dataset, an open and crowdsourced speech corpus. The results are really good for gender classification, independently of the type of neural network, but improve with the network size. Regarding the classification by age groups, the combination of convolutional neural networks and temporal neural networks seems to be the best option among the analysed, and again, the larger the size of the network, the better the results. The results are promising for use in IVR systems, with the best systems achieving a gender identification error of less than 2% and a classification error by age group of less than 20%. © The Author(s) 2021 |
abstract_unstemmed |
Abstract This paper analyses the performance of different types of Deep Neural Networks to jointly estimate age and identify gender from speech, to be applied in Interactive Voice Response systems available in call centres. Deep Neural Networks are used, because they have recently demonstrated discriminative and representation capabilities in a wide range of applications, including speech processing problems based on feature extraction and selection. Networks with different sizes are analysed to obtain information on how performance depends on the network architecture and the number of free parameters. The speech corpus used for the experiments is Mozilla’s Common Voice dataset, an open and crowdsourced speech corpus. The results are really good for gender classification, independently of the type of neural network, but improve with the network size. Regarding the classification by age groups, the combination of convolutional neural networks and temporal neural networks seems to be the best option among the analysed, and again, the larger the size of the network, the better the results. The results are promising for use in IVR systems, with the best systems achieving a gender identification error of less than 2% and a classification error by age group of less than 20%. © The Author(s) 2021 |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OLC-MKW |
container_issue |
3 |
title_short |
Age group classification and gender recognition from speech with temporal convolutional neural networks |
url |
https://doi.org/10.1007/s11042-021-11614-4 |
remote_bool |
false |
author2 |
Gil-Pita, Roberto Utrilla-Manso, Manuel Rosa-Zurera, Manuel |
author2Str |
Gil-Pita, Roberto Utrilla-Manso, Manuel Rosa-Zurera, Manuel |
ppnlink |
189064145 |
mediatype_str_mv |
n |
isOA_txt |
false |
hochschulschrift_bool |
false |
doi_str |
10.1007/s11042-021-11614-4 |
up_date |
2024-07-03T18:48:47.905Z |
_version_ |
1803584827982610433 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2078099147</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230505225227.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">221220s2022 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s11042-021-11614-4</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2078099147</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s11042-021-11614-4-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">070</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Sánchez-Hevia, Héctor A.</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Age group classification and gender recognition from speech with temporal convolutional neural networks</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2022</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s) 2021</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract This paper analyses the performance of different types of Deep Neural Networks to jointly estimate age and identify gender from speech, to be applied in Interactive Voice Response systems available in call centres. Deep Neural Networks are used, because they have recently demonstrated discriminative and representation capabilities in a wide range of applications, including speech processing problems based on feature extraction and selection. Networks with different sizes are analysed to obtain information on how performance depends on the network architecture and the number of free parameters. The speech corpus used for the experiments is Mozilla’s Common Voice dataset, an open and crowdsourced speech corpus. The results are really good for gender classification, independently of the type of neural network, but improve with the network size. Regarding the classification by age groups, the combination of convolutional neural networks and temporal neural networks seems to be the best option among the analysed, and again, the larger the size of the network, the better the results. The results are promising for use in IVR systems, with the best systems achieving a gender identification error of less than 2% and a classification error by age group of less than 20%.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Interactive voice response</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Age estimation</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Gender recognition</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Human-robot interaction</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Machine learning</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Gil-Pita, Roberto</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Utrilla-Manso, Manuel</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Rosa-Zurera, Manuel</subfield><subfield code="0">(orcid)0000-0002-3073-3278</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Multimedia tools and applications</subfield><subfield code="d">Springer US, 1995</subfield><subfield code="g">81(2022), 3 vom: Jan., Seite 3535-3552</subfield><subfield code="w">(DE-627)189064145</subfield><subfield code="w">(DE-600)1287642-2</subfield><subfield code="w">(DE-576)052842126</subfield><subfield code="x">1380-7501</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:81</subfield><subfield code="g">year:2022</subfield><subfield code="g">number:3</subfield><subfield code="g">month:01</subfield><subfield code="g">pages:3535-3552</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s11042-021-11614-4</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MKW</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">81</subfield><subfield code="j">2022</subfield><subfield code="e">3</subfield><subfield code="c">01</subfield><subfield code="h">3535-3552</subfield></datafield></record></collection>
|
score |
7.4005814 |