Generalisation and robustness investigation for facial and speech emotion recognition using bio-inspired spiking neural networks
Abstract Emotion recognition through facial expression and non-verbal speech represents an important area in affective computing. They have been extensively studied from classical feature extraction techniques to more recent deep learning approaches. However, most of these approaches face two major...
Ausführliche Beschreibung
Autor*in: |
Mansouri-Benssassi, Esma [verfasserIn] Ye, Juan [verfasserIn] |
---|
Format: |
E-Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2021 |
---|
Schlagwörter: |
---|
Übergeordnetes Werk: |
Enthalten in: Soft Computing - Springer-Verlag, 2003, 25(2021), 3 vom: 16. Jan., Seite 1717-1730 |
---|---|
Übergeordnetes Werk: |
volume:25 ; year:2021 ; number:3 ; day:16 ; month:01 ; pages:1717-1730 |
Links: |
---|
DOI / URN: |
10.1007/s00500-020-05501-7 |
---|
Katalog-ID: |
SPR043174310 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | SPR043174310 | ||
003 | DE-627 | ||
005 | 20210215092903.0 | ||
007 | cr uuu---uuuuu | ||
008 | 210215s2021 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1007/s00500-020-05501-7 |2 doi | |
035 | |a (DE-627)SPR043174310 | ||
035 | |a (DE-599)SPRs00500-020-05501-7-e | ||
035 | |a (SPR)s00500-020-05501-7-e | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Mansouri-Benssassi, Esma |e verfasserin |4 aut | |
245 | 1 | 0 | |a Generalisation and robustness investigation for facial and speech emotion recognition using bio-inspired spiking neural networks |
264 | 1 | |c 2021 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a Computermedien |b c |2 rdamedia | ||
338 | |a Online-Ressource |b cr |2 rdacarrier | ||
520 | |a Abstract Emotion recognition through facial expression and non-verbal speech represents an important area in affective computing. They have been extensively studied from classical feature extraction techniques to more recent deep learning approaches. However, most of these approaches face two major challenges: (1) robustness—in the face of degradation such as noise, can a model still make correct predictions? and (2) cross-dataset generalisation—when a model is trained on one dataset, can it be used to make inference on another dataset?. To directly address these challenges, we first propose the application of a spiking neural network (SNN) in predicting emotional states based on facial expression and speech data, then investigate, and compare their accuracy when facing data degradation or unseen new input. We evaluate our approach on third-party, publicly available datasets and compare to the state-of-the-art techniques. Our approach demonstrates robustness to noise, where it achieves an accuracy of 56.2% for facial expression recognition (FER) compared to 22.64% and 14.10% for CNN and SVM, respectively, when input images are degraded with the noise intensity of 0.5, and the highest accuracy of 74.3% for speech emotion recognition (SER) compared to 21.95% of CNN and 14.75% for SVM when audio white noise is applied. For generalisation, our approach achieves consistently high accuracy of 89% for FER and 70% for SER in cross-dataset evaluation and suggests that it can learn more effective feature representations, which lead to good generalisation of facial features and vocal characteristics across subjects. | ||
650 | 4 | |a Spiking neural network |7 (dpeaa)DE-He213 | |
650 | 4 | |a Facial emotion recognition |7 (dpeaa)DE-He213 | |
650 | 4 | |a Speech emotion recognition |7 (dpeaa)DE-He213 | |
650 | 4 | |a Unsupervised learning |7 (dpeaa)DE-He213 | |
700 | 1 | |a Ye, Juan |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Soft Computing |d Springer-Verlag, 2003 |g 25(2021), 3 vom: 16. Jan., Seite 1717-1730 |w (DE-627)SPR006469531 |7 nnns |
773 | 1 | 8 | |g volume:25 |g year:2021 |g number:3 |g day:16 |g month:01 |g pages:1717-1730 |
856 | 4 | 0 | |u https://dx.doi.org/10.1007/s00500-020-05501-7 |z kostenfrei |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_SPRINGER | ||
951 | |a AR | ||
952 | |d 25 |j 2021 |e 3 |b 16 |c 01 |h 1717-1730 |
author_variant |
e m b emb j y jy |
---|---|
matchkey_str |
mansouribenssassiesmayejuan:2021----:eeaiainnrbsnsivsiainofcaadpehmtorcgiinsnbo |
hierarchy_sort_str |
2021 |
publishDate |
2021 |
allfields |
10.1007/s00500-020-05501-7 doi (DE-627)SPR043174310 (DE-599)SPRs00500-020-05501-7-e (SPR)s00500-020-05501-7-e DE-627 ger DE-627 rakwb eng Mansouri-Benssassi, Esma verfasserin aut Generalisation and robustness investigation for facial and speech emotion recognition using bio-inspired spiking neural networks 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Abstract Emotion recognition through facial expression and non-verbal speech represents an important area in affective computing. They have been extensively studied from classical feature extraction techniques to more recent deep learning approaches. However, most of these approaches face two major challenges: (1) robustness—in the face of degradation such as noise, can a model still make correct predictions? and (2) cross-dataset generalisation—when a model is trained on one dataset, can it be used to make inference on another dataset?. To directly address these challenges, we first propose the application of a spiking neural network (SNN) in predicting emotional states based on facial expression and speech data, then investigate, and compare their accuracy when facing data degradation or unseen new input. We evaluate our approach on third-party, publicly available datasets and compare to the state-of-the-art techniques. Our approach demonstrates robustness to noise, where it achieves an accuracy of 56.2% for facial expression recognition (FER) compared to 22.64% and 14.10% for CNN and SVM, respectively, when input images are degraded with the noise intensity of 0.5, and the highest accuracy of 74.3% for speech emotion recognition (SER) compared to 21.95% of CNN and 14.75% for SVM when audio white noise is applied. For generalisation, our approach achieves consistently high accuracy of 89% for FER and 70% for SER in cross-dataset evaluation and suggests that it can learn more effective feature representations, which lead to good generalisation of facial features and vocal characteristics across subjects. Spiking neural network (dpeaa)DE-He213 Facial emotion recognition (dpeaa)DE-He213 Speech emotion recognition (dpeaa)DE-He213 Unsupervised learning (dpeaa)DE-He213 Ye, Juan verfasserin aut Enthalten in Soft Computing Springer-Verlag, 2003 25(2021), 3 vom: 16. Jan., Seite 1717-1730 (DE-627)SPR006469531 nnns volume:25 year:2021 number:3 day:16 month:01 pages:1717-1730 https://dx.doi.org/10.1007/s00500-020-05501-7 kostenfrei Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER AR 25 2021 3 16 01 1717-1730 |
spelling |
10.1007/s00500-020-05501-7 doi (DE-627)SPR043174310 (DE-599)SPRs00500-020-05501-7-e (SPR)s00500-020-05501-7-e DE-627 ger DE-627 rakwb eng Mansouri-Benssassi, Esma verfasserin aut Generalisation and robustness investigation for facial and speech emotion recognition using bio-inspired spiking neural networks 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Abstract Emotion recognition through facial expression and non-verbal speech represents an important area in affective computing. They have been extensively studied from classical feature extraction techniques to more recent deep learning approaches. However, most of these approaches face two major challenges: (1) robustness—in the face of degradation such as noise, can a model still make correct predictions? and (2) cross-dataset generalisation—when a model is trained on one dataset, can it be used to make inference on another dataset?. To directly address these challenges, we first propose the application of a spiking neural network (SNN) in predicting emotional states based on facial expression and speech data, then investigate, and compare their accuracy when facing data degradation or unseen new input. We evaluate our approach on third-party, publicly available datasets and compare to the state-of-the-art techniques. Our approach demonstrates robustness to noise, where it achieves an accuracy of 56.2% for facial expression recognition (FER) compared to 22.64% and 14.10% for CNN and SVM, respectively, when input images are degraded with the noise intensity of 0.5, and the highest accuracy of 74.3% for speech emotion recognition (SER) compared to 21.95% of CNN and 14.75% for SVM when audio white noise is applied. For generalisation, our approach achieves consistently high accuracy of 89% for FER and 70% for SER in cross-dataset evaluation and suggests that it can learn more effective feature representations, which lead to good generalisation of facial features and vocal characteristics across subjects. Spiking neural network (dpeaa)DE-He213 Facial emotion recognition (dpeaa)DE-He213 Speech emotion recognition (dpeaa)DE-He213 Unsupervised learning (dpeaa)DE-He213 Ye, Juan verfasserin aut Enthalten in Soft Computing Springer-Verlag, 2003 25(2021), 3 vom: 16. Jan., Seite 1717-1730 (DE-627)SPR006469531 nnns volume:25 year:2021 number:3 day:16 month:01 pages:1717-1730 https://dx.doi.org/10.1007/s00500-020-05501-7 kostenfrei Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER AR 25 2021 3 16 01 1717-1730 |
allfields_unstemmed |
10.1007/s00500-020-05501-7 doi (DE-627)SPR043174310 (DE-599)SPRs00500-020-05501-7-e (SPR)s00500-020-05501-7-e DE-627 ger DE-627 rakwb eng Mansouri-Benssassi, Esma verfasserin aut Generalisation and robustness investigation for facial and speech emotion recognition using bio-inspired spiking neural networks 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Abstract Emotion recognition through facial expression and non-verbal speech represents an important area in affective computing. They have been extensively studied from classical feature extraction techniques to more recent deep learning approaches. However, most of these approaches face two major challenges: (1) robustness—in the face of degradation such as noise, can a model still make correct predictions? and (2) cross-dataset generalisation—when a model is trained on one dataset, can it be used to make inference on another dataset?. To directly address these challenges, we first propose the application of a spiking neural network (SNN) in predicting emotional states based on facial expression and speech data, then investigate, and compare their accuracy when facing data degradation or unseen new input. We evaluate our approach on third-party, publicly available datasets and compare to the state-of-the-art techniques. Our approach demonstrates robustness to noise, where it achieves an accuracy of 56.2% for facial expression recognition (FER) compared to 22.64% and 14.10% for CNN and SVM, respectively, when input images are degraded with the noise intensity of 0.5, and the highest accuracy of 74.3% for speech emotion recognition (SER) compared to 21.95% of CNN and 14.75% for SVM when audio white noise is applied. For generalisation, our approach achieves consistently high accuracy of 89% for FER and 70% for SER in cross-dataset evaluation and suggests that it can learn more effective feature representations, which lead to good generalisation of facial features and vocal characteristics across subjects. Spiking neural network (dpeaa)DE-He213 Facial emotion recognition (dpeaa)DE-He213 Speech emotion recognition (dpeaa)DE-He213 Unsupervised learning (dpeaa)DE-He213 Ye, Juan verfasserin aut Enthalten in Soft Computing Springer-Verlag, 2003 25(2021), 3 vom: 16. Jan., Seite 1717-1730 (DE-627)SPR006469531 nnns volume:25 year:2021 number:3 day:16 month:01 pages:1717-1730 https://dx.doi.org/10.1007/s00500-020-05501-7 kostenfrei Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER AR 25 2021 3 16 01 1717-1730 |
allfieldsGer |
10.1007/s00500-020-05501-7 doi (DE-627)SPR043174310 (DE-599)SPRs00500-020-05501-7-e (SPR)s00500-020-05501-7-e DE-627 ger DE-627 rakwb eng Mansouri-Benssassi, Esma verfasserin aut Generalisation and robustness investigation for facial and speech emotion recognition using bio-inspired spiking neural networks 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Abstract Emotion recognition through facial expression and non-verbal speech represents an important area in affective computing. They have been extensively studied from classical feature extraction techniques to more recent deep learning approaches. However, most of these approaches face two major challenges: (1) robustness—in the face of degradation such as noise, can a model still make correct predictions? and (2) cross-dataset generalisation—when a model is trained on one dataset, can it be used to make inference on another dataset?. To directly address these challenges, we first propose the application of a spiking neural network (SNN) in predicting emotional states based on facial expression and speech data, then investigate, and compare their accuracy when facing data degradation or unseen new input. We evaluate our approach on third-party, publicly available datasets and compare to the state-of-the-art techniques. Our approach demonstrates robustness to noise, where it achieves an accuracy of 56.2% for facial expression recognition (FER) compared to 22.64% and 14.10% for CNN and SVM, respectively, when input images are degraded with the noise intensity of 0.5, and the highest accuracy of 74.3% for speech emotion recognition (SER) compared to 21.95% of CNN and 14.75% for SVM when audio white noise is applied. For generalisation, our approach achieves consistently high accuracy of 89% for FER and 70% for SER in cross-dataset evaluation and suggests that it can learn more effective feature representations, which lead to good generalisation of facial features and vocal characteristics across subjects. Spiking neural network (dpeaa)DE-He213 Facial emotion recognition (dpeaa)DE-He213 Speech emotion recognition (dpeaa)DE-He213 Unsupervised learning (dpeaa)DE-He213 Ye, Juan verfasserin aut Enthalten in Soft Computing Springer-Verlag, 2003 25(2021), 3 vom: 16. Jan., Seite 1717-1730 (DE-627)SPR006469531 nnns volume:25 year:2021 number:3 day:16 month:01 pages:1717-1730 https://dx.doi.org/10.1007/s00500-020-05501-7 kostenfrei Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER AR 25 2021 3 16 01 1717-1730 |
allfieldsSound |
10.1007/s00500-020-05501-7 doi (DE-627)SPR043174310 (DE-599)SPRs00500-020-05501-7-e (SPR)s00500-020-05501-7-e DE-627 ger DE-627 rakwb eng Mansouri-Benssassi, Esma verfasserin aut Generalisation and robustness investigation for facial and speech emotion recognition using bio-inspired spiking neural networks 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Abstract Emotion recognition through facial expression and non-verbal speech represents an important area in affective computing. They have been extensively studied from classical feature extraction techniques to more recent deep learning approaches. However, most of these approaches face two major challenges: (1) robustness—in the face of degradation such as noise, can a model still make correct predictions? and (2) cross-dataset generalisation—when a model is trained on one dataset, can it be used to make inference on another dataset?. To directly address these challenges, we first propose the application of a spiking neural network (SNN) in predicting emotional states based on facial expression and speech data, then investigate, and compare their accuracy when facing data degradation or unseen new input. We evaluate our approach on third-party, publicly available datasets and compare to the state-of-the-art techniques. Our approach demonstrates robustness to noise, where it achieves an accuracy of 56.2% for facial expression recognition (FER) compared to 22.64% and 14.10% for CNN and SVM, respectively, when input images are degraded with the noise intensity of 0.5, and the highest accuracy of 74.3% for speech emotion recognition (SER) compared to 21.95% of CNN and 14.75% for SVM when audio white noise is applied. For generalisation, our approach achieves consistently high accuracy of 89% for FER and 70% for SER in cross-dataset evaluation and suggests that it can learn more effective feature representations, which lead to good generalisation of facial features and vocal characteristics across subjects. Spiking neural network (dpeaa)DE-He213 Facial emotion recognition (dpeaa)DE-He213 Speech emotion recognition (dpeaa)DE-He213 Unsupervised learning (dpeaa)DE-He213 Ye, Juan verfasserin aut Enthalten in Soft Computing Springer-Verlag, 2003 25(2021), 3 vom: 16. Jan., Seite 1717-1730 (DE-627)SPR006469531 nnns volume:25 year:2021 number:3 day:16 month:01 pages:1717-1730 https://dx.doi.org/10.1007/s00500-020-05501-7 kostenfrei Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER AR 25 2021 3 16 01 1717-1730 |
language |
English |
source |
Enthalten in Soft Computing 25(2021), 3 vom: 16. Jan., Seite 1717-1730 volume:25 year:2021 number:3 day:16 month:01 pages:1717-1730 |
sourceStr |
Enthalten in Soft Computing 25(2021), 3 vom: 16. Jan., Seite 1717-1730 volume:25 year:2021 number:3 day:16 month:01 pages:1717-1730 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Spiking neural network Facial emotion recognition Speech emotion recognition Unsupervised learning |
isfreeaccess_bool |
true |
container_title |
Soft Computing |
authorswithroles_txt_mv |
Mansouri-Benssassi, Esma @@aut@@ Ye, Juan @@aut@@ |
publishDateDaySort_date |
2021-01-16T00:00:00Z |
hierarchy_top_id |
SPR006469531 |
id |
SPR043174310 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">SPR043174310</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20210215092903.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">210215s2021 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s00500-020-05501-7</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)SPR043174310</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)SPRs00500-020-05501-7-e</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(SPR)s00500-020-05501-7-e</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Mansouri-Benssassi, Esma</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Generalisation and robustness investigation for facial and speech emotion recognition using bio-inspired spiking neural networks</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2021</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Emotion recognition through facial expression and non-verbal speech represents an important area in affective computing. They have been extensively studied from classical feature extraction techniques to more recent deep learning approaches. However, most of these approaches face two major challenges: (1) robustness—in the face of degradation such as noise, can a model still make correct predictions? and (2) cross-dataset generalisation—when a model is trained on one dataset, can it be used to make inference on another dataset?. To directly address these challenges, we first propose the application of a spiking neural network (SNN) in predicting emotional states based on facial expression and speech data, then investigate, and compare their accuracy when facing data degradation or unseen new input. We evaluate our approach on third-party, publicly available datasets and compare to the state-of-the-art techniques. Our approach demonstrates robustness to noise, where it achieves an accuracy of 56.2% for facial expression recognition (FER) compared to 22.64% and 14.10% for CNN and SVM, respectively, when input images are degraded with the noise intensity of 0.5, and the highest accuracy of 74.3% for speech emotion recognition (SER) compared to 21.95% of CNN and 14.75% for SVM when audio white noise is applied. For generalisation, our approach achieves consistently high accuracy of 89% for FER and 70% for SER in cross-dataset evaluation and suggests that it can learn more effective feature representations, which lead to good generalisation of facial features and vocal characteristics across subjects.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Spiking neural network</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Facial emotion recognition</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Speech emotion recognition</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Unsupervised learning</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ye, Juan</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Soft Computing</subfield><subfield code="d">Springer-Verlag, 2003</subfield><subfield code="g">25(2021), 3 vom: 16. Jan., Seite 1717-1730</subfield><subfield code="w">(DE-627)SPR006469531</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:25</subfield><subfield code="g">year:2021</subfield><subfield code="g">number:3</subfield><subfield code="g">day:16</subfield><subfield code="g">month:01</subfield><subfield code="g">pages:1717-1730</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://dx.doi.org/10.1007/s00500-020-05501-7</subfield><subfield code="z">kostenfrei</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_SPRINGER</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">25</subfield><subfield code="j">2021</subfield><subfield code="e">3</subfield><subfield code="b">16</subfield><subfield code="c">01</subfield><subfield code="h">1717-1730</subfield></datafield></record></collection>
|
author |
Mansouri-Benssassi, Esma |
spellingShingle |
Mansouri-Benssassi, Esma misc Spiking neural network misc Facial emotion recognition misc Speech emotion recognition misc Unsupervised learning Generalisation and robustness investigation for facial and speech emotion recognition using bio-inspired spiking neural networks |
authorStr |
Mansouri-Benssassi, Esma |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)SPR006469531 |
format |
electronic Article |
delete_txt_mv |
keep |
author_role |
aut aut |
collection |
springer |
remote_str |
true |
illustrated |
Not Illustrated |
topic_title |
Generalisation and robustness investigation for facial and speech emotion recognition using bio-inspired spiking neural networks Spiking neural network (dpeaa)DE-He213 Facial emotion recognition (dpeaa)DE-He213 Speech emotion recognition (dpeaa)DE-He213 Unsupervised learning (dpeaa)DE-He213 |
topic |
misc Spiking neural network misc Facial emotion recognition misc Speech emotion recognition misc Unsupervised learning |
topic_unstemmed |
misc Spiking neural network misc Facial emotion recognition misc Speech emotion recognition misc Unsupervised learning |
topic_browse |
misc Spiking neural network misc Facial emotion recognition misc Speech emotion recognition misc Unsupervised learning |
format_facet |
Elektronische Aufsätze Aufsätze Elektronische Ressource |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
cr |
hierarchy_parent_title |
Soft Computing |
hierarchy_parent_id |
SPR006469531 |
hierarchy_top_title |
Soft Computing |
isfreeaccess_txt |
true |
familylinks_str_mv |
(DE-627)SPR006469531 |
title |
Generalisation and robustness investigation for facial and speech emotion recognition using bio-inspired spiking neural networks |
ctrlnum |
(DE-627)SPR043174310 (DE-599)SPRs00500-020-05501-7-e (SPR)s00500-020-05501-7-e |
title_full |
Generalisation and robustness investigation for facial and speech emotion recognition using bio-inspired spiking neural networks |
author_sort |
Mansouri-Benssassi, Esma |
journal |
Soft Computing |
journalStr |
Soft Computing |
lang_code |
eng |
isOA_bool |
true |
recordtype |
marc |
publishDateSort |
2021 |
contenttype_str_mv |
txt |
container_start_page |
1717 |
author_browse |
Mansouri-Benssassi, Esma Ye, Juan |
container_volume |
25 |
format_se |
Elektronische Aufsätze |
author-letter |
Mansouri-Benssassi, Esma |
doi_str_mv |
10.1007/s00500-020-05501-7 |
author2-role |
verfasserin |
title_sort |
generalisation and robustness investigation for facial and speech emotion recognition using bio-inspired spiking neural networks |
title_auth |
Generalisation and robustness investigation for facial and speech emotion recognition using bio-inspired spiking neural networks |
abstract |
Abstract Emotion recognition through facial expression and non-verbal speech represents an important area in affective computing. They have been extensively studied from classical feature extraction techniques to more recent deep learning approaches. However, most of these approaches face two major challenges: (1) robustness—in the face of degradation such as noise, can a model still make correct predictions? and (2) cross-dataset generalisation—when a model is trained on one dataset, can it be used to make inference on another dataset?. To directly address these challenges, we first propose the application of a spiking neural network (SNN) in predicting emotional states based on facial expression and speech data, then investigate, and compare their accuracy when facing data degradation or unseen new input. We evaluate our approach on third-party, publicly available datasets and compare to the state-of-the-art techniques. Our approach demonstrates robustness to noise, where it achieves an accuracy of 56.2% for facial expression recognition (FER) compared to 22.64% and 14.10% for CNN and SVM, respectively, when input images are degraded with the noise intensity of 0.5, and the highest accuracy of 74.3% for speech emotion recognition (SER) compared to 21.95% of CNN and 14.75% for SVM when audio white noise is applied. For generalisation, our approach achieves consistently high accuracy of 89% for FER and 70% for SER in cross-dataset evaluation and suggests that it can learn more effective feature representations, which lead to good generalisation of facial features and vocal characteristics across subjects. |
abstractGer |
Abstract Emotion recognition through facial expression and non-verbal speech represents an important area in affective computing. They have been extensively studied from classical feature extraction techniques to more recent deep learning approaches. However, most of these approaches face two major challenges: (1) robustness—in the face of degradation such as noise, can a model still make correct predictions? and (2) cross-dataset generalisation—when a model is trained on one dataset, can it be used to make inference on another dataset?. To directly address these challenges, we first propose the application of a spiking neural network (SNN) in predicting emotional states based on facial expression and speech data, then investigate, and compare their accuracy when facing data degradation or unseen new input. We evaluate our approach on third-party, publicly available datasets and compare to the state-of-the-art techniques. Our approach demonstrates robustness to noise, where it achieves an accuracy of 56.2% for facial expression recognition (FER) compared to 22.64% and 14.10% for CNN and SVM, respectively, when input images are degraded with the noise intensity of 0.5, and the highest accuracy of 74.3% for speech emotion recognition (SER) compared to 21.95% of CNN and 14.75% for SVM when audio white noise is applied. For generalisation, our approach achieves consistently high accuracy of 89% for FER and 70% for SER in cross-dataset evaluation and suggests that it can learn more effective feature representations, which lead to good generalisation of facial features and vocal characteristics across subjects. |
abstract_unstemmed |
Abstract Emotion recognition through facial expression and non-verbal speech represents an important area in affective computing. They have been extensively studied from classical feature extraction techniques to more recent deep learning approaches. However, most of these approaches face two major challenges: (1) robustness—in the face of degradation such as noise, can a model still make correct predictions? and (2) cross-dataset generalisation—when a model is trained on one dataset, can it be used to make inference on another dataset?. To directly address these challenges, we first propose the application of a spiking neural network (SNN) in predicting emotional states based on facial expression and speech data, then investigate, and compare their accuracy when facing data degradation or unseen new input. We evaluate our approach on third-party, publicly available datasets and compare to the state-of-the-art techniques. Our approach demonstrates robustness to noise, where it achieves an accuracy of 56.2% for facial expression recognition (FER) compared to 22.64% and 14.10% for CNN and SVM, respectively, when input images are degraded with the noise intensity of 0.5, and the highest accuracy of 74.3% for speech emotion recognition (SER) compared to 21.95% of CNN and 14.75% for SVM when audio white noise is applied. For generalisation, our approach achieves consistently high accuracy of 89% for FER and 70% for SER in cross-dataset evaluation and suggests that it can learn more effective feature representations, which lead to good generalisation of facial features and vocal characteristics across subjects. |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER |
container_issue |
3 |
title_short |
Generalisation and robustness investigation for facial and speech emotion recognition using bio-inspired spiking neural networks |
url |
https://dx.doi.org/10.1007/s00500-020-05501-7 |
remote_bool |
true |
author2 |
Ye, Juan |
author2Str |
Ye, Juan |
ppnlink |
SPR006469531 |
mediatype_str_mv |
c |
isOA_txt |
true |
hochschulschrift_bool |
false |
doi_str |
10.1007/s00500-020-05501-7 |
up_date |
2024-07-03T17:00:56.131Z |
_version_ |
1803578041839910913 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">SPR043174310</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20210215092903.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">210215s2021 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s00500-020-05501-7</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)SPR043174310</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)SPRs00500-020-05501-7-e</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(SPR)s00500-020-05501-7-e</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Mansouri-Benssassi, Esma</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Generalisation and robustness investigation for facial and speech emotion recognition using bio-inspired spiking neural networks</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2021</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Emotion recognition through facial expression and non-verbal speech represents an important area in affective computing. They have been extensively studied from classical feature extraction techniques to more recent deep learning approaches. However, most of these approaches face two major challenges: (1) robustness—in the face of degradation such as noise, can a model still make correct predictions? and (2) cross-dataset generalisation—when a model is trained on one dataset, can it be used to make inference on another dataset?. To directly address these challenges, we first propose the application of a spiking neural network (SNN) in predicting emotional states based on facial expression and speech data, then investigate, and compare their accuracy when facing data degradation or unseen new input. We evaluate our approach on third-party, publicly available datasets and compare to the state-of-the-art techniques. Our approach demonstrates robustness to noise, where it achieves an accuracy of 56.2% for facial expression recognition (FER) compared to 22.64% and 14.10% for CNN and SVM, respectively, when input images are degraded with the noise intensity of 0.5, and the highest accuracy of 74.3% for speech emotion recognition (SER) compared to 21.95% of CNN and 14.75% for SVM when audio white noise is applied. For generalisation, our approach achieves consistently high accuracy of 89% for FER and 70% for SER in cross-dataset evaluation and suggests that it can learn more effective feature representations, which lead to good generalisation of facial features and vocal characteristics across subjects.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Spiking neural network</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Facial emotion recognition</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Speech emotion recognition</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Unsupervised learning</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ye, Juan</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Soft Computing</subfield><subfield code="d">Springer-Verlag, 2003</subfield><subfield code="g">25(2021), 3 vom: 16. Jan., Seite 1717-1730</subfield><subfield code="w">(DE-627)SPR006469531</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:25</subfield><subfield code="g">year:2021</subfield><subfield code="g">number:3</subfield><subfield code="g">day:16</subfield><subfield code="g">month:01</subfield><subfield code="g">pages:1717-1730</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://dx.doi.org/10.1007/s00500-020-05501-7</subfield><subfield code="z">kostenfrei</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_SPRINGER</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">25</subfield><subfield code="j">2021</subfield><subfield code="e">3</subfield><subfield code="b">16</subfield><subfield code="c">01</subfield><subfield code="h">1717-1730</subfield></datafield></record></collection>
|
score |
7.399441 |