Topical Text Classification of Russian News: a Comparison of BERT and Standard Models
The paper is devoted to the single-label topical classification of Russian news. The author compares the BERT features and standard character, word and structure-level features as text models. Experiments with OpenCorpora show that the BERT model is superior to standard ones, and achieves good class...
Ausführliche Beschreibung
Autor*in: |
Ksenia Lagutina [verfasserIn] |
---|
Format: |
E-Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2022 |
---|
Schlagwörter: |
---|
Übergeordnetes Werk: |
In: Proceedings of the XXth Conference of Open Innovations Association FRUCT - FRUCT, 2017, 31(2022), 1, Seite 160-166 |
---|---|
Übergeordnetes Werk: |
volume:31 ; year:2022 ; number:1 ; pages:160-166 |
Links: |
Link aufrufen |
---|
DOI / URN: |
10.23919/FRUCT54823.2022.9770920 |
---|
Katalog-ID: |
DOAJ043087701 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | DOAJ043087701 | ||
003 | DE-627 | ||
005 | 20230308070122.0 | ||
007 | cr uuu---uuuuu | ||
008 | 230227s2022 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.23919/FRUCT54823.2022.9770920 |2 doi | |
035 | |a (DE-627)DOAJ043087701 | ||
035 | |a (DE-599)DOAJeecb2bf21f0e47808f75a9e0fbb090c5 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
050 | 0 | |a TK5101-6720 | |
100 | 0 | |a Ksenia Lagutina |e verfasserin |4 aut | |
245 | 1 | 0 | |a Topical Text Classification of Russian News: a Comparison of BERT and Standard Models |
264 | 1 | |c 2022 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a Computermedien |b c |2 rdamedia | ||
338 | |a Online-Ressource |b cr |2 rdacarrier | ||
520 | |a The paper is devoted to the single-label topical classification of Russian news. The author compares the BERT features and standard character, word and structure-level features as text models. Experiments with OpenCorpora show that the BERT model is superior to standard ones, and achieves good classification quality for a small dataset of long news. Comparison with the state-of-the-art research allows to consider BERT as a baseline for future investigations of analysis of texts in Russian. | ||
650 | 4 | |a news classification | |
650 | 4 | |a text classification | |
650 | 4 | |a bert | |
650 | 4 | |a topical classification | |
650 | 4 | |a russian text classification | |
650 | 4 | |a opencorpora | |
653 | 0 | |a Telecommunication | |
773 | 0 | 8 | |i In |t Proceedings of the XXth Conference of Open Innovations Association FRUCT |d FRUCT, 2017 |g 31(2022), 1, Seite 160-166 |w (DE-627)1760594334 |x 23430737 |7 nnns |
773 | 1 | 8 | |g volume:31 |g year:2022 |g number:1 |g pages:160-166 |
856 | 4 | 0 | |u https://doi.org/10.23919/FRUCT54823.2022.9770920 |z kostenfrei |
856 | 4 | 0 | |u https://doaj.org/article/eecb2bf21f0e47808f75a9e0fbb090c5 |z kostenfrei |
856 | 4 | 0 | |u https://www.fruct.org/publications/fruct31/files/Lag.pdf |z kostenfrei |
856 | 4 | 2 | |u https://doaj.org/toc/2305-7254 |y Journal toc |z kostenfrei |
856 | 4 | 2 | |u https://doaj.org/toc/2343-0737 |y Journal toc |z kostenfrei |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_DOAJ | ||
951 | |a AR | ||
952 | |d 31 |j 2022 |e 1 |h 160-166 |
author_variant |
k l kl |
---|---|
matchkey_str |
article:23430737:2022----::oiatxcasfctoorsinescmaioobr |
hierarchy_sort_str |
2022 |
callnumber-subject-code |
TK |
publishDate |
2022 |
allfields |
10.23919/FRUCT54823.2022.9770920 doi (DE-627)DOAJ043087701 (DE-599)DOAJeecb2bf21f0e47808f75a9e0fbb090c5 DE-627 ger DE-627 rakwb eng TK5101-6720 Ksenia Lagutina verfasserin aut Topical Text Classification of Russian News: a Comparison of BERT and Standard Models 2022 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier The paper is devoted to the single-label topical classification of Russian news. The author compares the BERT features and standard character, word and structure-level features as text models. Experiments with OpenCorpora show that the BERT model is superior to standard ones, and achieves good classification quality for a small dataset of long news. Comparison with the state-of-the-art research allows to consider BERT as a baseline for future investigations of analysis of texts in Russian. news classification text classification bert topical classification russian text classification opencorpora Telecommunication In Proceedings of the XXth Conference of Open Innovations Association FRUCT FRUCT, 2017 31(2022), 1, Seite 160-166 (DE-627)1760594334 23430737 nnns volume:31 year:2022 number:1 pages:160-166 https://doi.org/10.23919/FRUCT54823.2022.9770920 kostenfrei https://doaj.org/article/eecb2bf21f0e47808f75a9e0fbb090c5 kostenfrei https://www.fruct.org/publications/fruct31/files/Lag.pdf kostenfrei https://doaj.org/toc/2305-7254 Journal toc kostenfrei https://doaj.org/toc/2343-0737 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ AR 31 2022 1 160-166 |
spelling |
10.23919/FRUCT54823.2022.9770920 doi (DE-627)DOAJ043087701 (DE-599)DOAJeecb2bf21f0e47808f75a9e0fbb090c5 DE-627 ger DE-627 rakwb eng TK5101-6720 Ksenia Lagutina verfasserin aut Topical Text Classification of Russian News: a Comparison of BERT and Standard Models 2022 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier The paper is devoted to the single-label topical classification of Russian news. The author compares the BERT features and standard character, word and structure-level features as text models. Experiments with OpenCorpora show that the BERT model is superior to standard ones, and achieves good classification quality for a small dataset of long news. Comparison with the state-of-the-art research allows to consider BERT as a baseline for future investigations of analysis of texts in Russian. news classification text classification bert topical classification russian text classification opencorpora Telecommunication In Proceedings of the XXth Conference of Open Innovations Association FRUCT FRUCT, 2017 31(2022), 1, Seite 160-166 (DE-627)1760594334 23430737 nnns volume:31 year:2022 number:1 pages:160-166 https://doi.org/10.23919/FRUCT54823.2022.9770920 kostenfrei https://doaj.org/article/eecb2bf21f0e47808f75a9e0fbb090c5 kostenfrei https://www.fruct.org/publications/fruct31/files/Lag.pdf kostenfrei https://doaj.org/toc/2305-7254 Journal toc kostenfrei https://doaj.org/toc/2343-0737 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ AR 31 2022 1 160-166 |
allfields_unstemmed |
10.23919/FRUCT54823.2022.9770920 doi (DE-627)DOAJ043087701 (DE-599)DOAJeecb2bf21f0e47808f75a9e0fbb090c5 DE-627 ger DE-627 rakwb eng TK5101-6720 Ksenia Lagutina verfasserin aut Topical Text Classification of Russian News: a Comparison of BERT and Standard Models 2022 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier The paper is devoted to the single-label topical classification of Russian news. The author compares the BERT features and standard character, word and structure-level features as text models. Experiments with OpenCorpora show that the BERT model is superior to standard ones, and achieves good classification quality for a small dataset of long news. Comparison with the state-of-the-art research allows to consider BERT as a baseline for future investigations of analysis of texts in Russian. news classification text classification bert topical classification russian text classification opencorpora Telecommunication In Proceedings of the XXth Conference of Open Innovations Association FRUCT FRUCT, 2017 31(2022), 1, Seite 160-166 (DE-627)1760594334 23430737 nnns volume:31 year:2022 number:1 pages:160-166 https://doi.org/10.23919/FRUCT54823.2022.9770920 kostenfrei https://doaj.org/article/eecb2bf21f0e47808f75a9e0fbb090c5 kostenfrei https://www.fruct.org/publications/fruct31/files/Lag.pdf kostenfrei https://doaj.org/toc/2305-7254 Journal toc kostenfrei https://doaj.org/toc/2343-0737 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ AR 31 2022 1 160-166 |
allfieldsGer |
10.23919/FRUCT54823.2022.9770920 doi (DE-627)DOAJ043087701 (DE-599)DOAJeecb2bf21f0e47808f75a9e0fbb090c5 DE-627 ger DE-627 rakwb eng TK5101-6720 Ksenia Lagutina verfasserin aut Topical Text Classification of Russian News: a Comparison of BERT and Standard Models 2022 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier The paper is devoted to the single-label topical classification of Russian news. The author compares the BERT features and standard character, word and structure-level features as text models. Experiments with OpenCorpora show that the BERT model is superior to standard ones, and achieves good classification quality for a small dataset of long news. Comparison with the state-of-the-art research allows to consider BERT as a baseline for future investigations of analysis of texts in Russian. news classification text classification bert topical classification russian text classification opencorpora Telecommunication In Proceedings of the XXth Conference of Open Innovations Association FRUCT FRUCT, 2017 31(2022), 1, Seite 160-166 (DE-627)1760594334 23430737 nnns volume:31 year:2022 number:1 pages:160-166 https://doi.org/10.23919/FRUCT54823.2022.9770920 kostenfrei https://doaj.org/article/eecb2bf21f0e47808f75a9e0fbb090c5 kostenfrei https://www.fruct.org/publications/fruct31/files/Lag.pdf kostenfrei https://doaj.org/toc/2305-7254 Journal toc kostenfrei https://doaj.org/toc/2343-0737 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ AR 31 2022 1 160-166 |
allfieldsSound |
10.23919/FRUCT54823.2022.9770920 doi (DE-627)DOAJ043087701 (DE-599)DOAJeecb2bf21f0e47808f75a9e0fbb090c5 DE-627 ger DE-627 rakwb eng TK5101-6720 Ksenia Lagutina verfasserin aut Topical Text Classification of Russian News: a Comparison of BERT and Standard Models 2022 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier The paper is devoted to the single-label topical classification of Russian news. The author compares the BERT features and standard character, word and structure-level features as text models. Experiments with OpenCorpora show that the BERT model is superior to standard ones, and achieves good classification quality for a small dataset of long news. Comparison with the state-of-the-art research allows to consider BERT as a baseline for future investigations of analysis of texts in Russian. news classification text classification bert topical classification russian text classification opencorpora Telecommunication In Proceedings of the XXth Conference of Open Innovations Association FRUCT FRUCT, 2017 31(2022), 1, Seite 160-166 (DE-627)1760594334 23430737 nnns volume:31 year:2022 number:1 pages:160-166 https://doi.org/10.23919/FRUCT54823.2022.9770920 kostenfrei https://doaj.org/article/eecb2bf21f0e47808f75a9e0fbb090c5 kostenfrei https://www.fruct.org/publications/fruct31/files/Lag.pdf kostenfrei https://doaj.org/toc/2305-7254 Journal toc kostenfrei https://doaj.org/toc/2343-0737 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ AR 31 2022 1 160-166 |
language |
English |
source |
In Proceedings of the XXth Conference of Open Innovations Association FRUCT 31(2022), 1, Seite 160-166 volume:31 year:2022 number:1 pages:160-166 |
sourceStr |
In Proceedings of the XXth Conference of Open Innovations Association FRUCT 31(2022), 1, Seite 160-166 volume:31 year:2022 number:1 pages:160-166 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
news classification text classification bert topical classification russian text classification opencorpora Telecommunication |
isfreeaccess_bool |
true |
container_title |
Proceedings of the XXth Conference of Open Innovations Association FRUCT |
authorswithroles_txt_mv |
Ksenia Lagutina @@aut@@ |
publishDateDaySort_date |
2022-01-01T00:00:00Z |
hierarchy_top_id |
1760594334 |
id |
DOAJ043087701 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">DOAJ043087701</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230308070122.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">230227s2022 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.23919/FRUCT54823.2022.9770920</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)DOAJ043087701</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DOAJeecb2bf21f0e47808f75a9e0fbb090c5</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">TK5101-6720</subfield></datafield><datafield tag="100" ind1="0" ind2=" "><subfield code="a">Ksenia Lagutina</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Topical Text Classification of Russian News: a Comparison of BERT and Standard Models</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2022</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">The paper is devoted to the single-label topical classification of Russian news. The author compares the BERT features and standard character, word and structure-level features as text models. Experiments with OpenCorpora show that the BERT model is superior to standard ones, and achieves good classification quality for a small dataset of long news. Comparison with the state-of-the-art research allows to consider BERT as a baseline for future investigations of analysis of texts in Russian.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">news classification</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">text classification</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">bert</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">topical classification</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">russian text classification</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">opencorpora</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Telecommunication</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">In</subfield><subfield code="t">Proceedings of the XXth Conference of Open Innovations Association FRUCT</subfield><subfield code="d">FRUCT, 2017</subfield><subfield code="g">31(2022), 1, Seite 160-166</subfield><subfield code="w">(DE-627)1760594334</subfield><subfield code="x">23430737</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:31</subfield><subfield code="g">year:2022</subfield><subfield code="g">number:1</subfield><subfield code="g">pages:160-166</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.23919/FRUCT54823.2022.9770920</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doaj.org/article/eecb2bf21f0e47808f75a9e0fbb090c5</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://www.fruct.org/publications/fruct31/files/Lag.pdf</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">https://doaj.org/toc/2305-7254</subfield><subfield code="y">Journal toc</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">https://doaj.org/toc/2343-0737</subfield><subfield code="y">Journal toc</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_DOAJ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">31</subfield><subfield code="j">2022</subfield><subfield code="e">1</subfield><subfield code="h">160-166</subfield></datafield></record></collection>
|
callnumber-first |
T - Technology |
author |
Ksenia Lagutina |
spellingShingle |
Ksenia Lagutina misc TK5101-6720 misc news classification misc text classification misc bert misc topical classification misc russian text classification misc opencorpora misc Telecommunication Topical Text Classification of Russian News: a Comparison of BERT and Standard Models |
authorStr |
Ksenia Lagutina |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)1760594334 |
format |
electronic Article |
delete_txt_mv |
keep |
author_role |
aut |
collection |
DOAJ |
remote_str |
true |
callnumber-label |
TK5101-6720 |
illustrated |
Not Illustrated |
issn |
23430737 |
topic_title |
TK5101-6720 Topical Text Classification of Russian News: a Comparison of BERT and Standard Models news classification text classification bert topical classification russian text classification opencorpora |
topic |
misc TK5101-6720 misc news classification misc text classification misc bert misc topical classification misc russian text classification misc opencorpora misc Telecommunication |
topic_unstemmed |
misc TK5101-6720 misc news classification misc text classification misc bert misc topical classification misc russian text classification misc opencorpora misc Telecommunication |
topic_browse |
misc TK5101-6720 misc news classification misc text classification misc bert misc topical classification misc russian text classification misc opencorpora misc Telecommunication |
format_facet |
Elektronische Aufsätze Aufsätze Elektronische Ressource |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
cr |
hierarchy_parent_title |
Proceedings of the XXth Conference of Open Innovations Association FRUCT |
hierarchy_parent_id |
1760594334 |
hierarchy_top_title |
Proceedings of the XXth Conference of Open Innovations Association FRUCT |
isfreeaccess_txt |
true |
familylinks_str_mv |
(DE-627)1760594334 |
title |
Topical Text Classification of Russian News: a Comparison of BERT and Standard Models |
ctrlnum |
(DE-627)DOAJ043087701 (DE-599)DOAJeecb2bf21f0e47808f75a9e0fbb090c5 |
title_full |
Topical Text Classification of Russian News: a Comparison of BERT and Standard Models |
author_sort |
Ksenia Lagutina |
journal |
Proceedings of the XXth Conference of Open Innovations Association FRUCT |
journalStr |
Proceedings of the XXth Conference of Open Innovations Association FRUCT |
callnumber-first-code |
T |
lang_code |
eng |
isOA_bool |
true |
recordtype |
marc |
publishDateSort |
2022 |
contenttype_str_mv |
txt |
container_start_page |
160 |
author_browse |
Ksenia Lagutina |
container_volume |
31 |
class |
TK5101-6720 |
format_se |
Elektronische Aufsätze |
author-letter |
Ksenia Lagutina |
doi_str_mv |
10.23919/FRUCT54823.2022.9770920 |
title_sort |
topical text classification of russian news: a comparison of bert and standard models |
callnumber |
TK5101-6720 |
title_auth |
Topical Text Classification of Russian News: a Comparison of BERT and Standard Models |
abstract |
The paper is devoted to the single-label topical classification of Russian news. The author compares the BERT features and standard character, word and structure-level features as text models. Experiments with OpenCorpora show that the BERT model is superior to standard ones, and achieves good classification quality for a small dataset of long news. Comparison with the state-of-the-art research allows to consider BERT as a baseline for future investigations of analysis of texts in Russian. |
abstractGer |
The paper is devoted to the single-label topical classification of Russian news. The author compares the BERT features and standard character, word and structure-level features as text models. Experiments with OpenCorpora show that the BERT model is superior to standard ones, and achieves good classification quality for a small dataset of long news. Comparison with the state-of-the-art research allows to consider BERT as a baseline for future investigations of analysis of texts in Russian. |
abstract_unstemmed |
The paper is devoted to the single-label topical classification of Russian news. The author compares the BERT features and standard character, word and structure-level features as text models. Experiments with OpenCorpora show that the BERT model is superior to standard ones, and achieves good classification quality for a small dataset of long news. Comparison with the state-of-the-art research allows to consider BERT as a baseline for future investigations of analysis of texts in Russian. |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ |
container_issue |
1 |
title_short |
Topical Text Classification of Russian News: a Comparison of BERT and Standard Models |
url |
https://doi.org/10.23919/FRUCT54823.2022.9770920 https://doaj.org/article/eecb2bf21f0e47808f75a9e0fbb090c5 https://www.fruct.org/publications/fruct31/files/Lag.pdf https://doaj.org/toc/2305-7254 https://doaj.org/toc/2343-0737 |
remote_bool |
true |
ppnlink |
1760594334 |
callnumber-subject |
TK - Electrical and Nuclear Engineering |
mediatype_str_mv |
c |
isOA_txt |
true |
hochschulschrift_bool |
false |
doi_str |
10.23919/FRUCT54823.2022.9770920 |
callnumber-a |
TK5101-6720 |
up_date |
2024-07-03T15:38:41.661Z |
_version_ |
1803572867670999040 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">DOAJ043087701</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230308070122.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">230227s2022 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.23919/FRUCT54823.2022.9770920</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)DOAJ043087701</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DOAJeecb2bf21f0e47808f75a9e0fbb090c5</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">TK5101-6720</subfield></datafield><datafield tag="100" ind1="0" ind2=" "><subfield code="a">Ksenia Lagutina</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Topical Text Classification of Russian News: a Comparison of BERT and Standard Models</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2022</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">The paper is devoted to the single-label topical classification of Russian news. The author compares the BERT features and standard character, word and structure-level features as text models. Experiments with OpenCorpora show that the BERT model is superior to standard ones, and achieves good classification quality for a small dataset of long news. Comparison with the state-of-the-art research allows to consider BERT as a baseline for future investigations of analysis of texts in Russian.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">news classification</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">text classification</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">bert</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">topical classification</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">russian text classification</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">opencorpora</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Telecommunication</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">In</subfield><subfield code="t">Proceedings of the XXth Conference of Open Innovations Association FRUCT</subfield><subfield code="d">FRUCT, 2017</subfield><subfield code="g">31(2022), 1, Seite 160-166</subfield><subfield code="w">(DE-627)1760594334</subfield><subfield code="x">23430737</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:31</subfield><subfield code="g">year:2022</subfield><subfield code="g">number:1</subfield><subfield code="g">pages:160-166</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.23919/FRUCT54823.2022.9770920</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doaj.org/article/eecb2bf21f0e47808f75a9e0fbb090c5</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://www.fruct.org/publications/fruct31/files/Lag.pdf</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">https://doaj.org/toc/2305-7254</subfield><subfield code="y">Journal toc</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">https://doaj.org/toc/2343-0737</subfield><subfield code="y">Journal toc</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_DOAJ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">31</subfield><subfield code="j">2022</subfield><subfield code="e">1</subfield><subfield code="h">160-166</subfield></datafield></record></collection>
|
score |
7.3980417 |