Evaluating Various Tokenizers for Arabic Text Classification

Abstract The first step in any NLP pipeline is to split the text into individual tokens. The most obvious and straightforward approach is to use words as tokens. However, given a large text corpus, representing all the words is not efficient in terms of vocabulary size. In the literature, many token...
Ausführliche Beschreibung

Gespeichert in:
Autor*in:

Alyafeai, Zaid [verfasserIn]

Al-shaibani, Maged S.

Ghaleb, Mustafa

Ahmad, Irfan

Format:

E-Artikel

Sprache:

Englisch

Erschienen:

2022

Schlagwörter:

Text Tokenization

Arabic NLP

Text Classification

Sentiment Analysis

Poem-meter Classification

Anmerkung:

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Übergeordnetes Werk:

Enthalten in: Neural processing letters - Dordrecht [u.a.] : Springer Science + Business Media B.V, 1994, 55(2022), 3 vom: 18. Aug., Seite 2911-2933

Übergeordnetes Werk:

volume:55 ; year:2022 ; number:3 ; day:18 ; month:08 ; pages:2911-2933

Links:

Volltext

DOI / URN:

10.1007/s11063-022-10990-8

Katalog-ID:

SPR052193098

Nicht das Richtige dabei?

Schreiben Sie uns!