Text Classification Using Compression-Based Dissimilarity Measures
Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be...
Ausführliche Beschreibung
Autor*in: |
Coutinho, David Pereira [verfasserIn] |
---|
Format: |
Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2015 |
---|
Rechteinformationen: |
Nutzungsrecht: © 2015, World Scientific Publishing Company |
---|
Schlagwörter: |
---|
Übergeordnetes Werk: |
Enthalten in: International journal of pattern recognition and artificial intelligence - Singapore [u.a.] : World Scientific Publ. Co., 1987, 29(2015), 5 |
---|---|
Übergeordnetes Werk: |
volume:29 ; year:2015 ; number:5 |
Links: |
---|
DOI / URN: |
10.1142/S0218001415530043 |
---|
Katalog-ID: |
OLC1957828625 |
---|
LEADER | 01000caa a2200265 4500 | ||
---|---|---|---|
001 | OLC1957828625 | ||
003 | DE-627 | ||
005 | 20220216100940.0 | ||
007 | tu | ||
008 | 160206s2015 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1142/S0218001415530043 |2 doi | |
028 | 5 | 2 | |a PQ20160617 |
035 | |a (DE-627)OLC1957828625 | ||
035 | |a (DE-599)GBVOLC1957828625 | ||
035 | |a (PRQ)s1013-5e0bc91cdde7fd18132dcc5c30608ca8f2461dd4e209cf89ef3f94ddaed093db0 | ||
035 | |a (KEY)0163438020150000029000500000textclassificationusingcompressionbaseddissimilari | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 510 |q ZDB |
084 | |a 54.72 |2 bkl | ||
084 | |a 54.74 |2 bkl | ||
100 | 1 | |a Coutinho, David Pereira |e verfasserin |4 aut | |
245 | 1 | 0 | |a Text Classification Using Compression-Based Dissimilarity Measures |
264 | 1 | |c 2015 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
520 | |a Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering. | ||
540 | |a Nutzungsrecht: © 2015, World Scientific Publishing Company | ||
650 | 4 | |a Document Analysis | |
700 | 1 | |a Figueiredo, Mário A. T |4 oth | |
773 | 0 | 8 | |i Enthalten in |t International journal of pattern recognition and artificial intelligence |d Singapore [u.a.] : World Scientific Publ. Co., 1987 |g 29(2015), 5 |w (DE-627)129238694 |w (DE-600)58282-7 |w (DE-576)018613543 |x 0218-0014 |7 nnns |
773 | 1 | 8 | |g volume:29 |g year:2015 |g number:5 |
856 | 4 | 1 | |u http://dx.doi.org/10.1142/S0218001415530043 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a SSG-OLC-TEC | ||
912 | |a SSG-OLC-MAT | ||
912 | |a GBV_ILN_60 | ||
912 | |a GBV_ILN_70 | ||
912 | |a GBV_ILN_4324 | ||
936 | b | k | |a 54.72 |q AVZ |
936 | b | k | |a 54.74 |q AVZ |
951 | |a AR | ||
952 | |d 29 |j 2015 |e 5 |
author_variant |
d p c dp dpc |
---|---|
matchkey_str |
article:02180014:2015----::etlsiiainsncmrsinaedsi |
hierarchy_sort_str |
2015 |
bklnumber |
54.72 54.74 |
publishDate |
2015 |
allfields |
10.1142/S0218001415530043 doi PQ20160617 (DE-627)OLC1957828625 (DE-599)GBVOLC1957828625 (PRQ)s1013-5e0bc91cdde7fd18132dcc5c30608ca8f2461dd4e209cf89ef3f94ddaed093db0 (KEY)0163438020150000029000500000textclassificationusingcompressionbaseddissimilari DE-627 ger DE-627 rakwb eng 510 ZDB 54.72 bkl 54.74 bkl Coutinho, David Pereira verfasserin aut Text Classification Using Compression-Based Dissimilarity Measures 2015 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering. Nutzungsrecht: © 2015, World Scientific Publishing Company Document Analysis Figueiredo, Mário A. T oth Enthalten in International journal of pattern recognition and artificial intelligence Singapore [u.a.] : World Scientific Publ. Co., 1987 29(2015), 5 (DE-627)129238694 (DE-600)58282-7 (DE-576)018613543 0218-0014 nnns volume:29 year:2015 number:5 http://dx.doi.org/10.1142/S0218001415530043 Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_60 GBV_ILN_70 GBV_ILN_4324 54.72 AVZ 54.74 AVZ AR 29 2015 5 |
spelling |
10.1142/S0218001415530043 doi PQ20160617 (DE-627)OLC1957828625 (DE-599)GBVOLC1957828625 (PRQ)s1013-5e0bc91cdde7fd18132dcc5c30608ca8f2461dd4e209cf89ef3f94ddaed093db0 (KEY)0163438020150000029000500000textclassificationusingcompressionbaseddissimilari DE-627 ger DE-627 rakwb eng 510 ZDB 54.72 bkl 54.74 bkl Coutinho, David Pereira verfasserin aut Text Classification Using Compression-Based Dissimilarity Measures 2015 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering. Nutzungsrecht: © 2015, World Scientific Publishing Company Document Analysis Figueiredo, Mário A. T oth Enthalten in International journal of pattern recognition and artificial intelligence Singapore [u.a.] : World Scientific Publ. Co., 1987 29(2015), 5 (DE-627)129238694 (DE-600)58282-7 (DE-576)018613543 0218-0014 nnns volume:29 year:2015 number:5 http://dx.doi.org/10.1142/S0218001415530043 Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_60 GBV_ILN_70 GBV_ILN_4324 54.72 AVZ 54.74 AVZ AR 29 2015 5 |
allfields_unstemmed |
10.1142/S0218001415530043 doi PQ20160617 (DE-627)OLC1957828625 (DE-599)GBVOLC1957828625 (PRQ)s1013-5e0bc91cdde7fd18132dcc5c30608ca8f2461dd4e209cf89ef3f94ddaed093db0 (KEY)0163438020150000029000500000textclassificationusingcompressionbaseddissimilari DE-627 ger DE-627 rakwb eng 510 ZDB 54.72 bkl 54.74 bkl Coutinho, David Pereira verfasserin aut Text Classification Using Compression-Based Dissimilarity Measures 2015 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering. Nutzungsrecht: © 2015, World Scientific Publishing Company Document Analysis Figueiredo, Mário A. T oth Enthalten in International journal of pattern recognition and artificial intelligence Singapore [u.a.] : World Scientific Publ. Co., 1987 29(2015), 5 (DE-627)129238694 (DE-600)58282-7 (DE-576)018613543 0218-0014 nnns volume:29 year:2015 number:5 http://dx.doi.org/10.1142/S0218001415530043 Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_60 GBV_ILN_70 GBV_ILN_4324 54.72 AVZ 54.74 AVZ AR 29 2015 5 |
allfieldsGer |
10.1142/S0218001415530043 doi PQ20160617 (DE-627)OLC1957828625 (DE-599)GBVOLC1957828625 (PRQ)s1013-5e0bc91cdde7fd18132dcc5c30608ca8f2461dd4e209cf89ef3f94ddaed093db0 (KEY)0163438020150000029000500000textclassificationusingcompressionbaseddissimilari DE-627 ger DE-627 rakwb eng 510 ZDB 54.72 bkl 54.74 bkl Coutinho, David Pereira verfasserin aut Text Classification Using Compression-Based Dissimilarity Measures 2015 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering. Nutzungsrecht: © 2015, World Scientific Publishing Company Document Analysis Figueiredo, Mário A. T oth Enthalten in International journal of pattern recognition and artificial intelligence Singapore [u.a.] : World Scientific Publ. Co., 1987 29(2015), 5 (DE-627)129238694 (DE-600)58282-7 (DE-576)018613543 0218-0014 nnns volume:29 year:2015 number:5 http://dx.doi.org/10.1142/S0218001415530043 Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_60 GBV_ILN_70 GBV_ILN_4324 54.72 AVZ 54.74 AVZ AR 29 2015 5 |
allfieldsSound |
10.1142/S0218001415530043 doi PQ20160617 (DE-627)OLC1957828625 (DE-599)GBVOLC1957828625 (PRQ)s1013-5e0bc91cdde7fd18132dcc5c30608ca8f2461dd4e209cf89ef3f94ddaed093db0 (KEY)0163438020150000029000500000textclassificationusingcompressionbaseddissimilari DE-627 ger DE-627 rakwb eng 510 ZDB 54.72 bkl 54.74 bkl Coutinho, David Pereira verfasserin aut Text Classification Using Compression-Based Dissimilarity Measures 2015 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering. Nutzungsrecht: © 2015, World Scientific Publishing Company Document Analysis Figueiredo, Mário A. T oth Enthalten in International journal of pattern recognition and artificial intelligence Singapore [u.a.] : World Scientific Publ. Co., 1987 29(2015), 5 (DE-627)129238694 (DE-600)58282-7 (DE-576)018613543 0218-0014 nnns volume:29 year:2015 number:5 http://dx.doi.org/10.1142/S0218001415530043 Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_60 GBV_ILN_70 GBV_ILN_4324 54.72 AVZ 54.74 AVZ AR 29 2015 5 |
language |
English |
source |
Enthalten in International journal of pattern recognition and artificial intelligence 29(2015), 5 volume:29 year:2015 number:5 |
sourceStr |
Enthalten in International journal of pattern recognition and artificial intelligence 29(2015), 5 volume:29 year:2015 number:5 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Document Analysis |
dewey-raw |
510 |
isfreeaccess_bool |
false |
container_title |
International journal of pattern recognition and artificial intelligence |
authorswithroles_txt_mv |
Coutinho, David Pereira @@aut@@ Figueiredo, Mário A. T @@oth@@ |
publishDateDaySort_date |
2015-01-01T00:00:00Z |
hierarchy_top_id |
129238694 |
dewey-sort |
3510 |
id |
OLC1957828625 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a2200265 4500</leader><controlfield tag="001">OLC1957828625</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20220216100940.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">160206s2015 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1142/S0218001415530043</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">PQ20160617</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC1957828625</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBVOLC1957828625</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(PRQ)s1013-5e0bc91cdde7fd18132dcc5c30608ca8f2461dd4e209cf89ef3f94ddaed093db0</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(KEY)0163438020150000029000500000textclassificationusingcompressionbaseddissimilari</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">510</subfield><subfield code="q">ZDB</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.72</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.74</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Coutinho, David Pereira</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Text Classification Using Compression-Based Dissimilarity Measures</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2015</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering.</subfield></datafield><datafield tag="540" ind1=" " ind2=" "><subfield code="a">Nutzungsrecht: © 2015, World Scientific Publishing Company</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Document Analysis</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Figueiredo, Mário A. T</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">International journal of pattern recognition and artificial intelligence</subfield><subfield code="d">Singapore [u.a.] : World Scientific Publ. Co., 1987</subfield><subfield code="g">29(2015), 5</subfield><subfield code="w">(DE-627)129238694</subfield><subfield code="w">(DE-600)58282-7</subfield><subfield code="w">(DE-576)018613543</subfield><subfield code="x">0218-0014</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:29</subfield><subfield code="g">year:2015</subfield><subfield code="g">number:5</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">http://dx.doi.org/10.1142/S0218001415530043</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_60</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.72</subfield><subfield code="q">AVZ</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.74</subfield><subfield code="q">AVZ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">29</subfield><subfield code="j">2015</subfield><subfield code="e">5</subfield></datafield></record></collection>
|
author |
Coutinho, David Pereira |
spellingShingle |
Coutinho, David Pereira ddc 510 bkl 54.72 bkl 54.74 misc Document Analysis Text Classification Using Compression-Based Dissimilarity Measures |
authorStr |
Coutinho, David Pereira |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)129238694 |
format |
Article |
dewey-ones |
510 - Mathematics |
delete_txt_mv |
keep |
author_role |
aut |
collection |
OLC |
remote_str |
false |
illustrated |
Not Illustrated |
issn |
0218-0014 |
topic_title |
510 ZDB 54.72 bkl 54.74 bkl Text Classification Using Compression-Based Dissimilarity Measures Document Analysis |
topic |
ddc 510 bkl 54.72 bkl 54.74 misc Document Analysis |
topic_unstemmed |
ddc 510 bkl 54.72 bkl 54.74 misc Document Analysis |
topic_browse |
ddc 510 bkl 54.72 bkl 54.74 misc Document Analysis |
format_facet |
Aufsätze Gedruckte Aufsätze |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
nc |
author2_variant |
m a t f mat matf |
hierarchy_parent_title |
International journal of pattern recognition and artificial intelligence |
hierarchy_parent_id |
129238694 |
dewey-tens |
510 - Mathematics |
hierarchy_top_title |
International journal of pattern recognition and artificial intelligence |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)129238694 (DE-600)58282-7 (DE-576)018613543 |
title |
Text Classification Using Compression-Based Dissimilarity Measures |
ctrlnum |
(DE-627)OLC1957828625 (DE-599)GBVOLC1957828625 (PRQ)s1013-5e0bc91cdde7fd18132dcc5c30608ca8f2461dd4e209cf89ef3f94ddaed093db0 (KEY)0163438020150000029000500000textclassificationusingcompressionbaseddissimilari |
title_full |
Text Classification Using Compression-Based Dissimilarity Measures |
author_sort |
Coutinho, David Pereira |
journal |
International journal of pattern recognition and artificial intelligence |
journalStr |
International journal of pattern recognition and artificial intelligence |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
500 - Science |
recordtype |
marc |
publishDateSort |
2015 |
contenttype_str_mv |
txt |
author_browse |
Coutinho, David Pereira |
container_volume |
29 |
class |
510 ZDB 54.72 bkl 54.74 bkl |
format_se |
Aufsätze |
author-letter |
Coutinho, David Pereira |
doi_str_mv |
10.1142/S0218001415530043 |
dewey-full |
510 |
title_sort |
text classification using compression-based dissimilarity measures |
title_auth |
Text Classification Using Compression-Based Dissimilarity Measures |
abstract |
Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering. |
abstractGer |
Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering. |
abstract_unstemmed |
Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering. |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_60 GBV_ILN_70 GBV_ILN_4324 |
container_issue |
5 |
title_short |
Text Classification Using Compression-Based Dissimilarity Measures |
url |
http://dx.doi.org/10.1142/S0218001415530043 |
remote_bool |
false |
author2 |
Figueiredo, Mário A. T |
author2Str |
Figueiredo, Mário A. T |
ppnlink |
129238694 |
mediatype_str_mv |
n |
isOA_txt |
false |
hochschulschrift_bool |
false |
author2_role |
oth |
doi_str |
10.1142/S0218001415530043 |
up_date |
2024-07-04T01:29:17.249Z |
_version_ |
1803610024574976001 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a2200265 4500</leader><controlfield tag="001">OLC1957828625</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20220216100940.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">160206s2015 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1142/S0218001415530043</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">PQ20160617</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC1957828625</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBVOLC1957828625</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(PRQ)s1013-5e0bc91cdde7fd18132dcc5c30608ca8f2461dd4e209cf89ef3f94ddaed093db0</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(KEY)0163438020150000029000500000textclassificationusingcompressionbaseddissimilari</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">510</subfield><subfield code="q">ZDB</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.72</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.74</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Coutinho, David Pereira</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Text Classification Using Compression-Based Dissimilarity Measures</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2015</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering.</subfield></datafield><datafield tag="540" ind1=" " ind2=" "><subfield code="a">Nutzungsrecht: © 2015, World Scientific Publishing Company</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Document Analysis</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Figueiredo, Mário A. T</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">International journal of pattern recognition and artificial intelligence</subfield><subfield code="d">Singapore [u.a.] : World Scientific Publ. Co., 1987</subfield><subfield code="g">29(2015), 5</subfield><subfield code="w">(DE-627)129238694</subfield><subfield code="w">(DE-600)58282-7</subfield><subfield code="w">(DE-576)018613543</subfield><subfield code="x">0218-0014</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:29</subfield><subfield code="g">year:2015</subfield><subfield code="g">number:5</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">http://dx.doi.org/10.1142/S0218001415530043</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_60</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.72</subfield><subfield code="q">AVZ</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.74</subfield><subfield code="q">AVZ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">29</subfield><subfield code="j">2015</subfield><subfield code="e">5</subfield></datafield></record></collection>
|
score |
7.4004736 |