Towards website domain name classification using graph based semi-supervised learning
In this work, we tackle the problem of classifying websites domain names to a category, e.g., mapping bbc.com to the ”News and Media” class. Domain name classification is challenging due to the high number of class labels and the highly skewed class distributions. Differently from prior efforts that...
Ausführliche Beschreibung
Autor*in: |
Faroughi, Azadeh [verfasserIn] |
---|
Format: |
E-Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2021transfer abstract |
---|
Schlagwörter: |
---|
Übergeordnetes Werk: |
Enthalten in: Pharmacokinetics of the Antifibrotic Drug Pirfenidone in Child Pugh A and B Cirrhotic Patients Compared to Healthy Age-Matched Controls - Poo, J.L. ELSEVIER, 2016, the international journal of computer and telecommunications networking, Amsterdam [u.a.] |
---|---|
Übergeordnetes Werk: |
volume:188 ; year:2021 ; day:7 ; month:04 ; pages:0 |
Links: |
---|
DOI / URN: |
10.1016/j.comnet.2021.107865 |
---|
Katalog-ID: |
ELV053338790 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | ELV053338790 | ||
003 | DE-627 | ||
005 | 20230626034651.0 | ||
007 | cr uuu---uuuuu | ||
008 | 210910s2021 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1016/j.comnet.2021.107865 |2 doi | |
028 | 5 | 2 | |a /cbs_pica/cbs_olc/import_discovery/elsevier/einzuspielen/GBV00000000001321.pica |
035 | |a (DE-627)ELV053338790 | ||
035 | |a (ELSEVIER)S1389-1286(21)00038-4 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 610 |q VZ |
082 | 0 | 4 | |a 610 |q VZ |
084 | |a 44.44 |2 bkl | ||
100 | 1 | |a Faroughi, Azadeh |e verfasserin |4 aut | |
245 | 1 | 0 | |a Towards website domain name classification using graph based semi-supervised learning |
264 | 1 | |c 2021transfer abstract | |
336 | |a nicht spezifiziert |b zzz |2 rdacontent | ||
337 | |a nicht spezifiziert |b z |2 rdamedia | ||
338 | |a nicht spezifiziert |b zu |2 rdacarrier | ||
520 | |a In this work, we tackle the problem of classifying websites domain names to a category, e.g., mapping bbc.com to the ”News and Media” class. Domain name classification is challenging due to the high number of class labels and the highly skewed class distributions. Differently from prior efforts that need to crawl and use the web pages’ actual content, we rely only on traffic logs passively collected, observing traffic regularly flowing in the network, without the burden to crawl and parse web pages. We exploit the information carried by network logs, using just the name of the websites and the sequence of visited websites by users. For this, we propose and evaluate different classification methods based on machine learning. Using a large dataset with hundreds of thousands of domain names and 25 different categories, we show that semi-supervised learning methods are more suitable for this task than traditional supervised approaches. Using graphs, we incorporate in the classifier aspects not strictly related to the labeled data, and we can classify most of the unlabeled domains. However, in this framework, classification scores are lower than those usually found when exploiting the page-specific content. Our work is the first to perform an extensive evaluation of domain name classification using only passive flow-level logs to the best of our knowledge. | ||
520 | |a In this work, we tackle the problem of classifying websites domain names to a category, e.g., mapping bbc.com to the ”News and Media” class. Domain name classification is challenging due to the high number of class labels and the highly skewed class distributions. Differently from prior efforts that need to crawl and use the web pages’ actual content, we rely only on traffic logs passively collected, observing traffic regularly flowing in the network, without the burden to crawl and parse web pages. We exploit the information carried by network logs, using just the name of the websites and the sequence of visited websites by users. For this, we propose and evaluate different classification methods based on machine learning. Using a large dataset with hundreds of thousands of domain names and 25 different categories, we show that semi-supervised learning methods are more suitable for this task than traditional supervised approaches. Using graphs, we incorporate in the classifier aspects not strictly related to the labeled data, and we can classify most of the unlabeled domains. However, in this framework, classification scores are lower than those usually found when exploiting the page-specific content. Our work is the first to perform an extensive evaluation of domain name classification using only passive flow-level logs to the best of our knowledge. | ||
650 | 7 | |a Domain names |2 Elsevier | |
650 | 7 | |a Network measurements |2 Elsevier | |
650 | 7 | |a Semi-supervised learning |2 Elsevier | |
650 | 7 | |a Classification |2 Elsevier | |
650 | 7 | |a Passive measurements |2 Elsevier | |
700 | 1 | |a Morichetta, Andrea |4 oth | |
700 | 1 | |a Vassio, Luca |4 oth | |
700 | 1 | |a Figueiredo, Flavio |4 oth | |
700 | 1 | |a Mellia, Marco |4 oth | |
700 | 1 | |a Javidan, Reza |4 oth | |
773 | 0 | 8 | |i Enthalten in |n Elsevier |a Poo, J.L. ELSEVIER |t Pharmacokinetics of the Antifibrotic Drug Pirfenidone in Child Pugh A and B Cirrhotic Patients Compared to Healthy Age-Matched Controls |d 2016 |d the international journal of computer and telecommunications networking |g Amsterdam [u.a.] |w (DE-627)ELV013796984 |
773 | 1 | 8 | |g volume:188 |g year:2021 |g day:7 |g month:04 |g pages:0 |
856 | 4 | 0 | |u https://doi.org/10.1016/j.comnet.2021.107865 |3 Volltext |
912 | |a GBV_USEFLAG_U | ||
912 | |a GBV_ELV | ||
912 | |a SYSFLAG_U | ||
912 | |a SSG-OLC-PHA | ||
912 | |a GBV_ILN_40 | ||
936 | b | k | |a 44.44 |j Parasitologie |x Medizin |q VZ |
951 | |a AR | ||
952 | |d 188 |j 2021 |b 7 |c 0407 |h 0 |
author_variant |
a f af |
---|---|
matchkey_str |
faroughiazadehmorichettaandreavassioluca:2021----:oadwbieoanaelsiiainsngahaes |
hierarchy_sort_str |
2021transfer abstract |
bklnumber |
44.44 |
publishDate |
2021 |
allfields |
10.1016/j.comnet.2021.107865 doi /cbs_pica/cbs_olc/import_discovery/elsevier/einzuspielen/GBV00000000001321.pica (DE-627)ELV053338790 (ELSEVIER)S1389-1286(21)00038-4 DE-627 ger DE-627 rakwb eng 610 VZ 610 VZ 44.44 bkl Faroughi, Azadeh verfasserin aut Towards website domain name classification using graph based semi-supervised learning 2021transfer abstract nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier In this work, we tackle the problem of classifying websites domain names to a category, e.g., mapping bbc.com to the ”News and Media” class. Domain name classification is challenging due to the high number of class labels and the highly skewed class distributions. Differently from prior efforts that need to crawl and use the web pages’ actual content, we rely only on traffic logs passively collected, observing traffic regularly flowing in the network, without the burden to crawl and parse web pages. We exploit the information carried by network logs, using just the name of the websites and the sequence of visited websites by users. For this, we propose and evaluate different classification methods based on machine learning. Using a large dataset with hundreds of thousands of domain names and 25 different categories, we show that semi-supervised learning methods are more suitable for this task than traditional supervised approaches. Using graphs, we incorporate in the classifier aspects not strictly related to the labeled data, and we can classify most of the unlabeled domains. However, in this framework, classification scores are lower than those usually found when exploiting the page-specific content. Our work is the first to perform an extensive evaluation of domain name classification using only passive flow-level logs to the best of our knowledge. In this work, we tackle the problem of classifying websites domain names to a category, e.g., mapping bbc.com to the ”News and Media” class. Domain name classification is challenging due to the high number of class labels and the highly skewed class distributions. Differently from prior efforts that need to crawl and use the web pages’ actual content, we rely only on traffic logs passively collected, observing traffic regularly flowing in the network, without the burden to crawl and parse web pages. We exploit the information carried by network logs, using just the name of the websites and the sequence of visited websites by users. For this, we propose and evaluate different classification methods based on machine learning. Using a large dataset with hundreds of thousands of domain names and 25 different categories, we show that semi-supervised learning methods are more suitable for this task than traditional supervised approaches. Using graphs, we incorporate in the classifier aspects not strictly related to the labeled data, and we can classify most of the unlabeled domains. However, in this framework, classification scores are lower than those usually found when exploiting the page-specific content. Our work is the first to perform an extensive evaluation of domain name classification using only passive flow-level logs to the best of our knowledge. Domain names Elsevier Network measurements Elsevier Semi-supervised learning Elsevier Classification Elsevier Passive measurements Elsevier Morichetta, Andrea oth Vassio, Luca oth Figueiredo, Flavio oth Mellia, Marco oth Javidan, Reza oth Enthalten in Elsevier Poo, J.L. ELSEVIER Pharmacokinetics of the Antifibrotic Drug Pirfenidone in Child Pugh A and B Cirrhotic Patients Compared to Healthy Age-Matched Controls 2016 the international journal of computer and telecommunications networking Amsterdam [u.a.] (DE-627)ELV013796984 volume:188 year:2021 day:7 month:04 pages:0 https://doi.org/10.1016/j.comnet.2021.107865 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U SSG-OLC-PHA GBV_ILN_40 44.44 Parasitologie Medizin VZ AR 188 2021 7 0407 0 |
spelling |
10.1016/j.comnet.2021.107865 doi /cbs_pica/cbs_olc/import_discovery/elsevier/einzuspielen/GBV00000000001321.pica (DE-627)ELV053338790 (ELSEVIER)S1389-1286(21)00038-4 DE-627 ger DE-627 rakwb eng 610 VZ 610 VZ 44.44 bkl Faroughi, Azadeh verfasserin aut Towards website domain name classification using graph based semi-supervised learning 2021transfer abstract nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier In this work, we tackle the problem of classifying websites domain names to a category, e.g., mapping bbc.com to the ”News and Media” class. Domain name classification is challenging due to the high number of class labels and the highly skewed class distributions. Differently from prior efforts that need to crawl and use the web pages’ actual content, we rely only on traffic logs passively collected, observing traffic regularly flowing in the network, without the burden to crawl and parse web pages. We exploit the information carried by network logs, using just the name of the websites and the sequence of visited websites by users. For this, we propose and evaluate different classification methods based on machine learning. Using a large dataset with hundreds of thousands of domain names and 25 different categories, we show that semi-supervised learning methods are more suitable for this task than traditional supervised approaches. Using graphs, we incorporate in the classifier aspects not strictly related to the labeled data, and we can classify most of the unlabeled domains. However, in this framework, classification scores are lower than those usually found when exploiting the page-specific content. Our work is the first to perform an extensive evaluation of domain name classification using only passive flow-level logs to the best of our knowledge. In this work, we tackle the problem of classifying websites domain names to a category, e.g., mapping bbc.com to the ”News and Media” class. Domain name classification is challenging due to the high number of class labels and the highly skewed class distributions. Differently from prior efforts that need to crawl and use the web pages’ actual content, we rely only on traffic logs passively collected, observing traffic regularly flowing in the network, without the burden to crawl and parse web pages. We exploit the information carried by network logs, using just the name of the websites and the sequence of visited websites by users. For this, we propose and evaluate different classification methods based on machine learning. Using a large dataset with hundreds of thousands of domain names and 25 different categories, we show that semi-supervised learning methods are more suitable for this task than traditional supervised approaches. Using graphs, we incorporate in the classifier aspects not strictly related to the labeled data, and we can classify most of the unlabeled domains. However, in this framework, classification scores are lower than those usually found when exploiting the page-specific content. Our work is the first to perform an extensive evaluation of domain name classification using only passive flow-level logs to the best of our knowledge. Domain names Elsevier Network measurements Elsevier Semi-supervised learning Elsevier Classification Elsevier Passive measurements Elsevier Morichetta, Andrea oth Vassio, Luca oth Figueiredo, Flavio oth Mellia, Marco oth Javidan, Reza oth Enthalten in Elsevier Poo, J.L. ELSEVIER Pharmacokinetics of the Antifibrotic Drug Pirfenidone in Child Pugh A and B Cirrhotic Patients Compared to Healthy Age-Matched Controls 2016 the international journal of computer and telecommunications networking Amsterdam [u.a.] (DE-627)ELV013796984 volume:188 year:2021 day:7 month:04 pages:0 https://doi.org/10.1016/j.comnet.2021.107865 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U SSG-OLC-PHA GBV_ILN_40 44.44 Parasitologie Medizin VZ AR 188 2021 7 0407 0 |
allfields_unstemmed |
10.1016/j.comnet.2021.107865 doi /cbs_pica/cbs_olc/import_discovery/elsevier/einzuspielen/GBV00000000001321.pica (DE-627)ELV053338790 (ELSEVIER)S1389-1286(21)00038-4 DE-627 ger DE-627 rakwb eng 610 VZ 610 VZ 44.44 bkl Faroughi, Azadeh verfasserin aut Towards website domain name classification using graph based semi-supervised learning 2021transfer abstract nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier In this work, we tackle the problem of classifying websites domain names to a category, e.g., mapping bbc.com to the ”News and Media” class. Domain name classification is challenging due to the high number of class labels and the highly skewed class distributions. Differently from prior efforts that need to crawl and use the web pages’ actual content, we rely only on traffic logs passively collected, observing traffic regularly flowing in the network, without the burden to crawl and parse web pages. We exploit the information carried by network logs, using just the name of the websites and the sequence of visited websites by users. For this, we propose and evaluate different classification methods based on machine learning. Using a large dataset with hundreds of thousands of domain names and 25 different categories, we show that semi-supervised learning methods are more suitable for this task than traditional supervised approaches. Using graphs, we incorporate in the classifier aspects not strictly related to the labeled data, and we can classify most of the unlabeled domains. However, in this framework, classification scores are lower than those usually found when exploiting the page-specific content. Our work is the first to perform an extensive evaluation of domain name classification using only passive flow-level logs to the best of our knowledge. In this work, we tackle the problem of classifying websites domain names to a category, e.g., mapping bbc.com to the ”News and Media” class. Domain name classification is challenging due to the high number of class labels and the highly skewed class distributions. Differently from prior efforts that need to crawl and use the web pages’ actual content, we rely only on traffic logs passively collected, observing traffic regularly flowing in the network, without the burden to crawl and parse web pages. We exploit the information carried by network logs, using just the name of the websites and the sequence of visited websites by users. For this, we propose and evaluate different classification methods based on machine learning. Using a large dataset with hundreds of thousands of domain names and 25 different categories, we show that semi-supervised learning methods are more suitable for this task than traditional supervised approaches. Using graphs, we incorporate in the classifier aspects not strictly related to the labeled data, and we can classify most of the unlabeled domains. However, in this framework, classification scores are lower than those usually found when exploiting the page-specific content. Our work is the first to perform an extensive evaluation of domain name classification using only passive flow-level logs to the best of our knowledge. Domain names Elsevier Network measurements Elsevier Semi-supervised learning Elsevier Classification Elsevier Passive measurements Elsevier Morichetta, Andrea oth Vassio, Luca oth Figueiredo, Flavio oth Mellia, Marco oth Javidan, Reza oth Enthalten in Elsevier Poo, J.L. ELSEVIER Pharmacokinetics of the Antifibrotic Drug Pirfenidone in Child Pugh A and B Cirrhotic Patients Compared to Healthy Age-Matched Controls 2016 the international journal of computer and telecommunications networking Amsterdam [u.a.] (DE-627)ELV013796984 volume:188 year:2021 day:7 month:04 pages:0 https://doi.org/10.1016/j.comnet.2021.107865 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U SSG-OLC-PHA GBV_ILN_40 44.44 Parasitologie Medizin VZ AR 188 2021 7 0407 0 |
allfieldsGer |
10.1016/j.comnet.2021.107865 doi /cbs_pica/cbs_olc/import_discovery/elsevier/einzuspielen/GBV00000000001321.pica (DE-627)ELV053338790 (ELSEVIER)S1389-1286(21)00038-4 DE-627 ger DE-627 rakwb eng 610 VZ 610 VZ 44.44 bkl Faroughi, Azadeh verfasserin aut Towards website domain name classification using graph based semi-supervised learning 2021transfer abstract nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier In this work, we tackle the problem of classifying websites domain names to a category, e.g., mapping bbc.com to the ”News and Media” class. Domain name classification is challenging due to the high number of class labels and the highly skewed class distributions. Differently from prior efforts that need to crawl and use the web pages’ actual content, we rely only on traffic logs passively collected, observing traffic regularly flowing in the network, without the burden to crawl and parse web pages. We exploit the information carried by network logs, using just the name of the websites and the sequence of visited websites by users. For this, we propose and evaluate different classification methods based on machine learning. Using a large dataset with hundreds of thousands of domain names and 25 different categories, we show that semi-supervised learning methods are more suitable for this task than traditional supervised approaches. Using graphs, we incorporate in the classifier aspects not strictly related to the labeled data, and we can classify most of the unlabeled domains. However, in this framework, classification scores are lower than those usually found when exploiting the page-specific content. Our work is the first to perform an extensive evaluation of domain name classification using only passive flow-level logs to the best of our knowledge. In this work, we tackle the problem of classifying websites domain names to a category, e.g., mapping bbc.com to the ”News and Media” class. Domain name classification is challenging due to the high number of class labels and the highly skewed class distributions. Differently from prior efforts that need to crawl and use the web pages’ actual content, we rely only on traffic logs passively collected, observing traffic regularly flowing in the network, without the burden to crawl and parse web pages. We exploit the information carried by network logs, using just the name of the websites and the sequence of visited websites by users. For this, we propose and evaluate different classification methods based on machine learning. Using a large dataset with hundreds of thousands of domain names and 25 different categories, we show that semi-supervised learning methods are more suitable for this task than traditional supervised approaches. Using graphs, we incorporate in the classifier aspects not strictly related to the labeled data, and we can classify most of the unlabeled domains. However, in this framework, classification scores are lower than those usually found when exploiting the page-specific content. Our work is the first to perform an extensive evaluation of domain name classification using only passive flow-level logs to the best of our knowledge. Domain names Elsevier Network measurements Elsevier Semi-supervised learning Elsevier Classification Elsevier Passive measurements Elsevier Morichetta, Andrea oth Vassio, Luca oth Figueiredo, Flavio oth Mellia, Marco oth Javidan, Reza oth Enthalten in Elsevier Poo, J.L. ELSEVIER Pharmacokinetics of the Antifibrotic Drug Pirfenidone in Child Pugh A and B Cirrhotic Patients Compared to Healthy Age-Matched Controls 2016 the international journal of computer and telecommunications networking Amsterdam [u.a.] (DE-627)ELV013796984 volume:188 year:2021 day:7 month:04 pages:0 https://doi.org/10.1016/j.comnet.2021.107865 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U SSG-OLC-PHA GBV_ILN_40 44.44 Parasitologie Medizin VZ AR 188 2021 7 0407 0 |
allfieldsSound |
10.1016/j.comnet.2021.107865 doi /cbs_pica/cbs_olc/import_discovery/elsevier/einzuspielen/GBV00000000001321.pica (DE-627)ELV053338790 (ELSEVIER)S1389-1286(21)00038-4 DE-627 ger DE-627 rakwb eng 610 VZ 610 VZ 44.44 bkl Faroughi, Azadeh verfasserin aut Towards website domain name classification using graph based semi-supervised learning 2021transfer abstract nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier In this work, we tackle the problem of classifying websites domain names to a category, e.g., mapping bbc.com to the ”News and Media” class. Domain name classification is challenging due to the high number of class labels and the highly skewed class distributions. Differently from prior efforts that need to crawl and use the web pages’ actual content, we rely only on traffic logs passively collected, observing traffic regularly flowing in the network, without the burden to crawl and parse web pages. We exploit the information carried by network logs, using just the name of the websites and the sequence of visited websites by users. For this, we propose and evaluate different classification methods based on machine learning. Using a large dataset with hundreds of thousands of domain names and 25 different categories, we show that semi-supervised learning methods are more suitable for this task than traditional supervised approaches. Using graphs, we incorporate in the classifier aspects not strictly related to the labeled data, and we can classify most of the unlabeled domains. However, in this framework, classification scores are lower than those usually found when exploiting the page-specific content. Our work is the first to perform an extensive evaluation of domain name classification using only passive flow-level logs to the best of our knowledge. In this work, we tackle the problem of classifying websites domain names to a category, e.g., mapping bbc.com to the ”News and Media” class. Domain name classification is challenging due to the high number of class labels and the highly skewed class distributions. Differently from prior efforts that need to crawl and use the web pages’ actual content, we rely only on traffic logs passively collected, observing traffic regularly flowing in the network, without the burden to crawl and parse web pages. We exploit the information carried by network logs, using just the name of the websites and the sequence of visited websites by users. For this, we propose and evaluate different classification methods based on machine learning. Using a large dataset with hundreds of thousands of domain names and 25 different categories, we show that semi-supervised learning methods are more suitable for this task than traditional supervised approaches. Using graphs, we incorporate in the classifier aspects not strictly related to the labeled data, and we can classify most of the unlabeled domains. However, in this framework, classification scores are lower than those usually found when exploiting the page-specific content. Our work is the first to perform an extensive evaluation of domain name classification using only passive flow-level logs to the best of our knowledge. Domain names Elsevier Network measurements Elsevier Semi-supervised learning Elsevier Classification Elsevier Passive measurements Elsevier Morichetta, Andrea oth Vassio, Luca oth Figueiredo, Flavio oth Mellia, Marco oth Javidan, Reza oth Enthalten in Elsevier Poo, J.L. ELSEVIER Pharmacokinetics of the Antifibrotic Drug Pirfenidone in Child Pugh A and B Cirrhotic Patients Compared to Healthy Age-Matched Controls 2016 the international journal of computer and telecommunications networking Amsterdam [u.a.] (DE-627)ELV013796984 volume:188 year:2021 day:7 month:04 pages:0 https://doi.org/10.1016/j.comnet.2021.107865 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U SSG-OLC-PHA GBV_ILN_40 44.44 Parasitologie Medizin VZ AR 188 2021 7 0407 0 |
language |
English |
source |
Enthalten in Pharmacokinetics of the Antifibrotic Drug Pirfenidone in Child Pugh A and B Cirrhotic Patients Compared to Healthy Age-Matched Controls Amsterdam [u.a.] volume:188 year:2021 day:7 month:04 pages:0 |
sourceStr |
Enthalten in Pharmacokinetics of the Antifibrotic Drug Pirfenidone in Child Pugh A and B Cirrhotic Patients Compared to Healthy Age-Matched Controls Amsterdam [u.a.] volume:188 year:2021 day:7 month:04 pages:0 |
format_phy_str_mv |
Article |
bklname |
Parasitologie |
institution |
findex.gbv.de |
topic_facet |
Domain names Network measurements Semi-supervised learning Classification Passive measurements |
dewey-raw |
610 |
isfreeaccess_bool |
false |
container_title |
Pharmacokinetics of the Antifibrotic Drug Pirfenidone in Child Pugh A and B Cirrhotic Patients Compared to Healthy Age-Matched Controls |
authorswithroles_txt_mv |
Faroughi, Azadeh @@aut@@ Morichetta, Andrea @@oth@@ Vassio, Luca @@oth@@ Figueiredo, Flavio @@oth@@ Mellia, Marco @@oth@@ Javidan, Reza @@oth@@ |
publishDateDaySort_date |
2021-01-07T00:00:00Z |
hierarchy_top_id |
ELV013796984 |
dewey-sort |
3610 |
id |
ELV053338790 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">ELV053338790</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230626034651.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">210910s2021 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1016/j.comnet.2021.107865</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">/cbs_pica/cbs_olc/import_discovery/elsevier/einzuspielen/GBV00000000001321.pica</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)ELV053338790</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ELSEVIER)S1389-1286(21)00038-4</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">610</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">610</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">44.44</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Faroughi, Azadeh</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Towards website domain name classification using graph based semi-supervised learning</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2021transfer abstract</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">zzz</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">z</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">zu</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">In this work, we tackle the problem of classifying websites domain names to a category, e.g., mapping bbc.com to the ”News and Media” class. Domain name classification is challenging due to the high number of class labels and the highly skewed class distributions. Differently from prior efforts that need to crawl and use the web pages’ actual content, we rely only on traffic logs passively collected, observing traffic regularly flowing in the network, without the burden to crawl and parse web pages. We exploit the information carried by network logs, using just the name of the websites and the sequence of visited websites by users. For this, we propose and evaluate different classification methods based on machine learning. Using a large dataset with hundreds of thousands of domain names and 25 different categories, we show that semi-supervised learning methods are more suitable for this task than traditional supervised approaches. Using graphs, we incorporate in the classifier aspects not strictly related to the labeled data, and we can classify most of the unlabeled domains. However, in this framework, classification scores are lower than those usually found when exploiting the page-specific content. Our work is the first to perform an extensive evaluation of domain name classification using only passive flow-level logs to the best of our knowledge.</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">In this work, we tackle the problem of classifying websites domain names to a category, e.g., mapping bbc.com to the ”News and Media” class. Domain name classification is challenging due to the high number of class labels and the highly skewed class distributions. Differently from prior efforts that need to crawl and use the web pages’ actual content, we rely only on traffic logs passively collected, observing traffic regularly flowing in the network, without the burden to crawl and parse web pages. We exploit the information carried by network logs, using just the name of the websites and the sequence of visited websites by users. For this, we propose and evaluate different classification methods based on machine learning. Using a large dataset with hundreds of thousands of domain names and 25 different categories, we show that semi-supervised learning methods are more suitable for this task than traditional supervised approaches. Using graphs, we incorporate in the classifier aspects not strictly related to the labeled data, and we can classify most of the unlabeled domains. However, in this framework, classification scores are lower than those usually found when exploiting the page-specific content. Our work is the first to perform an extensive evaluation of domain name classification using only passive flow-level logs to the best of our knowledge.</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Domain names</subfield><subfield code="2">Elsevier</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Network measurements</subfield><subfield code="2">Elsevier</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Semi-supervised learning</subfield><subfield code="2">Elsevier</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Classification</subfield><subfield code="2">Elsevier</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Passive measurements</subfield><subfield code="2">Elsevier</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Morichetta, Andrea</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Vassio, Luca</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Figueiredo, Flavio</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Mellia, Marco</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Javidan, Reza</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="n">Elsevier</subfield><subfield code="a">Poo, J.L. ELSEVIER</subfield><subfield code="t">Pharmacokinetics of the Antifibrotic Drug Pirfenidone in Child Pugh A and B Cirrhotic Patients Compared to Healthy Age-Matched Controls</subfield><subfield code="d">2016</subfield><subfield code="d">the international journal of computer and telecommunications networking</subfield><subfield code="g">Amsterdam [u.a.]</subfield><subfield code="w">(DE-627)ELV013796984</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:188</subfield><subfield code="g">year:2021</subfield><subfield code="g">day:7</subfield><subfield code="g">month:04</subfield><subfield code="g">pages:0</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.1016/j.comnet.2021.107865</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ELV</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-PHA</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">44.44</subfield><subfield code="j">Parasitologie</subfield><subfield code="x">Medizin</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">188</subfield><subfield code="j">2021</subfield><subfield code="b">7</subfield><subfield code="c">0407</subfield><subfield code="h">0</subfield></datafield></record></collection>
|
author |
Faroughi, Azadeh |
spellingShingle |
Faroughi, Azadeh ddc 610 bkl 44.44 Elsevier Domain names Elsevier Network measurements Elsevier Semi-supervised learning Elsevier Classification Elsevier Passive measurements Towards website domain name classification using graph based semi-supervised learning |
authorStr |
Faroughi, Azadeh |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)ELV013796984 |
format |
electronic Article |
dewey-ones |
610 - Medicine & health |
delete_txt_mv |
keep |
author_role |
aut |
collection |
elsevier |
remote_str |
true |
illustrated |
Not Illustrated |
topic_title |
610 VZ 44.44 bkl Towards website domain name classification using graph based semi-supervised learning Domain names Elsevier Network measurements Elsevier Semi-supervised learning Elsevier Classification Elsevier Passive measurements Elsevier |
topic |
ddc 610 bkl 44.44 Elsevier Domain names Elsevier Network measurements Elsevier Semi-supervised learning Elsevier Classification Elsevier Passive measurements |
topic_unstemmed |
ddc 610 bkl 44.44 Elsevier Domain names Elsevier Network measurements Elsevier Semi-supervised learning Elsevier Classification Elsevier Passive measurements |
topic_browse |
ddc 610 bkl 44.44 Elsevier Domain names Elsevier Network measurements Elsevier Semi-supervised learning Elsevier Classification Elsevier Passive measurements |
format_facet |
Elektronische Aufsätze Aufsätze Elektronische Ressource |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
zu |
author2_variant |
a m am l v lv f f ff m m mm r j rj |
hierarchy_parent_title |
Pharmacokinetics of the Antifibrotic Drug Pirfenidone in Child Pugh A and B Cirrhotic Patients Compared to Healthy Age-Matched Controls |
hierarchy_parent_id |
ELV013796984 |
dewey-tens |
610 - Medicine & health |
hierarchy_top_title |
Pharmacokinetics of the Antifibrotic Drug Pirfenidone in Child Pugh A and B Cirrhotic Patients Compared to Healthy Age-Matched Controls |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)ELV013796984 |
title |
Towards website domain name classification using graph based semi-supervised learning |
ctrlnum |
(DE-627)ELV053338790 (ELSEVIER)S1389-1286(21)00038-4 |
title_full |
Towards website domain name classification using graph based semi-supervised learning |
author_sort |
Faroughi, Azadeh |
journal |
Pharmacokinetics of the Antifibrotic Drug Pirfenidone in Child Pugh A and B Cirrhotic Patients Compared to Healthy Age-Matched Controls |
journalStr |
Pharmacokinetics of the Antifibrotic Drug Pirfenidone in Child Pugh A and B Cirrhotic Patients Compared to Healthy Age-Matched Controls |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
600 - Technology |
recordtype |
marc |
publishDateSort |
2021 |
contenttype_str_mv |
zzz |
container_start_page |
0 |
author_browse |
Faroughi, Azadeh |
container_volume |
188 |
class |
610 VZ 44.44 bkl |
format_se |
Elektronische Aufsätze |
author-letter |
Faroughi, Azadeh |
doi_str_mv |
10.1016/j.comnet.2021.107865 |
dewey-full |
610 |
title_sort |
towards website domain name classification using graph based semi-supervised learning |
title_auth |
Towards website domain name classification using graph based semi-supervised learning |
abstract |
In this work, we tackle the problem of classifying websites domain names to a category, e.g., mapping bbc.com to the ”News and Media” class. Domain name classification is challenging due to the high number of class labels and the highly skewed class distributions. Differently from prior efforts that need to crawl and use the web pages’ actual content, we rely only on traffic logs passively collected, observing traffic regularly flowing in the network, without the burden to crawl and parse web pages. We exploit the information carried by network logs, using just the name of the websites and the sequence of visited websites by users. For this, we propose and evaluate different classification methods based on machine learning. Using a large dataset with hundreds of thousands of domain names and 25 different categories, we show that semi-supervised learning methods are more suitable for this task than traditional supervised approaches. Using graphs, we incorporate in the classifier aspects not strictly related to the labeled data, and we can classify most of the unlabeled domains. However, in this framework, classification scores are lower than those usually found when exploiting the page-specific content. Our work is the first to perform an extensive evaluation of domain name classification using only passive flow-level logs to the best of our knowledge. |
abstractGer |
In this work, we tackle the problem of classifying websites domain names to a category, e.g., mapping bbc.com to the ”News and Media” class. Domain name classification is challenging due to the high number of class labels and the highly skewed class distributions. Differently from prior efforts that need to crawl and use the web pages’ actual content, we rely only on traffic logs passively collected, observing traffic regularly flowing in the network, without the burden to crawl and parse web pages. We exploit the information carried by network logs, using just the name of the websites and the sequence of visited websites by users. For this, we propose and evaluate different classification methods based on machine learning. Using a large dataset with hundreds of thousands of domain names and 25 different categories, we show that semi-supervised learning methods are more suitable for this task than traditional supervised approaches. Using graphs, we incorporate in the classifier aspects not strictly related to the labeled data, and we can classify most of the unlabeled domains. However, in this framework, classification scores are lower than those usually found when exploiting the page-specific content. Our work is the first to perform an extensive evaluation of domain name classification using only passive flow-level logs to the best of our knowledge. |
abstract_unstemmed |
In this work, we tackle the problem of classifying websites domain names to a category, e.g., mapping bbc.com to the ”News and Media” class. Domain name classification is challenging due to the high number of class labels and the highly skewed class distributions. Differently from prior efforts that need to crawl and use the web pages’ actual content, we rely only on traffic logs passively collected, observing traffic regularly flowing in the network, without the burden to crawl and parse web pages. We exploit the information carried by network logs, using just the name of the websites and the sequence of visited websites by users. For this, we propose and evaluate different classification methods based on machine learning. Using a large dataset with hundreds of thousands of domain names and 25 different categories, we show that semi-supervised learning methods are more suitable for this task than traditional supervised approaches. Using graphs, we incorporate in the classifier aspects not strictly related to the labeled data, and we can classify most of the unlabeled domains. However, in this framework, classification scores are lower than those usually found when exploiting the page-specific content. Our work is the first to perform an extensive evaluation of domain name classification using only passive flow-level logs to the best of our knowledge. |
collection_details |
GBV_USEFLAG_U GBV_ELV SYSFLAG_U SSG-OLC-PHA GBV_ILN_40 |
title_short |
Towards website domain name classification using graph based semi-supervised learning |
url |
https://doi.org/10.1016/j.comnet.2021.107865 |
remote_bool |
true |
author2 |
Morichetta, Andrea Vassio, Luca Figueiredo, Flavio Mellia, Marco Javidan, Reza |
author2Str |
Morichetta, Andrea Vassio, Luca Figueiredo, Flavio Mellia, Marco Javidan, Reza |
ppnlink |
ELV013796984 |
mediatype_str_mv |
z |
isOA_txt |
false |
hochschulschrift_bool |
false |
author2_role |
oth oth oth oth oth |
doi_str |
10.1016/j.comnet.2021.107865 |
up_date |
2024-07-06T18:40:39.955Z |
_version_ |
1803856107228102656 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">ELV053338790</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230626034651.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">210910s2021 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1016/j.comnet.2021.107865</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">/cbs_pica/cbs_olc/import_discovery/elsevier/einzuspielen/GBV00000000001321.pica</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)ELV053338790</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ELSEVIER)S1389-1286(21)00038-4</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">610</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">610</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">44.44</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Faroughi, Azadeh</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Towards website domain name classification using graph based semi-supervised learning</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2021transfer abstract</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">zzz</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">z</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">zu</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">In this work, we tackle the problem of classifying websites domain names to a category, e.g., mapping bbc.com to the ”News and Media” class. Domain name classification is challenging due to the high number of class labels and the highly skewed class distributions. Differently from prior efforts that need to crawl and use the web pages’ actual content, we rely only on traffic logs passively collected, observing traffic regularly flowing in the network, without the burden to crawl and parse web pages. We exploit the information carried by network logs, using just the name of the websites and the sequence of visited websites by users. For this, we propose and evaluate different classification methods based on machine learning. Using a large dataset with hundreds of thousands of domain names and 25 different categories, we show that semi-supervised learning methods are more suitable for this task than traditional supervised approaches. Using graphs, we incorporate in the classifier aspects not strictly related to the labeled data, and we can classify most of the unlabeled domains. However, in this framework, classification scores are lower than those usually found when exploiting the page-specific content. Our work is the first to perform an extensive evaluation of domain name classification using only passive flow-level logs to the best of our knowledge.</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">In this work, we tackle the problem of classifying websites domain names to a category, e.g., mapping bbc.com to the ”News and Media” class. Domain name classification is challenging due to the high number of class labels and the highly skewed class distributions. Differently from prior efforts that need to crawl and use the web pages’ actual content, we rely only on traffic logs passively collected, observing traffic regularly flowing in the network, without the burden to crawl and parse web pages. We exploit the information carried by network logs, using just the name of the websites and the sequence of visited websites by users. For this, we propose and evaluate different classification methods based on machine learning. Using a large dataset with hundreds of thousands of domain names and 25 different categories, we show that semi-supervised learning methods are more suitable for this task than traditional supervised approaches. Using graphs, we incorporate in the classifier aspects not strictly related to the labeled data, and we can classify most of the unlabeled domains. However, in this framework, classification scores are lower than those usually found when exploiting the page-specific content. Our work is the first to perform an extensive evaluation of domain name classification using only passive flow-level logs to the best of our knowledge.</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Domain names</subfield><subfield code="2">Elsevier</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Network measurements</subfield><subfield code="2">Elsevier</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Semi-supervised learning</subfield><subfield code="2">Elsevier</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Classification</subfield><subfield code="2">Elsevier</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Passive measurements</subfield><subfield code="2">Elsevier</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Morichetta, Andrea</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Vassio, Luca</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Figueiredo, Flavio</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Mellia, Marco</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Javidan, Reza</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="n">Elsevier</subfield><subfield code="a">Poo, J.L. ELSEVIER</subfield><subfield code="t">Pharmacokinetics of the Antifibrotic Drug Pirfenidone in Child Pugh A and B Cirrhotic Patients Compared to Healthy Age-Matched Controls</subfield><subfield code="d">2016</subfield><subfield code="d">the international journal of computer and telecommunications networking</subfield><subfield code="g">Amsterdam [u.a.]</subfield><subfield code="w">(DE-627)ELV013796984</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:188</subfield><subfield code="g">year:2021</subfield><subfield code="g">day:7</subfield><subfield code="g">month:04</subfield><subfield code="g">pages:0</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.1016/j.comnet.2021.107865</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ELV</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-PHA</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">44.44</subfield><subfield code="j">Parasitologie</subfield><subfield code="x">Medizin</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">188</subfield><subfield code="j">2021</subfield><subfield code="b">7</subfield><subfield code="c">0407</subfield><subfield code="h">0</subfield></datafield></record></collection>
|
score |
7.402647 |