Offline evaluation options for recommender systems
Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the ch...
Ausführliche Beschreibung
Autor*in: |
Cañamares, Rocío [verfasserIn] |
---|
Format: |
Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2020 |
---|
Schlagwörter: |
---|
Anmerkung: |
© Springer Nature B.V. 2020 |
---|
Übergeordnetes Werk: |
Enthalten in: Information retrieval journal - Springer Netherlands, 1999, 23(2020), 4 vom: 18. März, Seite 387-410 |
---|---|
Übergeordnetes Werk: |
volume:23 ; year:2020 ; number:4 ; day:18 ; month:03 ; pages:387-410 |
Links: |
---|
DOI / URN: |
10.1007/s10791-020-09371-3 |
---|
Katalog-ID: |
OLC2034068572 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | OLC2034068572 | ||
003 | DE-627 | ||
005 | 20230504153657.0 | ||
007 | tu | ||
008 | 200819s2020 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1007/s10791-020-09371-3 |2 doi | |
035 | |a (DE-627)OLC2034068572 | ||
035 | |a (DE-He213)s10791-020-09371-3-p | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 020 |a 070 |a 004 |q VZ |
084 | |a 24,1 |2 ssgn | ||
084 | |a 06.74$jInformationssysteme |2 bkl | ||
100 | 1 | |a Cañamares, Rocío |e verfasserin |0 (orcid)0000-0002-2278-0445 |4 aut | |
245 | 1 | 0 | |a Offline evaluation options for recommender systems |
264 | 1 | |c 2020 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
500 | |a © Springer Nature B.V. 2020 | ||
520 | |a Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used. | ||
650 | 4 | |a Recommender systems | |
650 | 4 | |a Evaluation | |
650 | 4 | |a Effectiveness metric | |
650 | 4 | |a Experimental design | |
700 | 1 | |a Castells, Pablo |0 (orcid)0000-0003-0668-6317 |4 aut | |
700 | 1 | |a Moffat, Alistair |0 (orcid)0000-0002-6638-0232 |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Information retrieval journal |d Springer Netherlands, 1999 |g 23(2020), 4 vom: 18. März, Seite 387-410 |w (DE-627)245716939 |w (DE-600)1432556-1 |w (DE-576)066689066 |x 1386-4564 |7 nnns |
773 | 1 | 8 | |g volume:23 |g year:2020 |g number:4 |g day:18 |g month:03 |g pages:387-410 |
856 | 4 | 1 | |u https://doi.org/10.1007/s10791-020-09371-3 |z lizenzpflichtig |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a SSG-OLC-BUB | ||
912 | |a SSG-OPC-BBI | ||
936 | b | k | |a 06.74$jInformationssysteme |q VZ |0 106415212 |0 (DE-625)106415212 |
951 | |a AR | ||
952 | |d 23 |j 2020 |e 4 |b 18 |c 03 |h 387-410 |
author_variant |
r c rc p c pc a m am |
---|---|
matchkey_str |
article:13864564:2020----::flneautootosorcm |
hierarchy_sort_str |
2020 |
bklnumber |
06.74$jInformationssysteme |
publishDate |
2020 |
allfields |
10.1007/s10791-020-09371-3 doi (DE-627)OLC2034068572 (DE-He213)s10791-020-09371-3-p DE-627 ger DE-627 rakwb eng 020 070 004 VZ 24,1 ssgn 06.74$jInformationssysteme bkl Cañamares, Rocío verfasserin (orcid)0000-0002-2278-0445 aut Offline evaluation options for recommender systems 2020 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Nature B.V. 2020 Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used. Recommender systems Evaluation Effectiveness metric Experimental design Castells, Pablo (orcid)0000-0003-0668-6317 aut Moffat, Alistair (orcid)0000-0002-6638-0232 aut Enthalten in Information retrieval journal Springer Netherlands, 1999 23(2020), 4 vom: 18. März, Seite 387-410 (DE-627)245716939 (DE-600)1432556-1 (DE-576)066689066 1386-4564 nnns volume:23 year:2020 number:4 day:18 month:03 pages:387-410 https://doi.org/10.1007/s10791-020-09371-3 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-BUB SSG-OPC-BBI 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 AR 23 2020 4 18 03 387-410 |
spelling |
10.1007/s10791-020-09371-3 doi (DE-627)OLC2034068572 (DE-He213)s10791-020-09371-3-p DE-627 ger DE-627 rakwb eng 020 070 004 VZ 24,1 ssgn 06.74$jInformationssysteme bkl Cañamares, Rocío verfasserin (orcid)0000-0002-2278-0445 aut Offline evaluation options for recommender systems 2020 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Nature B.V. 2020 Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used. Recommender systems Evaluation Effectiveness metric Experimental design Castells, Pablo (orcid)0000-0003-0668-6317 aut Moffat, Alistair (orcid)0000-0002-6638-0232 aut Enthalten in Information retrieval journal Springer Netherlands, 1999 23(2020), 4 vom: 18. März, Seite 387-410 (DE-627)245716939 (DE-600)1432556-1 (DE-576)066689066 1386-4564 nnns volume:23 year:2020 number:4 day:18 month:03 pages:387-410 https://doi.org/10.1007/s10791-020-09371-3 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-BUB SSG-OPC-BBI 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 AR 23 2020 4 18 03 387-410 |
allfields_unstemmed |
10.1007/s10791-020-09371-3 doi (DE-627)OLC2034068572 (DE-He213)s10791-020-09371-3-p DE-627 ger DE-627 rakwb eng 020 070 004 VZ 24,1 ssgn 06.74$jInformationssysteme bkl Cañamares, Rocío verfasserin (orcid)0000-0002-2278-0445 aut Offline evaluation options for recommender systems 2020 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Nature B.V. 2020 Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used. Recommender systems Evaluation Effectiveness metric Experimental design Castells, Pablo (orcid)0000-0003-0668-6317 aut Moffat, Alistair (orcid)0000-0002-6638-0232 aut Enthalten in Information retrieval journal Springer Netherlands, 1999 23(2020), 4 vom: 18. März, Seite 387-410 (DE-627)245716939 (DE-600)1432556-1 (DE-576)066689066 1386-4564 nnns volume:23 year:2020 number:4 day:18 month:03 pages:387-410 https://doi.org/10.1007/s10791-020-09371-3 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-BUB SSG-OPC-BBI 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 AR 23 2020 4 18 03 387-410 |
allfieldsGer |
10.1007/s10791-020-09371-3 doi (DE-627)OLC2034068572 (DE-He213)s10791-020-09371-3-p DE-627 ger DE-627 rakwb eng 020 070 004 VZ 24,1 ssgn 06.74$jInformationssysteme bkl Cañamares, Rocío verfasserin (orcid)0000-0002-2278-0445 aut Offline evaluation options for recommender systems 2020 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Nature B.V. 2020 Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used. Recommender systems Evaluation Effectiveness metric Experimental design Castells, Pablo (orcid)0000-0003-0668-6317 aut Moffat, Alistair (orcid)0000-0002-6638-0232 aut Enthalten in Information retrieval journal Springer Netherlands, 1999 23(2020), 4 vom: 18. März, Seite 387-410 (DE-627)245716939 (DE-600)1432556-1 (DE-576)066689066 1386-4564 nnns volume:23 year:2020 number:4 day:18 month:03 pages:387-410 https://doi.org/10.1007/s10791-020-09371-3 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-BUB SSG-OPC-BBI 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 AR 23 2020 4 18 03 387-410 |
allfieldsSound |
10.1007/s10791-020-09371-3 doi (DE-627)OLC2034068572 (DE-He213)s10791-020-09371-3-p DE-627 ger DE-627 rakwb eng 020 070 004 VZ 24,1 ssgn 06.74$jInformationssysteme bkl Cañamares, Rocío verfasserin (orcid)0000-0002-2278-0445 aut Offline evaluation options for recommender systems 2020 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Nature B.V. 2020 Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used. Recommender systems Evaluation Effectiveness metric Experimental design Castells, Pablo (orcid)0000-0003-0668-6317 aut Moffat, Alistair (orcid)0000-0002-6638-0232 aut Enthalten in Information retrieval journal Springer Netherlands, 1999 23(2020), 4 vom: 18. März, Seite 387-410 (DE-627)245716939 (DE-600)1432556-1 (DE-576)066689066 1386-4564 nnns volume:23 year:2020 number:4 day:18 month:03 pages:387-410 https://doi.org/10.1007/s10791-020-09371-3 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-BUB SSG-OPC-BBI 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 AR 23 2020 4 18 03 387-410 |
language |
English |
source |
Enthalten in Information retrieval journal 23(2020), 4 vom: 18. März, Seite 387-410 volume:23 year:2020 number:4 day:18 month:03 pages:387-410 |
sourceStr |
Enthalten in Information retrieval journal 23(2020), 4 vom: 18. März, Seite 387-410 volume:23 year:2020 number:4 day:18 month:03 pages:387-410 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Recommender systems Evaluation Effectiveness metric Experimental design |
dewey-raw |
020 |
isfreeaccess_bool |
false |
container_title |
Information retrieval journal |
authorswithroles_txt_mv |
Cañamares, Rocío @@aut@@ Castells, Pablo @@aut@@ Moffat, Alistair @@aut@@ |
publishDateDaySort_date |
2020-03-18T00:00:00Z |
hierarchy_top_id |
245716939 |
dewey-sort |
220 |
id |
OLC2034068572 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2034068572</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230504153657.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2020 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10791-020-09371-3</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2034068572</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10791-020-09371-3-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">020</subfield><subfield code="a">070</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">24,1</subfield><subfield code="2">ssgn</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">06.74$jInformationssysteme</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Cañamares, Rocío</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0002-2278-0445</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Offline evaluation options for recommender systems</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2020</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Nature B.V. 2020</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Recommender systems</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Evaluation</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Effectiveness metric</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Experimental design</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Castells, Pablo</subfield><subfield code="0">(orcid)0000-0003-0668-6317</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Moffat, Alistair</subfield><subfield code="0">(orcid)0000-0002-6638-0232</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Information retrieval journal</subfield><subfield code="d">Springer Netherlands, 1999</subfield><subfield code="g">23(2020), 4 vom: 18. März, Seite 387-410</subfield><subfield code="w">(DE-627)245716939</subfield><subfield code="w">(DE-600)1432556-1</subfield><subfield code="w">(DE-576)066689066</subfield><subfield code="x">1386-4564</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:23</subfield><subfield code="g">year:2020</subfield><subfield code="g">number:4</subfield><subfield code="g">day:18</subfield><subfield code="g">month:03</subfield><subfield code="g">pages:387-410</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10791-020-09371-3</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OPC-BBI</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">06.74$jInformationssysteme</subfield><subfield code="q">VZ</subfield><subfield code="0">106415212</subfield><subfield code="0">(DE-625)106415212</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">23</subfield><subfield code="j">2020</subfield><subfield code="e">4</subfield><subfield code="b">18</subfield><subfield code="c">03</subfield><subfield code="h">387-410</subfield></datafield></record></collection>
|
author |
Cañamares, Rocío |
spellingShingle |
Cañamares, Rocío ddc 020 ssgn 24,1 bkl 06.74$jInformationssysteme misc Recommender systems misc Evaluation misc Effectiveness metric misc Experimental design Offline evaluation options for recommender systems |
authorStr |
Cañamares, Rocío |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)245716939 |
format |
Article |
dewey-ones |
020 - Library & information sciences 070 - News media, journalism & publishing 004 - Data processing & computer science |
delete_txt_mv |
keep |
author_role |
aut aut aut |
collection |
OLC |
remote_str |
false |
illustrated |
Not Illustrated |
issn |
1386-4564 |
topic_title |
020 070 004 VZ 24,1 ssgn 06.74$jInformationssysteme bkl Offline evaluation options for recommender systems Recommender systems Evaluation Effectiveness metric Experimental design |
topic |
ddc 020 ssgn 24,1 bkl 06.74$jInformationssysteme misc Recommender systems misc Evaluation misc Effectiveness metric misc Experimental design |
topic_unstemmed |
ddc 020 ssgn 24,1 bkl 06.74$jInformationssysteme misc Recommender systems misc Evaluation misc Effectiveness metric misc Experimental design |
topic_browse |
ddc 020 ssgn 24,1 bkl 06.74$jInformationssysteme misc Recommender systems misc Evaluation misc Effectiveness metric misc Experimental design |
format_facet |
Aufsätze Gedruckte Aufsätze |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
nc |
hierarchy_parent_title |
Information retrieval journal |
hierarchy_parent_id |
245716939 |
dewey-tens |
020 - Library & information sciences 070 - News media, journalism & publishing 000 - Computer science, knowledge & systems |
hierarchy_top_title |
Information retrieval journal |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)245716939 (DE-600)1432556-1 (DE-576)066689066 |
title |
Offline evaluation options for recommender systems |
ctrlnum |
(DE-627)OLC2034068572 (DE-He213)s10791-020-09371-3-p |
title_full |
Offline evaluation options for recommender systems |
author_sort |
Cañamares, Rocío |
journal |
Information retrieval journal |
journalStr |
Information retrieval journal |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
000 - Computer science, information & general works |
recordtype |
marc |
publishDateSort |
2020 |
contenttype_str_mv |
txt |
container_start_page |
387 |
author_browse |
Cañamares, Rocío Castells, Pablo Moffat, Alistair |
container_volume |
23 |
class |
020 070 004 VZ 24,1 ssgn 06.74$jInformationssysteme bkl |
format_se |
Aufsätze |
author-letter |
Cañamares, Rocío |
doi_str_mv |
10.1007/s10791-020-09371-3 |
normlink |
(ORCID)0000-0002-2278-0445 (ORCID)0000-0003-0668-6317 (ORCID)0000-0002-6638-0232 106415212 |
normlink_prefix_str_mv |
(orcid)0000-0002-2278-0445 (orcid)0000-0003-0668-6317 (orcid)0000-0002-6638-0232 106415212 (DE-625)106415212 |
dewey-full |
020 070 004 |
title_sort |
offline evaluation options for recommender systems |
title_auth |
Offline evaluation options for recommender systems |
abstract |
Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used. © Springer Nature B.V. 2020 |
abstractGer |
Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used. © Springer Nature B.V. 2020 |
abstract_unstemmed |
Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used. © Springer Nature B.V. 2020 |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-BUB SSG-OPC-BBI |
container_issue |
4 |
title_short |
Offline evaluation options for recommender systems |
url |
https://doi.org/10.1007/s10791-020-09371-3 |
remote_bool |
false |
author2 |
Castells, Pablo Moffat, Alistair |
author2Str |
Castells, Pablo Moffat, Alistair |
ppnlink |
245716939 |
mediatype_str_mv |
n |
isOA_txt |
false |
hochschulschrift_bool |
false |
doi_str |
10.1007/s10791-020-09371-3 |
up_date |
2024-07-03T19:27:03.640Z |
_version_ |
1803587235240476672 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2034068572</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230504153657.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2020 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10791-020-09371-3</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2034068572</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10791-020-09371-3-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">020</subfield><subfield code="a">070</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">24,1</subfield><subfield code="2">ssgn</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">06.74$jInformationssysteme</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Cañamares, Rocío</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0002-2278-0445</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Offline evaluation options for recommender systems</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2020</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Nature B.V. 2020</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Recommender systems</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Evaluation</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Effectiveness metric</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Experimental design</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Castells, Pablo</subfield><subfield code="0">(orcid)0000-0003-0668-6317</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Moffat, Alistair</subfield><subfield code="0">(orcid)0000-0002-6638-0232</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Information retrieval journal</subfield><subfield code="d">Springer Netherlands, 1999</subfield><subfield code="g">23(2020), 4 vom: 18. März, Seite 387-410</subfield><subfield code="w">(DE-627)245716939</subfield><subfield code="w">(DE-600)1432556-1</subfield><subfield code="w">(DE-576)066689066</subfield><subfield code="x">1386-4564</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:23</subfield><subfield code="g">year:2020</subfield><subfield code="g">number:4</subfield><subfield code="g">day:18</subfield><subfield code="g">month:03</subfield><subfield code="g">pages:387-410</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10791-020-09371-3</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OPC-BBI</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">06.74$jInformationssysteme</subfield><subfield code="q">VZ</subfield><subfield code="0">106415212</subfield><subfield code="0">(DE-625)106415212</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">23</subfield><subfield code="j">2020</subfield><subfield code="e">4</subfield><subfield code="b">18</subfield><subfield code="c">03</subfield><subfield code="h">387-410</subfield></datafield></record></collection>
|
score |
7.40226 |