Offline evaluation options for recommender systems

Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the ch...
Ausführliche Beschreibung

Gespeichert in:

Autor*in:	Cañamares, Rocío [verfasserIn] Castells, Pablo Moffat, Alistair

Format:	Artikel
Sprache:	Englisch

Erschienen:	2020

Schlagwörter:	Recommender systems Evaluation Effectiveness metric Experimental design

Anmerkung:	© Springer Nature B.V. 2020

Übergeordnetes Werk:	Enthalten in: Information retrieval journal - Springer Netherlands, 1999, 23(2020), 4 vom: 18. März, Seite 387-410
Übergeordnetes Werk:	volume:23 ; year:2020 ; number:4 ; day:18 ; month:03 ; pages:387-410

Links:	Volltext

DOI / URN:	10.1007/s10791-020-09371-3

Katalog-ID:	OLC2034068572

Internformat


LEADER	01000caa a22002652 4500
001	OLC2034068572
003	DE-627
005	20230504153657.0
007	tu
008	200819s2020 xx \|\|\|\|\| 00\| \|\|eng c
024	7		\|a 10.1007/s10791-020-09371-3 \|2 doi
035			\|a (DE-627)OLC2034068572
035			\|a (DE-He213)s10791-020-09371-3-p
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
082	0	4	\|a 020 \|a 070 \|a 004 \|q VZ
084			\|a 24,1 \|2 ssgn
084			\|a 06.74$jInformationssysteme \|2 bkl
100	1		\|a Cañamares, Rocío \|e verfasserin \|0 (orcid)0000-0002-2278-0445 \|4 aut
245	1	0	\|a Offline evaluation options for recommender systems
264		1	\|c 2020
336			\|a Text \|b txt \|2 rdacontent
337			\|a ohne Hilfsmittel zu benutzen \|b n \|2 rdamedia
338			\|a Band \|b nc \|2 rdacarrier
500			\|a © Springer Nature B.V. 2020
520			\|a Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used.
650		4	\|a Recommender systems
650		4	\|a Evaluation
650		4	\|a Effectiveness metric
650		4	\|a Experimental design
700	1		\|a Castells, Pablo \|0 (orcid)0000-0003-0668-6317 \|4 aut
700	1		\|a Moffat, Alistair \|0 (orcid)0000-0002-6638-0232 \|4 aut
773	0	8	\|i Enthalten in \|t Information retrieval journal \|d Springer Netherlands, 1999 \|g 23(2020), 4 vom: 18. März, Seite 387-410 \|w (DE-627)245716939 \|w (DE-600)1432556-1 \|w (DE-576)066689066 \|x 1386-4564 \|7 nnns
773	1	8	\|g volume:23 \|g year:2020 \|g number:4 \|g day:18 \|g month:03 \|g pages:387-410
856	4	1	\|u https://doi.org/10.1007/s10791-020-09371-3 \|z lizenzpflichtig \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_OLC
912			\|a SSG-OLC-BUB
912			\|a SSG-OPC-BBI
936	b	k	\|a 06.74$jInformationssysteme \|q VZ \|0 106415212 \|0 (DE-625)106415212
951			\|a AR
952			\|d 23 \|j 2020 \|e 4 \|b 18 \|c 03 \|h 387-410

Indexfelder

author_variant	r c rc p c pc a m am
matchkey_str	article:13864564:2020----::flneautootosorcm
hierarchy_sort_str	2020
bklnumber	06.74$jInformationssysteme
publishDate	2020
allfields	10.1007/s10791-020-09371-3 doi (DE-627)OLC2034068572 (DE-He213)s10791-020-09371-3-p DE-627 ger DE-627 rakwb eng 020 070 004 VZ 24,1 ssgn 06.74$jInformationssysteme bkl Cañamares, Rocío verfasserin (orcid)0000-0002-2278-0445 aut Offline evaluation options for recommender systems 2020 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Nature B.V. 2020 Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used. Recommender systems Evaluation Effectiveness metric Experimental design Castells, Pablo (orcid)0000-0003-0668-6317 aut Moffat, Alistair (orcid)0000-0002-6638-0232 aut Enthalten in Information retrieval journal Springer Netherlands, 1999 23(2020), 4 vom: 18. März, Seite 387-410 (DE-627)245716939 (DE-600)1432556-1 (DE-576)066689066 1386-4564 nnns volume:23 year:2020 number:4 day:18 month:03 pages:387-410 https://doi.org/10.1007/s10791-020-09371-3 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-BUB SSG-OPC-BBI 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 AR 23 2020 4 18 03 387-410
spelling	10.1007/s10791-020-09371-3 doi (DE-627)OLC2034068572 (DE-He213)s10791-020-09371-3-p DE-627 ger DE-627 rakwb eng 020 070 004 VZ 24,1 ssgn 06.74$jInformationssysteme bkl Cañamares, Rocío verfasserin (orcid)0000-0002-2278-0445 aut Offline evaluation options for recommender systems 2020 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Nature B.V. 2020 Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used. Recommender systems Evaluation Effectiveness metric Experimental design Castells, Pablo (orcid)0000-0003-0668-6317 aut Moffat, Alistair (orcid)0000-0002-6638-0232 aut Enthalten in Information retrieval journal Springer Netherlands, 1999 23(2020), 4 vom: 18. März, Seite 387-410 (DE-627)245716939 (DE-600)1432556-1 (DE-576)066689066 1386-4564 nnns volume:23 year:2020 number:4 day:18 month:03 pages:387-410 https://doi.org/10.1007/s10791-020-09371-3 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-BUB SSG-OPC-BBI 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 AR 23 2020 4 18 03 387-410
allfields_unstemmed	10.1007/s10791-020-09371-3 doi (DE-627)OLC2034068572 (DE-He213)s10791-020-09371-3-p DE-627 ger DE-627 rakwb eng 020 070 004 VZ 24,1 ssgn 06.74$jInformationssysteme bkl Cañamares, Rocío verfasserin (orcid)0000-0002-2278-0445 aut Offline evaluation options for recommender systems 2020 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Nature B.V. 2020 Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used. Recommender systems Evaluation Effectiveness metric Experimental design Castells, Pablo (orcid)0000-0003-0668-6317 aut Moffat, Alistair (orcid)0000-0002-6638-0232 aut Enthalten in Information retrieval journal Springer Netherlands, 1999 23(2020), 4 vom: 18. März, Seite 387-410 (DE-627)245716939 (DE-600)1432556-1 (DE-576)066689066 1386-4564 nnns volume:23 year:2020 number:4 day:18 month:03 pages:387-410 https://doi.org/10.1007/s10791-020-09371-3 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-BUB SSG-OPC-BBI 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 AR 23 2020 4 18 03 387-410
allfieldsGer	10.1007/s10791-020-09371-3 doi (DE-627)OLC2034068572 (DE-He213)s10791-020-09371-3-p DE-627 ger DE-627 rakwb eng 020 070 004 VZ 24,1 ssgn 06.74$jInformationssysteme bkl Cañamares, Rocío verfasserin (orcid)0000-0002-2278-0445 aut Offline evaluation options for recommender systems 2020 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Nature B.V. 2020 Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used. Recommender systems Evaluation Effectiveness metric Experimental design Castells, Pablo (orcid)0000-0003-0668-6317 aut Moffat, Alistair (orcid)0000-0002-6638-0232 aut Enthalten in Information retrieval journal Springer Netherlands, 1999 23(2020), 4 vom: 18. März, Seite 387-410 (DE-627)245716939 (DE-600)1432556-1 (DE-576)066689066 1386-4564 nnns volume:23 year:2020 number:4 day:18 month:03 pages:387-410 https://doi.org/10.1007/s10791-020-09371-3 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-BUB SSG-OPC-BBI 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 AR 23 2020 4 18 03 387-410
allfieldsSound	10.1007/s10791-020-09371-3 doi (DE-627)OLC2034068572 (DE-He213)s10791-020-09371-3-p DE-627 ger DE-627 rakwb eng 020 070 004 VZ 24,1 ssgn 06.74$jInformationssysteme bkl Cañamares, Rocío verfasserin (orcid)0000-0002-2278-0445 aut Offline evaluation options for recommender systems 2020 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Nature B.V. 2020 Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used. Recommender systems Evaluation Effectiveness metric Experimental design Castells, Pablo (orcid)0000-0003-0668-6317 aut Moffat, Alistair (orcid)0000-0002-6638-0232 aut Enthalten in Information retrieval journal Springer Netherlands, 1999 23(2020), 4 vom: 18. März, Seite 387-410 (DE-627)245716939 (DE-600)1432556-1 (DE-576)066689066 1386-4564 nnns volume:23 year:2020 number:4 day:18 month:03 pages:387-410 https://doi.org/10.1007/s10791-020-09371-3 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-BUB SSG-OPC-BBI 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 AR 23 2020 4 18 03 387-410
language	English
source	Enthalten in Information retrieval journal 23(2020), 4 vom: 18. März, Seite 387-410 volume:23 year:2020 number:4 day:18 month:03 pages:387-410
sourceStr	Enthalten in Information retrieval journal 23(2020), 4 vom: 18. März, Seite 387-410 volume:23 year:2020 number:4 day:18 month:03 pages:387-410
format_phy_str_mv	Article
institution	findex.gbv.de
topic_facet	Recommender systems Evaluation Effectiveness metric Experimental design
dewey-raw	020
isfreeaccess_bool	false
container_title	Information retrieval journal
authorswithroles_txt_mv	Cañamares, Rocío @@aut@@ Castells, Pablo @@aut@@ Moffat, Alistair @@aut@@
publishDateDaySort_date	2020-03-18T00:00:00Z
hierarchy_top_id	245716939
dewey-sort	220
id	OLC2034068572
language_de	englisch
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2034068572</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230504153657.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2020 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10791-020-09371-3</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2034068572</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10791-020-09371-3-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">020</subfield><subfield code="a">070</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">24,1</subfield><subfield code="2">ssgn</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">06.74$jInformationssysteme</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Cañamares, Rocío</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0002-2278-0445</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Offline evaluation options for recommender systems</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2020</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Nature B.V. 2020</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Recommender systems</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Evaluation</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Effectiveness metric</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Experimental design</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Castells, Pablo</subfield><subfield code="0">(orcid)0000-0003-0668-6317</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Moffat, Alistair</subfield><subfield code="0">(orcid)0000-0002-6638-0232</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Information retrieval journal</subfield><subfield code="d">Springer Netherlands, 1999</subfield><subfield code="g">23(2020), 4 vom: 18. März, Seite 387-410</subfield><subfield code="w">(DE-627)245716939</subfield><subfield code="w">(DE-600)1432556-1</subfield><subfield code="w">(DE-576)066689066</subfield><subfield code="x">1386-4564</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:23</subfield><subfield code="g">year:2020</subfield><subfield code="g">number:4</subfield><subfield code="g">day:18</subfield><subfield code="g">month:03</subfield><subfield code="g">pages:387-410</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10791-020-09371-3</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OPC-BBI</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">06.74$jInformationssysteme</subfield><subfield code="q">VZ</subfield><subfield code="0">106415212</subfield><subfield code="0">(DE-625)106415212</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">23</subfield><subfield code="j">2020</subfield><subfield code="e">4</subfield><subfield code="b">18</subfield><subfield code="c">03</subfield><subfield code="h">387-410</subfield></datafield></record></collection>
author	Cañamares, Rocío
spellingShingle	Cañamares, Rocío ddc 020 ssgn 24,1 bkl 06.74$jInformationssysteme misc Recommender systems misc Evaluation misc Effectiveness metric misc Experimental design Offline evaluation options for recommender systems
authorStr	Cañamares, Rocío
ppnlink_with_tag_str_mv	@@773@@(DE-627)245716939
format	Article
dewey-ones	020 - Library & information sciences 070 - News media, journalism & publishing 004 - Data processing & computer science
delete_txt_mv	keep
author_role	aut aut aut
collection	OLC
remote_str	false
illustrated	Not Illustrated
issn	1386-4564
topic_title	020 070 004 VZ 24,1 ssgn 06.74$jInformationssysteme bkl Offline evaluation options for recommender systems Recommender systems Evaluation Effectiveness metric Experimental design
topic	ddc 020 ssgn 24,1 bkl 06.74$jInformationssysteme misc Recommender systems misc Evaluation misc Effectiveness metric misc Experimental design
topic_unstemmed	ddc 020 ssgn 24,1 bkl 06.74$jInformationssysteme misc Recommender systems misc Evaluation misc Effectiveness metric misc Experimental design
topic_browse	ddc 020 ssgn 24,1 bkl 06.74$jInformationssysteme misc Recommender systems misc Evaluation misc Effectiveness metric misc Experimental design
format_facet	Aufsätze Gedruckte Aufsätze
format_main_str_mv	Text Zeitschrift/Artikel
carriertype_str_mv	nc
hierarchy_parent_title	Information retrieval journal
hierarchy_parent_id	245716939
dewey-tens	020 - Library & information sciences 070 - News media, journalism & publishing 000 - Computer science, knowledge & systems
hierarchy_top_title	Information retrieval journal
isfreeaccess_txt	false
familylinks_str_mv	(DE-627)245716939 (DE-600)1432556-1 (DE-576)066689066
title	Offline evaluation options for recommender systems
ctrlnum	(DE-627)OLC2034068572 (DE-He213)s10791-020-09371-3-p
title_full	Offline evaluation options for recommender systems
author_sort	Cañamares, Rocío
journal	Information retrieval journal
journalStr	Information retrieval journal
lang_code	eng
isOA_bool	false
dewey-hundreds	000 - Computer science, information & general works
recordtype	marc
publishDateSort	2020
contenttype_str_mv	txt
container_start_page	387
author_browse	Cañamares, Rocío Castells, Pablo Moffat, Alistair
container_volume	23
class	020 070 004 VZ 24,1 ssgn 06.74$jInformationssysteme bkl
format_se	Aufsätze
author-letter	Cañamares, Rocío
doi_str_mv	10.1007/s10791-020-09371-3
normlink	(ORCID)0000-0002-2278-0445 (ORCID)0000-0003-0668-6317 (ORCID)0000-0002-6638-0232 106415212
normlink_prefix_str_mv	(orcid)0000-0002-2278-0445 (orcid)0000-0003-0668-6317 (orcid)0000-0002-6638-0232 106415212 (DE-625)106415212
dewey-full	020 070 004
title_sort	offline evaluation options for recommender systems
title_auth	Offline evaluation options for recommender systems
abstract	Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used. © Springer Nature B.V. 2020
abstractGer	Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used. © Springer Nature B.V. 2020
abstract_unstemmed	Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used. © Springer Nature B.V. 2020
collection_details	GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-BUB SSG-OPC-BBI
container_issue	4
title_short	Offline evaluation options for recommender systems
url	https://doi.org/10.1007/s10791-020-09371-3
remote_bool	false
author2	Castells, Pablo Moffat, Alistair
author2Str	Castells, Pablo Moffat, Alistair
ppnlink	245716939
mediatype_str_mv	n
isOA_txt	false
hochschulschrift_bool	false
doi_str	10.1007/s10791-020-09371-3
up_date	2024-07-03T19:27:03.640Z
_version_	1803587235240476672
fullrecord_marcxml	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2034068572</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230504153657.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2020 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10791-020-09371-3</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2034068572</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10791-020-09371-3-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">020</subfield><subfield code="a">070</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">24,1</subfield><subfield code="2">ssgn</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">06.74$jInformationssysteme</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Cañamares, Rocío</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0002-2278-0445</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Offline evaluation options for recommender systems</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2020</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Nature B.V. 2020</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Recommender systems</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Evaluation</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Effectiveness metric</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Experimental design</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Castells, Pablo</subfield><subfield code="0">(orcid)0000-0003-0668-6317</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Moffat, Alistair</subfield><subfield code="0">(orcid)0000-0002-6638-0232</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Information retrieval journal</subfield><subfield code="d">Springer Netherlands, 1999</subfield><subfield code="g">23(2020), 4 vom: 18. März, Seite 387-410</subfield><subfield code="w">(DE-627)245716939</subfield><subfield code="w">(DE-600)1432556-1</subfield><subfield code="w">(DE-576)066689066</subfield><subfield code="x">1386-4564</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:23</subfield><subfield code="g">year:2020</subfield><subfield code="g">number:4</subfield><subfield code="g">day:18</subfield><subfield code="g">month:03</subfield><subfield code="g">pages:387-410</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10791-020-09371-3</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OPC-BBI</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">06.74$jInformationssysteme</subfield><subfield code="q">VZ</subfield><subfield code="0">106415212</subfield><subfield code="0">(DE-625)106415212</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">23</subfield><subfield code="j">2020</subfield><subfield code="e">4</subfield><subfield code="b">18</subfield><subfield code="c">03</subfield><subfield code="h">387-410</subfield></datafield></record></collection>
score	7.40226

Nicht das Richtige dabei?

Schreiben Sie uns!

Offline evaluation options for recommender systems

Nicht das Richtige dabei?

Zugang & Verfügbarkeit

Vorhandene Bände

Nicht das Richtige dabei?