Moved but not gone: an evaluation of real-time methods for discovering replacement web pages
Abstract Inaccessible Web pages and 404 “Page Not Found” responses are a common Web phenomenon and a detriment to the user’s browsing experience. The rediscovery of missing Web pages is, therefore, a relevant research topic in the digital preservation as well as in the Information Retrieval realm. I...
Ausführliche Beschreibung
Autor*in: |
Klein, Martin [verfasserIn] |
---|
Format: |
Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2014 |
---|
Schlagwörter: |
---|
Anmerkung: |
© The Author(s) 2014 |
---|
Übergeordnetes Werk: |
Enthalten in: International journal on digital libraries - Springer Berlin Heidelberg, 1997, 14(2014), 1-2 vom: 01. Feb., Seite 17-38 |
---|---|
Übergeordnetes Werk: |
volume:14 ; year:2014 ; number:1-2 ; day:01 ; month:02 ; pages:17-38 |
Links: |
---|
DOI / URN: |
10.1007/s00799-014-0108-0 |
---|
Katalog-ID: |
OLC2051434379 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | OLC2051434379 | ||
003 | DE-627 | ||
005 | 20230502153658.0 | ||
007 | tu | ||
008 | 200819s2014 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1007/s00799-014-0108-0 |2 doi | |
035 | |a (DE-627)OLC2051434379 | ||
035 | |a (DE-He213)s00799-014-0108-0-p | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 020 |a 004 |q VZ |
084 | |a 24,1 |2 ssgn | ||
100 | 1 | |a Klein, Martin |e verfasserin |4 aut | |
245 | 1 | 0 | |a Moved but not gone: an evaluation of real-time methods for discovering replacement web pages |
264 | 1 | |c 2014 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
500 | |a © The Author(s) 2014 | ||
520 | |a Abstract Inaccessible Web pages and 404 “Page Not Found” responses are a common Web phenomenon and a detriment to the user’s browsing experience. The rediscovery of missing Web pages is, therefore, a relevant research topic in the digital preservation as well as in the Information Retrieval realm. In this article, we bring these two areas together by analyzing four content- and link-based methods to rediscover missing Web pages. We investigate the retrieval performance of the methods individually as well as their combinations and give an insight into how effective these methods are over time. As the main result of this work, we are able to recommend not only the best performing methods but also the sequence in which they should be applied, based on their performance, complexity required to generate them, and evolution over time. Our least complex single method results in a rediscovery rate of almost $$70\,\%$$ of Web pages of our sample dataset based on URIs sampled from the Open Directory Project (DMOZ). By increasing the complexity level and combining three different methods, our results show an increase of the success rate of up to $$77\,\%$$. The results, based on our sample dataset, indicate that Web pages are often not completely lost but have moved to a different location and “just” need to be rediscovered. | ||
650 | 4 | |a Missing Web Pages | |
650 | 4 | |a Web Page Discovery | |
650 | 4 | |a 404 Error | |
650 | 4 | |a Web Preservation | |
650 | 4 | |a Web Archives | |
650 | 4 | |a Memento | |
700 | 1 | |a Nelson, Michael L. |4 aut | |
773 | 0 | 8 | |i Enthalten in |t International journal on digital libraries |d Springer Berlin Heidelberg, 1997 |g 14(2014), 1-2 vom: 01. Feb., Seite 17-38 |w (DE-627)223267902 |w (DE-600)1357321-4 |w (DE-576)059412127 |x 1432-5012 |7 nnns |
773 | 1 | 8 | |g volume:14 |g year:2014 |g number:1-2 |g day:01 |g month:02 |g pages:17-38 |
856 | 4 | 1 | |u https://doi.org/10.1007/s00799-014-0108-0 |z lizenzpflichtig |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a SSG-OLC-MAT | ||
912 | |a SSG-OLC-BUB | ||
912 | |a SSG-OPC-BBI | ||
912 | |a GBV_ILN_11 | ||
912 | |a GBV_ILN_70 | ||
912 | |a GBV_ILN_2018 | ||
912 | |a GBV_ILN_4012 | ||
912 | |a GBV_ILN_4277 | ||
951 | |a AR | ||
952 | |d 14 |j 2014 |e 1-2 |b 01 |c 02 |h 17-38 |
author_variant |
m k mk m l n ml mln |
---|---|
matchkey_str |
article:14325012:2014----::oebtognaeautooratmmtosodsoei |
hierarchy_sort_str |
2014 |
publishDate |
2014 |
allfields |
10.1007/s00799-014-0108-0 doi (DE-627)OLC2051434379 (DE-He213)s00799-014-0108-0-p DE-627 ger DE-627 rakwb eng 020 004 VZ 24,1 ssgn Klein, Martin verfasserin aut Moved but not gone: an evaluation of real-time methods for discovering replacement web pages 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2014 Abstract Inaccessible Web pages and 404 “Page Not Found” responses are a common Web phenomenon and a detriment to the user’s browsing experience. The rediscovery of missing Web pages is, therefore, a relevant research topic in the digital preservation as well as in the Information Retrieval realm. In this article, we bring these two areas together by analyzing four content- and link-based methods to rediscover missing Web pages. We investigate the retrieval performance of the methods individually as well as their combinations and give an insight into how effective these methods are over time. As the main result of this work, we are able to recommend not only the best performing methods but also the sequence in which they should be applied, based on their performance, complexity required to generate them, and evolution over time. Our least complex single method results in a rediscovery rate of almost $$70\,\%$$ of Web pages of our sample dataset based on URIs sampled from the Open Directory Project (DMOZ). By increasing the complexity level and combining three different methods, our results show an increase of the success rate of up to $$77\,\%$$. The results, based on our sample dataset, indicate that Web pages are often not completely lost but have moved to a different location and “just” need to be rediscovered. Missing Web Pages Web Page Discovery 404 Error Web Preservation Web Archives Memento Nelson, Michael L. aut Enthalten in International journal on digital libraries Springer Berlin Heidelberg, 1997 14(2014), 1-2 vom: 01. Feb., Seite 17-38 (DE-627)223267902 (DE-600)1357321-4 (DE-576)059412127 1432-5012 nnns volume:14 year:2014 number:1-2 day:01 month:02 pages:17-38 https://doi.org/10.1007/s00799-014-0108-0 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_11 GBV_ILN_70 GBV_ILN_2018 GBV_ILN_4012 GBV_ILN_4277 AR 14 2014 1-2 01 02 17-38 |
spelling |
10.1007/s00799-014-0108-0 doi (DE-627)OLC2051434379 (DE-He213)s00799-014-0108-0-p DE-627 ger DE-627 rakwb eng 020 004 VZ 24,1 ssgn Klein, Martin verfasserin aut Moved but not gone: an evaluation of real-time methods for discovering replacement web pages 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2014 Abstract Inaccessible Web pages and 404 “Page Not Found” responses are a common Web phenomenon and a detriment to the user’s browsing experience. The rediscovery of missing Web pages is, therefore, a relevant research topic in the digital preservation as well as in the Information Retrieval realm. In this article, we bring these two areas together by analyzing four content- and link-based methods to rediscover missing Web pages. We investigate the retrieval performance of the methods individually as well as their combinations and give an insight into how effective these methods are over time. As the main result of this work, we are able to recommend not only the best performing methods but also the sequence in which they should be applied, based on their performance, complexity required to generate them, and evolution over time. Our least complex single method results in a rediscovery rate of almost $$70\,\%$$ of Web pages of our sample dataset based on URIs sampled from the Open Directory Project (DMOZ). By increasing the complexity level and combining three different methods, our results show an increase of the success rate of up to $$77\,\%$$. The results, based on our sample dataset, indicate that Web pages are often not completely lost but have moved to a different location and “just” need to be rediscovered. Missing Web Pages Web Page Discovery 404 Error Web Preservation Web Archives Memento Nelson, Michael L. aut Enthalten in International journal on digital libraries Springer Berlin Heidelberg, 1997 14(2014), 1-2 vom: 01. Feb., Seite 17-38 (DE-627)223267902 (DE-600)1357321-4 (DE-576)059412127 1432-5012 nnns volume:14 year:2014 number:1-2 day:01 month:02 pages:17-38 https://doi.org/10.1007/s00799-014-0108-0 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_11 GBV_ILN_70 GBV_ILN_2018 GBV_ILN_4012 GBV_ILN_4277 AR 14 2014 1-2 01 02 17-38 |
allfields_unstemmed |
10.1007/s00799-014-0108-0 doi (DE-627)OLC2051434379 (DE-He213)s00799-014-0108-0-p DE-627 ger DE-627 rakwb eng 020 004 VZ 24,1 ssgn Klein, Martin verfasserin aut Moved but not gone: an evaluation of real-time methods for discovering replacement web pages 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2014 Abstract Inaccessible Web pages and 404 “Page Not Found” responses are a common Web phenomenon and a detriment to the user’s browsing experience. The rediscovery of missing Web pages is, therefore, a relevant research topic in the digital preservation as well as in the Information Retrieval realm. In this article, we bring these two areas together by analyzing four content- and link-based methods to rediscover missing Web pages. We investigate the retrieval performance of the methods individually as well as their combinations and give an insight into how effective these methods are over time. As the main result of this work, we are able to recommend not only the best performing methods but also the sequence in which they should be applied, based on their performance, complexity required to generate them, and evolution over time. Our least complex single method results in a rediscovery rate of almost $$70\,\%$$ of Web pages of our sample dataset based on URIs sampled from the Open Directory Project (DMOZ). By increasing the complexity level and combining three different methods, our results show an increase of the success rate of up to $$77\,\%$$. The results, based on our sample dataset, indicate that Web pages are often not completely lost but have moved to a different location and “just” need to be rediscovered. Missing Web Pages Web Page Discovery 404 Error Web Preservation Web Archives Memento Nelson, Michael L. aut Enthalten in International journal on digital libraries Springer Berlin Heidelberg, 1997 14(2014), 1-2 vom: 01. Feb., Seite 17-38 (DE-627)223267902 (DE-600)1357321-4 (DE-576)059412127 1432-5012 nnns volume:14 year:2014 number:1-2 day:01 month:02 pages:17-38 https://doi.org/10.1007/s00799-014-0108-0 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_11 GBV_ILN_70 GBV_ILN_2018 GBV_ILN_4012 GBV_ILN_4277 AR 14 2014 1-2 01 02 17-38 |
allfieldsGer |
10.1007/s00799-014-0108-0 doi (DE-627)OLC2051434379 (DE-He213)s00799-014-0108-0-p DE-627 ger DE-627 rakwb eng 020 004 VZ 24,1 ssgn Klein, Martin verfasserin aut Moved but not gone: an evaluation of real-time methods for discovering replacement web pages 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2014 Abstract Inaccessible Web pages and 404 “Page Not Found” responses are a common Web phenomenon and a detriment to the user’s browsing experience. The rediscovery of missing Web pages is, therefore, a relevant research topic in the digital preservation as well as in the Information Retrieval realm. In this article, we bring these two areas together by analyzing four content- and link-based methods to rediscover missing Web pages. We investigate the retrieval performance of the methods individually as well as their combinations and give an insight into how effective these methods are over time. As the main result of this work, we are able to recommend not only the best performing methods but also the sequence in which they should be applied, based on their performance, complexity required to generate them, and evolution over time. Our least complex single method results in a rediscovery rate of almost $$70\,\%$$ of Web pages of our sample dataset based on URIs sampled from the Open Directory Project (DMOZ). By increasing the complexity level and combining three different methods, our results show an increase of the success rate of up to $$77\,\%$$. The results, based on our sample dataset, indicate that Web pages are often not completely lost but have moved to a different location and “just” need to be rediscovered. Missing Web Pages Web Page Discovery 404 Error Web Preservation Web Archives Memento Nelson, Michael L. aut Enthalten in International journal on digital libraries Springer Berlin Heidelberg, 1997 14(2014), 1-2 vom: 01. Feb., Seite 17-38 (DE-627)223267902 (DE-600)1357321-4 (DE-576)059412127 1432-5012 nnns volume:14 year:2014 number:1-2 day:01 month:02 pages:17-38 https://doi.org/10.1007/s00799-014-0108-0 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_11 GBV_ILN_70 GBV_ILN_2018 GBV_ILN_4012 GBV_ILN_4277 AR 14 2014 1-2 01 02 17-38 |
allfieldsSound |
10.1007/s00799-014-0108-0 doi (DE-627)OLC2051434379 (DE-He213)s00799-014-0108-0-p DE-627 ger DE-627 rakwb eng 020 004 VZ 24,1 ssgn Klein, Martin verfasserin aut Moved but not gone: an evaluation of real-time methods for discovering replacement web pages 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2014 Abstract Inaccessible Web pages and 404 “Page Not Found” responses are a common Web phenomenon and a detriment to the user’s browsing experience. The rediscovery of missing Web pages is, therefore, a relevant research topic in the digital preservation as well as in the Information Retrieval realm. In this article, we bring these two areas together by analyzing four content- and link-based methods to rediscover missing Web pages. We investigate the retrieval performance of the methods individually as well as their combinations and give an insight into how effective these methods are over time. As the main result of this work, we are able to recommend not only the best performing methods but also the sequence in which they should be applied, based on their performance, complexity required to generate them, and evolution over time. Our least complex single method results in a rediscovery rate of almost $$70\,\%$$ of Web pages of our sample dataset based on URIs sampled from the Open Directory Project (DMOZ). By increasing the complexity level and combining three different methods, our results show an increase of the success rate of up to $$77\,\%$$. The results, based on our sample dataset, indicate that Web pages are often not completely lost but have moved to a different location and “just” need to be rediscovered. Missing Web Pages Web Page Discovery 404 Error Web Preservation Web Archives Memento Nelson, Michael L. aut Enthalten in International journal on digital libraries Springer Berlin Heidelberg, 1997 14(2014), 1-2 vom: 01. Feb., Seite 17-38 (DE-627)223267902 (DE-600)1357321-4 (DE-576)059412127 1432-5012 nnns volume:14 year:2014 number:1-2 day:01 month:02 pages:17-38 https://doi.org/10.1007/s00799-014-0108-0 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_11 GBV_ILN_70 GBV_ILN_2018 GBV_ILN_4012 GBV_ILN_4277 AR 14 2014 1-2 01 02 17-38 |
language |
English |
source |
Enthalten in International journal on digital libraries 14(2014), 1-2 vom: 01. Feb., Seite 17-38 volume:14 year:2014 number:1-2 day:01 month:02 pages:17-38 |
sourceStr |
Enthalten in International journal on digital libraries 14(2014), 1-2 vom: 01. Feb., Seite 17-38 volume:14 year:2014 number:1-2 day:01 month:02 pages:17-38 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Missing Web Pages Web Page Discovery 404 Error Web Preservation Web Archives Memento |
dewey-raw |
020 |
isfreeaccess_bool |
false |
container_title |
International journal on digital libraries |
authorswithroles_txt_mv |
Klein, Martin @@aut@@ Nelson, Michael L. @@aut@@ |
publishDateDaySort_date |
2014-02-01T00:00:00Z |
hierarchy_top_id |
223267902 |
dewey-sort |
220 |
id |
OLC2051434379 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2051434379</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230502153658.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2014 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s00799-014-0108-0</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2051434379</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s00799-014-0108-0-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">020</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">24,1</subfield><subfield code="2">ssgn</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Klein, Martin</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Moved but not gone: an evaluation of real-time methods for discovering replacement web pages</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2014</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s) 2014</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Inaccessible Web pages and 404 “Page Not Found” responses are a common Web phenomenon and a detriment to the user’s browsing experience. The rediscovery of missing Web pages is, therefore, a relevant research topic in the digital preservation as well as in the Information Retrieval realm. In this article, we bring these two areas together by analyzing four content- and link-based methods to rediscover missing Web pages. We investigate the retrieval performance of the methods individually as well as their combinations and give an insight into how effective these methods are over time. As the main result of this work, we are able to recommend not only the best performing methods but also the sequence in which they should be applied, based on their performance, complexity required to generate them, and evolution over time. Our least complex single method results in a rediscovery rate of almost $$70\,\%$$ of Web pages of our sample dataset based on URIs sampled from the Open Directory Project (DMOZ). By increasing the complexity level and combining three different methods, our results show an increase of the success rate of up to $$77\,\%$$. The results, based on our sample dataset, indicate that Web pages are often not completely lost but have moved to a different location and “just” need to be rediscovered.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Missing Web Pages</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Web Page Discovery</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">404 Error</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Web Preservation</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Web Archives</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Memento</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Nelson, Michael L.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">International journal on digital libraries</subfield><subfield code="d">Springer Berlin Heidelberg, 1997</subfield><subfield code="g">14(2014), 1-2 vom: 01. Feb., Seite 17-38</subfield><subfield code="w">(DE-627)223267902</subfield><subfield code="w">(DE-600)1357321-4</subfield><subfield code="w">(DE-576)059412127</subfield><subfield code="x">1432-5012</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:14</subfield><subfield code="g">year:2014</subfield><subfield code="g">number:1-2</subfield><subfield code="g">day:01</subfield><subfield code="g">month:02</subfield><subfield code="g">pages:17-38</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s00799-014-0108-0</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OPC-BBI</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_11</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2018</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4277</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">14</subfield><subfield code="j">2014</subfield><subfield code="e">1-2</subfield><subfield code="b">01</subfield><subfield code="c">02</subfield><subfield code="h">17-38</subfield></datafield></record></collection>
|
author |
Klein, Martin |
spellingShingle |
Klein, Martin ddc 020 ssgn 24,1 misc Missing Web Pages misc Web Page Discovery misc 404 Error misc Web Preservation misc Web Archives misc Memento Moved but not gone: an evaluation of real-time methods for discovering replacement web pages |
authorStr |
Klein, Martin |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)223267902 |
format |
Article |
dewey-ones |
020 - Library & information sciences 004 - Data processing & computer science |
delete_txt_mv |
keep |
author_role |
aut aut |
collection |
OLC |
remote_str |
false |
illustrated |
Not Illustrated |
issn |
1432-5012 |
topic_title |
020 004 VZ 24,1 ssgn Moved but not gone: an evaluation of real-time methods for discovering replacement web pages Missing Web Pages Web Page Discovery 404 Error Web Preservation Web Archives Memento |
topic |
ddc 020 ssgn 24,1 misc Missing Web Pages misc Web Page Discovery misc 404 Error misc Web Preservation misc Web Archives misc Memento |
topic_unstemmed |
ddc 020 ssgn 24,1 misc Missing Web Pages misc Web Page Discovery misc 404 Error misc Web Preservation misc Web Archives misc Memento |
topic_browse |
ddc 020 ssgn 24,1 misc Missing Web Pages misc Web Page Discovery misc 404 Error misc Web Preservation misc Web Archives misc Memento |
format_facet |
Aufsätze Gedruckte Aufsätze |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
nc |
hierarchy_parent_title |
International journal on digital libraries |
hierarchy_parent_id |
223267902 |
dewey-tens |
020 - Library & information sciences 000 - Computer science, knowledge & systems |
hierarchy_top_title |
International journal on digital libraries |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)223267902 (DE-600)1357321-4 (DE-576)059412127 |
title |
Moved but not gone: an evaluation of real-time methods for discovering replacement web pages |
ctrlnum |
(DE-627)OLC2051434379 (DE-He213)s00799-014-0108-0-p |
title_full |
Moved but not gone: an evaluation of real-time methods for discovering replacement web pages |
author_sort |
Klein, Martin |
journal |
International journal on digital libraries |
journalStr |
International journal on digital libraries |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
000 - Computer science, information & general works |
recordtype |
marc |
publishDateSort |
2014 |
contenttype_str_mv |
txt |
container_start_page |
17 |
author_browse |
Klein, Martin Nelson, Michael L. |
container_volume |
14 |
class |
020 004 VZ 24,1 ssgn |
format_se |
Aufsätze |
author-letter |
Klein, Martin |
doi_str_mv |
10.1007/s00799-014-0108-0 |
dewey-full |
020 004 |
title_sort |
moved but not gone: an evaluation of real-time methods for discovering replacement web pages |
title_auth |
Moved but not gone: an evaluation of real-time methods for discovering replacement web pages |
abstract |
Abstract Inaccessible Web pages and 404 “Page Not Found” responses are a common Web phenomenon and a detriment to the user’s browsing experience. The rediscovery of missing Web pages is, therefore, a relevant research topic in the digital preservation as well as in the Information Retrieval realm. In this article, we bring these two areas together by analyzing four content- and link-based methods to rediscover missing Web pages. We investigate the retrieval performance of the methods individually as well as their combinations and give an insight into how effective these methods are over time. As the main result of this work, we are able to recommend not only the best performing methods but also the sequence in which they should be applied, based on their performance, complexity required to generate them, and evolution over time. Our least complex single method results in a rediscovery rate of almost $$70\,\%$$ of Web pages of our sample dataset based on URIs sampled from the Open Directory Project (DMOZ). By increasing the complexity level and combining three different methods, our results show an increase of the success rate of up to $$77\,\%$$. The results, based on our sample dataset, indicate that Web pages are often not completely lost but have moved to a different location and “just” need to be rediscovered. © The Author(s) 2014 |
abstractGer |
Abstract Inaccessible Web pages and 404 “Page Not Found” responses are a common Web phenomenon and a detriment to the user’s browsing experience. The rediscovery of missing Web pages is, therefore, a relevant research topic in the digital preservation as well as in the Information Retrieval realm. In this article, we bring these two areas together by analyzing four content- and link-based methods to rediscover missing Web pages. We investigate the retrieval performance of the methods individually as well as their combinations and give an insight into how effective these methods are over time. As the main result of this work, we are able to recommend not only the best performing methods but also the sequence in which they should be applied, based on their performance, complexity required to generate them, and evolution over time. Our least complex single method results in a rediscovery rate of almost $$70\,\%$$ of Web pages of our sample dataset based on URIs sampled from the Open Directory Project (DMOZ). By increasing the complexity level and combining three different methods, our results show an increase of the success rate of up to $$77\,\%$$. The results, based on our sample dataset, indicate that Web pages are often not completely lost but have moved to a different location and “just” need to be rediscovered. © The Author(s) 2014 |
abstract_unstemmed |
Abstract Inaccessible Web pages and 404 “Page Not Found” responses are a common Web phenomenon and a detriment to the user’s browsing experience. The rediscovery of missing Web pages is, therefore, a relevant research topic in the digital preservation as well as in the Information Retrieval realm. In this article, we bring these two areas together by analyzing four content- and link-based methods to rediscover missing Web pages. We investigate the retrieval performance of the methods individually as well as their combinations and give an insight into how effective these methods are over time. As the main result of this work, we are able to recommend not only the best performing methods but also the sequence in which they should be applied, based on their performance, complexity required to generate them, and evolution over time. Our least complex single method results in a rediscovery rate of almost $$70\,\%$$ of Web pages of our sample dataset based on URIs sampled from the Open Directory Project (DMOZ). By increasing the complexity level and combining three different methods, our results show an increase of the success rate of up to $$77\,\%$$. The results, based on our sample dataset, indicate that Web pages are often not completely lost but have moved to a different location and “just” need to be rediscovered. © The Author(s) 2014 |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_11 GBV_ILN_70 GBV_ILN_2018 GBV_ILN_4012 GBV_ILN_4277 |
container_issue |
1-2 |
title_short |
Moved but not gone: an evaluation of real-time methods for discovering replacement web pages |
url |
https://doi.org/10.1007/s00799-014-0108-0 |
remote_bool |
false |
author2 |
Nelson, Michael L. |
author2Str |
Nelson, Michael L. |
ppnlink |
223267902 |
mediatype_str_mv |
n |
isOA_txt |
false |
hochschulschrift_bool |
false |
doi_str |
10.1007/s00799-014-0108-0 |
up_date |
2024-07-04T04:25:32.075Z |
_version_ |
1803621113083723776 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2051434379</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230502153658.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2014 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s00799-014-0108-0</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2051434379</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s00799-014-0108-0-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">020</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">24,1</subfield><subfield code="2">ssgn</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Klein, Martin</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Moved but not gone: an evaluation of real-time methods for discovering replacement web pages</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2014</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s) 2014</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Inaccessible Web pages and 404 “Page Not Found” responses are a common Web phenomenon and a detriment to the user’s browsing experience. The rediscovery of missing Web pages is, therefore, a relevant research topic in the digital preservation as well as in the Information Retrieval realm. In this article, we bring these two areas together by analyzing four content- and link-based methods to rediscover missing Web pages. We investigate the retrieval performance of the methods individually as well as their combinations and give an insight into how effective these methods are over time. As the main result of this work, we are able to recommend not only the best performing methods but also the sequence in which they should be applied, based on their performance, complexity required to generate them, and evolution over time. Our least complex single method results in a rediscovery rate of almost $$70\,\%$$ of Web pages of our sample dataset based on URIs sampled from the Open Directory Project (DMOZ). By increasing the complexity level and combining three different methods, our results show an increase of the success rate of up to $$77\,\%$$. The results, based on our sample dataset, indicate that Web pages are often not completely lost but have moved to a different location and “just” need to be rediscovered.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Missing Web Pages</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Web Page Discovery</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">404 Error</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Web Preservation</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Web Archives</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Memento</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Nelson, Michael L.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">International journal on digital libraries</subfield><subfield code="d">Springer Berlin Heidelberg, 1997</subfield><subfield code="g">14(2014), 1-2 vom: 01. Feb., Seite 17-38</subfield><subfield code="w">(DE-627)223267902</subfield><subfield code="w">(DE-600)1357321-4</subfield><subfield code="w">(DE-576)059412127</subfield><subfield code="x">1432-5012</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:14</subfield><subfield code="g">year:2014</subfield><subfield code="g">number:1-2</subfield><subfield code="g">day:01</subfield><subfield code="g">month:02</subfield><subfield code="g">pages:17-38</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s00799-014-0108-0</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OPC-BBI</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_11</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2018</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4277</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">14</subfield><subfield code="j">2014</subfield><subfield code="e">1-2</subfield><subfield code="b">01</subfield><subfield code="c">02</subfield><subfield code="h">17-38</subfield></datafield></record></collection>
|
score |
7.3980455 |