Improved characters distance sampling for online and offline text searching
Sampled string matching is a very effective technique to reduce the search time for a pattern within a text at the cost of a small amount of additional memory, used for storing a partial index of the text. This approach has recently received some interest and has been applied to improve both online...
Ausführliche Beschreibung
Autor*in: |
Faro, Simone [verfasserIn] Marino, Francesco Pio [verfasserIn] Pavone, Arianna [verfasserIn] |
---|
Format: |
E-Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2022 |
---|
Schlagwörter: |
---|
Übergeordnetes Werk: |
Enthalten in: Theoretical computer science - Amsterdam [u.a.] : Elsevier, 1975, 946 |
---|---|
Übergeordnetes Werk: |
volume:946 |
DOI / URN: |
10.1016/j.tcs.2022.12.034 |
---|
Katalog-ID: |
ELV009137246 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | ELV009137246 | ||
003 | DE-627 | ||
005 | 20231021093121.0 | ||
007 | cr uuu---uuuuu | ||
008 | 230510s2022 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1016/j.tcs.2022.12.034 |2 doi | |
035 | |a (DE-627)ELV009137246 | ||
035 | |a (ELSEVIER)S0304-3975(22)00764-2 | ||
040 | |a DE-627 |b ger |c DE-627 |e rda | ||
041 | |a eng | ||
082 | 0 | 4 | |a 004 |q VZ |
084 | |a 54.10 |2 bkl | ||
100 | 1 | |a Faro, Simone |e verfasserin |0 (orcid)0000-0001-5937-5796 |4 aut | |
245 | 1 | 0 | |a Improved characters distance sampling for online and offline text searching |
264 | 1 | |c 2022 | |
336 | |a nicht spezifiziert |b zzz |2 rdacontent | ||
337 | |a Computermedien |b c |2 rdamedia | ||
338 | |a Online-Ressource |b cr |2 rdacarrier | ||
520 | |a Sampled string matching is a very effective technique to reduce the search time for a pattern within a text at the cost of a small amount of additional memory, used for storing a partial index of the text. This approach has recently received some interest and has been applied to improve both online and offline string matching solutions, improving standard solutions by more than 50%. However, this improvement is currently only achievable in the case of texts on large-sized alphabets, and remains small (or absent) in the case of small-sized alphabets. In this article we propose an extension of the approach to text-sampling, known as Character Distance Sampling, to the case of small alphabets, obtaining an improvement of up to 98% compared to standard solutions in the case of online string matching. We also extend this approach to the case of offline string matching, introducing a sampled version of the suffix array, obtaining performances up to 5 times higher than the search obtained on the standard suffix array. Differently from what has been done by previous solutions, our idea is not based on the reduction of the number of indexed suffixes, but on the construction of the index directly on the sampled text. | ||
650 | 4 | |a Text processing | |
650 | 4 | |a Experimental algorithms | |
650 | 4 | |a String matching | |
700 | 1 | |a Marino, Francesco Pio |e verfasserin |4 aut | |
700 | 1 | |a Pavone, Arianna |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Theoretical computer science |d Amsterdam [u.a.] : Elsevier, 1975 |g 946 |h Online-Ressource |w (DE-627)265784174 |w (DE-600)1466347-8 |w (DE-576)074891030 |7 nnns |
773 | 1 | 8 | |g volume:946 |
912 | |a GBV_USEFLAG_U | ||
912 | |a GBV_ELV | ||
912 | |a SYSFLAG_U | ||
912 | |a GBV_ILN_20 | ||
912 | |a GBV_ILN_22 | ||
912 | |a GBV_ILN_23 | ||
912 | |a GBV_ILN_24 | ||
912 | |a GBV_ILN_31 | ||
912 | |a GBV_ILN_32 | ||
912 | |a GBV_ILN_40 | ||
912 | |a GBV_ILN_60 | ||
912 | |a GBV_ILN_62 | ||
912 | |a GBV_ILN_63 | ||
912 | |a GBV_ILN_65 | ||
912 | |a GBV_ILN_69 | ||
912 | |a GBV_ILN_70 | ||
912 | |a GBV_ILN_73 | ||
912 | |a GBV_ILN_74 | ||
912 | |a GBV_ILN_90 | ||
912 | |a GBV_ILN_95 | ||
912 | |a GBV_ILN_100 | ||
912 | |a GBV_ILN_101 | ||
912 | |a GBV_ILN_105 | ||
912 | |a GBV_ILN_110 | ||
912 | |a GBV_ILN_150 | ||
912 | |a GBV_ILN_151 | ||
912 | |a GBV_ILN_224 | ||
912 | |a GBV_ILN_370 | ||
912 | |a GBV_ILN_602 | ||
912 | |a GBV_ILN_702 | ||
912 | |a GBV_ILN_2003 | ||
912 | |a GBV_ILN_2004 | ||
912 | |a GBV_ILN_2005 | ||
912 | |a GBV_ILN_2011 | ||
912 | |a GBV_ILN_2014 | ||
912 | |a GBV_ILN_2015 | ||
912 | |a GBV_ILN_2020 | ||
912 | |a GBV_ILN_2021 | ||
912 | |a GBV_ILN_2025 | ||
912 | |a GBV_ILN_2027 | ||
912 | |a GBV_ILN_2034 | ||
912 | |a GBV_ILN_2038 | ||
912 | |a GBV_ILN_2044 | ||
912 | |a GBV_ILN_2048 | ||
912 | |a GBV_ILN_2049 | ||
912 | |a GBV_ILN_2050 | ||
912 | |a GBV_ILN_2056 | ||
912 | |a GBV_ILN_2059 | ||
912 | |a GBV_ILN_2061 | ||
912 | |a GBV_ILN_2064 | ||
912 | |a GBV_ILN_2065 | ||
912 | |a GBV_ILN_2068 | ||
912 | |a GBV_ILN_2111 | ||
912 | |a GBV_ILN_2112 | ||
912 | |a GBV_ILN_2113 | ||
912 | |a GBV_ILN_2118 | ||
912 | |a GBV_ILN_2122 | ||
912 | |a GBV_ILN_2129 | ||
912 | |a GBV_ILN_2143 | ||
912 | |a GBV_ILN_2147 | ||
912 | |a GBV_ILN_2148 | ||
912 | |a GBV_ILN_2152 | ||
912 | |a GBV_ILN_2153 | ||
912 | |a GBV_ILN_2190 | ||
912 | |a GBV_ILN_2336 | ||
912 | |a GBV_ILN_2507 | ||
912 | |a GBV_ILN_2522 | ||
912 | |a GBV_ILN_4012 | ||
912 | |a GBV_ILN_4035 | ||
912 | |a GBV_ILN_4037 | ||
912 | |a GBV_ILN_4112 | ||
912 | |a GBV_ILN_4125 | ||
912 | |a GBV_ILN_4126 | ||
912 | |a GBV_ILN_4242 | ||
912 | |a GBV_ILN_4249 | ||
912 | |a GBV_ILN_4251 | ||
912 | |a GBV_ILN_4305 | ||
912 | |a GBV_ILN_4306 | ||
912 | |a GBV_ILN_4307 | ||
912 | |a GBV_ILN_4313 | ||
912 | |a GBV_ILN_4322 | ||
912 | |a GBV_ILN_4323 | ||
912 | |a GBV_ILN_4324 | ||
912 | |a GBV_ILN_4325 | ||
912 | |a GBV_ILN_4326 | ||
912 | |a GBV_ILN_4333 | ||
912 | |a GBV_ILN_4334 | ||
912 | |a GBV_ILN_4335 | ||
912 | |a GBV_ILN_4338 | ||
912 | |a GBV_ILN_4367 | ||
912 | |a GBV_ILN_4393 | ||
912 | |a GBV_ILN_4700 | ||
936 | b | k | |a 54.10 |j Theoretische Informatik |q VZ |
951 | |a AR | ||
952 | |d 946 |
author_variant |
s f sf f p m fp fpm a p ap |
---|---|
matchkey_str |
farosimonemarinofrancescopiopavoneariann:2022----:mrvdhrcesitneapigoolnadf |
hierarchy_sort_str |
2022 |
bklnumber |
54.10 |
publishDate |
2022 |
allfields |
10.1016/j.tcs.2022.12.034 doi (DE-627)ELV009137246 (ELSEVIER)S0304-3975(22)00764-2 DE-627 ger DE-627 rda eng 004 VZ 54.10 bkl Faro, Simone verfasserin (orcid)0000-0001-5937-5796 aut Improved characters distance sampling for online and offline text searching 2022 nicht spezifiziert zzz rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Sampled string matching is a very effective technique to reduce the search time for a pattern within a text at the cost of a small amount of additional memory, used for storing a partial index of the text. This approach has recently received some interest and has been applied to improve both online and offline string matching solutions, improving standard solutions by more than 50%. However, this improvement is currently only achievable in the case of texts on large-sized alphabets, and remains small (or absent) in the case of small-sized alphabets. In this article we propose an extension of the approach to text-sampling, known as Character Distance Sampling, to the case of small alphabets, obtaining an improvement of up to 98% compared to standard solutions in the case of online string matching. We also extend this approach to the case of offline string matching, introducing a sampled version of the suffix array, obtaining performances up to 5 times higher than the search obtained on the standard suffix array. Differently from what has been done by previous solutions, our idea is not based on the reduction of the number of indexed suffixes, but on the construction of the index directly on the sampled text. Text processing Experimental algorithms String matching Marino, Francesco Pio verfasserin aut Pavone, Arianna verfasserin aut Enthalten in Theoretical computer science Amsterdam [u.a.] : Elsevier, 1975 946 Online-Ressource (DE-627)265784174 (DE-600)1466347-8 (DE-576)074891030 nnns volume:946 GBV_USEFLAG_U GBV_ELV SYSFLAG_U GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_32 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_90 GBV_ILN_95 GBV_ILN_100 GBV_ILN_101 GBV_ILN_105 GBV_ILN_110 GBV_ILN_150 GBV_ILN_151 GBV_ILN_224 GBV_ILN_370 GBV_ILN_602 GBV_ILN_702 GBV_ILN_2003 GBV_ILN_2004 GBV_ILN_2005 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2015 GBV_ILN_2020 GBV_ILN_2021 GBV_ILN_2025 GBV_ILN_2027 GBV_ILN_2034 GBV_ILN_2038 GBV_ILN_2044 GBV_ILN_2048 GBV_ILN_2049 GBV_ILN_2050 GBV_ILN_2056 GBV_ILN_2059 GBV_ILN_2061 GBV_ILN_2064 GBV_ILN_2065 GBV_ILN_2068 GBV_ILN_2111 GBV_ILN_2112 GBV_ILN_2113 GBV_ILN_2118 GBV_ILN_2122 GBV_ILN_2129 GBV_ILN_2143 GBV_ILN_2147 GBV_ILN_2148 GBV_ILN_2152 GBV_ILN_2153 GBV_ILN_2190 GBV_ILN_2336 GBV_ILN_2507 GBV_ILN_2522 GBV_ILN_4012 GBV_ILN_4035 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4242 GBV_ILN_4249 GBV_ILN_4251 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4333 GBV_ILN_4334 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4393 GBV_ILN_4700 54.10 Theoretische Informatik VZ AR 946 |
spelling |
10.1016/j.tcs.2022.12.034 doi (DE-627)ELV009137246 (ELSEVIER)S0304-3975(22)00764-2 DE-627 ger DE-627 rda eng 004 VZ 54.10 bkl Faro, Simone verfasserin (orcid)0000-0001-5937-5796 aut Improved characters distance sampling for online and offline text searching 2022 nicht spezifiziert zzz rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Sampled string matching is a very effective technique to reduce the search time for a pattern within a text at the cost of a small amount of additional memory, used for storing a partial index of the text. This approach has recently received some interest and has been applied to improve both online and offline string matching solutions, improving standard solutions by more than 50%. However, this improvement is currently only achievable in the case of texts on large-sized alphabets, and remains small (or absent) in the case of small-sized alphabets. In this article we propose an extension of the approach to text-sampling, known as Character Distance Sampling, to the case of small alphabets, obtaining an improvement of up to 98% compared to standard solutions in the case of online string matching. We also extend this approach to the case of offline string matching, introducing a sampled version of the suffix array, obtaining performances up to 5 times higher than the search obtained on the standard suffix array. Differently from what has been done by previous solutions, our idea is not based on the reduction of the number of indexed suffixes, but on the construction of the index directly on the sampled text. Text processing Experimental algorithms String matching Marino, Francesco Pio verfasserin aut Pavone, Arianna verfasserin aut Enthalten in Theoretical computer science Amsterdam [u.a.] : Elsevier, 1975 946 Online-Ressource (DE-627)265784174 (DE-600)1466347-8 (DE-576)074891030 nnns volume:946 GBV_USEFLAG_U GBV_ELV SYSFLAG_U GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_32 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_90 GBV_ILN_95 GBV_ILN_100 GBV_ILN_101 GBV_ILN_105 GBV_ILN_110 GBV_ILN_150 GBV_ILN_151 GBV_ILN_224 GBV_ILN_370 GBV_ILN_602 GBV_ILN_702 GBV_ILN_2003 GBV_ILN_2004 GBV_ILN_2005 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2015 GBV_ILN_2020 GBV_ILN_2021 GBV_ILN_2025 GBV_ILN_2027 GBV_ILN_2034 GBV_ILN_2038 GBV_ILN_2044 GBV_ILN_2048 GBV_ILN_2049 GBV_ILN_2050 GBV_ILN_2056 GBV_ILN_2059 GBV_ILN_2061 GBV_ILN_2064 GBV_ILN_2065 GBV_ILN_2068 GBV_ILN_2111 GBV_ILN_2112 GBV_ILN_2113 GBV_ILN_2118 GBV_ILN_2122 GBV_ILN_2129 GBV_ILN_2143 GBV_ILN_2147 GBV_ILN_2148 GBV_ILN_2152 GBV_ILN_2153 GBV_ILN_2190 GBV_ILN_2336 GBV_ILN_2507 GBV_ILN_2522 GBV_ILN_4012 GBV_ILN_4035 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4242 GBV_ILN_4249 GBV_ILN_4251 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4333 GBV_ILN_4334 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4393 GBV_ILN_4700 54.10 Theoretische Informatik VZ AR 946 |
allfields_unstemmed |
10.1016/j.tcs.2022.12.034 doi (DE-627)ELV009137246 (ELSEVIER)S0304-3975(22)00764-2 DE-627 ger DE-627 rda eng 004 VZ 54.10 bkl Faro, Simone verfasserin (orcid)0000-0001-5937-5796 aut Improved characters distance sampling for online and offline text searching 2022 nicht spezifiziert zzz rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Sampled string matching is a very effective technique to reduce the search time for a pattern within a text at the cost of a small amount of additional memory, used for storing a partial index of the text. This approach has recently received some interest and has been applied to improve both online and offline string matching solutions, improving standard solutions by more than 50%. However, this improvement is currently only achievable in the case of texts on large-sized alphabets, and remains small (or absent) in the case of small-sized alphabets. In this article we propose an extension of the approach to text-sampling, known as Character Distance Sampling, to the case of small alphabets, obtaining an improvement of up to 98% compared to standard solutions in the case of online string matching. We also extend this approach to the case of offline string matching, introducing a sampled version of the suffix array, obtaining performances up to 5 times higher than the search obtained on the standard suffix array. Differently from what has been done by previous solutions, our idea is not based on the reduction of the number of indexed suffixes, but on the construction of the index directly on the sampled text. Text processing Experimental algorithms String matching Marino, Francesco Pio verfasserin aut Pavone, Arianna verfasserin aut Enthalten in Theoretical computer science Amsterdam [u.a.] : Elsevier, 1975 946 Online-Ressource (DE-627)265784174 (DE-600)1466347-8 (DE-576)074891030 nnns volume:946 GBV_USEFLAG_U GBV_ELV SYSFLAG_U GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_32 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_90 GBV_ILN_95 GBV_ILN_100 GBV_ILN_101 GBV_ILN_105 GBV_ILN_110 GBV_ILN_150 GBV_ILN_151 GBV_ILN_224 GBV_ILN_370 GBV_ILN_602 GBV_ILN_702 GBV_ILN_2003 GBV_ILN_2004 GBV_ILN_2005 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2015 GBV_ILN_2020 GBV_ILN_2021 GBV_ILN_2025 GBV_ILN_2027 GBV_ILN_2034 GBV_ILN_2038 GBV_ILN_2044 GBV_ILN_2048 GBV_ILN_2049 GBV_ILN_2050 GBV_ILN_2056 GBV_ILN_2059 GBV_ILN_2061 GBV_ILN_2064 GBV_ILN_2065 GBV_ILN_2068 GBV_ILN_2111 GBV_ILN_2112 GBV_ILN_2113 GBV_ILN_2118 GBV_ILN_2122 GBV_ILN_2129 GBV_ILN_2143 GBV_ILN_2147 GBV_ILN_2148 GBV_ILN_2152 GBV_ILN_2153 GBV_ILN_2190 GBV_ILN_2336 GBV_ILN_2507 GBV_ILN_2522 GBV_ILN_4012 GBV_ILN_4035 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4242 GBV_ILN_4249 GBV_ILN_4251 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4333 GBV_ILN_4334 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4393 GBV_ILN_4700 54.10 Theoretische Informatik VZ AR 946 |
allfieldsGer |
10.1016/j.tcs.2022.12.034 doi (DE-627)ELV009137246 (ELSEVIER)S0304-3975(22)00764-2 DE-627 ger DE-627 rda eng 004 VZ 54.10 bkl Faro, Simone verfasserin (orcid)0000-0001-5937-5796 aut Improved characters distance sampling for online and offline text searching 2022 nicht spezifiziert zzz rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Sampled string matching is a very effective technique to reduce the search time for a pattern within a text at the cost of a small amount of additional memory, used for storing a partial index of the text. This approach has recently received some interest and has been applied to improve both online and offline string matching solutions, improving standard solutions by more than 50%. However, this improvement is currently only achievable in the case of texts on large-sized alphabets, and remains small (or absent) in the case of small-sized alphabets. In this article we propose an extension of the approach to text-sampling, known as Character Distance Sampling, to the case of small alphabets, obtaining an improvement of up to 98% compared to standard solutions in the case of online string matching. We also extend this approach to the case of offline string matching, introducing a sampled version of the suffix array, obtaining performances up to 5 times higher than the search obtained on the standard suffix array. Differently from what has been done by previous solutions, our idea is not based on the reduction of the number of indexed suffixes, but on the construction of the index directly on the sampled text. Text processing Experimental algorithms String matching Marino, Francesco Pio verfasserin aut Pavone, Arianna verfasserin aut Enthalten in Theoretical computer science Amsterdam [u.a.] : Elsevier, 1975 946 Online-Ressource (DE-627)265784174 (DE-600)1466347-8 (DE-576)074891030 nnns volume:946 GBV_USEFLAG_U GBV_ELV SYSFLAG_U GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_32 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_90 GBV_ILN_95 GBV_ILN_100 GBV_ILN_101 GBV_ILN_105 GBV_ILN_110 GBV_ILN_150 GBV_ILN_151 GBV_ILN_224 GBV_ILN_370 GBV_ILN_602 GBV_ILN_702 GBV_ILN_2003 GBV_ILN_2004 GBV_ILN_2005 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2015 GBV_ILN_2020 GBV_ILN_2021 GBV_ILN_2025 GBV_ILN_2027 GBV_ILN_2034 GBV_ILN_2038 GBV_ILN_2044 GBV_ILN_2048 GBV_ILN_2049 GBV_ILN_2050 GBV_ILN_2056 GBV_ILN_2059 GBV_ILN_2061 GBV_ILN_2064 GBV_ILN_2065 GBV_ILN_2068 GBV_ILN_2111 GBV_ILN_2112 GBV_ILN_2113 GBV_ILN_2118 GBV_ILN_2122 GBV_ILN_2129 GBV_ILN_2143 GBV_ILN_2147 GBV_ILN_2148 GBV_ILN_2152 GBV_ILN_2153 GBV_ILN_2190 GBV_ILN_2336 GBV_ILN_2507 GBV_ILN_2522 GBV_ILN_4012 GBV_ILN_4035 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4242 GBV_ILN_4249 GBV_ILN_4251 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4333 GBV_ILN_4334 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4393 GBV_ILN_4700 54.10 Theoretische Informatik VZ AR 946 |
allfieldsSound |
10.1016/j.tcs.2022.12.034 doi (DE-627)ELV009137246 (ELSEVIER)S0304-3975(22)00764-2 DE-627 ger DE-627 rda eng 004 VZ 54.10 bkl Faro, Simone verfasserin (orcid)0000-0001-5937-5796 aut Improved characters distance sampling for online and offline text searching 2022 nicht spezifiziert zzz rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Sampled string matching is a very effective technique to reduce the search time for a pattern within a text at the cost of a small amount of additional memory, used for storing a partial index of the text. This approach has recently received some interest and has been applied to improve both online and offline string matching solutions, improving standard solutions by more than 50%. However, this improvement is currently only achievable in the case of texts on large-sized alphabets, and remains small (or absent) in the case of small-sized alphabets. In this article we propose an extension of the approach to text-sampling, known as Character Distance Sampling, to the case of small alphabets, obtaining an improvement of up to 98% compared to standard solutions in the case of online string matching. We also extend this approach to the case of offline string matching, introducing a sampled version of the suffix array, obtaining performances up to 5 times higher than the search obtained on the standard suffix array. Differently from what has been done by previous solutions, our idea is not based on the reduction of the number of indexed suffixes, but on the construction of the index directly on the sampled text. Text processing Experimental algorithms String matching Marino, Francesco Pio verfasserin aut Pavone, Arianna verfasserin aut Enthalten in Theoretical computer science Amsterdam [u.a.] : Elsevier, 1975 946 Online-Ressource (DE-627)265784174 (DE-600)1466347-8 (DE-576)074891030 nnns volume:946 GBV_USEFLAG_U GBV_ELV SYSFLAG_U GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_32 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_90 GBV_ILN_95 GBV_ILN_100 GBV_ILN_101 GBV_ILN_105 GBV_ILN_110 GBV_ILN_150 GBV_ILN_151 GBV_ILN_224 GBV_ILN_370 GBV_ILN_602 GBV_ILN_702 GBV_ILN_2003 GBV_ILN_2004 GBV_ILN_2005 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2015 GBV_ILN_2020 GBV_ILN_2021 GBV_ILN_2025 GBV_ILN_2027 GBV_ILN_2034 GBV_ILN_2038 GBV_ILN_2044 GBV_ILN_2048 GBV_ILN_2049 GBV_ILN_2050 GBV_ILN_2056 GBV_ILN_2059 GBV_ILN_2061 GBV_ILN_2064 GBV_ILN_2065 GBV_ILN_2068 GBV_ILN_2111 GBV_ILN_2112 GBV_ILN_2113 GBV_ILN_2118 GBV_ILN_2122 GBV_ILN_2129 GBV_ILN_2143 GBV_ILN_2147 GBV_ILN_2148 GBV_ILN_2152 GBV_ILN_2153 GBV_ILN_2190 GBV_ILN_2336 GBV_ILN_2507 GBV_ILN_2522 GBV_ILN_4012 GBV_ILN_4035 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4242 GBV_ILN_4249 GBV_ILN_4251 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4333 GBV_ILN_4334 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4393 GBV_ILN_4700 54.10 Theoretische Informatik VZ AR 946 |
language |
English |
source |
Enthalten in Theoretical computer science 946 volume:946 |
sourceStr |
Enthalten in Theoretical computer science 946 volume:946 |
format_phy_str_mv |
Article |
bklname |
Theoretische Informatik |
institution |
findex.gbv.de |
topic_facet |
Text processing Experimental algorithms String matching |
dewey-raw |
004 |
isfreeaccess_bool |
false |
container_title |
Theoretical computer science |
authorswithroles_txt_mv |
Faro, Simone @@aut@@ Marino, Francesco Pio @@aut@@ Pavone, Arianna @@aut@@ |
publishDateDaySort_date |
2022-01-01T00:00:00Z |
hierarchy_top_id |
265784174 |
dewey-sort |
14 |
id |
ELV009137246 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">ELV009137246</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20231021093121.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">230510s2022 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1016/j.tcs.2022.12.034</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)ELV009137246</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ELSEVIER)S0304-3975(22)00764-2</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.10</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Faro, Simone</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0001-5937-5796</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Improved characters distance sampling for online and offline text searching</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2022</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">zzz</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Sampled string matching is a very effective technique to reduce the search time for a pattern within a text at the cost of a small amount of additional memory, used for storing a partial index of the text. This approach has recently received some interest and has been applied to improve both online and offline string matching solutions, improving standard solutions by more than 50%. However, this improvement is currently only achievable in the case of texts on large-sized alphabets, and remains small (or absent) in the case of small-sized alphabets. In this article we propose an extension of the approach to text-sampling, known as Character Distance Sampling, to the case of small alphabets, obtaining an improvement of up to 98% compared to standard solutions in the case of online string matching. We also extend this approach to the case of offline string matching, introducing a sampled version of the suffix array, obtaining performances up to 5 times higher than the search obtained on the standard suffix array. Differently from what has been done by previous solutions, our idea is not based on the reduction of the number of indexed suffixes, but on the construction of the index directly on the sampled text.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Text processing</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Experimental algorithms</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">String matching</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Marino, Francesco Pio</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Pavone, Arianna</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Theoretical computer science</subfield><subfield code="d">Amsterdam [u.a.] : Elsevier, 1975</subfield><subfield code="g">946</subfield><subfield code="h">Online-Ressource</subfield><subfield code="w">(DE-627)265784174</subfield><subfield code="w">(DE-600)1466347-8</subfield><subfield code="w">(DE-576)074891030</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:946</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ELV</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_20</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_23</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_24</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_31</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_32</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_60</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_62</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_63</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_65</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_69</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_73</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_74</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_90</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_95</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_100</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_101</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_105</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_110</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_150</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_151</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_224</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_370</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_602</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_702</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2003</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2004</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2005</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2011</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2014</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2015</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2020</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2021</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2025</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2027</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2034</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2038</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2044</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2048</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2049</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2050</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2056</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2059</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2061</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2064</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2065</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2068</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2111</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2112</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2113</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2118</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2122</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2129</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2143</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2147</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2148</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2152</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2153</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2190</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2336</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2507</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2522</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4035</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4037</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4112</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4125</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4126</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4242</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4249</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4251</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4305</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4306</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4313</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4322</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4323</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4325</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4326</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4333</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4334</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4335</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4338</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4367</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4393</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4700</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.10</subfield><subfield code="j">Theoretische Informatik</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">946</subfield></datafield></record></collection>
|
author |
Faro, Simone |
spellingShingle |
Faro, Simone ddc 004 bkl 54.10 misc Text processing misc Experimental algorithms misc String matching Improved characters distance sampling for online and offline text searching |
authorStr |
Faro, Simone |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)265784174 |
format |
electronic Article |
dewey-ones |
004 - Data processing & computer science |
delete_txt_mv |
keep |
author_role |
aut aut aut |
collection |
elsevier |
remote_str |
true |
illustrated |
Not Illustrated |
topic_title |
004 VZ 54.10 bkl Improved characters distance sampling for online and offline text searching Text processing Experimental algorithms String matching |
topic |
ddc 004 bkl 54.10 misc Text processing misc Experimental algorithms misc String matching |
topic_unstemmed |
ddc 004 bkl 54.10 misc Text processing misc Experimental algorithms misc String matching |
topic_browse |
ddc 004 bkl 54.10 misc Text processing misc Experimental algorithms misc String matching |
format_facet |
Elektronische Aufsätze Aufsätze Elektronische Ressource |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
cr |
hierarchy_parent_title |
Theoretical computer science |
hierarchy_parent_id |
265784174 |
dewey-tens |
000 - Computer science, knowledge & systems |
hierarchy_top_title |
Theoretical computer science |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)265784174 (DE-600)1466347-8 (DE-576)074891030 |
title |
Improved characters distance sampling for online and offline text searching |
ctrlnum |
(DE-627)ELV009137246 (ELSEVIER)S0304-3975(22)00764-2 |
title_full |
Improved characters distance sampling for online and offline text searching |
author_sort |
Faro, Simone |
journal |
Theoretical computer science |
journalStr |
Theoretical computer science |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
000 - Computer science, information & general works |
recordtype |
marc |
publishDateSort |
2022 |
contenttype_str_mv |
zzz |
author_browse |
Faro, Simone Marino, Francesco Pio Pavone, Arianna |
container_volume |
946 |
class |
004 VZ 54.10 bkl |
format_se |
Elektronische Aufsätze |
author-letter |
Faro, Simone |
doi_str_mv |
10.1016/j.tcs.2022.12.034 |
normlink |
(ORCID)0000-0001-5937-5796 |
normlink_prefix_str_mv |
(orcid)0000-0001-5937-5796 |
dewey-full |
004 |
author2-role |
verfasserin |
title_sort |
improved characters distance sampling for online and offline text searching |
title_auth |
Improved characters distance sampling for online and offline text searching |
abstract |
Sampled string matching is a very effective technique to reduce the search time for a pattern within a text at the cost of a small amount of additional memory, used for storing a partial index of the text. This approach has recently received some interest and has been applied to improve both online and offline string matching solutions, improving standard solutions by more than 50%. However, this improvement is currently only achievable in the case of texts on large-sized alphabets, and remains small (or absent) in the case of small-sized alphabets. In this article we propose an extension of the approach to text-sampling, known as Character Distance Sampling, to the case of small alphabets, obtaining an improvement of up to 98% compared to standard solutions in the case of online string matching. We also extend this approach to the case of offline string matching, introducing a sampled version of the suffix array, obtaining performances up to 5 times higher than the search obtained on the standard suffix array. Differently from what has been done by previous solutions, our idea is not based on the reduction of the number of indexed suffixes, but on the construction of the index directly on the sampled text. |
abstractGer |
Sampled string matching is a very effective technique to reduce the search time for a pattern within a text at the cost of a small amount of additional memory, used for storing a partial index of the text. This approach has recently received some interest and has been applied to improve both online and offline string matching solutions, improving standard solutions by more than 50%. However, this improvement is currently only achievable in the case of texts on large-sized alphabets, and remains small (or absent) in the case of small-sized alphabets. In this article we propose an extension of the approach to text-sampling, known as Character Distance Sampling, to the case of small alphabets, obtaining an improvement of up to 98% compared to standard solutions in the case of online string matching. We also extend this approach to the case of offline string matching, introducing a sampled version of the suffix array, obtaining performances up to 5 times higher than the search obtained on the standard suffix array. Differently from what has been done by previous solutions, our idea is not based on the reduction of the number of indexed suffixes, but on the construction of the index directly on the sampled text. |
abstract_unstemmed |
Sampled string matching is a very effective technique to reduce the search time for a pattern within a text at the cost of a small amount of additional memory, used for storing a partial index of the text. This approach has recently received some interest and has been applied to improve both online and offline string matching solutions, improving standard solutions by more than 50%. However, this improvement is currently only achievable in the case of texts on large-sized alphabets, and remains small (or absent) in the case of small-sized alphabets. In this article we propose an extension of the approach to text-sampling, known as Character Distance Sampling, to the case of small alphabets, obtaining an improvement of up to 98% compared to standard solutions in the case of online string matching. We also extend this approach to the case of offline string matching, introducing a sampled version of the suffix array, obtaining performances up to 5 times higher than the search obtained on the standard suffix array. Differently from what has been done by previous solutions, our idea is not based on the reduction of the number of indexed suffixes, but on the construction of the index directly on the sampled text. |
collection_details |
GBV_USEFLAG_U GBV_ELV SYSFLAG_U GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_32 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_90 GBV_ILN_95 GBV_ILN_100 GBV_ILN_101 GBV_ILN_105 GBV_ILN_110 GBV_ILN_150 GBV_ILN_151 GBV_ILN_224 GBV_ILN_370 GBV_ILN_602 GBV_ILN_702 GBV_ILN_2003 GBV_ILN_2004 GBV_ILN_2005 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2015 GBV_ILN_2020 GBV_ILN_2021 GBV_ILN_2025 GBV_ILN_2027 GBV_ILN_2034 GBV_ILN_2038 GBV_ILN_2044 GBV_ILN_2048 GBV_ILN_2049 GBV_ILN_2050 GBV_ILN_2056 GBV_ILN_2059 GBV_ILN_2061 GBV_ILN_2064 GBV_ILN_2065 GBV_ILN_2068 GBV_ILN_2111 GBV_ILN_2112 GBV_ILN_2113 GBV_ILN_2118 GBV_ILN_2122 GBV_ILN_2129 GBV_ILN_2143 GBV_ILN_2147 GBV_ILN_2148 GBV_ILN_2152 GBV_ILN_2153 GBV_ILN_2190 GBV_ILN_2336 GBV_ILN_2507 GBV_ILN_2522 GBV_ILN_4012 GBV_ILN_4035 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4242 GBV_ILN_4249 GBV_ILN_4251 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4333 GBV_ILN_4334 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4393 GBV_ILN_4700 |
title_short |
Improved characters distance sampling for online and offline text searching |
remote_bool |
true |
author2 |
Marino, Francesco Pio Pavone, Arianna |
author2Str |
Marino, Francesco Pio Pavone, Arianna |
ppnlink |
265784174 |
mediatype_str_mv |
c |
isOA_txt |
false |
hochschulschrift_bool |
false |
doi_str |
10.1016/j.tcs.2022.12.034 |
up_date |
2024-07-06T22:07:26.173Z |
_version_ |
1803869116090548224 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">ELV009137246</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20231021093121.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">230510s2022 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1016/j.tcs.2022.12.034</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)ELV009137246</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ELSEVIER)S0304-3975(22)00764-2</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.10</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Faro, Simone</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0001-5937-5796</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Improved characters distance sampling for online and offline text searching</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2022</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">zzz</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Sampled string matching is a very effective technique to reduce the search time for a pattern within a text at the cost of a small amount of additional memory, used for storing a partial index of the text. This approach has recently received some interest and has been applied to improve both online and offline string matching solutions, improving standard solutions by more than 50%. However, this improvement is currently only achievable in the case of texts on large-sized alphabets, and remains small (or absent) in the case of small-sized alphabets. In this article we propose an extension of the approach to text-sampling, known as Character Distance Sampling, to the case of small alphabets, obtaining an improvement of up to 98% compared to standard solutions in the case of online string matching. We also extend this approach to the case of offline string matching, introducing a sampled version of the suffix array, obtaining performances up to 5 times higher than the search obtained on the standard suffix array. Differently from what has been done by previous solutions, our idea is not based on the reduction of the number of indexed suffixes, but on the construction of the index directly on the sampled text.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Text processing</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Experimental algorithms</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">String matching</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Marino, Francesco Pio</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Pavone, Arianna</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Theoretical computer science</subfield><subfield code="d">Amsterdam [u.a.] : Elsevier, 1975</subfield><subfield code="g">946</subfield><subfield code="h">Online-Ressource</subfield><subfield code="w">(DE-627)265784174</subfield><subfield code="w">(DE-600)1466347-8</subfield><subfield code="w">(DE-576)074891030</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:946</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ELV</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_20</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_23</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_24</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_31</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_32</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_60</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_62</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_63</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_65</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_69</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_73</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_74</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_90</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_95</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_100</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_101</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_105</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_110</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_150</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_151</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_224</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_370</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_602</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_702</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2003</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2004</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2005</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2011</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2014</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2015</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2020</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2021</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2025</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2027</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2034</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2038</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2044</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2048</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2049</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2050</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2056</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2059</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2061</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2064</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2065</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2068</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2111</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2112</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2113</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2118</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2122</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2129</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2143</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2147</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2148</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2152</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2153</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2190</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2336</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2507</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2522</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4035</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4037</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4112</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4125</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4126</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4242</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4249</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4251</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4305</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4306</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4313</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4322</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4323</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4325</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4326</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4333</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4334</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4335</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4338</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4367</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4393</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4700</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.10</subfield><subfield code="j">Theoretische Informatik</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">946</subfield></datafield></record></collection>
|
score |
7.4018707 |