Regret bounds for restless Markov bandits
We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp...
Ausführliche Beschreibung
Autor*in: |
Ortner, Ronald [verfasserIn] |
---|
Format: |
E-Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2014transfer abstract |
---|
Schlagwörter: |
---|
Umfang: |
15 |
---|
Übergeordnetes Werk: |
Enthalten in: Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries - Schweiss, Rüdiger ELSEVIER, 2015transfer abstract, the journal of the EATCS, Amsterdam [u.a.] |
---|---|
Übergeordnetes Werk: |
volume:558 ; year:2014 ; day:13 ; month:11 ; pages:62-76 ; extent:15 |
Links: |
---|
DOI / URN: |
10.1016/j.tcs.2014.09.026 |
---|
Katalog-ID: |
ELV039336247 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | ELV039336247 | ||
003 | DE-627 | ||
005 | 20230625224644.0 | ||
007 | cr uuu---uuuuu | ||
008 | 180603s2014 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1016/j.tcs.2014.09.026 |2 doi | |
028 | 5 | 2 | |a GBVA2014010000025.pica |
035 | |a (DE-627)ELV039336247 | ||
035 | |a (ELSEVIER)S0304-3975(14)00704-X | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | |a 004 | |
082 | 0 | 4 | |a 004 |q DE-600 |
082 | 0 | 4 | |a 620 |q VZ |
082 | 0 | 4 | |a 690 |q VZ |
084 | |a 50.92 |2 bkl | ||
100 | 1 | |a Ortner, Ronald |e verfasserin |4 aut | |
245 | 1 | 0 | |a Regret bounds for restless Markov bandits |
264 | 1 | |c 2014transfer abstract | |
300 | |a 15 | ||
336 | |a nicht spezifiziert |b zzz |2 rdacontent | ||
337 | |a nicht spezifiziert |b z |2 rdamedia | ||
338 | |a nicht spezifiziert |b zu |2 rdacarrier | ||
520 | |a We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. | ||
520 | |a We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. | ||
650 | 7 | |a Regret |2 Elsevier | |
650 | 7 | |a Markov decision processes |2 Elsevier | |
650 | 7 | |a Restless bandits |2 Elsevier | |
700 | 1 | |a Ryabko, Daniil |4 oth | |
700 | 1 | |a Auer, Peter |4 oth | |
700 | 1 | |a Munos, Rémi |4 oth | |
773 | 0 | 8 | |i Enthalten in |n Elsevier |a Schweiss, Rüdiger ELSEVIER |t Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries |d 2015transfer abstract |d the journal of the EATCS |g Amsterdam [u.a.] |w (DE-627)ELV013125583 |
773 | 1 | 8 | |g volume:558 |g year:2014 |g day:13 |g month:11 |g pages:62-76 |g extent:15 |
856 | 4 | 0 | |u https://doi.org/10.1016/j.tcs.2014.09.026 |3 Volltext |
912 | |a GBV_USEFLAG_U | ||
912 | |a GBV_ELV | ||
912 | |a SYSFLAG_U | ||
912 | |a GBV_ILN_22 | ||
912 | |a GBV_ILN_40 | ||
936 | b | k | |a 50.92 |j Meerestechnik |q VZ |
951 | |a AR | ||
952 | |d 558 |j 2014 |b 13 |c 1113 |h 62-76 |g 15 | ||
953 | |2 045F |a 004 |
author_variant |
r o ro |
---|---|
matchkey_str |
ortnerronaldryabkodaniilauerpetermunosrm:2014----:ertonsorslsmr |
hierarchy_sort_str |
2014transfer abstract |
bklnumber |
50.92 |
publishDate |
2014 |
allfields |
10.1016/j.tcs.2014.09.026 doi GBVA2014010000025.pica (DE-627)ELV039336247 (ELSEVIER)S0304-3975(14)00704-X DE-627 ger DE-627 rakwb eng 004 004 DE-600 620 VZ 690 VZ 50.92 bkl Ortner, Ronald verfasserin aut Regret bounds for restless Markov bandits 2014transfer abstract 15 nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. Regret Elsevier Markov decision processes Elsevier Restless bandits Elsevier Ryabko, Daniil oth Auer, Peter oth Munos, Rémi oth Enthalten in Elsevier Schweiss, Rüdiger ELSEVIER Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries 2015transfer abstract the journal of the EATCS Amsterdam [u.a.] (DE-627)ELV013125583 volume:558 year:2014 day:13 month:11 pages:62-76 extent:15 https://doi.org/10.1016/j.tcs.2014.09.026 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U GBV_ILN_22 GBV_ILN_40 50.92 Meerestechnik VZ AR 558 2014 13 1113 62-76 15 045F 004 |
spelling |
10.1016/j.tcs.2014.09.026 doi GBVA2014010000025.pica (DE-627)ELV039336247 (ELSEVIER)S0304-3975(14)00704-X DE-627 ger DE-627 rakwb eng 004 004 DE-600 620 VZ 690 VZ 50.92 bkl Ortner, Ronald verfasserin aut Regret bounds for restless Markov bandits 2014transfer abstract 15 nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. Regret Elsevier Markov decision processes Elsevier Restless bandits Elsevier Ryabko, Daniil oth Auer, Peter oth Munos, Rémi oth Enthalten in Elsevier Schweiss, Rüdiger ELSEVIER Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries 2015transfer abstract the journal of the EATCS Amsterdam [u.a.] (DE-627)ELV013125583 volume:558 year:2014 day:13 month:11 pages:62-76 extent:15 https://doi.org/10.1016/j.tcs.2014.09.026 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U GBV_ILN_22 GBV_ILN_40 50.92 Meerestechnik VZ AR 558 2014 13 1113 62-76 15 045F 004 |
allfields_unstemmed |
10.1016/j.tcs.2014.09.026 doi GBVA2014010000025.pica (DE-627)ELV039336247 (ELSEVIER)S0304-3975(14)00704-X DE-627 ger DE-627 rakwb eng 004 004 DE-600 620 VZ 690 VZ 50.92 bkl Ortner, Ronald verfasserin aut Regret bounds for restless Markov bandits 2014transfer abstract 15 nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. Regret Elsevier Markov decision processes Elsevier Restless bandits Elsevier Ryabko, Daniil oth Auer, Peter oth Munos, Rémi oth Enthalten in Elsevier Schweiss, Rüdiger ELSEVIER Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries 2015transfer abstract the journal of the EATCS Amsterdam [u.a.] (DE-627)ELV013125583 volume:558 year:2014 day:13 month:11 pages:62-76 extent:15 https://doi.org/10.1016/j.tcs.2014.09.026 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U GBV_ILN_22 GBV_ILN_40 50.92 Meerestechnik VZ AR 558 2014 13 1113 62-76 15 045F 004 |
allfieldsGer |
10.1016/j.tcs.2014.09.026 doi GBVA2014010000025.pica (DE-627)ELV039336247 (ELSEVIER)S0304-3975(14)00704-X DE-627 ger DE-627 rakwb eng 004 004 DE-600 620 VZ 690 VZ 50.92 bkl Ortner, Ronald verfasserin aut Regret bounds for restless Markov bandits 2014transfer abstract 15 nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. Regret Elsevier Markov decision processes Elsevier Restless bandits Elsevier Ryabko, Daniil oth Auer, Peter oth Munos, Rémi oth Enthalten in Elsevier Schweiss, Rüdiger ELSEVIER Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries 2015transfer abstract the journal of the EATCS Amsterdam [u.a.] (DE-627)ELV013125583 volume:558 year:2014 day:13 month:11 pages:62-76 extent:15 https://doi.org/10.1016/j.tcs.2014.09.026 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U GBV_ILN_22 GBV_ILN_40 50.92 Meerestechnik VZ AR 558 2014 13 1113 62-76 15 045F 004 |
allfieldsSound |
10.1016/j.tcs.2014.09.026 doi GBVA2014010000025.pica (DE-627)ELV039336247 (ELSEVIER)S0304-3975(14)00704-X DE-627 ger DE-627 rakwb eng 004 004 DE-600 620 VZ 690 VZ 50.92 bkl Ortner, Ronald verfasserin aut Regret bounds for restless Markov bandits 2014transfer abstract 15 nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. Regret Elsevier Markov decision processes Elsevier Restless bandits Elsevier Ryabko, Daniil oth Auer, Peter oth Munos, Rémi oth Enthalten in Elsevier Schweiss, Rüdiger ELSEVIER Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries 2015transfer abstract the journal of the EATCS Amsterdam [u.a.] (DE-627)ELV013125583 volume:558 year:2014 day:13 month:11 pages:62-76 extent:15 https://doi.org/10.1016/j.tcs.2014.09.026 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U GBV_ILN_22 GBV_ILN_40 50.92 Meerestechnik VZ AR 558 2014 13 1113 62-76 15 045F 004 |
language |
English |
source |
Enthalten in Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries Amsterdam [u.a.] volume:558 year:2014 day:13 month:11 pages:62-76 extent:15 |
sourceStr |
Enthalten in Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries Amsterdam [u.a.] volume:558 year:2014 day:13 month:11 pages:62-76 extent:15 |
format_phy_str_mv |
Article |
bklname |
Meerestechnik |
institution |
findex.gbv.de |
topic_facet |
Regret Markov decision processes Restless bandits |
dewey-raw |
004 |
isfreeaccess_bool |
false |
container_title |
Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries |
authorswithroles_txt_mv |
Ortner, Ronald @@aut@@ Ryabko, Daniil @@oth@@ Auer, Peter @@oth@@ Munos, Rémi @@oth@@ |
publishDateDaySort_date |
2014-01-13T00:00:00Z |
hierarchy_top_id |
ELV013125583 |
dewey-sort |
14 |
id |
ELV039336247 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">ELV039336247</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230625224644.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">180603s2014 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1016/j.tcs.2014.09.026</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">GBVA2014010000025.pica</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)ELV039336247</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ELSEVIER)S0304-3975(14)00704-X</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">004</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">DE-600</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">620</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">690</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">50.92</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Ortner, Ronald</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Regret bounds for restless Markov bandits</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2014transfer abstract</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">15</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">zzz</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">z</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">zu</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning.</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning.</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Regret</subfield><subfield code="2">Elsevier</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Markov decision processes</subfield><subfield code="2">Elsevier</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Restless bandits</subfield><subfield code="2">Elsevier</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ryabko, Daniil</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Auer, Peter</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Munos, Rémi</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="n">Elsevier</subfield><subfield code="a">Schweiss, Rüdiger ELSEVIER</subfield><subfield code="t">Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries</subfield><subfield code="d">2015transfer abstract</subfield><subfield code="d">the journal of the EATCS</subfield><subfield code="g">Amsterdam [u.a.]</subfield><subfield code="w">(DE-627)ELV013125583</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:558</subfield><subfield code="g">year:2014</subfield><subfield code="g">day:13</subfield><subfield code="g">month:11</subfield><subfield code="g">pages:62-76</subfield><subfield code="g">extent:15</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.1016/j.tcs.2014.09.026</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ELV</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">50.92</subfield><subfield code="j">Meerestechnik</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">558</subfield><subfield code="j">2014</subfield><subfield code="b">13</subfield><subfield code="c">1113</subfield><subfield code="h">62-76</subfield><subfield code="g">15</subfield></datafield><datafield tag="953" ind1=" " ind2=" "><subfield code="2">045F</subfield><subfield code="a">004</subfield></datafield></record></collection>
|
author |
Ortner, Ronald |
spellingShingle |
Ortner, Ronald ddc 004 ddc 620 ddc 690 bkl 50.92 Elsevier Regret Elsevier Markov decision processes Elsevier Restless bandits Regret bounds for restless Markov bandits |
authorStr |
Ortner, Ronald |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)ELV013125583 |
format |
electronic Article |
dewey-ones |
004 - Data processing & computer science 620 - Engineering & allied operations 690 - Buildings |
delete_txt_mv |
keep |
author_role |
aut |
collection |
elsevier |
remote_str |
true |
illustrated |
Not Illustrated |
topic_title |
004 004 DE-600 620 VZ 690 VZ 50.92 bkl Regret bounds for restless Markov bandits Regret Elsevier Markov decision processes Elsevier Restless bandits Elsevier |
topic |
ddc 004 ddc 620 ddc 690 bkl 50.92 Elsevier Regret Elsevier Markov decision processes Elsevier Restless bandits |
topic_unstemmed |
ddc 004 ddc 620 ddc 690 bkl 50.92 Elsevier Regret Elsevier Markov decision processes Elsevier Restless bandits |
topic_browse |
ddc 004 ddc 620 ddc 690 bkl 50.92 Elsevier Regret Elsevier Markov decision processes Elsevier Restless bandits |
format_facet |
Elektronische Aufsätze Aufsätze Elektronische Ressource |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
zu |
author2_variant |
d r dr p a pa r m rm |
hierarchy_parent_title |
Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries |
hierarchy_parent_id |
ELV013125583 |
dewey-tens |
000 - Computer science, knowledge & systems 620 - Engineering 690 - Building & construction |
hierarchy_top_title |
Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)ELV013125583 |
title |
Regret bounds for restless Markov bandits |
ctrlnum |
(DE-627)ELV039336247 (ELSEVIER)S0304-3975(14)00704-X |
title_full |
Regret bounds for restless Markov bandits |
author_sort |
Ortner, Ronald |
journal |
Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries |
journalStr |
Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
000 - Computer science, information & general works 600 - Technology |
recordtype |
marc |
publishDateSort |
2014 |
contenttype_str_mv |
zzz |
container_start_page |
62 |
author_browse |
Ortner, Ronald |
container_volume |
558 |
physical |
15 |
class |
004 004 DE-600 620 VZ 690 VZ 50.92 bkl |
format_se |
Elektronische Aufsätze |
author-letter |
Ortner, Ronald |
doi_str_mv |
10.1016/j.tcs.2014.09.026 |
dewey-full |
004 620 690 |
title_sort |
regret bounds for restless markov bandits |
title_auth |
Regret bounds for restless Markov bandits |
abstract |
We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. |
abstractGer |
We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. |
abstract_unstemmed |
We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. |
collection_details |
GBV_USEFLAG_U GBV_ELV SYSFLAG_U GBV_ILN_22 GBV_ILN_40 |
title_short |
Regret bounds for restless Markov bandits |
url |
https://doi.org/10.1016/j.tcs.2014.09.026 |
remote_bool |
true |
author2 |
Ryabko, Daniil Auer, Peter Munos, Rémi |
author2Str |
Ryabko, Daniil Auer, Peter Munos, Rémi |
ppnlink |
ELV013125583 |
mediatype_str_mv |
z |
isOA_txt |
false |
hochschulschrift_bool |
false |
author2_role |
oth oth oth |
doi_str |
10.1016/j.tcs.2014.09.026 |
up_date |
2024-07-06T20:21:53.099Z |
_version_ |
1803862475381145600 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">ELV039336247</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230625224644.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">180603s2014 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1016/j.tcs.2014.09.026</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">GBVA2014010000025.pica</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)ELV039336247</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ELSEVIER)S0304-3975(14)00704-X</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">004</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">DE-600</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">620</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">690</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">50.92</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Ortner, Ronald</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Regret bounds for restless Markov bandits</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2014transfer abstract</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">15</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">zzz</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">z</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">zu</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning.</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning.</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Regret</subfield><subfield code="2">Elsevier</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Markov decision processes</subfield><subfield code="2">Elsevier</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Restless bandits</subfield><subfield code="2">Elsevier</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ryabko, Daniil</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Auer, Peter</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Munos, Rémi</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="n">Elsevier</subfield><subfield code="a">Schweiss, Rüdiger ELSEVIER</subfield><subfield code="t">Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries</subfield><subfield code="d">2015transfer abstract</subfield><subfield code="d">the journal of the EATCS</subfield><subfield code="g">Amsterdam [u.a.]</subfield><subfield code="w">(DE-627)ELV013125583</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:558</subfield><subfield code="g">year:2014</subfield><subfield code="g">day:13</subfield><subfield code="g">month:11</subfield><subfield code="g">pages:62-76</subfield><subfield code="g">extent:15</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.1016/j.tcs.2014.09.026</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ELV</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">50.92</subfield><subfield code="j">Meerestechnik</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">558</subfield><subfield code="j">2014</subfield><subfield code="b">13</subfield><subfield code="c">1113</subfield><subfield code="h">62-76</subfield><subfield code="g">15</subfield></datafield><datafield tag="953" ind1=" " ind2=" "><subfield code="2">045F</subfield><subfield code="a">004</subfield></datafield></record></collection>
|
score |
7.4011774 |