Regret bounds for restless Markov bandits

We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp...
Ausführliche Beschreibung

Gespeichert in:

Autor*in:	Ortner, Ronald [verfasserIn] Ryabko, Daniil Auer, Peter Munos, Rémi

Format:	E-Artikel
Sprache:	Englisch

Erschienen:	2014transfer abstract

Schlagwörter:	Regret Markov decision processes Restless bandits

Umfang:	15

Übergeordnetes Werk:	Enthalten in: Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries - Schweiss, Rüdiger ELSEVIER, 2015transfer abstract, the journal of the EATCS, Amsterdam [u.a.]
Übergeordnetes Werk:	volume:558 ; year:2014 ; day:13 ; month:11 ; pages:62-76 ; extent:15

Links:	Volltext

DOI / URN:	10.1016/j.tcs.2014.09.026

Katalog-ID:	ELV039336247

Internformat


LEADER	01000caa a22002652 4500
001	ELV039336247
003	DE-627
005	20230625224644.0
007	cr uuu---uuuuu
008	180603s2014 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1016/j.tcs.2014.09.026 \|2 doi
028	5	2	\|a GBVA2014010000025.pica
035			\|a (DE-627)ELV039336247
035			\|a (ELSEVIER)S0304-3975(14)00704-X
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
082	0		\|a 004
082	0	4	\|a 004 \|q DE-600
082	0	4	\|a 620 \|q VZ
082	0	4	\|a 690 \|q VZ
084			\|a 50.92 \|2 bkl
100	1		\|a Ortner, Ronald \|e verfasserin \|4 aut
245	1	0	\|a Regret bounds for restless Markov bandits
264		1	\|c 2014transfer abstract
300			\|a 15
336			\|a nicht spezifiziert \|b zzz \|2 rdacontent
337			\|a nicht spezifiziert \|b z \|2 rdamedia
338			\|a nicht spezifiziert \|b zu \|2 rdacarrier
520			\|a We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning.
520			\|a We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning.
650		7	\|a Regret \|2 Elsevier
650		7	\|a Markov decision processes \|2 Elsevier
650		7	\|a Restless bandits \|2 Elsevier
700	1		\|a Ryabko, Daniil \|4 oth
700	1		\|a Auer, Peter \|4 oth
700	1		\|a Munos, Rémi \|4 oth
773	0	8	\|i Enthalten in \|n Elsevier \|a Schweiss, Rüdiger ELSEVIER \|t Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries \|d 2015transfer abstract \|d the journal of the EATCS \|g Amsterdam [u.a.] \|w (DE-627)ELV013125583
773	1	8	\|g volume:558 \|g year:2014 \|g day:13 \|g month:11 \|g pages:62-76 \|g extent:15
856	4	0	\|u https://doi.org/10.1016/j.tcs.2014.09.026 \|3 Volltext
912			\|a GBV_USEFLAG_U
912			\|a GBV_ELV
912			\|a SYSFLAG_U
912			\|a GBV_ILN_22
912			\|a GBV_ILN_40
936	b	k	\|a 50.92 \|j Meerestechnik \|q VZ
951			\|a AR
952			\|d 558 \|j 2014 \|b 13 \|c 1113 \|h 62-76 \|g 15
953			\|2 045F \|a 004

Indexfelder

author_variant	r o ro
matchkey_str	ortnerronaldryabkodaniilauerpetermunosrm:2014----:ertonsorslsmr
hierarchy_sort_str	2014transfer abstract
bklnumber	50.92
publishDate	2014
allfields	10.1016/j.tcs.2014.09.026 doi GBVA2014010000025.pica (DE-627)ELV039336247 (ELSEVIER)S0304-3975(14)00704-X DE-627 ger DE-627 rakwb eng 004 004 DE-600 620 VZ 690 VZ 50.92 bkl Ortner, Ronald verfasserin aut Regret bounds for restless Markov bandits 2014transfer abstract 15 nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. Regret Elsevier Markov decision processes Elsevier Restless bandits Elsevier Ryabko, Daniil oth Auer, Peter oth Munos, Rémi oth Enthalten in Elsevier Schweiss, Rüdiger ELSEVIER Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries 2015transfer abstract the journal of the EATCS Amsterdam [u.a.] (DE-627)ELV013125583 volume:558 year:2014 day:13 month:11 pages:62-76 extent:15 https://doi.org/10.1016/j.tcs.2014.09.026 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U GBV_ILN_22 GBV_ILN_40 50.92 Meerestechnik VZ AR 558 2014 13 1113 62-76 15 045F 004
spelling	10.1016/j.tcs.2014.09.026 doi GBVA2014010000025.pica (DE-627)ELV039336247 (ELSEVIER)S0304-3975(14)00704-X DE-627 ger DE-627 rakwb eng 004 004 DE-600 620 VZ 690 VZ 50.92 bkl Ortner, Ronald verfasserin aut Regret bounds for restless Markov bandits 2014transfer abstract 15 nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. Regret Elsevier Markov decision processes Elsevier Restless bandits Elsevier Ryabko, Daniil oth Auer, Peter oth Munos, Rémi oth Enthalten in Elsevier Schweiss, Rüdiger ELSEVIER Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries 2015transfer abstract the journal of the EATCS Amsterdam [u.a.] (DE-627)ELV013125583 volume:558 year:2014 day:13 month:11 pages:62-76 extent:15 https://doi.org/10.1016/j.tcs.2014.09.026 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U GBV_ILN_22 GBV_ILN_40 50.92 Meerestechnik VZ AR 558 2014 13 1113 62-76 15 045F 004
allfields_unstemmed	10.1016/j.tcs.2014.09.026 doi GBVA2014010000025.pica (DE-627)ELV039336247 (ELSEVIER)S0304-3975(14)00704-X DE-627 ger DE-627 rakwb eng 004 004 DE-600 620 VZ 690 VZ 50.92 bkl Ortner, Ronald verfasserin aut Regret bounds for restless Markov bandits 2014transfer abstract 15 nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. Regret Elsevier Markov decision processes Elsevier Restless bandits Elsevier Ryabko, Daniil oth Auer, Peter oth Munos, Rémi oth Enthalten in Elsevier Schweiss, Rüdiger ELSEVIER Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries 2015transfer abstract the journal of the EATCS Amsterdam [u.a.] (DE-627)ELV013125583 volume:558 year:2014 day:13 month:11 pages:62-76 extent:15 https://doi.org/10.1016/j.tcs.2014.09.026 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U GBV_ILN_22 GBV_ILN_40 50.92 Meerestechnik VZ AR 558 2014 13 1113 62-76 15 045F 004
allfieldsGer	10.1016/j.tcs.2014.09.026 doi GBVA2014010000025.pica (DE-627)ELV039336247 (ELSEVIER)S0304-3975(14)00704-X DE-627 ger DE-627 rakwb eng 004 004 DE-600 620 VZ 690 VZ 50.92 bkl Ortner, Ronald verfasserin aut Regret bounds for restless Markov bandits 2014transfer abstract 15 nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. Regret Elsevier Markov decision processes Elsevier Restless bandits Elsevier Ryabko, Daniil oth Auer, Peter oth Munos, Rémi oth Enthalten in Elsevier Schweiss, Rüdiger ELSEVIER Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries 2015transfer abstract the journal of the EATCS Amsterdam [u.a.] (DE-627)ELV013125583 volume:558 year:2014 day:13 month:11 pages:62-76 extent:15 https://doi.org/10.1016/j.tcs.2014.09.026 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U GBV_ILN_22 GBV_ILN_40 50.92 Meerestechnik VZ AR 558 2014 13 1113 62-76 15 045F 004
allfieldsSound	10.1016/j.tcs.2014.09.026 doi GBVA2014010000025.pica (DE-627)ELV039336247 (ELSEVIER)S0304-3975(14)00704-X DE-627 ger DE-627 rakwb eng 004 004 DE-600 620 VZ 690 VZ 50.92 bkl Ortner, Ronald verfasserin aut Regret bounds for restless Markov bandits 2014transfer abstract 15 nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. Regret Elsevier Markov decision processes Elsevier Restless bandits Elsevier Ryabko, Daniil oth Auer, Peter oth Munos, Rémi oth Enthalten in Elsevier Schweiss, Rüdiger ELSEVIER Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries 2015transfer abstract the journal of the EATCS Amsterdam [u.a.] (DE-627)ELV013125583 volume:558 year:2014 day:13 month:11 pages:62-76 extent:15 https://doi.org/10.1016/j.tcs.2014.09.026 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U GBV_ILN_22 GBV_ILN_40 50.92 Meerestechnik VZ AR 558 2014 13 1113 62-76 15 045F 004
language	English
source	Enthalten in Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries Amsterdam [u.a.] volume:558 year:2014 day:13 month:11 pages:62-76 extent:15
sourceStr	Enthalten in Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries Amsterdam [u.a.] volume:558 year:2014 day:13 month:11 pages:62-76 extent:15
format_phy_str_mv	Article
bklname	Meerestechnik
institution	findex.gbv.de
topic_facet	Regret Markov decision processes Restless bandits
dewey-raw	004
isfreeaccess_bool	false
container_title	Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries
authorswithroles_txt_mv	Ortner, Ronald @@aut@@ Ryabko, Daniil @@oth@@ Auer, Peter @@oth@@ Munos, Rémi @@oth@@
publishDateDaySort_date	2014-01-13T00:00:00Z
hierarchy_top_id	ELV013125583
dewey-sort	14
id	ELV039336247
language_de	englisch
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">ELV039336247</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230625224644.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">180603s2014 xx \|\|\|\|\|o 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1016/j.tcs.2014.09.026</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">GBVA2014010000025.pica</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)ELV039336247</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ELSEVIER)S0304-3975(14)00704-X</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">004</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">DE-600</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">620</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">690</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">50.92</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Ortner, Ronald</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Regret bounds for restless Markov bandits</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2014transfer abstract</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">15</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">zzz</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">z</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">zu</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning.</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning.</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Regret</subfield><subfield code="2">Elsevier</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Markov decision processes</subfield><subfield code="2">Elsevier</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Restless bandits</subfield><subfield code="2">Elsevier</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ryabko, Daniil</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Auer, Peter</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Munos, Rémi</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="n">Elsevier</subfield><subfield code="a">Schweiss, Rüdiger ELSEVIER</subfield><subfield code="t">Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries</subfield><subfield code="d">2015transfer abstract</subfield><subfield code="d">the journal of the EATCS</subfield><subfield code="g">Amsterdam [u.a.]</subfield><subfield code="w">(DE-627)ELV013125583</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:558</subfield><subfield code="g">year:2014</subfield><subfield code="g">day:13</subfield><subfield code="g">month:11</subfield><subfield code="g">pages:62-76</subfield><subfield code="g">extent:15</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.1016/j.tcs.2014.09.026</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ELV</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">50.92</subfield><subfield code="j">Meerestechnik</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">558</subfield><subfield code="j">2014</subfield><subfield code="b">13</subfield><subfield code="c">1113</subfield><subfield code="h">62-76</subfield><subfield code="g">15</subfield></datafield><datafield tag="953" ind1=" " ind2=" "><subfield code="2">045F</subfield><subfield code="a">004</subfield></datafield></record></collection>
author	Ortner, Ronald
spellingShingle	Ortner, Ronald ddc 004 ddc 620 ddc 690 bkl 50.92 Elsevier Regret Elsevier Markov decision processes Elsevier Restless bandits Regret bounds for restless Markov bandits
authorStr	Ortner, Ronald
ppnlink_with_tag_str_mv	@@773@@(DE-627)ELV013125583
format	electronic Article
dewey-ones	004 - Data processing & computer science 620 - Engineering & allied operations 690 - Buildings
delete_txt_mv	keep
author_role	aut
collection	elsevier
remote_str	true
illustrated	Not Illustrated
topic_title	004 004 DE-600 620 VZ 690 VZ 50.92 bkl Regret bounds for restless Markov bandits Regret Elsevier Markov decision processes Elsevier Restless bandits Elsevier
topic	ddc 004 ddc 620 ddc 690 bkl 50.92 Elsevier Regret Elsevier Markov decision processes Elsevier Restless bandits
topic_unstemmed	ddc 004 ddc 620 ddc 690 bkl 50.92 Elsevier Regret Elsevier Markov decision processes Elsevier Restless bandits
topic_browse	ddc 004 ddc 620 ddc 690 bkl 50.92 Elsevier Regret Elsevier Markov decision processes Elsevier Restless bandits
format_facet	Elektronische Aufsätze Aufsätze Elektronische Ressource
format_main_str_mv	Text Zeitschrift/Artikel
carriertype_str_mv	zu
author2_variant	d r dr p a pa r m rm
hierarchy_parent_title	Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries
hierarchy_parent_id	ELV013125583
dewey-tens	000 - Computer science, knowledge & systems 620 - Engineering 690 - Building & construction
hierarchy_top_title	Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries
isfreeaccess_txt	false
familylinks_str_mv	(DE-627)ELV013125583
title	Regret bounds for restless Markov bandits
ctrlnum	(DE-627)ELV039336247 (ELSEVIER)S0304-3975(14)00704-X
title_full	Regret bounds for restless Markov bandits
author_sort	Ortner, Ronald
journal	Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries
journalStr	Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries
lang_code	eng
isOA_bool	false
dewey-hundreds	000 - Computer science, information & general works 600 - Technology
recordtype	marc
publishDateSort	2014
contenttype_str_mv	zzz
container_start_page	62
author_browse	Ortner, Ronald
container_volume	558
physical	15
class	004 004 DE-600 620 VZ 690 VZ 50.92 bkl
format_se	Elektronische Aufsätze
author-letter	Ortner, Ronald
doi_str_mv	10.1016/j.tcs.2014.09.026
dewey-full	004 620 690
title_sort	regret bounds for restless markov bandits
title_auth	Regret bounds for restless Markov bandits
abstract	We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning.
abstractGer	We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning.
abstract_unstemmed	We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning.
collection_details	GBV_USEFLAG_U GBV_ELV SYSFLAG_U GBV_ILN_22 GBV_ILN_40
title_short	Regret bounds for restless Markov bandits
url	https://doi.org/10.1016/j.tcs.2014.09.026
remote_bool	true
author2	Ryabko, Daniil Auer, Peter Munos, Rémi
author2Str	Ryabko, Daniil Auer, Peter Munos, Rémi
ppnlink	ELV013125583
mediatype_str_mv	z
isOA_txt	false
hochschulschrift_bool	false
author2_role	oth oth oth
doi_str	10.1016/j.tcs.2014.09.026
up_date	2024-07-06T20:21:53.099Z
_version_	1803862475381145600
fullrecord_marcxml	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">ELV039336247</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230625224644.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">180603s2014 xx \|\|\|\|\|o 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1016/j.tcs.2014.09.026</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">GBVA2014010000025.pica</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)ELV039336247</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ELSEVIER)S0304-3975(14)00704-X</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">004</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">DE-600</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">620</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">690</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">50.92</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Ortner, Ronald</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Regret bounds for restless Markov bandits</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2014transfer abstract</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">15</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">zzz</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">z</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">zu</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning.</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning.</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Regret</subfield><subfield code="2">Elsevier</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Markov decision processes</subfield><subfield code="2">Elsevier</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Restless bandits</subfield><subfield code="2">Elsevier</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ryabko, Daniil</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Auer, Peter</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Munos, Rémi</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="n">Elsevier</subfield><subfield code="a">Schweiss, Rüdiger ELSEVIER</subfield><subfield code="t">Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries</subfield><subfield code="d">2015transfer abstract</subfield><subfield code="d">the journal of the EATCS</subfield><subfield code="g">Amsterdam [u.a.]</subfield><subfield code="w">(DE-627)ELV013125583</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:558</subfield><subfield code="g">year:2014</subfield><subfield code="g">day:13</subfield><subfield code="g">month:11</subfield><subfield code="g">pages:62-76</subfield><subfield code="g">extent:15</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.1016/j.tcs.2014.09.026</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ELV</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">50.92</subfield><subfield code="j">Meerestechnik</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">558</subfield><subfield code="j">2014</subfield><subfield code="b">13</subfield><subfield code="c">1113</subfield><subfield code="h">62-76</subfield><subfield code="g">15</subfield></datafield><datafield tag="953" ind1=" " ind2=" "><subfield code="2">045F</subfield><subfield code="a">004</subfield></datafield></record></collection>
score	7.4011774

Nicht das Richtige dabei?

Schreiben Sie uns!

Regret bounds for restless Markov bandits

Nicht das Richtige dabei?

Zugang & Verfügbarkeit

Vorhandene Bände

Nicht das Richtige dabei?