Regret bounds for restless Markov bandits

We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp...
Ausführliche Beschreibung

Gespeichert in:
Autor*in:

Ortner, Ronald [verfasserIn]

Ryabko, Daniil

Auer, Peter

Munos, Rémi

Format:

E-Artikel

Sprache:

Englisch

Erschienen:

2014transfer abstract

Schlagwörter:

Regret

Markov decision processes

Restless bandits

Umfang:

15

Übergeordnetes Werk:

Enthalten in: Influence of bulk fibre properties of PAN-based carbon felts on their performance in vanadium redox flow batteries - Schweiss, Rüdiger ELSEVIER, 2015transfer abstract, the journal of the EATCS, Amsterdam [u.a.]

Übergeordnetes Werk:

volume:558 ; year:2014 ; day:13 ; month:11 ; pages:62-76 ; extent:15

Links:

Volltext

DOI / URN:

10.1016/j.tcs.2014.09.026

Katalog-ID:

ELV039336247

Nicht das Richtige dabei?

Schreiben Sie uns!