Online Reinforcement Learning Using a Probability Density Estimation

Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and...
Ausführliche Beschreibung

Gespeichert in:

Autor*in:	Alejandro Agostini [verfasserIn] Enric Celaya

Format:	Artikel
Sprache:	Englisch

Erschienen:	2017

Schlagwörter:	Probability Estimating techniques Sampling Neural networks

Übergeordnetes Werk:	Enthalten in: Neural computation - Cambridge, Mass. : MIT Press, 1989, 29(2017), 1, Seite 220-246
Übergeordnetes Werk:	volume:29 ; year:2017 ; number:1 ; pages:220-246

Links:	Volltext Link aufrufen

DOI / URN:	10.1162/NECO_a_00906

Katalog-ID:	OLC1989106862

Internformat


LEADER	01000caa a2200265 4500
001	OLC1989106862
003	DE-627
005	20210716170724.0
007	tu
008	170207s2017 xx \|\|\|\|\| 00\| \|\|eng c
024	7		\|a 10.1162/NECO_a_00906 \|2 doi
028	5	2	\|a PQ20170206
035			\|a (DE-627)OLC1989106862
035			\|a (DE-599)GBVOLC1989106862
035			\|a (PRQ)c1181-d763b29ba46238be82446f61ca1d110d5626d1712f16b5762f5ab8569a2b3f780
035			\|a (KEY)0175809820170000029000100220onlinereinforcementlearningusingaprobabilitydensit
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
082	0	4	\|a 004 \|q DE-600
100	0		\|a Alejandro Agostini \|e verfasserin \|4 aut
245	1	0	\|a Online Reinforcement Learning Using a Probability Density Estimation
264		1	\|c 2017
336			\|a Text \|b txt \|2 rdacontent
337			\|a ohne Hilfsmittel zu benutzen \|b n \|2 rdamedia
338			\|a Band \|b nc \|2 rdacarrier
520			\|a Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions.
650		4	\|a Probability
650		4	\|a Estimating techniques
650		4	\|a Sampling
650		4	\|a Neural networks
700	0		\|a Enric Celaya \|4 oth
773	0	8	\|i Enthalten in \|t Neural computation \|d Cambridge, Mass. : MIT Press, 1989 \|g 29(2017), 1, Seite 220-246 \|w (DE-627)16566682X \|w (DE-600)1025692-1 \|w (DE-576)023099836 \|x 0899-7667 \|7 nnns
773	1	8	\|g volume:29 \|g year:2017 \|g number:1 \|g pages:220-246
856	4	1	\|u http://dx.doi.org/10.1162/NECO_a_00906 \|3 Volltext
856	4	2	\|u http://search.proquest.com/docview/1855913875
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_OLC
912			\|a SSG-OLC-PHY
912			\|a SSG-OLC-MAT
912			\|a GBV_ILN_59
912			\|a GBV_ILN_2192
951			\|a AR
952			\|d 29 \|j 2017 \|e 1 \|h 220-246

Indexfelder

author_variant	a a aa
matchkey_str	article:08997667:2017----::nieenocmnlannuigpoaiiy
hierarchy_sort_str	2017
publishDate	2017
allfields	10.1162/NECO_a_00906 doi PQ20170206 (DE-627)OLC1989106862 (DE-599)GBVOLC1989106862 (PRQ)c1181-d763b29ba46238be82446f61ca1d110d5626d1712f16b5762f5ab8569a2b3f780 (KEY)0175809820170000029000100220onlinereinforcementlearningusingaprobabilitydensit DE-627 ger DE-627 rakwb eng 004 DE-600 Alejandro Agostini verfasserin aut Online Reinforcement Learning Using a Probability Density Estimation 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions. Probability Estimating techniques Sampling Neural networks Enric Celaya oth Enthalten in Neural computation Cambridge, Mass. : MIT Press, 1989 29(2017), 1, Seite 220-246 (DE-627)16566682X (DE-600)1025692-1 (DE-576)023099836 0899-7667 nnns volume:29 year:2017 number:1 pages:220-246 http://dx.doi.org/10.1162/NECO_a_00906 Volltext http://search.proquest.com/docview/1855913875 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-PHY SSG-OLC-MAT GBV_ILN_59 GBV_ILN_2192 AR 29 2017 1 220-246
spelling	10.1162/NECO_a_00906 doi PQ20170206 (DE-627)OLC1989106862 (DE-599)GBVOLC1989106862 (PRQ)c1181-d763b29ba46238be82446f61ca1d110d5626d1712f16b5762f5ab8569a2b3f780 (KEY)0175809820170000029000100220onlinereinforcementlearningusingaprobabilitydensit DE-627 ger DE-627 rakwb eng 004 DE-600 Alejandro Agostini verfasserin aut Online Reinforcement Learning Using a Probability Density Estimation 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions. Probability Estimating techniques Sampling Neural networks Enric Celaya oth Enthalten in Neural computation Cambridge, Mass. : MIT Press, 1989 29(2017), 1, Seite 220-246 (DE-627)16566682X (DE-600)1025692-1 (DE-576)023099836 0899-7667 nnns volume:29 year:2017 number:1 pages:220-246 http://dx.doi.org/10.1162/NECO_a_00906 Volltext http://search.proquest.com/docview/1855913875 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-PHY SSG-OLC-MAT GBV_ILN_59 GBV_ILN_2192 AR 29 2017 1 220-246
allfields_unstemmed	10.1162/NECO_a_00906 doi PQ20170206 (DE-627)OLC1989106862 (DE-599)GBVOLC1989106862 (PRQ)c1181-d763b29ba46238be82446f61ca1d110d5626d1712f16b5762f5ab8569a2b3f780 (KEY)0175809820170000029000100220onlinereinforcementlearningusingaprobabilitydensit DE-627 ger DE-627 rakwb eng 004 DE-600 Alejandro Agostini verfasserin aut Online Reinforcement Learning Using a Probability Density Estimation 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions. Probability Estimating techniques Sampling Neural networks Enric Celaya oth Enthalten in Neural computation Cambridge, Mass. : MIT Press, 1989 29(2017), 1, Seite 220-246 (DE-627)16566682X (DE-600)1025692-1 (DE-576)023099836 0899-7667 nnns volume:29 year:2017 number:1 pages:220-246 http://dx.doi.org/10.1162/NECO_a_00906 Volltext http://search.proquest.com/docview/1855913875 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-PHY SSG-OLC-MAT GBV_ILN_59 GBV_ILN_2192 AR 29 2017 1 220-246
allfieldsGer	10.1162/NECO_a_00906 doi PQ20170206 (DE-627)OLC1989106862 (DE-599)GBVOLC1989106862 (PRQ)c1181-d763b29ba46238be82446f61ca1d110d5626d1712f16b5762f5ab8569a2b3f780 (KEY)0175809820170000029000100220onlinereinforcementlearningusingaprobabilitydensit DE-627 ger DE-627 rakwb eng 004 DE-600 Alejandro Agostini verfasserin aut Online Reinforcement Learning Using a Probability Density Estimation 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions. Probability Estimating techniques Sampling Neural networks Enric Celaya oth Enthalten in Neural computation Cambridge, Mass. : MIT Press, 1989 29(2017), 1, Seite 220-246 (DE-627)16566682X (DE-600)1025692-1 (DE-576)023099836 0899-7667 nnns volume:29 year:2017 number:1 pages:220-246 http://dx.doi.org/10.1162/NECO_a_00906 Volltext http://search.proquest.com/docview/1855913875 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-PHY SSG-OLC-MAT GBV_ILN_59 GBV_ILN_2192 AR 29 2017 1 220-246
allfieldsSound	10.1162/NECO_a_00906 doi PQ20170206 (DE-627)OLC1989106862 (DE-599)GBVOLC1989106862 (PRQ)c1181-d763b29ba46238be82446f61ca1d110d5626d1712f16b5762f5ab8569a2b3f780 (KEY)0175809820170000029000100220onlinereinforcementlearningusingaprobabilitydensit DE-627 ger DE-627 rakwb eng 004 DE-600 Alejandro Agostini verfasserin aut Online Reinforcement Learning Using a Probability Density Estimation 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions. Probability Estimating techniques Sampling Neural networks Enric Celaya oth Enthalten in Neural computation Cambridge, Mass. : MIT Press, 1989 29(2017), 1, Seite 220-246 (DE-627)16566682X (DE-600)1025692-1 (DE-576)023099836 0899-7667 nnns volume:29 year:2017 number:1 pages:220-246 http://dx.doi.org/10.1162/NECO_a_00906 Volltext http://search.proquest.com/docview/1855913875 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-PHY SSG-OLC-MAT GBV_ILN_59 GBV_ILN_2192 AR 29 2017 1 220-246
language	English
source	Enthalten in Neural computation 29(2017), 1, Seite 220-246 volume:29 year:2017 number:1 pages:220-246
sourceStr	Enthalten in Neural computation 29(2017), 1, Seite 220-246 volume:29 year:2017 number:1 pages:220-246
format_phy_str_mv	Article
institution	findex.gbv.de
topic_facet	Probability Estimating techniques Sampling Neural networks
dewey-raw	004
isfreeaccess_bool	false
container_title	Neural computation
authorswithroles_txt_mv	Alejandro Agostini @@aut@@ Enric Celaya @@oth@@
publishDateDaySort_date	2017-01-01T00:00:00Z
hierarchy_top_id	16566682X
dewey-sort	14
id	OLC1989106862
language_de	englisch
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a2200265 4500</leader><controlfield tag="001">OLC1989106862</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20210716170724.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">170207s2017 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1162/NECO_a_00906</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">PQ20170206</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC1989106862</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBVOLC1989106862</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(PRQ)c1181-d763b29ba46238be82446f61ca1d110d5626d1712f16b5762f5ab8569a2b3f780</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(KEY)0175809820170000029000100220onlinereinforcementlearningusingaprobabilitydensit</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">DE-600</subfield></datafield><datafield tag="100" ind1="0" ind2=" "><subfield code="a">Alejandro Agostini</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Online Reinforcement Learning Using a Probability Density Estimation</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2017</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Probability</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Estimating techniques</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Sampling</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Neural networks</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Enric Celaya</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Neural computation</subfield><subfield code="d">Cambridge, Mass. : MIT Press, 1989</subfield><subfield code="g">29(2017), 1, Seite 220-246</subfield><subfield code="w">(DE-627)16566682X</subfield><subfield code="w">(DE-600)1025692-1</subfield><subfield code="w">(DE-576)023099836</subfield><subfield code="x">0899-7667</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:29</subfield><subfield code="g">year:2017</subfield><subfield code="g">number:1</subfield><subfield code="g">pages:220-246</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">http://dx.doi.org/10.1162/NECO_a_00906</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://search.proquest.com/docview/1855913875</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-PHY</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_59</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2192</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">29</subfield><subfield code="j">2017</subfield><subfield code="e">1</subfield><subfield code="h">220-246</subfield></datafield></record></collection>
author	Alejandro Agostini
spellingShingle	Alejandro Agostini ddc 004 misc Probability misc Estimating techniques misc Sampling misc Neural networks Online Reinforcement Learning Using a Probability Density Estimation
authorStr	Alejandro Agostini
ppnlink_with_tag_str_mv	@@773@@(DE-627)16566682X
format	Article
dewey-ones	004 - Data processing & computer science
delete_txt_mv	keep
author_role	aut
collection	OLC
remote_str	false
illustrated	Not Illustrated
issn	0899-7667
topic_title	004 DE-600 Online Reinforcement Learning Using a Probability Density Estimation Probability Estimating techniques Sampling Neural networks
topic	ddc 004 misc Probability misc Estimating techniques misc Sampling misc Neural networks
topic_unstemmed	ddc 004 misc Probability misc Estimating techniques misc Sampling misc Neural networks
topic_browse	ddc 004 misc Probability misc Estimating techniques misc Sampling misc Neural networks
format_facet	Aufsätze Gedruckte Aufsätze
format_main_str_mv	Text Zeitschrift/Artikel
carriertype_str_mv	nc
author2_variant	e c ec
hierarchy_parent_title	Neural computation
hierarchy_parent_id	16566682X
dewey-tens	000 - Computer science, knowledge & systems
hierarchy_top_title	Neural computation
isfreeaccess_txt	false
familylinks_str_mv	(DE-627)16566682X (DE-600)1025692-1 (DE-576)023099836
title	Online Reinforcement Learning Using a Probability Density Estimation
ctrlnum	(DE-627)OLC1989106862 (DE-599)GBVOLC1989106862 (PRQ)c1181-d763b29ba46238be82446f61ca1d110d5626d1712f16b5762f5ab8569a2b3f780 (KEY)0175809820170000029000100220onlinereinforcementlearningusingaprobabilitydensit
title_full	Online Reinforcement Learning Using a Probability Density Estimation
author_sort	Alejandro Agostini
journal	Neural computation
journalStr	Neural computation
lang_code	eng
isOA_bool	false
dewey-hundreds	000 - Computer science, information & general works
recordtype	marc
publishDateSort	2017
contenttype_str_mv	txt
container_start_page	220
author_browse	Alejandro Agostini
container_volume	29
class	004 DE-600
format_se	Aufsätze
author-letter	Alejandro Agostini
doi_str_mv	10.1162/NECO_a_00906
dewey-full	004
title_sort	online reinforcement learning using a probability density estimation
title_auth	Online Reinforcement Learning Using a Probability Density Estimation
abstract	Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions.
abstractGer	Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions.
abstract_unstemmed	Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions.
collection_details	GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-PHY SSG-OLC-MAT GBV_ILN_59 GBV_ILN_2192
container_issue	1
title_short	Online Reinforcement Learning Using a Probability Density Estimation
url	http://dx.doi.org/10.1162/NECO_a_00906 http://search.proquest.com/docview/1855913875
remote_bool	false
author2	Enric Celaya
author2Str	Enric Celaya
ppnlink	16566682X
mediatype_str_mv	n
isOA_txt	false
hochschulschrift_bool	false
author2_role	oth
doi_str	10.1162/NECO_a_00906
up_date	2024-07-03T20:21:03.471Z
_version_	1803590632443215872
fullrecord_marcxml	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a2200265 4500</leader><controlfield tag="001">OLC1989106862</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20210716170724.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">170207s2017 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1162/NECO_a_00906</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">PQ20170206</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC1989106862</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBVOLC1989106862</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(PRQ)c1181-d763b29ba46238be82446f61ca1d110d5626d1712f16b5762f5ab8569a2b3f780</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(KEY)0175809820170000029000100220onlinereinforcementlearningusingaprobabilitydensit</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">DE-600</subfield></datafield><datafield tag="100" ind1="0" ind2=" "><subfield code="a">Alejandro Agostini</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Online Reinforcement Learning Using a Probability Density Estimation</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2017</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Probability</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Estimating techniques</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Sampling</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Neural networks</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Enric Celaya</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Neural computation</subfield><subfield code="d">Cambridge, Mass. : MIT Press, 1989</subfield><subfield code="g">29(2017), 1, Seite 220-246</subfield><subfield code="w">(DE-627)16566682X</subfield><subfield code="w">(DE-600)1025692-1</subfield><subfield code="w">(DE-576)023099836</subfield><subfield code="x">0899-7667</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:29</subfield><subfield code="g">year:2017</subfield><subfield code="g">number:1</subfield><subfield code="g">pages:220-246</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">http://dx.doi.org/10.1162/NECO_a_00906</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://search.proquest.com/docview/1855913875</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-PHY</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_59</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2192</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">29</subfield><subfield code="j">2017</subfield><subfield code="e">1</subfield><subfield code="h">220-246</subfield></datafield></record></collection>
score	7.399288

Nicht das Richtige dabei?

Schreiben Sie uns!

Online Reinforcement Learning Using a Probability Density Estimation

Nicht das Richtige dabei?

Zugang & Verfügbarkeit

Vorhandene Bände

Nicht das Richtige dabei?