Online Reinforcement Learning Using a Probability Density Estimation
Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and...
Ausführliche Beschreibung
Autor*in: |
Alejandro Agostini [verfasserIn] |
---|
Format: |
Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2017 |
---|
Schlagwörter: |
---|
Übergeordnetes Werk: |
Enthalten in: Neural computation - Cambridge, Mass. : MIT Press, 1989, 29(2017), 1, Seite 220-246 |
---|---|
Übergeordnetes Werk: |
volume:29 ; year:2017 ; number:1 ; pages:220-246 |
Links: |
---|
DOI / URN: |
10.1162/NECO_a_00906 |
---|
Katalog-ID: |
OLC1989106862 |
---|
LEADER | 01000caa a2200265 4500 | ||
---|---|---|---|
001 | OLC1989106862 | ||
003 | DE-627 | ||
005 | 20210716170724.0 | ||
007 | tu | ||
008 | 170207s2017 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1162/NECO_a_00906 |2 doi | |
028 | 5 | 2 | |a PQ20170206 |
035 | |a (DE-627)OLC1989106862 | ||
035 | |a (DE-599)GBVOLC1989106862 | ||
035 | |a (PRQ)c1181-d763b29ba46238be82446f61ca1d110d5626d1712f16b5762f5ab8569a2b3f780 | ||
035 | |a (KEY)0175809820170000029000100220onlinereinforcementlearningusingaprobabilitydensit | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 004 |q DE-600 |
100 | 0 | |a Alejandro Agostini |e verfasserin |4 aut | |
245 | 1 | 0 | |a Online Reinforcement Learning Using a Probability Density Estimation |
264 | 1 | |c 2017 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
520 | |a Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions. | ||
650 | 4 | |a Probability | |
650 | 4 | |a Estimating techniques | |
650 | 4 | |a Sampling | |
650 | 4 | |a Neural networks | |
700 | 0 | |a Enric Celaya |4 oth | |
773 | 0 | 8 | |i Enthalten in |t Neural computation |d Cambridge, Mass. : MIT Press, 1989 |g 29(2017), 1, Seite 220-246 |w (DE-627)16566682X |w (DE-600)1025692-1 |w (DE-576)023099836 |x 0899-7667 |7 nnns |
773 | 1 | 8 | |g volume:29 |g year:2017 |g number:1 |g pages:220-246 |
856 | 4 | 1 | |u http://dx.doi.org/10.1162/NECO_a_00906 |3 Volltext |
856 | 4 | 2 | |u http://search.proquest.com/docview/1855913875 |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a SSG-OLC-PHY | ||
912 | |a SSG-OLC-MAT | ||
912 | |a GBV_ILN_59 | ||
912 | |a GBV_ILN_2192 | ||
951 | |a AR | ||
952 | |d 29 |j 2017 |e 1 |h 220-246 |
author_variant |
a a aa |
---|---|
matchkey_str |
article:08997667:2017----::nieenocmnlannuigpoaiiy |
hierarchy_sort_str |
2017 |
publishDate |
2017 |
allfields |
10.1162/NECO_a_00906 doi PQ20170206 (DE-627)OLC1989106862 (DE-599)GBVOLC1989106862 (PRQ)c1181-d763b29ba46238be82446f61ca1d110d5626d1712f16b5762f5ab8569a2b3f780 (KEY)0175809820170000029000100220onlinereinforcementlearningusingaprobabilitydensit DE-627 ger DE-627 rakwb eng 004 DE-600 Alejandro Agostini verfasserin aut Online Reinforcement Learning Using a Probability Density Estimation 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions. Probability Estimating techniques Sampling Neural networks Enric Celaya oth Enthalten in Neural computation Cambridge, Mass. : MIT Press, 1989 29(2017), 1, Seite 220-246 (DE-627)16566682X (DE-600)1025692-1 (DE-576)023099836 0899-7667 nnns volume:29 year:2017 number:1 pages:220-246 http://dx.doi.org/10.1162/NECO_a_00906 Volltext http://search.proquest.com/docview/1855913875 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-PHY SSG-OLC-MAT GBV_ILN_59 GBV_ILN_2192 AR 29 2017 1 220-246 |
spelling |
10.1162/NECO_a_00906 doi PQ20170206 (DE-627)OLC1989106862 (DE-599)GBVOLC1989106862 (PRQ)c1181-d763b29ba46238be82446f61ca1d110d5626d1712f16b5762f5ab8569a2b3f780 (KEY)0175809820170000029000100220onlinereinforcementlearningusingaprobabilitydensit DE-627 ger DE-627 rakwb eng 004 DE-600 Alejandro Agostini verfasserin aut Online Reinforcement Learning Using a Probability Density Estimation 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions. Probability Estimating techniques Sampling Neural networks Enric Celaya oth Enthalten in Neural computation Cambridge, Mass. : MIT Press, 1989 29(2017), 1, Seite 220-246 (DE-627)16566682X (DE-600)1025692-1 (DE-576)023099836 0899-7667 nnns volume:29 year:2017 number:1 pages:220-246 http://dx.doi.org/10.1162/NECO_a_00906 Volltext http://search.proquest.com/docview/1855913875 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-PHY SSG-OLC-MAT GBV_ILN_59 GBV_ILN_2192 AR 29 2017 1 220-246 |
allfields_unstemmed |
10.1162/NECO_a_00906 doi PQ20170206 (DE-627)OLC1989106862 (DE-599)GBVOLC1989106862 (PRQ)c1181-d763b29ba46238be82446f61ca1d110d5626d1712f16b5762f5ab8569a2b3f780 (KEY)0175809820170000029000100220onlinereinforcementlearningusingaprobabilitydensit DE-627 ger DE-627 rakwb eng 004 DE-600 Alejandro Agostini verfasserin aut Online Reinforcement Learning Using a Probability Density Estimation 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions. Probability Estimating techniques Sampling Neural networks Enric Celaya oth Enthalten in Neural computation Cambridge, Mass. : MIT Press, 1989 29(2017), 1, Seite 220-246 (DE-627)16566682X (DE-600)1025692-1 (DE-576)023099836 0899-7667 nnns volume:29 year:2017 number:1 pages:220-246 http://dx.doi.org/10.1162/NECO_a_00906 Volltext http://search.proquest.com/docview/1855913875 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-PHY SSG-OLC-MAT GBV_ILN_59 GBV_ILN_2192 AR 29 2017 1 220-246 |
allfieldsGer |
10.1162/NECO_a_00906 doi PQ20170206 (DE-627)OLC1989106862 (DE-599)GBVOLC1989106862 (PRQ)c1181-d763b29ba46238be82446f61ca1d110d5626d1712f16b5762f5ab8569a2b3f780 (KEY)0175809820170000029000100220onlinereinforcementlearningusingaprobabilitydensit DE-627 ger DE-627 rakwb eng 004 DE-600 Alejandro Agostini verfasserin aut Online Reinforcement Learning Using a Probability Density Estimation 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions. Probability Estimating techniques Sampling Neural networks Enric Celaya oth Enthalten in Neural computation Cambridge, Mass. : MIT Press, 1989 29(2017), 1, Seite 220-246 (DE-627)16566682X (DE-600)1025692-1 (DE-576)023099836 0899-7667 nnns volume:29 year:2017 number:1 pages:220-246 http://dx.doi.org/10.1162/NECO_a_00906 Volltext http://search.proquest.com/docview/1855913875 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-PHY SSG-OLC-MAT GBV_ILN_59 GBV_ILN_2192 AR 29 2017 1 220-246 |
allfieldsSound |
10.1162/NECO_a_00906 doi PQ20170206 (DE-627)OLC1989106862 (DE-599)GBVOLC1989106862 (PRQ)c1181-d763b29ba46238be82446f61ca1d110d5626d1712f16b5762f5ab8569a2b3f780 (KEY)0175809820170000029000100220onlinereinforcementlearningusingaprobabilitydensit DE-627 ger DE-627 rakwb eng 004 DE-600 Alejandro Agostini verfasserin aut Online Reinforcement Learning Using a Probability Density Estimation 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions. Probability Estimating techniques Sampling Neural networks Enric Celaya oth Enthalten in Neural computation Cambridge, Mass. : MIT Press, 1989 29(2017), 1, Seite 220-246 (DE-627)16566682X (DE-600)1025692-1 (DE-576)023099836 0899-7667 nnns volume:29 year:2017 number:1 pages:220-246 http://dx.doi.org/10.1162/NECO_a_00906 Volltext http://search.proquest.com/docview/1855913875 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-PHY SSG-OLC-MAT GBV_ILN_59 GBV_ILN_2192 AR 29 2017 1 220-246 |
language |
English |
source |
Enthalten in Neural computation 29(2017), 1, Seite 220-246 volume:29 year:2017 number:1 pages:220-246 |
sourceStr |
Enthalten in Neural computation 29(2017), 1, Seite 220-246 volume:29 year:2017 number:1 pages:220-246 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Probability Estimating techniques Sampling Neural networks |
dewey-raw |
004 |
isfreeaccess_bool |
false |
container_title |
Neural computation |
authorswithroles_txt_mv |
Alejandro Agostini @@aut@@ Enric Celaya @@oth@@ |
publishDateDaySort_date |
2017-01-01T00:00:00Z |
hierarchy_top_id |
16566682X |
dewey-sort |
14 |
id |
OLC1989106862 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a2200265 4500</leader><controlfield tag="001">OLC1989106862</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20210716170724.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">170207s2017 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1162/NECO_a_00906</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">PQ20170206</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC1989106862</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBVOLC1989106862</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(PRQ)c1181-d763b29ba46238be82446f61ca1d110d5626d1712f16b5762f5ab8569a2b3f780</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(KEY)0175809820170000029000100220onlinereinforcementlearningusingaprobabilitydensit</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">DE-600</subfield></datafield><datafield tag="100" ind1="0" ind2=" "><subfield code="a">Alejandro Agostini</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Online Reinforcement Learning Using a Probability Density Estimation</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2017</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Probability</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Estimating techniques</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Sampling</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Neural networks</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Enric Celaya</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Neural computation</subfield><subfield code="d">Cambridge, Mass. : MIT Press, 1989</subfield><subfield code="g">29(2017), 1, Seite 220-246</subfield><subfield code="w">(DE-627)16566682X</subfield><subfield code="w">(DE-600)1025692-1</subfield><subfield code="w">(DE-576)023099836</subfield><subfield code="x">0899-7667</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:29</subfield><subfield code="g">year:2017</subfield><subfield code="g">number:1</subfield><subfield code="g">pages:220-246</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">http://dx.doi.org/10.1162/NECO_a_00906</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://search.proquest.com/docview/1855913875</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-PHY</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_59</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2192</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">29</subfield><subfield code="j">2017</subfield><subfield code="e">1</subfield><subfield code="h">220-246</subfield></datafield></record></collection>
|
author |
Alejandro Agostini |
spellingShingle |
Alejandro Agostini ddc 004 misc Probability misc Estimating techniques misc Sampling misc Neural networks Online Reinforcement Learning Using a Probability Density Estimation |
authorStr |
Alejandro Agostini |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)16566682X |
format |
Article |
dewey-ones |
004 - Data processing & computer science |
delete_txt_mv |
keep |
author_role |
aut |
collection |
OLC |
remote_str |
false |
illustrated |
Not Illustrated |
issn |
0899-7667 |
topic_title |
004 DE-600 Online Reinforcement Learning Using a Probability Density Estimation Probability Estimating techniques Sampling Neural networks |
topic |
ddc 004 misc Probability misc Estimating techniques misc Sampling misc Neural networks |
topic_unstemmed |
ddc 004 misc Probability misc Estimating techniques misc Sampling misc Neural networks |
topic_browse |
ddc 004 misc Probability misc Estimating techniques misc Sampling misc Neural networks |
format_facet |
Aufsätze Gedruckte Aufsätze |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
nc |
author2_variant |
e c ec |
hierarchy_parent_title |
Neural computation |
hierarchy_parent_id |
16566682X |
dewey-tens |
000 - Computer science, knowledge & systems |
hierarchy_top_title |
Neural computation |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)16566682X (DE-600)1025692-1 (DE-576)023099836 |
title |
Online Reinforcement Learning Using a Probability Density Estimation |
ctrlnum |
(DE-627)OLC1989106862 (DE-599)GBVOLC1989106862 (PRQ)c1181-d763b29ba46238be82446f61ca1d110d5626d1712f16b5762f5ab8569a2b3f780 (KEY)0175809820170000029000100220onlinereinforcementlearningusingaprobabilitydensit |
title_full |
Online Reinforcement Learning Using a Probability Density Estimation |
author_sort |
Alejandro Agostini |
journal |
Neural computation |
journalStr |
Neural computation |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
000 - Computer science, information & general works |
recordtype |
marc |
publishDateSort |
2017 |
contenttype_str_mv |
txt |
container_start_page |
220 |
author_browse |
Alejandro Agostini |
container_volume |
29 |
class |
004 DE-600 |
format_se |
Aufsätze |
author-letter |
Alejandro Agostini |
doi_str_mv |
10.1162/NECO_a_00906 |
dewey-full |
004 |
title_sort |
online reinforcement learning using a probability density estimation |
title_auth |
Online Reinforcement Learning Using a Probability Density Estimation |
abstract |
Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions. |
abstractGer |
Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions. |
abstract_unstemmed |
Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions. |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-PHY SSG-OLC-MAT GBV_ILN_59 GBV_ILN_2192 |
container_issue |
1 |
title_short |
Online Reinforcement Learning Using a Probability Density Estimation |
url |
http://dx.doi.org/10.1162/NECO_a_00906 http://search.proquest.com/docview/1855913875 |
remote_bool |
false |
author2 |
Enric Celaya |
author2Str |
Enric Celaya |
ppnlink |
16566682X |
mediatype_str_mv |
n |
isOA_txt |
false |
hochschulschrift_bool |
false |
author2_role |
oth |
doi_str |
10.1162/NECO_a_00906 |
up_date |
2024-07-03T20:21:03.471Z |
_version_ |
1803590632443215872 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a2200265 4500</leader><controlfield tag="001">OLC1989106862</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20210716170724.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">170207s2017 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1162/NECO_a_00906</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">PQ20170206</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC1989106862</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBVOLC1989106862</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(PRQ)c1181-d763b29ba46238be82446f61ca1d110d5626d1712f16b5762f5ab8569a2b3f780</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(KEY)0175809820170000029000100220onlinereinforcementlearningusingaprobabilitydensit</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">DE-600</subfield></datafield><datafield tag="100" ind1="0" ind2=" "><subfield code="a">Alejandro Agostini</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Online Reinforcement Learning Using a Probability Density Estimation</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2017</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Probability</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Estimating techniques</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Sampling</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Neural networks</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Enric Celaya</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Neural computation</subfield><subfield code="d">Cambridge, Mass. : MIT Press, 1989</subfield><subfield code="g">29(2017), 1, Seite 220-246</subfield><subfield code="w">(DE-627)16566682X</subfield><subfield code="w">(DE-600)1025692-1</subfield><subfield code="w">(DE-576)023099836</subfield><subfield code="x">0899-7667</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:29</subfield><subfield code="g">year:2017</subfield><subfield code="g">number:1</subfield><subfield code="g">pages:220-246</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">http://dx.doi.org/10.1162/NECO_a_00906</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://search.proquest.com/docview/1855913875</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-PHY</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_59</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2192</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">29</subfield><subfield code="j">2017</subfield><subfield code="e">1</subfield><subfield code="h">220-246</subfield></datafield></record></collection>
|
score |
7.399288 |