Simple statistical gradient-following algorithms for connectionist reinforcement learning
Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforc...
Ausführliche Beschreibung
Autor*in: |
Williams, Ronald J. [verfasserIn] |
---|
Format: |
Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
1992 |
---|
Schlagwörter: |
---|
Anmerkung: |
© Kluwer Academic Publishers 1992 |
---|
Übergeordnetes Werk: |
Enthalten in: Machine learning - Kluwer Academic Publishers, 1986, 8(1992), 3-4 vom: Mai, Seite 229-256 |
---|---|
Übergeordnetes Werk: |
volume:8 ; year:1992 ; number:3-4 ; month:05 ; pages:229-256 |
Links: |
---|
DOI / URN: |
10.1007/BF00992696 |
---|
Katalog-ID: |
OLC2026512213 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | OLC2026512213 | ||
003 | DE-627 | ||
005 | 20230503172148.0 | ||
007 | tu | ||
008 | 200820s1992 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1007/BF00992696 |2 doi | |
035 | |a (DE-627)OLC2026512213 | ||
035 | |a (DE-He213)BF00992696-p | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 150 |a 004 |q VZ |
100 | 1 | |a Williams, Ronald J. |e verfasserin |4 aut | |
245 | 1 | 0 | |a Simple statistical gradient-following algorithms for connectionist reinforcement learning |
264 | 1 | |c 1992 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
500 | |a © Kluwer Academic Publishers 1992 | ||
520 | |a Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms. | ||
650 | 4 | |a Reinforcement learning | |
650 | 4 | |a connectionist networks | |
650 | 4 | |a gradient descent | |
650 | 4 | |a mathematical analysis | |
773 | 0 | 8 | |i Enthalten in |t Machine learning |d Kluwer Academic Publishers, 1986 |g 8(1992), 3-4 vom: Mai, Seite 229-256 |w (DE-627)12920403X |w (DE-600)54638-0 |w (DE-576)014457377 |x 0885-6125 |7 nnns |
773 | 1 | 8 | |g volume:8 |g year:1992 |g number:3-4 |g month:05 |g pages:229-256 |
856 | 4 | 1 | |u https://doi.org/10.1007/BF00992696 |z lizenzpflichtig |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a SSG-OLC-MAT | ||
912 | |a GBV_ILN_21 | ||
912 | |a GBV_ILN_22 | ||
912 | |a GBV_ILN_31 | ||
912 | |a GBV_ILN_70 | ||
912 | |a GBV_ILN_130 | ||
912 | |a GBV_ILN_2006 | ||
912 | |a GBV_ILN_2010 | ||
912 | |a GBV_ILN_2020 | ||
912 | |a GBV_ILN_2093 | ||
912 | |a GBV_ILN_2244 | ||
912 | |a GBV_ILN_4012 | ||
912 | |a GBV_ILN_4046 | ||
912 | |a GBV_ILN_4266 | ||
912 | |a GBV_ILN_4306 | ||
912 | |a GBV_ILN_4307 | ||
912 | |a GBV_ILN_4318 | ||
951 | |a AR | ||
952 | |d 8 |j 1992 |e 3-4 |c 05 |h 229-256 |
author_variant |
r j w rj rjw |
---|---|
matchkey_str |
article:08856125:1992----::ipettsiagainfloigloihsocnetoit |
hierarchy_sort_str |
1992 |
publishDate |
1992 |
allfields |
10.1007/BF00992696 doi (DE-627)OLC2026512213 (DE-He213)BF00992696-p DE-627 ger DE-627 rakwb eng 150 004 VZ Williams, Ronald J. verfasserin aut Simple statistical gradient-following algorithms for connectionist reinforcement learning 1992 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Kluwer Academic Publishers 1992 Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms. Reinforcement learning connectionist networks gradient descent mathematical analysis Enthalten in Machine learning Kluwer Academic Publishers, 1986 8(1992), 3-4 vom: Mai, Seite 229-256 (DE-627)12920403X (DE-600)54638-0 (DE-576)014457377 0885-6125 nnns volume:8 year:1992 number:3-4 month:05 pages:229-256 https://doi.org/10.1007/BF00992696 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_21 GBV_ILN_22 GBV_ILN_31 GBV_ILN_70 GBV_ILN_130 GBV_ILN_2006 GBV_ILN_2010 GBV_ILN_2020 GBV_ILN_2093 GBV_ILN_2244 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4266 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4318 AR 8 1992 3-4 05 229-256 |
spelling |
10.1007/BF00992696 doi (DE-627)OLC2026512213 (DE-He213)BF00992696-p DE-627 ger DE-627 rakwb eng 150 004 VZ Williams, Ronald J. verfasserin aut Simple statistical gradient-following algorithms for connectionist reinforcement learning 1992 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Kluwer Academic Publishers 1992 Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms. Reinforcement learning connectionist networks gradient descent mathematical analysis Enthalten in Machine learning Kluwer Academic Publishers, 1986 8(1992), 3-4 vom: Mai, Seite 229-256 (DE-627)12920403X (DE-600)54638-0 (DE-576)014457377 0885-6125 nnns volume:8 year:1992 number:3-4 month:05 pages:229-256 https://doi.org/10.1007/BF00992696 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_21 GBV_ILN_22 GBV_ILN_31 GBV_ILN_70 GBV_ILN_130 GBV_ILN_2006 GBV_ILN_2010 GBV_ILN_2020 GBV_ILN_2093 GBV_ILN_2244 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4266 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4318 AR 8 1992 3-4 05 229-256 |
allfields_unstemmed |
10.1007/BF00992696 doi (DE-627)OLC2026512213 (DE-He213)BF00992696-p DE-627 ger DE-627 rakwb eng 150 004 VZ Williams, Ronald J. verfasserin aut Simple statistical gradient-following algorithms for connectionist reinforcement learning 1992 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Kluwer Academic Publishers 1992 Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms. Reinforcement learning connectionist networks gradient descent mathematical analysis Enthalten in Machine learning Kluwer Academic Publishers, 1986 8(1992), 3-4 vom: Mai, Seite 229-256 (DE-627)12920403X (DE-600)54638-0 (DE-576)014457377 0885-6125 nnns volume:8 year:1992 number:3-4 month:05 pages:229-256 https://doi.org/10.1007/BF00992696 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_21 GBV_ILN_22 GBV_ILN_31 GBV_ILN_70 GBV_ILN_130 GBV_ILN_2006 GBV_ILN_2010 GBV_ILN_2020 GBV_ILN_2093 GBV_ILN_2244 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4266 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4318 AR 8 1992 3-4 05 229-256 |
allfieldsGer |
10.1007/BF00992696 doi (DE-627)OLC2026512213 (DE-He213)BF00992696-p DE-627 ger DE-627 rakwb eng 150 004 VZ Williams, Ronald J. verfasserin aut Simple statistical gradient-following algorithms for connectionist reinforcement learning 1992 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Kluwer Academic Publishers 1992 Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms. Reinforcement learning connectionist networks gradient descent mathematical analysis Enthalten in Machine learning Kluwer Academic Publishers, 1986 8(1992), 3-4 vom: Mai, Seite 229-256 (DE-627)12920403X (DE-600)54638-0 (DE-576)014457377 0885-6125 nnns volume:8 year:1992 number:3-4 month:05 pages:229-256 https://doi.org/10.1007/BF00992696 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_21 GBV_ILN_22 GBV_ILN_31 GBV_ILN_70 GBV_ILN_130 GBV_ILN_2006 GBV_ILN_2010 GBV_ILN_2020 GBV_ILN_2093 GBV_ILN_2244 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4266 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4318 AR 8 1992 3-4 05 229-256 |
allfieldsSound |
10.1007/BF00992696 doi (DE-627)OLC2026512213 (DE-He213)BF00992696-p DE-627 ger DE-627 rakwb eng 150 004 VZ Williams, Ronald J. verfasserin aut Simple statistical gradient-following algorithms for connectionist reinforcement learning 1992 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Kluwer Academic Publishers 1992 Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms. Reinforcement learning connectionist networks gradient descent mathematical analysis Enthalten in Machine learning Kluwer Academic Publishers, 1986 8(1992), 3-4 vom: Mai, Seite 229-256 (DE-627)12920403X (DE-600)54638-0 (DE-576)014457377 0885-6125 nnns volume:8 year:1992 number:3-4 month:05 pages:229-256 https://doi.org/10.1007/BF00992696 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_21 GBV_ILN_22 GBV_ILN_31 GBV_ILN_70 GBV_ILN_130 GBV_ILN_2006 GBV_ILN_2010 GBV_ILN_2020 GBV_ILN_2093 GBV_ILN_2244 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4266 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4318 AR 8 1992 3-4 05 229-256 |
language |
English |
source |
Enthalten in Machine learning 8(1992), 3-4 vom: Mai, Seite 229-256 volume:8 year:1992 number:3-4 month:05 pages:229-256 |
sourceStr |
Enthalten in Machine learning 8(1992), 3-4 vom: Mai, Seite 229-256 volume:8 year:1992 number:3-4 month:05 pages:229-256 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Reinforcement learning connectionist networks gradient descent mathematical analysis |
dewey-raw |
150 |
isfreeaccess_bool |
false |
container_title |
Machine learning |
authorswithroles_txt_mv |
Williams, Ronald J. @@aut@@ |
publishDateDaySort_date |
1992-05-01T00:00:00Z |
hierarchy_top_id |
12920403X |
dewey-sort |
3150 |
id |
OLC2026512213 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2026512213</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503172148.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200820s1992 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/BF00992696</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2026512213</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)BF00992696-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">150</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Williams, Ronald J.</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Simple statistical gradient-following algorithms for connectionist reinforcement learning</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">1992</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Kluwer Academic Publishers 1992</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Reinforcement learning</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">connectionist networks</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">gradient descent</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">mathematical analysis</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Machine learning</subfield><subfield code="d">Kluwer Academic Publishers, 1986</subfield><subfield code="g">8(1992), 3-4 vom: Mai, Seite 229-256</subfield><subfield code="w">(DE-627)12920403X</subfield><subfield code="w">(DE-600)54638-0</subfield><subfield code="w">(DE-576)014457377</subfield><subfield code="x">0885-6125</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:8</subfield><subfield code="g">year:1992</subfield><subfield code="g">number:3-4</subfield><subfield code="g">month:05</subfield><subfield code="g">pages:229-256</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/BF00992696</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_21</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_31</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_130</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2006</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2010</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2020</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2093</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2244</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4046</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4266</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4306</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4318</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">8</subfield><subfield code="j">1992</subfield><subfield code="e">3-4</subfield><subfield code="c">05</subfield><subfield code="h">229-256</subfield></datafield></record></collection>
|
author |
Williams, Ronald J. |
spellingShingle |
Williams, Ronald J. ddc 150 misc Reinforcement learning misc connectionist networks misc gradient descent misc mathematical analysis Simple statistical gradient-following algorithms for connectionist reinforcement learning |
authorStr |
Williams, Ronald J. |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)12920403X |
format |
Article |
dewey-ones |
150 - Psychology 004 - Data processing & computer science |
delete_txt_mv |
keep |
author_role |
aut |
collection |
OLC |
remote_str |
false |
illustrated |
Not Illustrated |
issn |
0885-6125 |
topic_title |
150 004 VZ Simple statistical gradient-following algorithms for connectionist reinforcement learning Reinforcement learning connectionist networks gradient descent mathematical analysis |
topic |
ddc 150 misc Reinforcement learning misc connectionist networks misc gradient descent misc mathematical analysis |
topic_unstemmed |
ddc 150 misc Reinforcement learning misc connectionist networks misc gradient descent misc mathematical analysis |
topic_browse |
ddc 150 misc Reinforcement learning misc connectionist networks misc gradient descent misc mathematical analysis |
format_facet |
Aufsätze Gedruckte Aufsätze |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
nc |
hierarchy_parent_title |
Machine learning |
hierarchy_parent_id |
12920403X |
dewey-tens |
150 - Psychology 000 - Computer science, knowledge & systems |
hierarchy_top_title |
Machine learning |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)12920403X (DE-600)54638-0 (DE-576)014457377 |
title |
Simple statistical gradient-following algorithms for connectionist reinforcement learning |
ctrlnum |
(DE-627)OLC2026512213 (DE-He213)BF00992696-p |
title_full |
Simple statistical gradient-following algorithms for connectionist reinforcement learning |
author_sort |
Williams, Ronald J. |
journal |
Machine learning |
journalStr |
Machine learning |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
100 - Philosophy & psychology 000 - Computer science, information & general works |
recordtype |
marc |
publishDateSort |
1992 |
contenttype_str_mv |
txt |
container_start_page |
229 |
author_browse |
Williams, Ronald J. |
container_volume |
8 |
class |
150 004 VZ |
format_se |
Aufsätze |
author-letter |
Williams, Ronald J. |
doi_str_mv |
10.1007/BF00992696 |
dewey-full |
150 004 |
title_sort |
simple statistical gradient-following algorithms for connectionist reinforcement learning |
title_auth |
Simple statistical gradient-following algorithms for connectionist reinforcement learning |
abstract |
Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms. © Kluwer Academic Publishers 1992 |
abstractGer |
Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms. © Kluwer Academic Publishers 1992 |
abstract_unstemmed |
Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms. © Kluwer Academic Publishers 1992 |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_21 GBV_ILN_22 GBV_ILN_31 GBV_ILN_70 GBV_ILN_130 GBV_ILN_2006 GBV_ILN_2010 GBV_ILN_2020 GBV_ILN_2093 GBV_ILN_2244 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4266 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4318 |
container_issue |
3-4 |
title_short |
Simple statistical gradient-following algorithms for connectionist reinforcement learning |
url |
https://doi.org/10.1007/BF00992696 |
remote_bool |
false |
ppnlink |
12920403X |
mediatype_str_mv |
n |
isOA_txt |
false |
hochschulschrift_bool |
false |
doi_str |
10.1007/BF00992696 |
up_date |
2024-07-04T04:08:08.218Z |
_version_ |
1803620018522423297 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2026512213</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503172148.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200820s1992 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/BF00992696</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2026512213</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)BF00992696-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">150</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Williams, Ronald J.</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Simple statistical gradient-following algorithms for connectionist reinforcement learning</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">1992</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Kluwer Academic Publishers 1992</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Reinforcement learning</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">connectionist networks</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">gradient descent</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">mathematical analysis</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Machine learning</subfield><subfield code="d">Kluwer Academic Publishers, 1986</subfield><subfield code="g">8(1992), 3-4 vom: Mai, Seite 229-256</subfield><subfield code="w">(DE-627)12920403X</subfield><subfield code="w">(DE-600)54638-0</subfield><subfield code="w">(DE-576)014457377</subfield><subfield code="x">0885-6125</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:8</subfield><subfield code="g">year:1992</subfield><subfield code="g">number:3-4</subfield><subfield code="g">month:05</subfield><subfield code="g">pages:229-256</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/BF00992696</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_21</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_31</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_130</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2006</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2010</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2020</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2093</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2244</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4046</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4266</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4306</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4318</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">8</subfield><subfield code="j">1992</subfield><subfield code="e">3-4</subfield><subfield code="c">05</subfield><subfield code="h">229-256</subfield></datafield></record></collection>
|
score |
7.399107 |