Simple statistical gradient-following algorithms for connectionist reinforcement learning

Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforc...
Ausführliche Beschreibung

Gespeichert in:

Autor*in:	Williams, Ronald J. [verfasserIn]

Format:	Artikel
Sprache:	Englisch

Erschienen:	1992

Schlagwörter:	Reinforcement learning connectionist networks gradient descent mathematical analysis

Anmerkung:	© Kluwer Academic Publishers 1992

Übergeordnetes Werk:	Enthalten in: Machine learning - Kluwer Academic Publishers, 1986, 8(1992), 3-4 vom: Mai, Seite 229-256
Übergeordnetes Werk:	volume:8 ; year:1992 ; number:3-4 ; month:05 ; pages:229-256

Links:	Volltext

DOI / URN:	10.1007/BF00992696

Katalog-ID:	OLC2026512213

Internformat


LEADER	01000caa a22002652 4500
001	OLC2026512213
003	DE-627
005	20230503172148.0
007	tu
008	200820s1992 xx \|\|\|\|\| 00\| \|\|eng c
024	7		\|a 10.1007/BF00992696 \|2 doi
035			\|a (DE-627)OLC2026512213
035			\|a (DE-He213)BF00992696-p
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
082	0	4	\|a 150 \|a 004 \|q VZ
100	1		\|a Williams, Ronald J. \|e verfasserin \|4 aut
245	1	0	\|a Simple statistical gradient-following algorithms for connectionist reinforcement learning
264		1	\|c 1992
336			\|a Text \|b txt \|2 rdacontent
337			\|a ohne Hilfsmittel zu benutzen \|b n \|2 rdamedia
338			\|a Band \|b nc \|2 rdacarrier
500			\|a © Kluwer Academic Publishers 1992
520			\|a Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.
650		4	\|a Reinforcement learning
650		4	\|a connectionist networks
650		4	\|a gradient descent
650		4	\|a mathematical analysis
773	0	8	\|i Enthalten in \|t Machine learning \|d Kluwer Academic Publishers, 1986 \|g 8(1992), 3-4 vom: Mai, Seite 229-256 \|w (DE-627)12920403X \|w (DE-600)54638-0 \|w (DE-576)014457377 \|x 0885-6125 \|7 nnns
773	1	8	\|g volume:8 \|g year:1992 \|g number:3-4 \|g month:05 \|g pages:229-256
856	4	1	\|u https://doi.org/10.1007/BF00992696 \|z lizenzpflichtig \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_OLC
912			\|a SSG-OLC-MAT
912			\|a GBV_ILN_21
912			\|a GBV_ILN_22
912			\|a GBV_ILN_31
912			\|a GBV_ILN_70
912			\|a GBV_ILN_130
912			\|a GBV_ILN_2006
912			\|a GBV_ILN_2010
912			\|a GBV_ILN_2020
912			\|a GBV_ILN_2093
912			\|a GBV_ILN_2244
912			\|a GBV_ILN_4012
912			\|a GBV_ILN_4046
912			\|a GBV_ILN_4266
912			\|a GBV_ILN_4306
912			\|a GBV_ILN_4307
912			\|a GBV_ILN_4318
951			\|a AR
952			\|d 8 \|j 1992 \|e 3-4 \|c 05 \|h 229-256

Indexfelder

author_variant	r j w rj rjw
matchkey_str	article:08856125:1992----::ipettsiagainfloigloihsocnetoit
hierarchy_sort_str	1992
publishDate	1992
allfields	10.1007/BF00992696 doi (DE-627)OLC2026512213 (DE-He213)BF00992696-p DE-627 ger DE-627 rakwb eng 150 004 VZ Williams, Ronald J. verfasserin aut Simple statistical gradient-following algorithms for connectionist reinforcement learning 1992 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Kluwer Academic Publishers 1992 Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms. Reinforcement learning connectionist networks gradient descent mathematical analysis Enthalten in Machine learning Kluwer Academic Publishers, 1986 8(1992), 3-4 vom: Mai, Seite 229-256 (DE-627)12920403X (DE-600)54638-0 (DE-576)014457377 0885-6125 nnns volume:8 year:1992 number:3-4 month:05 pages:229-256 https://doi.org/10.1007/BF00992696 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_21 GBV_ILN_22 GBV_ILN_31 GBV_ILN_70 GBV_ILN_130 GBV_ILN_2006 GBV_ILN_2010 GBV_ILN_2020 GBV_ILN_2093 GBV_ILN_2244 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4266 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4318 AR 8 1992 3-4 05 229-256
spelling	10.1007/BF00992696 doi (DE-627)OLC2026512213 (DE-He213)BF00992696-p DE-627 ger DE-627 rakwb eng 150 004 VZ Williams, Ronald J. verfasserin aut Simple statistical gradient-following algorithms for connectionist reinforcement learning 1992 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Kluwer Academic Publishers 1992 Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms. Reinforcement learning connectionist networks gradient descent mathematical analysis Enthalten in Machine learning Kluwer Academic Publishers, 1986 8(1992), 3-4 vom: Mai, Seite 229-256 (DE-627)12920403X (DE-600)54638-0 (DE-576)014457377 0885-6125 nnns volume:8 year:1992 number:3-4 month:05 pages:229-256 https://doi.org/10.1007/BF00992696 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_21 GBV_ILN_22 GBV_ILN_31 GBV_ILN_70 GBV_ILN_130 GBV_ILN_2006 GBV_ILN_2010 GBV_ILN_2020 GBV_ILN_2093 GBV_ILN_2244 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4266 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4318 AR 8 1992 3-4 05 229-256
allfields_unstemmed	10.1007/BF00992696 doi (DE-627)OLC2026512213 (DE-He213)BF00992696-p DE-627 ger DE-627 rakwb eng 150 004 VZ Williams, Ronald J. verfasserin aut Simple statistical gradient-following algorithms for connectionist reinforcement learning 1992 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Kluwer Academic Publishers 1992 Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms. Reinforcement learning connectionist networks gradient descent mathematical analysis Enthalten in Machine learning Kluwer Academic Publishers, 1986 8(1992), 3-4 vom: Mai, Seite 229-256 (DE-627)12920403X (DE-600)54638-0 (DE-576)014457377 0885-6125 nnns volume:8 year:1992 number:3-4 month:05 pages:229-256 https://doi.org/10.1007/BF00992696 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_21 GBV_ILN_22 GBV_ILN_31 GBV_ILN_70 GBV_ILN_130 GBV_ILN_2006 GBV_ILN_2010 GBV_ILN_2020 GBV_ILN_2093 GBV_ILN_2244 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4266 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4318 AR 8 1992 3-4 05 229-256
allfieldsGer	10.1007/BF00992696 doi (DE-627)OLC2026512213 (DE-He213)BF00992696-p DE-627 ger DE-627 rakwb eng 150 004 VZ Williams, Ronald J. verfasserin aut Simple statistical gradient-following algorithms for connectionist reinforcement learning 1992 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Kluwer Academic Publishers 1992 Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms. Reinforcement learning connectionist networks gradient descent mathematical analysis Enthalten in Machine learning Kluwer Academic Publishers, 1986 8(1992), 3-4 vom: Mai, Seite 229-256 (DE-627)12920403X (DE-600)54638-0 (DE-576)014457377 0885-6125 nnns volume:8 year:1992 number:3-4 month:05 pages:229-256 https://doi.org/10.1007/BF00992696 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_21 GBV_ILN_22 GBV_ILN_31 GBV_ILN_70 GBV_ILN_130 GBV_ILN_2006 GBV_ILN_2010 GBV_ILN_2020 GBV_ILN_2093 GBV_ILN_2244 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4266 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4318 AR 8 1992 3-4 05 229-256
allfieldsSound	10.1007/BF00992696 doi (DE-627)OLC2026512213 (DE-He213)BF00992696-p DE-627 ger DE-627 rakwb eng 150 004 VZ Williams, Ronald J. verfasserin aut Simple statistical gradient-following algorithms for connectionist reinforcement learning 1992 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Kluwer Academic Publishers 1992 Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms. Reinforcement learning connectionist networks gradient descent mathematical analysis Enthalten in Machine learning Kluwer Academic Publishers, 1986 8(1992), 3-4 vom: Mai, Seite 229-256 (DE-627)12920403X (DE-600)54638-0 (DE-576)014457377 0885-6125 nnns volume:8 year:1992 number:3-4 month:05 pages:229-256 https://doi.org/10.1007/BF00992696 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_21 GBV_ILN_22 GBV_ILN_31 GBV_ILN_70 GBV_ILN_130 GBV_ILN_2006 GBV_ILN_2010 GBV_ILN_2020 GBV_ILN_2093 GBV_ILN_2244 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4266 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4318 AR 8 1992 3-4 05 229-256
language	English
source	Enthalten in Machine learning 8(1992), 3-4 vom: Mai, Seite 229-256 volume:8 year:1992 number:3-4 month:05 pages:229-256
sourceStr	Enthalten in Machine learning 8(1992), 3-4 vom: Mai, Seite 229-256 volume:8 year:1992 number:3-4 month:05 pages:229-256
format_phy_str_mv	Article
institution	findex.gbv.de
topic_facet	Reinforcement learning connectionist networks gradient descent mathematical analysis
dewey-raw	150
isfreeaccess_bool	false
container_title	Machine learning
authorswithroles_txt_mv	Williams, Ronald J. @@aut@@
publishDateDaySort_date	1992-05-01T00:00:00Z
hierarchy_top_id	12920403X
dewey-sort	3150
id	OLC2026512213
language_de	englisch
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2026512213</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503172148.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200820s1992 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/BF00992696</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2026512213</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)BF00992696-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">150</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Williams, Ronald J.</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Simple statistical gradient-following algorithms for connectionist reinforcement learning</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">1992</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Kluwer Academic Publishers 1992</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Reinforcement learning</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">connectionist networks</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">gradient descent</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">mathematical analysis</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Machine learning</subfield><subfield code="d">Kluwer Academic Publishers, 1986</subfield><subfield code="g">8(1992), 3-4 vom: Mai, Seite 229-256</subfield><subfield code="w">(DE-627)12920403X</subfield><subfield code="w">(DE-600)54638-0</subfield><subfield code="w">(DE-576)014457377</subfield><subfield code="x">0885-6125</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:8</subfield><subfield code="g">year:1992</subfield><subfield code="g">number:3-4</subfield><subfield code="g">month:05</subfield><subfield code="g">pages:229-256</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/BF00992696</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_21</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_31</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_130</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2006</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2010</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2020</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2093</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2244</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4046</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4266</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4306</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4318</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">8</subfield><subfield code="j">1992</subfield><subfield code="e">3-4</subfield><subfield code="c">05</subfield><subfield code="h">229-256</subfield></datafield></record></collection>
author	Williams, Ronald J.
spellingShingle	Williams, Ronald J. ddc 150 misc Reinforcement learning misc connectionist networks misc gradient descent misc mathematical analysis Simple statistical gradient-following algorithms for connectionist reinforcement learning
authorStr	Williams, Ronald J.
ppnlink_with_tag_str_mv	@@773@@(DE-627)12920403X
format	Article
dewey-ones	150 - Psychology 004 - Data processing & computer science
delete_txt_mv	keep
author_role	aut
collection	OLC
remote_str	false
illustrated	Not Illustrated
issn	0885-6125
topic_title	150 004 VZ Simple statistical gradient-following algorithms for connectionist reinforcement learning Reinforcement learning connectionist networks gradient descent mathematical analysis
topic	ddc 150 misc Reinforcement learning misc connectionist networks misc gradient descent misc mathematical analysis
topic_unstemmed	ddc 150 misc Reinforcement learning misc connectionist networks misc gradient descent misc mathematical analysis
topic_browse	ddc 150 misc Reinforcement learning misc connectionist networks misc gradient descent misc mathematical analysis
format_facet	Aufsätze Gedruckte Aufsätze
format_main_str_mv	Text Zeitschrift/Artikel
carriertype_str_mv	nc
hierarchy_parent_title	Machine learning
hierarchy_parent_id	12920403X
dewey-tens	150 - Psychology 000 - Computer science, knowledge & systems
hierarchy_top_title	Machine learning
isfreeaccess_txt	false
familylinks_str_mv	(DE-627)12920403X (DE-600)54638-0 (DE-576)014457377
title	Simple statistical gradient-following algorithms for connectionist reinforcement learning
ctrlnum	(DE-627)OLC2026512213 (DE-He213)BF00992696-p
title_full	Simple statistical gradient-following algorithms for connectionist reinforcement learning
author_sort	Williams, Ronald J.
journal	Machine learning
journalStr	Machine learning
lang_code	eng
isOA_bool	false
dewey-hundreds	100 - Philosophy & psychology 000 - Computer science, information & general works
recordtype	marc
publishDateSort	1992
contenttype_str_mv	txt
container_start_page	229
author_browse	Williams, Ronald J.
container_volume	8
class	150 004 VZ
format_se	Aufsätze
author-letter	Williams, Ronald J.
doi_str_mv	10.1007/BF00992696
dewey-full	150 004
title_sort	simple statistical gradient-following algorithms for connectionist reinforcement learning
title_auth	Simple statistical gradient-following algorithms for connectionist reinforcement learning
abstract	Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms. © Kluwer Academic Publishers 1992
abstractGer	Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms. © Kluwer Academic Publishers 1992
abstract_unstemmed	Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms. © Kluwer Academic Publishers 1992
collection_details	GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_21 GBV_ILN_22 GBV_ILN_31 GBV_ILN_70 GBV_ILN_130 GBV_ILN_2006 GBV_ILN_2010 GBV_ILN_2020 GBV_ILN_2093 GBV_ILN_2244 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4266 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4318
container_issue	3-4
title_short	Simple statistical gradient-following algorithms for connectionist reinforcement learning
url	https://doi.org/10.1007/BF00992696
remote_bool	false
ppnlink	12920403X
mediatype_str_mv	n
isOA_txt	false
hochschulschrift_bool	false
doi_str	10.1007/BF00992696
up_date	2024-07-04T04:08:08.218Z
_version_	1803620018522423297
fullrecord_marcxml	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2026512213</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503172148.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200820s1992 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/BF00992696</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2026512213</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)BF00992696-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">150</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Williams, Ronald J.</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Simple statistical gradient-following algorithms for connectionist reinforcement learning</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">1992</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Kluwer Academic Publishers 1992</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Reinforcement learning</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">connectionist networks</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">gradient descent</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">mathematical analysis</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Machine learning</subfield><subfield code="d">Kluwer Academic Publishers, 1986</subfield><subfield code="g">8(1992), 3-4 vom: Mai, Seite 229-256</subfield><subfield code="w">(DE-627)12920403X</subfield><subfield code="w">(DE-600)54638-0</subfield><subfield code="w">(DE-576)014457377</subfield><subfield code="x">0885-6125</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:8</subfield><subfield code="g">year:1992</subfield><subfield code="g">number:3-4</subfield><subfield code="g">month:05</subfield><subfield code="g">pages:229-256</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/BF00992696</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_21</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_31</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_130</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2006</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2010</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2020</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2093</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2244</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4046</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4266</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4306</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4318</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">8</subfield><subfield code="j">1992</subfield><subfield code="e">3-4</subfield><subfield code="c">05</subfield><subfield code="h">229-256</subfield></datafield></record></collection>
score	7.399107

Nicht das Richtige dabei?

Schreiben Sie uns!

Simple statistical gradient-following algorithms for connectionist reinforcement learning

Nicht das Richtige dabei?

Zugang & Verfügbarkeit

Vorhandene Bände

Nicht das Richtige dabei?