On the Minimax Risk of Dictionary Learning
We consider the problem of learning a dictionary matrix from a number of observed signals, which are assumed to be generated via a linear model with a common underlying dictionary. In particular, we derive lower bounds on the minimum achievable worst case mean squared error (MSE), regardless of comp...
Ausführliche Beschreibung
Autor*in: |
A Jung [verfasserIn] |
---|
Format: |
Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2016 |
---|
Schlagwörter: |
---|
Systematik: |
|
---|
Übergeordnetes Werk: |
Enthalten in: IEEE transactions on information theory - Piscataway, NJ : IEEE, 1963, 62(2016), 3, Seite 1501-1 |
---|---|
Übergeordnetes Werk: |
volume:62 ; year:2016 ; number:3 ; pages:1501-1 |
Links: |
---|
DOI / URN: |
10.1109/TIT.2016.2517006 |
---|
Katalog-ID: |
OLC1972621599 |
---|
LEADER | 01000caa a2200265 4500 | ||
---|---|---|---|
001 | OLC1972621599 | ||
003 | DE-627 | ||
005 | 20220221163259.0 | ||
007 | tu | ||
008 | 160427s2016 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1109/TIT.2016.2517006 |2 doi | |
028 | 5 | 2 | |a PQ20160430 |
035 | |a (DE-627)OLC1972621599 | ||
035 | |a (DE-599)GBVOLC1972621599 | ||
035 | |a (PRQ)c1067-de8c2131eeb76fca609373f00383d9eae73fe6d76d78b289e05139cf5344e0420 | ||
035 | |a (KEY)0023448620160000062000301501ontheminimaxriskofdictionarylearning | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 070 |a 620 |q DNB |
084 | |a SA 5570 |q AVZ |2 rvk | ||
100 | 0 | |a A Jung |e verfasserin |4 aut | |
245 | 1 | 0 | |a On the Minimax Risk of Dictionary Learning |
264 | 1 | |c 2016 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
520 | |a We consider the problem of learning a dictionary matrix from a number of observed signals, which are assumed to be generated via a linear model with a common underlying dictionary. In particular, we derive lower bounds on the minimum achievable worst case mean squared error (MSE), regardless of computational complexity of the dictionary learning (DL) schemes. By casting DL as a classical (or frequentist) estimation problem, the lower bounds on the worst case MSE are derived following an established information-theoretic approach to minimax estimation. The main contribution of this paper is the adaption of these information-theoretic tools to the DL problem in order to derive lower bounds on the worst case MSE of any DL algorithm. We derive three different lower bounds applying to different generative models for the observed signals. The first bound only requires the existence of a covariance matrix of the (unknown) underlying coefficient vector. By specializing this bound to the case of sparse coefficient distributions and assuming the true dictionary satisfies the restricted isometry property, we obtain a lower bound on the worst case MSE of DL methods in terms of the signal-to-noise ratio (SNR). The third bound applies to a more restrictive subclass of coefficient distributions by requiring the non-zero coefficients to be Gaussian. Although the applicability of this bound is the most limited, it is the tightest of the three bounds in the low SNR regime. A particular use of our lower bounds is the derivation of necessary conditions on the required number of observations (sample size), such that DL is feasible, i.e., accurate DL schemes might exist. By comparing these necessary conditions with sufficient conditions on the sample size such that a particular DL technique is successful, we are able to characterize the regimes, where those algorithms are optimal in terms of required sample size. | ||
650 | 4 | |a Normal distribution | |
650 | 4 | |a Dictionaries | |
650 | 4 | |a Matrix | |
650 | 4 | |a Learning | |
650 | 4 | |a Sample size | |
650 | 4 | |a Signal to noise ratio | |
700 | 0 | |a YC Eldar |4 oth | |
700 | 0 | |a N Gortz |4 oth | |
773 | 0 | 8 | |i Enthalten in |t IEEE transactions on information theory |d Piscataway, NJ : IEEE, 1963 |g 62(2016), 3, Seite 1501-1 |w (DE-627)12954731X |w (DE-600)218505-2 |w (DE-576)01499819X |x 0018-9448 |7 nnns |
773 | 1 | 8 | |g volume:62 |g year:2016 |g number:3 |g pages:1501-1 |
856 | 4 | 1 | |u http://dx.doi.org/10.1109/TIT.2016.2517006 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a SSG-OLC-TEC | ||
912 | |a SSG-OLC-MAT | ||
912 | |a SSG-OLC-BUB | ||
912 | |a SSG-OPC-BBI | ||
912 | |a GBV_ILN_65 | ||
912 | |a GBV_ILN_70 | ||
912 | |a GBV_ILN_2002 | ||
912 | |a GBV_ILN_2088 | ||
936 | r | v | |a SA 5570 |
951 | |a AR | ||
952 | |d 62 |j 2016 |e 3 |h 1501-1 |
author_variant |
a j aj |
---|---|
matchkey_str |
article:00189448:2016----::nhmnmxikfitoa |
hierarchy_sort_str |
2016 |
publishDate |
2016 |
allfields |
10.1109/TIT.2016.2517006 doi PQ20160430 (DE-627)OLC1972621599 (DE-599)GBVOLC1972621599 (PRQ)c1067-de8c2131eeb76fca609373f00383d9eae73fe6d76d78b289e05139cf5344e0420 (KEY)0023448620160000062000301501ontheminimaxriskofdictionarylearning DE-627 ger DE-627 rakwb eng 070 620 DNB SA 5570 AVZ rvk A Jung verfasserin aut On the Minimax Risk of Dictionary Learning 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier We consider the problem of learning a dictionary matrix from a number of observed signals, which are assumed to be generated via a linear model with a common underlying dictionary. In particular, we derive lower bounds on the minimum achievable worst case mean squared error (MSE), regardless of computational complexity of the dictionary learning (DL) schemes. By casting DL as a classical (or frequentist) estimation problem, the lower bounds on the worst case MSE are derived following an established information-theoretic approach to minimax estimation. The main contribution of this paper is the adaption of these information-theoretic tools to the DL problem in order to derive lower bounds on the worst case MSE of any DL algorithm. We derive three different lower bounds applying to different generative models for the observed signals. The first bound only requires the existence of a covariance matrix of the (unknown) underlying coefficient vector. By specializing this bound to the case of sparse coefficient distributions and assuming the true dictionary satisfies the restricted isometry property, we obtain a lower bound on the worst case MSE of DL methods in terms of the signal-to-noise ratio (SNR). The third bound applies to a more restrictive subclass of coefficient distributions by requiring the non-zero coefficients to be Gaussian. Although the applicability of this bound is the most limited, it is the tightest of the three bounds in the low SNR regime. A particular use of our lower bounds is the derivation of necessary conditions on the required number of observations (sample size), such that DL is feasible, i.e., accurate DL schemes might exist. By comparing these necessary conditions with sufficient conditions on the sample size such that a particular DL technique is successful, we are able to characterize the regimes, where those algorithms are optimal in terms of required sample size. Normal distribution Dictionaries Matrix Learning Sample size Signal to noise ratio YC Eldar oth N Gortz oth Enthalten in IEEE transactions on information theory Piscataway, NJ : IEEE, 1963 62(2016), 3, Seite 1501-1 (DE-627)12954731X (DE-600)218505-2 (DE-576)01499819X 0018-9448 nnns volume:62 year:2016 number:3 pages:1501-1 http://dx.doi.org/10.1109/TIT.2016.2517006 Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_65 GBV_ILN_70 GBV_ILN_2002 GBV_ILN_2088 SA 5570 AR 62 2016 3 1501-1 |
spelling |
10.1109/TIT.2016.2517006 doi PQ20160430 (DE-627)OLC1972621599 (DE-599)GBVOLC1972621599 (PRQ)c1067-de8c2131eeb76fca609373f00383d9eae73fe6d76d78b289e05139cf5344e0420 (KEY)0023448620160000062000301501ontheminimaxriskofdictionarylearning DE-627 ger DE-627 rakwb eng 070 620 DNB SA 5570 AVZ rvk A Jung verfasserin aut On the Minimax Risk of Dictionary Learning 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier We consider the problem of learning a dictionary matrix from a number of observed signals, which are assumed to be generated via a linear model with a common underlying dictionary. In particular, we derive lower bounds on the minimum achievable worst case mean squared error (MSE), regardless of computational complexity of the dictionary learning (DL) schemes. By casting DL as a classical (or frequentist) estimation problem, the lower bounds on the worst case MSE are derived following an established information-theoretic approach to minimax estimation. The main contribution of this paper is the adaption of these information-theoretic tools to the DL problem in order to derive lower bounds on the worst case MSE of any DL algorithm. We derive three different lower bounds applying to different generative models for the observed signals. The first bound only requires the existence of a covariance matrix of the (unknown) underlying coefficient vector. By specializing this bound to the case of sparse coefficient distributions and assuming the true dictionary satisfies the restricted isometry property, we obtain a lower bound on the worst case MSE of DL methods in terms of the signal-to-noise ratio (SNR). The third bound applies to a more restrictive subclass of coefficient distributions by requiring the non-zero coefficients to be Gaussian. Although the applicability of this bound is the most limited, it is the tightest of the three bounds in the low SNR regime. A particular use of our lower bounds is the derivation of necessary conditions on the required number of observations (sample size), such that DL is feasible, i.e., accurate DL schemes might exist. By comparing these necessary conditions with sufficient conditions on the sample size such that a particular DL technique is successful, we are able to characterize the regimes, where those algorithms are optimal in terms of required sample size. Normal distribution Dictionaries Matrix Learning Sample size Signal to noise ratio YC Eldar oth N Gortz oth Enthalten in IEEE transactions on information theory Piscataway, NJ : IEEE, 1963 62(2016), 3, Seite 1501-1 (DE-627)12954731X (DE-600)218505-2 (DE-576)01499819X 0018-9448 nnns volume:62 year:2016 number:3 pages:1501-1 http://dx.doi.org/10.1109/TIT.2016.2517006 Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_65 GBV_ILN_70 GBV_ILN_2002 GBV_ILN_2088 SA 5570 AR 62 2016 3 1501-1 |
allfields_unstemmed |
10.1109/TIT.2016.2517006 doi PQ20160430 (DE-627)OLC1972621599 (DE-599)GBVOLC1972621599 (PRQ)c1067-de8c2131eeb76fca609373f00383d9eae73fe6d76d78b289e05139cf5344e0420 (KEY)0023448620160000062000301501ontheminimaxriskofdictionarylearning DE-627 ger DE-627 rakwb eng 070 620 DNB SA 5570 AVZ rvk A Jung verfasserin aut On the Minimax Risk of Dictionary Learning 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier We consider the problem of learning a dictionary matrix from a number of observed signals, which are assumed to be generated via a linear model with a common underlying dictionary. In particular, we derive lower bounds on the minimum achievable worst case mean squared error (MSE), regardless of computational complexity of the dictionary learning (DL) schemes. By casting DL as a classical (or frequentist) estimation problem, the lower bounds on the worst case MSE are derived following an established information-theoretic approach to minimax estimation. The main contribution of this paper is the adaption of these information-theoretic tools to the DL problem in order to derive lower bounds on the worst case MSE of any DL algorithm. We derive three different lower bounds applying to different generative models for the observed signals. The first bound only requires the existence of a covariance matrix of the (unknown) underlying coefficient vector. By specializing this bound to the case of sparse coefficient distributions and assuming the true dictionary satisfies the restricted isometry property, we obtain a lower bound on the worst case MSE of DL methods in terms of the signal-to-noise ratio (SNR). The third bound applies to a more restrictive subclass of coefficient distributions by requiring the non-zero coefficients to be Gaussian. Although the applicability of this bound is the most limited, it is the tightest of the three bounds in the low SNR regime. A particular use of our lower bounds is the derivation of necessary conditions on the required number of observations (sample size), such that DL is feasible, i.e., accurate DL schemes might exist. By comparing these necessary conditions with sufficient conditions on the sample size such that a particular DL technique is successful, we are able to characterize the regimes, where those algorithms are optimal in terms of required sample size. Normal distribution Dictionaries Matrix Learning Sample size Signal to noise ratio YC Eldar oth N Gortz oth Enthalten in IEEE transactions on information theory Piscataway, NJ : IEEE, 1963 62(2016), 3, Seite 1501-1 (DE-627)12954731X (DE-600)218505-2 (DE-576)01499819X 0018-9448 nnns volume:62 year:2016 number:3 pages:1501-1 http://dx.doi.org/10.1109/TIT.2016.2517006 Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_65 GBV_ILN_70 GBV_ILN_2002 GBV_ILN_2088 SA 5570 AR 62 2016 3 1501-1 |
allfieldsGer |
10.1109/TIT.2016.2517006 doi PQ20160430 (DE-627)OLC1972621599 (DE-599)GBVOLC1972621599 (PRQ)c1067-de8c2131eeb76fca609373f00383d9eae73fe6d76d78b289e05139cf5344e0420 (KEY)0023448620160000062000301501ontheminimaxriskofdictionarylearning DE-627 ger DE-627 rakwb eng 070 620 DNB SA 5570 AVZ rvk A Jung verfasserin aut On the Minimax Risk of Dictionary Learning 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier We consider the problem of learning a dictionary matrix from a number of observed signals, which are assumed to be generated via a linear model with a common underlying dictionary. In particular, we derive lower bounds on the minimum achievable worst case mean squared error (MSE), regardless of computational complexity of the dictionary learning (DL) schemes. By casting DL as a classical (or frequentist) estimation problem, the lower bounds on the worst case MSE are derived following an established information-theoretic approach to minimax estimation. The main contribution of this paper is the adaption of these information-theoretic tools to the DL problem in order to derive lower bounds on the worst case MSE of any DL algorithm. We derive three different lower bounds applying to different generative models for the observed signals. The first bound only requires the existence of a covariance matrix of the (unknown) underlying coefficient vector. By specializing this bound to the case of sparse coefficient distributions and assuming the true dictionary satisfies the restricted isometry property, we obtain a lower bound on the worst case MSE of DL methods in terms of the signal-to-noise ratio (SNR). The third bound applies to a more restrictive subclass of coefficient distributions by requiring the non-zero coefficients to be Gaussian. Although the applicability of this bound is the most limited, it is the tightest of the three bounds in the low SNR regime. A particular use of our lower bounds is the derivation of necessary conditions on the required number of observations (sample size), such that DL is feasible, i.e., accurate DL schemes might exist. By comparing these necessary conditions with sufficient conditions on the sample size such that a particular DL technique is successful, we are able to characterize the regimes, where those algorithms are optimal in terms of required sample size. Normal distribution Dictionaries Matrix Learning Sample size Signal to noise ratio YC Eldar oth N Gortz oth Enthalten in IEEE transactions on information theory Piscataway, NJ : IEEE, 1963 62(2016), 3, Seite 1501-1 (DE-627)12954731X (DE-600)218505-2 (DE-576)01499819X 0018-9448 nnns volume:62 year:2016 number:3 pages:1501-1 http://dx.doi.org/10.1109/TIT.2016.2517006 Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_65 GBV_ILN_70 GBV_ILN_2002 GBV_ILN_2088 SA 5570 AR 62 2016 3 1501-1 |
allfieldsSound |
10.1109/TIT.2016.2517006 doi PQ20160430 (DE-627)OLC1972621599 (DE-599)GBVOLC1972621599 (PRQ)c1067-de8c2131eeb76fca609373f00383d9eae73fe6d76d78b289e05139cf5344e0420 (KEY)0023448620160000062000301501ontheminimaxriskofdictionarylearning DE-627 ger DE-627 rakwb eng 070 620 DNB SA 5570 AVZ rvk A Jung verfasserin aut On the Minimax Risk of Dictionary Learning 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier We consider the problem of learning a dictionary matrix from a number of observed signals, which are assumed to be generated via a linear model with a common underlying dictionary. In particular, we derive lower bounds on the minimum achievable worst case mean squared error (MSE), regardless of computational complexity of the dictionary learning (DL) schemes. By casting DL as a classical (or frequentist) estimation problem, the lower bounds on the worst case MSE are derived following an established information-theoretic approach to minimax estimation. The main contribution of this paper is the adaption of these information-theoretic tools to the DL problem in order to derive lower bounds on the worst case MSE of any DL algorithm. We derive three different lower bounds applying to different generative models for the observed signals. The first bound only requires the existence of a covariance matrix of the (unknown) underlying coefficient vector. By specializing this bound to the case of sparse coefficient distributions and assuming the true dictionary satisfies the restricted isometry property, we obtain a lower bound on the worst case MSE of DL methods in terms of the signal-to-noise ratio (SNR). The third bound applies to a more restrictive subclass of coefficient distributions by requiring the non-zero coefficients to be Gaussian. Although the applicability of this bound is the most limited, it is the tightest of the three bounds in the low SNR regime. A particular use of our lower bounds is the derivation of necessary conditions on the required number of observations (sample size), such that DL is feasible, i.e., accurate DL schemes might exist. By comparing these necessary conditions with sufficient conditions on the sample size such that a particular DL technique is successful, we are able to characterize the regimes, where those algorithms are optimal in terms of required sample size. Normal distribution Dictionaries Matrix Learning Sample size Signal to noise ratio YC Eldar oth N Gortz oth Enthalten in IEEE transactions on information theory Piscataway, NJ : IEEE, 1963 62(2016), 3, Seite 1501-1 (DE-627)12954731X (DE-600)218505-2 (DE-576)01499819X 0018-9448 nnns volume:62 year:2016 number:3 pages:1501-1 http://dx.doi.org/10.1109/TIT.2016.2517006 Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_65 GBV_ILN_70 GBV_ILN_2002 GBV_ILN_2088 SA 5570 AR 62 2016 3 1501-1 |
language |
English |
source |
Enthalten in IEEE transactions on information theory 62(2016), 3, Seite 1501-1 volume:62 year:2016 number:3 pages:1501-1 |
sourceStr |
Enthalten in IEEE transactions on information theory 62(2016), 3, Seite 1501-1 volume:62 year:2016 number:3 pages:1501-1 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Normal distribution Dictionaries Matrix Learning Sample size Signal to noise ratio |
dewey-raw |
070 |
isfreeaccess_bool |
false |
container_title |
IEEE transactions on information theory |
authorswithroles_txt_mv |
A Jung @@aut@@ YC Eldar @@oth@@ N Gortz @@oth@@ |
publishDateDaySort_date |
2016-01-01T00:00:00Z |
hierarchy_top_id |
12954731X |
dewey-sort |
270 |
id |
OLC1972621599 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a2200265 4500</leader><controlfield tag="001">OLC1972621599</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20220221163259.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">160427s2016 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1109/TIT.2016.2517006</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">PQ20160430</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC1972621599</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBVOLC1972621599</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(PRQ)c1067-de8c2131eeb76fca609373f00383d9eae73fe6d76d78b289e05139cf5344e0420</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(KEY)0023448620160000062000301501ontheminimaxriskofdictionarylearning</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">070</subfield><subfield code="a">620</subfield><subfield code="q">DNB</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">SA 5570</subfield><subfield code="q">AVZ</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="0" ind2=" "><subfield code="a">A Jung</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">On the Minimax Risk of Dictionary Learning</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2016</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">We consider the problem of learning a dictionary matrix from a number of observed signals, which are assumed to be generated via a linear model with a common underlying dictionary. In particular, we derive lower bounds on the minimum achievable worst case mean squared error (MSE), regardless of computational complexity of the dictionary learning (DL) schemes. By casting DL as a classical (or frequentist) estimation problem, the lower bounds on the worst case MSE are derived following an established information-theoretic approach to minimax estimation. The main contribution of this paper is the adaption of these information-theoretic tools to the DL problem in order to derive lower bounds on the worst case MSE of any DL algorithm. We derive three different lower bounds applying to different generative models for the observed signals. The first bound only requires the existence of a covariance matrix of the (unknown) underlying coefficient vector. By specializing this bound to the case of sparse coefficient distributions and assuming the true dictionary satisfies the restricted isometry property, we obtain a lower bound on the worst case MSE of DL methods in terms of the signal-to-noise ratio (SNR). The third bound applies to a more restrictive subclass of coefficient distributions by requiring the non-zero coefficients to be Gaussian. Although the applicability of this bound is the most limited, it is the tightest of the three bounds in the low SNR regime. A particular use of our lower bounds is the derivation of necessary conditions on the required number of observations (sample size), such that DL is feasible, i.e., accurate DL schemes might exist. By comparing these necessary conditions with sufficient conditions on the sample size such that a particular DL technique is successful, we are able to characterize the regimes, where those algorithms are optimal in terms of required sample size.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Normal distribution</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Dictionaries</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Matrix</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Learning</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Sample size</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Signal to noise ratio</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">YC Eldar</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">N Gortz</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">IEEE transactions on information theory</subfield><subfield code="d">Piscataway, NJ : IEEE, 1963</subfield><subfield code="g">62(2016), 3, Seite 1501-1</subfield><subfield code="w">(DE-627)12954731X</subfield><subfield code="w">(DE-600)218505-2</subfield><subfield code="w">(DE-576)01499819X</subfield><subfield code="x">0018-9448</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:62</subfield><subfield code="g">year:2016</subfield><subfield code="g">number:3</subfield><subfield code="g">pages:1501-1</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">http://dx.doi.org/10.1109/TIT.2016.2517006</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OPC-BBI</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_65</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2002</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2088</subfield></datafield><datafield tag="936" ind1="r" ind2="v"><subfield code="a">SA 5570</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">62</subfield><subfield code="j">2016</subfield><subfield code="e">3</subfield><subfield code="h">1501-1</subfield></datafield></record></collection>
|
author |
A Jung |
spellingShingle |
A Jung ddc 070 rvk SA 5570 misc Normal distribution misc Dictionaries misc Matrix misc Learning misc Sample size misc Signal to noise ratio On the Minimax Risk of Dictionary Learning |
authorStr |
A Jung |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)12954731X |
format |
Article |
dewey-ones |
070 - News media, journalism & publishing 620 - Engineering & allied operations |
delete_txt_mv |
keep |
author_role |
aut |
collection |
OLC |
remote_str |
false |
illustrated |
Not Illustrated |
issn |
0018-9448 |
topic_title |
070 620 DNB SA 5570 AVZ rvk On the Minimax Risk of Dictionary Learning Normal distribution Dictionaries Matrix Learning Sample size Signal to noise ratio |
topic |
ddc 070 rvk SA 5570 misc Normal distribution misc Dictionaries misc Matrix misc Learning misc Sample size misc Signal to noise ratio |
topic_unstemmed |
ddc 070 rvk SA 5570 misc Normal distribution misc Dictionaries misc Matrix misc Learning misc Sample size misc Signal to noise ratio |
topic_browse |
ddc 070 rvk SA 5570 misc Normal distribution misc Dictionaries misc Matrix misc Learning misc Sample size misc Signal to noise ratio |
format_facet |
Aufsätze Gedruckte Aufsätze |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
nc |
author2_variant |
y e ye n g ng |
hierarchy_parent_title |
IEEE transactions on information theory |
hierarchy_parent_id |
12954731X |
dewey-tens |
070 - News media, journalism & publishing 620 - Engineering |
hierarchy_top_title |
IEEE transactions on information theory |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)12954731X (DE-600)218505-2 (DE-576)01499819X |
title |
On the Minimax Risk of Dictionary Learning |
ctrlnum |
(DE-627)OLC1972621599 (DE-599)GBVOLC1972621599 (PRQ)c1067-de8c2131eeb76fca609373f00383d9eae73fe6d76d78b289e05139cf5344e0420 (KEY)0023448620160000062000301501ontheminimaxriskofdictionarylearning |
title_full |
On the Minimax Risk of Dictionary Learning |
author_sort |
A Jung |
journal |
IEEE transactions on information theory |
journalStr |
IEEE transactions on information theory |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
000 - Computer science, information & general works 600 - Technology |
recordtype |
marc |
publishDateSort |
2016 |
contenttype_str_mv |
txt |
container_start_page |
1501 |
author_browse |
A Jung |
container_volume |
62 |
class |
070 620 DNB SA 5570 AVZ rvk |
format_se |
Aufsätze |
author-letter |
A Jung |
doi_str_mv |
10.1109/TIT.2016.2517006 |
dewey-full |
070 620 |
title_sort |
on the minimax risk of dictionary learning |
title_auth |
On the Minimax Risk of Dictionary Learning |
abstract |
We consider the problem of learning a dictionary matrix from a number of observed signals, which are assumed to be generated via a linear model with a common underlying dictionary. In particular, we derive lower bounds on the minimum achievable worst case mean squared error (MSE), regardless of computational complexity of the dictionary learning (DL) schemes. By casting DL as a classical (or frequentist) estimation problem, the lower bounds on the worst case MSE are derived following an established information-theoretic approach to minimax estimation. The main contribution of this paper is the adaption of these information-theoretic tools to the DL problem in order to derive lower bounds on the worst case MSE of any DL algorithm. We derive three different lower bounds applying to different generative models for the observed signals. The first bound only requires the existence of a covariance matrix of the (unknown) underlying coefficient vector. By specializing this bound to the case of sparse coefficient distributions and assuming the true dictionary satisfies the restricted isometry property, we obtain a lower bound on the worst case MSE of DL methods in terms of the signal-to-noise ratio (SNR). The third bound applies to a more restrictive subclass of coefficient distributions by requiring the non-zero coefficients to be Gaussian. Although the applicability of this bound is the most limited, it is the tightest of the three bounds in the low SNR regime. A particular use of our lower bounds is the derivation of necessary conditions on the required number of observations (sample size), such that DL is feasible, i.e., accurate DL schemes might exist. By comparing these necessary conditions with sufficient conditions on the sample size such that a particular DL technique is successful, we are able to characterize the regimes, where those algorithms are optimal in terms of required sample size. |
abstractGer |
We consider the problem of learning a dictionary matrix from a number of observed signals, which are assumed to be generated via a linear model with a common underlying dictionary. In particular, we derive lower bounds on the minimum achievable worst case mean squared error (MSE), regardless of computational complexity of the dictionary learning (DL) schemes. By casting DL as a classical (or frequentist) estimation problem, the lower bounds on the worst case MSE are derived following an established information-theoretic approach to minimax estimation. The main contribution of this paper is the adaption of these information-theoretic tools to the DL problem in order to derive lower bounds on the worst case MSE of any DL algorithm. We derive three different lower bounds applying to different generative models for the observed signals. The first bound only requires the existence of a covariance matrix of the (unknown) underlying coefficient vector. By specializing this bound to the case of sparse coefficient distributions and assuming the true dictionary satisfies the restricted isometry property, we obtain a lower bound on the worst case MSE of DL methods in terms of the signal-to-noise ratio (SNR). The third bound applies to a more restrictive subclass of coefficient distributions by requiring the non-zero coefficients to be Gaussian. Although the applicability of this bound is the most limited, it is the tightest of the three bounds in the low SNR regime. A particular use of our lower bounds is the derivation of necessary conditions on the required number of observations (sample size), such that DL is feasible, i.e., accurate DL schemes might exist. By comparing these necessary conditions with sufficient conditions on the sample size such that a particular DL technique is successful, we are able to characterize the regimes, where those algorithms are optimal in terms of required sample size. |
abstract_unstemmed |
We consider the problem of learning a dictionary matrix from a number of observed signals, which are assumed to be generated via a linear model with a common underlying dictionary. In particular, we derive lower bounds on the minimum achievable worst case mean squared error (MSE), regardless of computational complexity of the dictionary learning (DL) schemes. By casting DL as a classical (or frequentist) estimation problem, the lower bounds on the worst case MSE are derived following an established information-theoretic approach to minimax estimation. The main contribution of this paper is the adaption of these information-theoretic tools to the DL problem in order to derive lower bounds on the worst case MSE of any DL algorithm. We derive three different lower bounds applying to different generative models for the observed signals. The first bound only requires the existence of a covariance matrix of the (unknown) underlying coefficient vector. By specializing this bound to the case of sparse coefficient distributions and assuming the true dictionary satisfies the restricted isometry property, we obtain a lower bound on the worst case MSE of DL methods in terms of the signal-to-noise ratio (SNR). The third bound applies to a more restrictive subclass of coefficient distributions by requiring the non-zero coefficients to be Gaussian. Although the applicability of this bound is the most limited, it is the tightest of the three bounds in the low SNR regime. A particular use of our lower bounds is the derivation of necessary conditions on the required number of observations (sample size), such that DL is feasible, i.e., accurate DL schemes might exist. By comparing these necessary conditions with sufficient conditions on the sample size such that a particular DL technique is successful, we are able to characterize the regimes, where those algorithms are optimal in terms of required sample size. |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_65 GBV_ILN_70 GBV_ILN_2002 GBV_ILN_2088 |
container_issue |
3 |
title_short |
On the Minimax Risk of Dictionary Learning |
url |
http://dx.doi.org/10.1109/TIT.2016.2517006 |
remote_bool |
false |
author2 |
YC Eldar N Gortz |
author2Str |
YC Eldar N Gortz |
ppnlink |
12954731X |
mediatype_str_mv |
n |
isOA_txt |
false |
hochschulschrift_bool |
false |
author2_role |
oth oth |
doi_str |
10.1109/TIT.2016.2517006 |
up_date |
2024-07-03T23:52:15.378Z |
_version_ |
1803603919907061760 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a2200265 4500</leader><controlfield tag="001">OLC1972621599</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20220221163259.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">160427s2016 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1109/TIT.2016.2517006</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">PQ20160430</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC1972621599</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBVOLC1972621599</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(PRQ)c1067-de8c2131eeb76fca609373f00383d9eae73fe6d76d78b289e05139cf5344e0420</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(KEY)0023448620160000062000301501ontheminimaxriskofdictionarylearning</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">070</subfield><subfield code="a">620</subfield><subfield code="q">DNB</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">SA 5570</subfield><subfield code="q">AVZ</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="0" ind2=" "><subfield code="a">A Jung</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">On the Minimax Risk of Dictionary Learning</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2016</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">We consider the problem of learning a dictionary matrix from a number of observed signals, which are assumed to be generated via a linear model with a common underlying dictionary. In particular, we derive lower bounds on the minimum achievable worst case mean squared error (MSE), regardless of computational complexity of the dictionary learning (DL) schemes. By casting DL as a classical (or frequentist) estimation problem, the lower bounds on the worst case MSE are derived following an established information-theoretic approach to minimax estimation. The main contribution of this paper is the adaption of these information-theoretic tools to the DL problem in order to derive lower bounds on the worst case MSE of any DL algorithm. We derive three different lower bounds applying to different generative models for the observed signals. The first bound only requires the existence of a covariance matrix of the (unknown) underlying coefficient vector. By specializing this bound to the case of sparse coefficient distributions and assuming the true dictionary satisfies the restricted isometry property, we obtain a lower bound on the worst case MSE of DL methods in terms of the signal-to-noise ratio (SNR). The third bound applies to a more restrictive subclass of coefficient distributions by requiring the non-zero coefficients to be Gaussian. Although the applicability of this bound is the most limited, it is the tightest of the three bounds in the low SNR regime. A particular use of our lower bounds is the derivation of necessary conditions on the required number of observations (sample size), such that DL is feasible, i.e., accurate DL schemes might exist. By comparing these necessary conditions with sufficient conditions on the sample size such that a particular DL technique is successful, we are able to characterize the regimes, where those algorithms are optimal in terms of required sample size.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Normal distribution</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Dictionaries</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Matrix</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Learning</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Sample size</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Signal to noise ratio</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">YC Eldar</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">N Gortz</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">IEEE transactions on information theory</subfield><subfield code="d">Piscataway, NJ : IEEE, 1963</subfield><subfield code="g">62(2016), 3, Seite 1501-1</subfield><subfield code="w">(DE-627)12954731X</subfield><subfield code="w">(DE-600)218505-2</subfield><subfield code="w">(DE-576)01499819X</subfield><subfield code="x">0018-9448</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:62</subfield><subfield code="g">year:2016</subfield><subfield code="g">number:3</subfield><subfield code="g">pages:1501-1</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">http://dx.doi.org/10.1109/TIT.2016.2517006</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OPC-BBI</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_65</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2002</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2088</subfield></datafield><datafield tag="936" ind1="r" ind2="v"><subfield code="a">SA 5570</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">62</subfield><subfield code="j">2016</subfield><subfield code="e">3</subfield><subfield code="h">1501-1</subfield></datafield></record></collection>
|
score |
7.39956 |