Maximum Likelihood Estimation of Functionals of Discrete Distributions

We consider the problem of estimating functionals of discrete distributions, and focus on a tight (up to universal multiplicative constants for each specific functional) nonasymptotic analysis of the worst case squared error risk of widely used estimators. We apply concentration inequalities to analyze the random fluctuation of these estimators around their expectations and the theory of approximation using positive linear operators to analyze the deviation of their expectations from the true functional, namely their bias . We explicitly characterize the worst case squared error risk incurred by the maximum likelihood estimator (MLE) in estimating the Shannon entropy <inline-formula> <tex-math notation="LaTeX">H(P) = \sum _{i = 1}^{S} -p_{i} \ln p_{i} </tex-math></inline-formula>, and the power sum <inline-formula> <tex-math notation="LaTeX">F_\alpha (P) = \sum _{i = 1}^{S} p_{i}^\alpha ,\alpha >0 </tex-math></inline-formula>, up to universal multiplicative constants for each fixed functional, for any alphabet size <inline-formula> <tex-math notation="LaTeX">S\leq \infty </tex-math></inline-formula> and sample size <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> for which the risk may vanish. As a corollary, for Shannon entropy estimation, we show that it is necessary and sufficient to have <inline-formula> <tex-math notation="LaTeX">n \gg S </tex-math></inline-formula> observations for the MLE to be consistent. In addition, we establish that it is necessary and sufficient to consider <inline-formula> <tex-math notation="LaTeX">n \gg S^{1/\alpha } </tex-math></inline-formula> samples for the MLE to consistently estimate <inline-formula> <tex-math notation="LaTeX">F_\alpha (P), 0<\alpha <1 </tex-math></inline-formula>. The minimax rate-optimal estimators for both problems require <inline-formula> <tex-math notation="LaTeX">S/\ln S </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">S^{1/\alpha }/\ln S </tex-math></inline-formula> samples, which implies that the MLE has a strictly sub-optimal sample complexity. When <inline-formula> <tex-math notation="LaTeX">1<\alpha <3/2 </tex-math></inline-formula>, we show that the worst case squared error rate of convergence for the MLE is <inline-formula> <tex-math notation="LaTeX">n^{-2(\alpha -1)} </tex-math></inline-formula> for infinite alphabet size, while the minimax squared error rate is <inline-formula> <tex-math notation="LaTeX">(n\ln n)^{-2(\alpha -1)} </tex-math></inline-formula>. When <inline-formula> <tex-math notation="LaTeX">\alpha \geq 3/2 </tex-math></inline-formula>, the MLE achieves the minimax optimal rate <inline-formula> <tex-math notation="LaTeX">n^{-1} </tex-math></inline-formula> regardless of the alphabet size. As an application of the general theory, we analyze the Dirichlet prior smoothing techniques for Shannon entropy estimation. In this context, one approach is to plug-in the Dirichlet prior smoothed distribution into the entropy functional, while the other one is to calculate the Bayes estimator for entropy under the Dirichlet prior for squared error, which is the conditional expectation. We show that in general such estimators do not improve over the maximum likelihood estimator. No matter how we tune the parameters in the Dirichlet prior, this approach cannot achieve the minimax rates in entropy estimation. The performance of the minimax rate-optimal estimator with <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> samples is essentially at least as good as that of Dirichlet smoothed entropy estimators with <inline-formula> <tex-math notation="LaTeX">n\ln n </tex-math></inline-formula> samples. Ausführliche Beschreibung

Gespeichert in:

Autor*in:	Jiao, Jiantao [verfasserIn] Venkat, Kartik Han, Yanjun Weissman, Tsachy

Format:	Artikel
Sprache:	Englisch

Erschienen:	2017

Schlagwörter:	approximation theory approximation using positive linear operators high dimensional statistics maximum likelihood estimator Information theory Entropy estimation Approximation methods Maximum likelihood estimation Complexity theory Entropy Smoothing methods Rényi entropy Dirichlet prior smoothing Statistics Theory Computer Science Information Theory Mathematics

Systematik:	SA 5570

Übergeordnetes Werk:	Enthalten in: IEEE transactions on information theory - Piscataway, NJ : IEEE, 1963, 63(2017), 10, Seite 6774-6798
Übergeordnetes Werk:	volume:63 ; year:2017 ; number:10 ; pages:6774-6798

Links:	Volltext Link aufrufen Link aufrufen

DOI / URN:	10.1109/TIT.2017.2733537

Katalog-ID:	OLC1996944002

Internformat


LEADER	01000caa a2200265 4500
001	OLC1996944002
003	DE-627
005	20220221163315.0
007	tu
008	171125s2017 xx \|\|\|\|\| 00\| \|\|eng c
024	7		\|a 10.1109/TIT.2017.2733537 \|2 doi
028	5	2	\|a PQ20171228
035			\|a (DE-627)OLC1996944002
035			\|a (DE-599)GBVOLC1996944002
035			\|a (PRQ)a1023-d7b684d5f7b3b3ec98e8ab70e5508eb0d03460fcbfacd4b0adbed995a54f96c10
035			\|a (KEY)0023448620170000063001006774maximumlikelihoodestimationoffunctionalsofdiscrete
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
082	0	4	\|a 070 \|a 620 \|q DE-600
084			\|a SA 5570 \|q AVZ \|2 rvk
100	1		\|a Jiao, Jiantao \|e verfasserin \|4 aut
245	1	0	\|a Maximum Likelihood Estimation of Functionals of Discrete Distributions
264		1	\|c 2017
336			\|a Text \|b txt \|2 rdacontent
337			\|a ohne Hilfsmittel zu benutzen \|b n \|2 rdamedia
338			\|a Band \|b nc \|2 rdacarrier
520			\|a We consider the problem of estimating functionals of discrete distributions, and focus on a tight (up to universal multiplicative constants for each specific functional) nonasymptotic analysis of the worst case squared error risk of widely used estimators. We apply concentration inequalities to analyze the random fluctuation of these estimators around their expectations and the theory of approximation using positive linear operators to analyze the deviation of their expectations from the true functional, namely their bias . We explicitly characterize the worst case squared error risk incurred by the maximum likelihood estimator (MLE) in estimating the Shannon entropy <inline-formula> <tex-math notation="LaTeX">H(P) = \sum _{i = 1}^{S} -p_{i} \ln p_{i} </tex-math></inline-formula>, and the power sum <inline-formula> <tex-math notation="LaTeX">F_\alpha (P) = \sum _{i = 1}^{S} p_{i}^\alpha ,\alpha >0 </tex-math></inline-formula>, up to universal multiplicative constants for each fixed functional, for any alphabet size <inline-formula> <tex-math notation="LaTeX">S\leq \infty </tex-math></inline-formula> and sample size <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> for which the risk may vanish. As a corollary, for Shannon entropy estimation, we show that it is necessary and sufficient to have <inline-formula> <tex-math notation="LaTeX">n \gg S </tex-math></inline-formula> observations for the MLE to be consistent. In addition, we establish that it is necessary and sufficient to consider <inline-formula> <tex-math notation="LaTeX">n \gg S^{1/\alpha } </tex-math></inline-formula> samples for the MLE to consistently estimate <inline-formula> <tex-math notation="LaTeX">F_\alpha (P), 0<\alpha <1 </tex-math></inline-formula>. The minimax rate-optimal estimators for both problems require <inline-formula> <tex-math notation="LaTeX">S/\ln S </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">S^{1/\alpha }/\ln S </tex-math></inline-formula> samples, which implies that the MLE has a strictly sub-optimal sample complexity. When <inline-formula> <tex-math notation="LaTeX">1<\alpha <3/2 </tex-math></inline-formula>, we show that the worst case squared error rate of convergence for the MLE is <inline-formula> <tex-math notation="LaTeX">n^{-2(\alpha -1)} </tex-math></inline-formula> for infinite alphabet size, while the minimax squared error rate is <inline-formula> <tex-math notation="LaTeX">(n\ln n)^{-2(\alpha -1)} </tex-math></inline-formula>. When <inline-formula> <tex-math notation="LaTeX">\alpha \geq 3/2 </tex-math></inline-formula>, the MLE achieves the minimax optimal rate <inline-formula> <tex-math notation="LaTeX">n^{-1} </tex-math></inline-formula> regardless of the alphabet size. As an application of the general theory, we analyze the Dirichlet prior smoothing techniques for Shannon entropy estimation. In this context, one approach is to plug-in the Dirichlet prior smoothed distribution into the entropy functional, while the other one is to calculate the Bayes estimator for entropy under the Dirichlet prior for squared error, which is the conditional expectation. We show that in general such estimators do not improve over the maximum likelihood estimator. No matter how we tune the parameters in the Dirichlet prior, this approach cannot achieve the minimax rates in entropy estimation. The performance of the minimax rate-optimal estimator with <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> samples is essentially at least as good as that of Dirichlet smoothed entropy estimators with <inline-formula> <tex-math notation="LaTeX">n\ln n </tex-math></inline-formula> samples.
650		4	\|a approximation theory
650		4	\|a approximation using positive linear operators
650		4	\|a high dimensional statistics
650		4	\|a maximum likelihood estimator
650		4	\|a Information theory
650		4	\|a Entropy estimation
650		4	\|a Approximation methods
650		4	\|a Maximum likelihood estimation
650		4	\|a Complexity theory
650		4	\|a Entropy
650		4	\|a Smoothing methods
650		4	\|a Rényi entropy
650		4	\|a Dirichlet prior smoothing
650		4	\|a Statistics Theory
650		4	\|a Computer Science
650		4	\|a Information Theory
650		4	\|a Mathematics
700	1		\|a Venkat, Kartik \|4 oth
700	1		\|a Han, Yanjun \|4 oth
700	1		\|a Weissman, Tsachy \|4 oth
773	0	8	\|i Enthalten in \|t IEEE transactions on information theory \|d Piscataway, NJ : IEEE, 1963 \|g 63(2017), 10, Seite 6774-6798 \|w (DE-627)12954731X \|w (DE-600)218505-2 \|w (DE-576)01499819X \|x 0018-9448 \|7 nnns
773	1	8	\|g volume:63 \|g year:2017 \|g number:10 \|g pages:6774-6798
856	4	1	\|u http://dx.doi.org/10.1109/TIT.2017.2733537 \|3 Volltext
856	4	2	\|u http://ieeexplore.ieee.org/document/7997814
856	4	2	\|u http://arxiv.org/abs/1406.6959
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_OLC
912			\|a SSG-OLC-TEC
912			\|a SSG-OLC-MAT
912			\|a SSG-OLC-BUB
912			\|a SSG-OPC-BBI
912			\|a GBV_ILN_65
912			\|a GBV_ILN_70
912			\|a GBV_ILN_2002
912			\|a GBV_ILN_2088
936	r	v	\|a SA 5570
951			\|a AR
952			\|d 63 \|j 2017 \|e 10 \|h 6774-6798

Indexfelder

author_variant	j j jj
matchkey_str	article:00189448:2017----::aiulklhoetmtoofntoasfi
hierarchy_sort_str	2017
publishDate	2017
allfields	10.1109/TIT.2017.2733537 doi PQ20171228 (DE-627)OLC1996944002 (DE-599)GBVOLC1996944002 (PRQ)a1023-d7b684d5f7b3b3ec98e8ab70e5508eb0d03460fcbfacd4b0adbed995a54f96c10 (KEY)0023448620170000063001006774maximumlikelihoodestimationoffunctionalsofdiscrete DE-627 ger DE-627 rakwb eng 070 620 DE-600 SA 5570 AVZ rvk Jiao, Jiantao verfasserin aut Maximum Likelihood Estimation of Functionals of Discrete Distributions 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier We consider the problem of estimating functionals of discrete distributions, and focus on a tight (up to universal multiplicative constants for each specific functional) nonasymptotic analysis of the worst case squared error risk of widely used estimators. We apply concentration inequalities to analyze the random fluctuation of these estimators around their expectations and the theory of approximation using positive linear operators to analyze the deviation of their expectations from the true functional, namely their bias . We explicitly characterize the worst case squared error risk incurred by the maximum likelihood estimator (MLE) in estimating the Shannon entropy <inline-formula> <tex-math notation="LaTeX">H(P) = \sum _{i = 1}^{S} -p_{i} \ln p_{i} </tex-math></inline-formula>, and the power sum <inline-formula> <tex-math notation="LaTeX">F_\alpha (P) = \sum _{i = 1}^{S} p_{i}^\alpha ,\alpha >0 </tex-math></inline-formula>, up to universal multiplicative constants for each fixed functional, for any alphabet size <inline-formula> <tex-math notation="LaTeX">S\leq \infty </tex-math></inline-formula> and sample size <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> for which the risk may vanish. As a corollary, for Shannon entropy estimation, we show that it is necessary and sufficient to have <inline-formula> <tex-math notation="LaTeX">n \gg S </tex-math></inline-formula> observations for the MLE to be consistent. In addition, we establish that it is necessary and sufficient to consider <inline-formula> <tex-math notation="LaTeX">n \gg S^{1/\alpha } </tex-math></inline-formula> samples for the MLE to consistently estimate <inline-formula> <tex-math notation="LaTeX">F_\alpha (P), 0<\alpha <1 </tex-math></inline-formula>. The minimax rate-optimal estimators for both problems require <inline-formula> <tex-math notation="LaTeX">S/\ln S </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">S^{1/\alpha }/\ln S </tex-math></inline-formula> samples, which implies that the MLE has a strictly sub-optimal sample complexity. When <inline-formula> <tex-math notation="LaTeX">1<\alpha <3/2 </tex-math></inline-formula>, we show that the worst case squared error rate of convergence for the MLE is <inline-formula> <tex-math notation="LaTeX">n^{-2(\alpha -1)} </tex-math></inline-formula> for infinite alphabet size, while the minimax squared error rate is <inline-formula> <tex-math notation="LaTeX">(n\ln n)^{-2(\alpha -1)} </tex-math></inline-formula>. When <inline-formula> <tex-math notation="LaTeX">\alpha \geq 3/2 </tex-math></inline-formula>, the MLE achieves the minimax optimal rate <inline-formula> <tex-math notation="LaTeX">n^{-1} </tex-math></inline-formula> regardless of the alphabet size. As an application of the general theory, we analyze the Dirichlet prior smoothing techniques for Shannon entropy estimation. In this context, one approach is to plug-in the Dirichlet prior smoothed distribution into the entropy functional, while the other one is to calculate the Bayes estimator for entropy under the Dirichlet prior for squared error, which is the conditional expectation. We show that in general such estimators do not improve over the maximum likelihood estimator. No matter how we tune the parameters in the Dirichlet prior, this approach cannot achieve the minimax rates in entropy estimation. The performance of the minimax rate-optimal estimator with <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> samples is essentially at least as good as that of Dirichlet smoothed entropy estimators with <inline-formula> <tex-math notation="LaTeX">n\ln n </tex-math></inline-formula> samples. approximation theory approximation using positive linear operators high dimensional statistics maximum likelihood estimator Information theory Entropy estimation Approximation methods Maximum likelihood estimation Complexity theory Entropy Smoothing methods Rényi entropy Dirichlet prior smoothing Statistics Theory Computer Science Information Theory Mathematics Venkat, Kartik oth Han, Yanjun oth Weissman, Tsachy oth Enthalten in IEEE transactions on information theory Piscataway, NJ : IEEE, 1963 63(2017), 10, Seite 6774-6798 (DE-627)12954731X (DE-600)218505-2 (DE-576)01499819X 0018-9448 nnns volume:63 year:2017 number:10 pages:6774-6798 http://dx.doi.org/10.1109/TIT.2017.2733537 Volltext http://ieeexplore.ieee.org/document/7997814 http://arxiv.org/abs/1406.6959 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_65 GBV_ILN_70 GBV_ILN_2002 GBV_ILN_2088 SA 5570 AR 63 2017 10 6774-6798
spelling	10.1109/TIT.2017.2733537 doi PQ20171228 (DE-627)OLC1996944002 (DE-599)GBVOLC1996944002 (PRQ)a1023-d7b684d5f7b3b3ec98e8ab70e5508eb0d03460fcbfacd4b0adbed995a54f96c10 (KEY)0023448620170000063001006774maximumlikelihoodestimationoffunctionalsofdiscrete DE-627 ger DE-627 rakwb eng 070 620 DE-600 SA 5570 AVZ rvk Jiao, Jiantao verfasserin aut Maximum Likelihood Estimation of Functionals of Discrete Distributions 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier We consider the problem of estimating functionals of discrete distributions, and focus on a tight (up to universal multiplicative constants for each specific functional) nonasymptotic analysis of the worst case squared error risk of widely used estimators. We apply concentration inequalities to analyze the random fluctuation of these estimators around their expectations and the theory of approximation using positive linear operators to analyze the deviation of their expectations from the true functional, namely their bias . We explicitly characterize the worst case squared error risk incurred by the maximum likelihood estimator (MLE) in estimating the Shannon entropy <inline-formula> <tex-math notation="LaTeX">H(P) = \sum _{i = 1}^{S} -p_{i} \ln p_{i} </tex-math></inline-formula>, and the power sum <inline-formula> <tex-math notation="LaTeX">F_\alpha (P) = \sum _{i = 1}^{S} p_{i}^\alpha ,\alpha >0 </tex-math></inline-formula>, up to universal multiplicative constants for each fixed functional, for any alphabet size <inline-formula> <tex-math notation="LaTeX">S\leq \infty </tex-math></inline-formula> and sample size <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> for which the risk may vanish. As a corollary, for Shannon entropy estimation, we show that it is necessary and sufficient to have <inline-formula> <tex-math notation="LaTeX">n \gg S </tex-math></inline-formula> observations for the MLE to be consistent. In addition, we establish that it is necessary and sufficient to consider <inline-formula> <tex-math notation="LaTeX">n \gg S^{1/\alpha } </tex-math></inline-formula> samples for the MLE to consistently estimate <inline-formula> <tex-math notation="LaTeX">F_\alpha (P), 0<\alpha <1 </tex-math></inline-formula>. The minimax rate-optimal estimators for both problems require <inline-formula> <tex-math notation="LaTeX">S/\ln S </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">S^{1/\alpha }/\ln S </tex-math></inline-formula> samples, which implies that the MLE has a strictly sub-optimal sample complexity. When <inline-formula> <tex-math notation="LaTeX">1<\alpha <3/2 </tex-math></inline-formula>, we show that the worst case squared error rate of convergence for the MLE is <inline-formula> <tex-math notation="LaTeX">n^{-2(\alpha -1)} </tex-math></inline-formula> for infinite alphabet size, while the minimax squared error rate is <inline-formula> <tex-math notation="LaTeX">(n\ln n)^{-2(\alpha -1)} </tex-math></inline-formula>. When <inline-formula> <tex-math notation="LaTeX">\alpha \geq 3/2 </tex-math></inline-formula>, the MLE achieves the minimax optimal rate <inline-formula> <tex-math notation="LaTeX">n^{-1} </tex-math></inline-formula> regardless of the alphabet size. As an application of the general theory, we analyze the Dirichlet prior smoothing techniques for Shannon entropy estimation. In this context, one approach is to plug-in the Dirichlet prior smoothed distribution into the entropy functional, while the other one is to calculate the Bayes estimator for entropy under the Dirichlet prior for squared error, which is the conditional expectation. We show that in general such estimators do not improve over the maximum likelihood estimator. No matter how we tune the parameters in the Dirichlet prior, this approach cannot achieve the minimax rates in entropy estimation. The performance of the minimax rate-optimal estimator with <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> samples is essentially at least as good as that of Dirichlet smoothed entropy estimators with <inline-formula> <tex-math notation="LaTeX">n\ln n </tex-math></inline-formula> samples. approximation theory approximation using positive linear operators high dimensional statistics maximum likelihood estimator Information theory Entropy estimation Approximation methods Maximum likelihood estimation Complexity theory Entropy Smoothing methods Rényi entropy Dirichlet prior smoothing Statistics Theory Computer Science Information Theory Mathematics Venkat, Kartik oth Han, Yanjun oth Weissman, Tsachy oth Enthalten in IEEE transactions on information theory Piscataway, NJ : IEEE, 1963 63(2017), 10, Seite 6774-6798 (DE-627)12954731X (DE-600)218505-2 (DE-576)01499819X 0018-9448 nnns volume:63 year:2017 number:10 pages:6774-6798 http://dx.doi.org/10.1109/TIT.2017.2733537 Volltext http://ieeexplore.ieee.org/document/7997814 http://arxiv.org/abs/1406.6959 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_65 GBV_ILN_70 GBV_ILN_2002 GBV_ILN_2088 SA 5570 AR 63 2017 10 6774-6798
allfields_unstemmed	10.1109/TIT.2017.2733537 doi PQ20171228 (DE-627)OLC1996944002 (DE-599)GBVOLC1996944002 (PRQ)a1023-d7b684d5f7b3b3ec98e8ab70e5508eb0d03460fcbfacd4b0adbed995a54f96c10 (KEY)0023448620170000063001006774maximumlikelihoodestimationoffunctionalsofdiscrete DE-627 ger DE-627 rakwb eng 070 620 DE-600 SA 5570 AVZ rvk Jiao, Jiantao verfasserin aut Maximum Likelihood Estimation of Functionals of Discrete Distributions 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier We consider the problem of estimating functionals of discrete distributions, and focus on a tight (up to universal multiplicative constants for each specific functional) nonasymptotic analysis of the worst case squared error risk of widely used estimators. We apply concentration inequalities to analyze the random fluctuation of these estimators around their expectations and the theory of approximation using positive linear operators to analyze the deviation of their expectations from the true functional, namely their bias . We explicitly characterize the worst case squared error risk incurred by the maximum likelihood estimator (MLE) in estimating the Shannon entropy <inline-formula> <tex-math notation="LaTeX">H(P) = \sum _{i = 1}^{S} -p_{i} \ln p_{i} </tex-math></inline-formula>, and the power sum <inline-formula> <tex-math notation="LaTeX">F_\alpha (P) = \sum _{i = 1}^{S} p_{i}^\alpha ,\alpha >0 </tex-math></inline-formula>, up to universal multiplicative constants for each fixed functional, for any alphabet size <inline-formula> <tex-math notation="LaTeX">S\leq \infty </tex-math></inline-formula> and sample size <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> for which the risk may vanish. As a corollary, for Shannon entropy estimation, we show that it is necessary and sufficient to have <inline-formula> <tex-math notation="LaTeX">n \gg S </tex-math></inline-formula> observations for the MLE to be consistent. In addition, we establish that it is necessary and sufficient to consider <inline-formula> <tex-math notation="LaTeX">n \gg S^{1/\alpha } </tex-math></inline-formula> samples for the MLE to consistently estimate <inline-formula> <tex-math notation="LaTeX">F_\alpha (P), 0<\alpha <1 </tex-math></inline-formula>. The minimax rate-optimal estimators for both problems require <inline-formula> <tex-math notation="LaTeX">S/\ln S </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">S^{1/\alpha }/\ln S </tex-math></inline-formula> samples, which implies that the MLE has a strictly sub-optimal sample complexity. When <inline-formula> <tex-math notation="LaTeX">1<\alpha <3/2 </tex-math></inline-formula>, we show that the worst case squared error rate of convergence for the MLE is <inline-formula> <tex-math notation="LaTeX">n^{-2(\alpha -1)} </tex-math></inline-formula> for infinite alphabet size, while the minimax squared error rate is <inline-formula> <tex-math notation="LaTeX">(n\ln n)^{-2(\alpha -1)} </tex-math></inline-formula>. When <inline-formula> <tex-math notation="LaTeX">\alpha \geq 3/2 </tex-math></inline-formula>, the MLE achieves the minimax optimal rate <inline-formula> <tex-math notation="LaTeX">n^{-1} </tex-math></inline-formula> regardless of the alphabet size. As an application of the general theory, we analyze the Dirichlet prior smoothing techniques for Shannon entropy estimation. In this context, one approach is to plug-in the Dirichlet prior smoothed distribution into the entropy functional, while the other one is to calculate the Bayes estimator for entropy under the Dirichlet prior for squared error, which is the conditional expectation. We show that in general such estimators do not improve over the maximum likelihood estimator. No matter how we tune the parameters in the Dirichlet prior, this approach cannot achieve the minimax rates in entropy estimation. The performance of the minimax rate-optimal estimator with <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> samples is essentially at least as good as that of Dirichlet smoothed entropy estimators with <inline-formula> <tex-math notation="LaTeX">n\ln n </tex-math></inline-formula> samples. approximation theory approximation using positive linear operators high dimensional statistics maximum likelihood estimator Information theory Entropy estimation Approximation methods Maximum likelihood estimation Complexity theory Entropy Smoothing methods Rényi entropy Dirichlet prior smoothing Statistics Theory Computer Science Information Theory Mathematics Venkat, Kartik oth Han, Yanjun oth Weissman, Tsachy oth Enthalten in IEEE transactions on information theory Piscataway, NJ : IEEE, 1963 63(2017), 10, Seite 6774-6798 (DE-627)12954731X (DE-600)218505-2 (DE-576)01499819X 0018-9448 nnns volume:63 year:2017 number:10 pages:6774-6798 http://dx.doi.org/10.1109/TIT.2017.2733537 Volltext http://ieeexplore.ieee.org/document/7997814 http://arxiv.org/abs/1406.6959 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_65 GBV_ILN_70 GBV_ILN_2002 GBV_ILN_2088 SA 5570 AR 63 2017 10 6774-6798
allfieldsGer	10.1109/TIT.2017.2733537 doi PQ20171228 (DE-627)OLC1996944002 (DE-599)GBVOLC1996944002 (PRQ)a1023-d7b684d5f7b3b3ec98e8ab70e5508eb0d03460fcbfacd4b0adbed995a54f96c10 (KEY)0023448620170000063001006774maximumlikelihoodestimationoffunctionalsofdiscrete DE-627 ger DE-627 rakwb eng 070 620 DE-600 SA 5570 AVZ rvk Jiao, Jiantao verfasserin aut Maximum Likelihood Estimation of Functionals of Discrete Distributions 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier We consider the problem of estimating functionals of discrete distributions, and focus on a tight (up to universal multiplicative constants for each specific functional) nonasymptotic analysis of the worst case squared error risk of widely used estimators. We apply concentration inequalities to analyze the random fluctuation of these estimators around their expectations and the theory of approximation using positive linear operators to analyze the deviation of their expectations from the true functional, namely their bias . We explicitly characterize the worst case squared error risk incurred by the maximum likelihood estimator (MLE) in estimating the Shannon entropy <inline-formula> <tex-math notation="LaTeX">H(P) = \sum _{i = 1}^{S} -p_{i} \ln p_{i} </tex-math></inline-formula>, and the power sum <inline-formula> <tex-math notation="LaTeX">F_\alpha (P) = \sum _{i = 1}^{S} p_{i}^\alpha ,\alpha >0 </tex-math></inline-formula>, up to universal multiplicative constants for each fixed functional, for any alphabet size <inline-formula> <tex-math notation="LaTeX">S\leq \infty </tex-math></inline-formula> and sample size <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> for which the risk may vanish. As a corollary, for Shannon entropy estimation, we show that it is necessary and sufficient to have <inline-formula> <tex-math notation="LaTeX">n \gg S </tex-math></inline-formula> observations for the MLE to be consistent. In addition, we establish that it is necessary and sufficient to consider <inline-formula> <tex-math notation="LaTeX">n \gg S^{1/\alpha } </tex-math></inline-formula> samples for the MLE to consistently estimate <inline-formula> <tex-math notation="LaTeX">F_\alpha (P), 0<\alpha <1 </tex-math></inline-formula>. The minimax rate-optimal estimators for both problems require <inline-formula> <tex-math notation="LaTeX">S/\ln S </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">S^{1/\alpha }/\ln S </tex-math></inline-formula> samples, which implies that the MLE has a strictly sub-optimal sample complexity. When <inline-formula> <tex-math notation="LaTeX">1<\alpha <3/2 </tex-math></inline-formula>, we show that the worst case squared error rate of convergence for the MLE is <inline-formula> <tex-math notation="LaTeX">n^{-2(\alpha -1)} </tex-math></inline-formula> for infinite alphabet size, while the minimax squared error rate is <inline-formula> <tex-math notation="LaTeX">(n\ln n)^{-2(\alpha -1)} </tex-math></inline-formula>. When <inline-formula> <tex-math notation="LaTeX">\alpha \geq 3/2 </tex-math></inline-formula>, the MLE achieves the minimax optimal rate <inline-formula> <tex-math notation="LaTeX">n^{-1} </tex-math></inline-formula> regardless of the alphabet size. As an application of the general theory, we analyze the Dirichlet prior smoothing techniques for Shannon entropy estimation. In this context, one approach is to plug-in the Dirichlet prior smoothed distribution into the entropy functional, while the other one is to calculate the Bayes estimator for entropy under the Dirichlet prior for squared error, which is the conditional expectation. We show that in general such estimators do not improve over the maximum likelihood estimator. No matter how we tune the parameters in the Dirichlet prior, this approach cannot achieve the minimax rates in entropy estimation. The performance of the minimax rate-optimal estimator with <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> samples is essentially at least as good as that of Dirichlet smoothed entropy estimators with <inline-formula> <tex-math notation="LaTeX">n\ln n </tex-math></inline-formula> samples. approximation theory approximation using positive linear operators high dimensional statistics maximum likelihood estimator Information theory Entropy estimation Approximation methods Maximum likelihood estimation Complexity theory Entropy Smoothing methods Rényi entropy Dirichlet prior smoothing Statistics Theory Computer Science Information Theory Mathematics Venkat, Kartik oth Han, Yanjun oth Weissman, Tsachy oth Enthalten in IEEE transactions on information theory Piscataway, NJ : IEEE, 1963 63(2017), 10, Seite 6774-6798 (DE-627)12954731X (DE-600)218505-2 (DE-576)01499819X 0018-9448 nnns volume:63 year:2017 number:10 pages:6774-6798 http://dx.doi.org/10.1109/TIT.2017.2733537 Volltext http://ieeexplore.ieee.org/document/7997814 http://arxiv.org/abs/1406.6959 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_65 GBV_ILN_70 GBV_ILN_2002 GBV_ILN_2088 SA 5570 AR 63 2017 10 6774-6798
allfieldsSound	10.1109/TIT.2017.2733537 doi PQ20171228 (DE-627)OLC1996944002 (DE-599)GBVOLC1996944002 (PRQ)a1023-d7b684d5f7b3b3ec98e8ab70e5508eb0d03460fcbfacd4b0adbed995a54f96c10 (KEY)0023448620170000063001006774maximumlikelihoodestimationoffunctionalsofdiscrete DE-627 ger DE-627 rakwb eng 070 620 DE-600 SA 5570 AVZ rvk Jiao, Jiantao verfasserin aut Maximum Likelihood Estimation of Functionals of Discrete Distributions 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier We consider the problem of estimating functionals of discrete distributions, and focus on a tight (up to universal multiplicative constants for each specific functional) nonasymptotic analysis of the worst case squared error risk of widely used estimators. We apply concentration inequalities to analyze the random fluctuation of these estimators around their expectations and the theory of approximation using positive linear operators to analyze the deviation of their expectations from the true functional, namely their bias . We explicitly characterize the worst case squared error risk incurred by the maximum likelihood estimator (MLE) in estimating the Shannon entropy <inline-formula> <tex-math notation="LaTeX">H(P) = \sum _{i = 1}^{S} -p_{i} \ln p_{i} </tex-math></inline-formula>, and the power sum <inline-formula> <tex-math notation="LaTeX">F_\alpha (P) = \sum _{i = 1}^{S} p_{i}^\alpha ,\alpha >0 </tex-math></inline-formula>, up to universal multiplicative constants for each fixed functional, for any alphabet size <inline-formula> <tex-math notation="LaTeX">S\leq \infty </tex-math></inline-formula> and sample size <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> for which the risk may vanish. As a corollary, for Shannon entropy estimation, we show that it is necessary and sufficient to have <inline-formula> <tex-math notation="LaTeX">n \gg S </tex-math></inline-formula> observations for the MLE to be consistent. In addition, we establish that it is necessary and sufficient to consider <inline-formula> <tex-math notation="LaTeX">n \gg S^{1/\alpha } </tex-math></inline-formula> samples for the MLE to consistently estimate <inline-formula> <tex-math notation="LaTeX">F_\alpha (P), 0<\alpha <1 </tex-math></inline-formula>. The minimax rate-optimal estimators for both problems require <inline-formula> <tex-math notation="LaTeX">S/\ln S </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">S^{1/\alpha }/\ln S </tex-math></inline-formula> samples, which implies that the MLE has a strictly sub-optimal sample complexity. When <inline-formula> <tex-math notation="LaTeX">1<\alpha <3/2 </tex-math></inline-formula>, we show that the worst case squared error rate of convergence for the MLE is <inline-formula> <tex-math notation="LaTeX">n^{-2(\alpha -1)} </tex-math></inline-formula> for infinite alphabet size, while the minimax squared error rate is <inline-formula> <tex-math notation="LaTeX">(n\ln n)^{-2(\alpha -1)} </tex-math></inline-formula>. When <inline-formula> <tex-math notation="LaTeX">\alpha \geq 3/2 </tex-math></inline-formula>, the MLE achieves the minimax optimal rate <inline-formula> <tex-math notation="LaTeX">n^{-1} </tex-math></inline-formula> regardless of the alphabet size. As an application of the general theory, we analyze the Dirichlet prior smoothing techniques for Shannon entropy estimation. In this context, one approach is to plug-in the Dirichlet prior smoothed distribution into the entropy functional, while the other one is to calculate the Bayes estimator for entropy under the Dirichlet prior for squared error, which is the conditional expectation. We show that in general such estimators do not improve over the maximum likelihood estimator. No matter how we tune the parameters in the Dirichlet prior, this approach cannot achieve the minimax rates in entropy estimation. The performance of the minimax rate-optimal estimator with <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> samples is essentially at least as good as that of Dirichlet smoothed entropy estimators with <inline-formula> <tex-math notation="LaTeX">n\ln n </tex-math></inline-formula> samples. approximation theory approximation using positive linear operators high dimensional statistics maximum likelihood estimator Information theory Entropy estimation Approximation methods Maximum likelihood estimation Complexity theory Entropy Smoothing methods Rényi entropy Dirichlet prior smoothing Statistics Theory Computer Science Information Theory Mathematics Venkat, Kartik oth Han, Yanjun oth Weissman, Tsachy oth Enthalten in IEEE transactions on information theory Piscataway, NJ : IEEE, 1963 63(2017), 10, Seite 6774-6798 (DE-627)12954731X (DE-600)218505-2 (DE-576)01499819X 0018-9448 nnns volume:63 year:2017 number:10 pages:6774-6798 http://dx.doi.org/10.1109/TIT.2017.2733537 Volltext http://ieeexplore.ieee.org/document/7997814 http://arxiv.org/abs/1406.6959 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_65 GBV_ILN_70 GBV_ILN_2002 GBV_ILN_2088 SA 5570 AR 63 2017 10 6774-6798
language	English
source	Enthalten in IEEE transactions on information theory 63(2017), 10, Seite 6774-6798 volume:63 year:2017 number:10 pages:6774-6798
sourceStr	Enthalten in IEEE transactions on information theory 63(2017), 10, Seite 6774-6798 volume:63 year:2017 number:10 pages:6774-6798
format_phy_str_mv	Article
institution	findex.gbv.de
topic_facet	approximation theory approximation using positive linear operators high dimensional statistics maximum likelihood estimator Information theory Entropy estimation Approximation methods Maximum likelihood estimation Complexity theory Entropy Smoothing methods Rényi entropy Dirichlet prior smoothing Statistics Theory Computer Science Information Theory Mathematics
dewey-raw	070
isfreeaccess_bool	false
container_title	IEEE transactions on information theory
authorswithroles_txt_mv	Jiao, Jiantao @@aut@@ Venkat, Kartik @@oth@@ Han, Yanjun @@oth@@ Weissman, Tsachy @@oth@@
publishDateDaySort_date	2017-01-01T00:00:00Z
hierarchy_top_id	12954731X
dewey-sort	270
id	OLC1996944002
language_de	englisch
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a2200265 4500</leader><controlfield tag="001">OLC1996944002</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20220221163315.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">171125s2017 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1109/TIT.2017.2733537</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">PQ20171228</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC1996944002</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBVOLC1996944002</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(PRQ)a1023-d7b684d5f7b3b3ec98e8ab70e5508eb0d03460fcbfacd4b0adbed995a54f96c10</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(KEY)0023448620170000063001006774maximumlikelihoodestimationoffunctionalsofdiscrete</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">070</subfield><subfield code="a">620</subfield><subfield code="q">DE-600</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">SA 5570</subfield><subfield code="q">AVZ</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Jiao, Jiantao</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Maximum Likelihood Estimation of Functionals of Discrete Distributions</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2017</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">We consider the problem of estimating functionals of discrete distributions, and focus on a tight (up to universal multiplicative constants for each specific functional) nonasymptotic analysis of the worst case squared error risk of widely used estimators. We apply concentration inequalities to analyze the random fluctuation of these estimators around their expectations and the theory of approximation using positive linear operators to analyze the deviation of their expectations from the true functional, namely their bias . We explicitly characterize the worst case squared error risk incurred by the maximum likelihood estimator (MLE) in estimating the Shannon entropy <inline-formula> <tex-math notation="LaTeX">H(P) = \sum _{i = 1}^{S} -p_{i} \ln p_{i} </tex-math></inline-formula>, and the power sum <inline-formula> <tex-math notation="LaTeX">F_\alpha (P) = \sum _{i = 1}^{S} p_{i}^\alpha ,\alpha >0 </tex-math></inline-formula>, up to universal multiplicative constants for each fixed functional, for any alphabet size <inline-formula> <tex-math notation="LaTeX">S\leq \infty </tex-math></inline-formula> and sample size <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> for which the risk may vanish. As a corollary, for Shannon entropy estimation, we show that it is necessary and sufficient to have <inline-formula> <tex-math notation="LaTeX">n \gg S </tex-math></inline-formula> observations for the MLE to be consistent. In addition, we establish that it is necessary and sufficient to consider <inline-formula> <tex-math notation="LaTeX">n \gg S^{1/\alpha } </tex-math></inline-formula> samples for the MLE to consistently estimate <inline-formula> <tex-math notation="LaTeX">F_\alpha (P), 0<\alpha <1 </tex-math></inline-formula>. The minimax rate-optimal estimators for both problems require <inline-formula> <tex-math notation="LaTeX">S/\ln S </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">S^{1/\alpha }/\ln S </tex-math></inline-formula> samples, which implies that the MLE has a strictly sub-optimal sample complexity. When <inline-formula> <tex-math notation="LaTeX">1<\alpha <3/2 </tex-math></inline-formula>, we show that the worst case squared error rate of convergence for the MLE is <inline-formula> <tex-math notation="LaTeX">n^{-2(\alpha -1)} </tex-math></inline-formula> for infinite alphabet size, while the minimax squared error rate is <inline-formula> <tex-math notation="LaTeX">(n\ln n)^{-2(\alpha -1)} </tex-math></inline-formula>. When <inline-formula> <tex-math notation="LaTeX">\alpha \geq 3/2 </tex-math></inline-formula>, the MLE achieves the minimax optimal rate <inline-formula> <tex-math notation="LaTeX">n^{-1} </tex-math></inline-formula> regardless of the alphabet size. As an application of the general theory, we analyze the Dirichlet prior smoothing techniques for Shannon entropy estimation. In this context, one approach is to plug-in the Dirichlet prior smoothed distribution into the entropy functional, while the other one is to calculate the Bayes estimator for entropy under the Dirichlet prior for squared error, which is the conditional expectation. We show that in general such estimators do not improve over the maximum likelihood estimator. No matter how we tune the parameters in the Dirichlet prior, this approach cannot achieve the minimax rates in entropy estimation. The performance of the minimax rate-optimal estimator with <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> samples is essentially at least as good as that of Dirichlet smoothed entropy estimators with <inline-formula> <tex-math notation="LaTeX">n\ln n </tex-math></inline-formula> samples.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">approximation theory</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">approximation using positive linear operators</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">high dimensional statistics</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">maximum likelihood estimator</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Information theory</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Entropy estimation</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Approximation methods</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Maximum likelihood estimation</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Complexity theory</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Entropy</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Smoothing methods</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Rényi entropy</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Dirichlet prior smoothing</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Statistics Theory</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Computer Science</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Information Theory</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Mathematics</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Venkat, Kartik</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Han, Yanjun</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Weissman, Tsachy</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">IEEE transactions on information theory</subfield><subfield code="d">Piscataway, NJ : IEEE, 1963</subfield><subfield code="g">63(2017), 10, Seite 6774-6798</subfield><subfield code="w">(DE-627)12954731X</subfield><subfield code="w">(DE-600)218505-2</subfield><subfield code="w">(DE-576)01499819X</subfield><subfield code="x">0018-9448</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:63</subfield><subfield code="g">year:2017</subfield><subfield code="g">number:10</subfield><subfield code="g">pages:6774-6798</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">http://dx.doi.org/10.1109/TIT.2017.2733537</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://ieeexplore.ieee.org/document/7997814</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://arxiv.org/abs/1406.6959</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OPC-BBI</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_65</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2002</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2088</subfield></datafield><datafield tag="936" ind1="r" ind2="v"><subfield code="a">SA 5570</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">63</subfield><subfield code="j">2017</subfield><subfield code="e">10</subfield><subfield code="h">6774-6798</subfield></datafield></record></collection>
author	Jiao, Jiantao
spellingShingle	Jiao, Jiantao ddc 070 rvk SA 5570 misc approximation theory misc approximation using positive linear operators misc high dimensional statistics misc maximum likelihood estimator misc Information theory misc Entropy estimation misc Approximation methods misc Maximum likelihood estimation misc Complexity theory misc Entropy misc Smoothing methods misc Rényi entropy misc Dirichlet prior smoothing misc Statistics Theory misc Computer Science misc Information Theory misc Mathematics Maximum Likelihood Estimation of Functionals of Discrete Distributions
authorStr	Jiao, Jiantao
ppnlink_with_tag_str_mv	@@773@@(DE-627)12954731X
format	Article
dewey-ones	070 - News media, journalism & publishing 620 - Engineering & allied operations
delete_txt_mv	keep
author_role	aut
collection	OLC
remote_str	false
illustrated	Not Illustrated
issn	0018-9448
topic_title	070 620 DE-600 SA 5570 AVZ rvk Maximum Likelihood Estimation of Functionals of Discrete Distributions approximation theory approximation using positive linear operators high dimensional statistics maximum likelihood estimator Information theory Entropy estimation Approximation methods Maximum likelihood estimation Complexity theory Entropy Smoothing methods Rényi entropy Dirichlet prior smoothing Statistics Theory Computer Science Information Theory Mathematics
topic	ddc 070 rvk SA 5570 misc approximation theory misc approximation using positive linear operators misc high dimensional statistics misc maximum likelihood estimator misc Information theory misc Entropy estimation misc Approximation methods misc Maximum likelihood estimation misc Complexity theory misc Entropy misc Smoothing methods misc Rényi entropy misc Dirichlet prior smoothing misc Statistics Theory misc Computer Science misc Information Theory misc Mathematics
topic_unstemmed	ddc 070 rvk SA 5570 misc approximation theory misc approximation using positive linear operators misc high dimensional statistics misc maximum likelihood estimator misc Information theory misc Entropy estimation misc Approximation methods misc Maximum likelihood estimation misc Complexity theory misc Entropy misc Smoothing methods misc Rényi entropy misc Dirichlet prior smoothing misc Statistics Theory misc Computer Science misc Information Theory misc Mathematics
topic_browse	ddc 070 rvk SA 5570 misc approximation theory misc approximation using positive linear operators misc high dimensional statistics misc maximum likelihood estimator misc Information theory misc Entropy estimation misc Approximation methods misc Maximum likelihood estimation misc Complexity theory misc Entropy misc Smoothing methods misc Rényi entropy misc Dirichlet prior smoothing misc Statistics Theory misc Computer Science misc Information Theory misc Mathematics
format_facet	Aufsätze Gedruckte Aufsätze
format_main_str_mv	Text Zeitschrift/Artikel
carriertype_str_mv	nc
author2_variant	k v kv y h yh t w tw
hierarchy_parent_title	IEEE transactions on information theory
hierarchy_parent_id	12954731X
dewey-tens	070 - News media, journalism & publishing 620 - Engineering
hierarchy_top_title	IEEE transactions on information theory
isfreeaccess_txt	false
familylinks_str_mv	(DE-627)12954731X (DE-600)218505-2 (DE-576)01499819X
title	Maximum Likelihood Estimation of Functionals of Discrete Distributions
ctrlnum	(DE-627)OLC1996944002 (DE-599)GBVOLC1996944002 (PRQ)a1023-d7b684d5f7b3b3ec98e8ab70e5508eb0d03460fcbfacd4b0adbed995a54f96c10 (KEY)0023448620170000063001006774maximumlikelihoodestimationoffunctionalsofdiscrete
title_full	Maximum Likelihood Estimation of Functionals of Discrete Distributions
author_sort	Jiao, Jiantao
journal	IEEE transactions on information theory
journalStr	IEEE transactions on information theory
lang_code	eng
isOA_bool	false
dewey-hundreds	000 - Computer science, information & general works 600 - Technology
recordtype	marc
publishDateSort	2017
contenttype_str_mv	txt
container_start_page	6774
author_browse	Jiao, Jiantao
container_volume	63
class	070 620 DE-600 SA 5570 AVZ rvk
format_se	Aufsätze
author-letter	Jiao, Jiantao
doi_str_mv	10.1109/TIT.2017.2733537
dewey-full	070 620
title_sort	maximum likelihood estimation of functionals of discrete distributions
title_auth	Maximum Likelihood Estimation of Functionals of Discrete Distributions
abstract	We consider the problem of estimating functionals of discrete distributions, and focus on a tight (up to universal multiplicative constants for each specific functional) nonasymptotic analysis of the worst case squared error risk of widely used estimators. We apply concentration inequalities to analyze the random fluctuation of these estimators around their expectations and the theory of approximation using positive linear operators to analyze the deviation of their expectations from the true functional, namely their bias . We explicitly characterize the worst case squared error risk incurred by the maximum likelihood estimator (MLE) in estimating the Shannon entropy <inline-formula> <tex-math notation="LaTeX">H(P) = \sum _{i = 1}^{S} -p_{i} \ln p_{i} </tex-math></inline-formula>, and the power sum <inline-formula> <tex-math notation="LaTeX">F_\alpha (P) = \sum _{i = 1}^{S} p_{i}^\alpha ,\alpha >0 </tex-math></inline-formula>, up to universal multiplicative constants for each fixed functional, for any alphabet size <inline-formula> <tex-math notation="LaTeX">S\leq \infty </tex-math></inline-formula> and sample size <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> for which the risk may vanish. As a corollary, for Shannon entropy estimation, we show that it is necessary and sufficient to have <inline-formula> <tex-math notation="LaTeX">n \gg S </tex-math></inline-formula> observations for the MLE to be consistent. In addition, we establish that it is necessary and sufficient to consider <inline-formula> <tex-math notation="LaTeX">n \gg S^{1/\alpha } </tex-math></inline-formula> samples for the MLE to consistently estimate <inline-formula> <tex-math notation="LaTeX">F_\alpha (P), 0<\alpha <1 </tex-math></inline-formula>. The minimax rate-optimal estimators for both problems require <inline-formula> <tex-math notation="LaTeX">S/\ln S </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">S^{1/\alpha }/\ln S </tex-math></inline-formula> samples, which implies that the MLE has a strictly sub-optimal sample complexity. When <inline-formula> <tex-math notation="LaTeX">1<\alpha <3/2 </tex-math></inline-formula>, we show that the worst case squared error rate of convergence for the MLE is <inline-formula> <tex-math notation="LaTeX">n^{-2(\alpha -1)} </tex-math></inline-formula> for infinite alphabet size, while the minimax squared error rate is <inline-formula> <tex-math notation="LaTeX">(n\ln n)^{-2(\alpha -1)} </tex-math></inline-formula>. When <inline-formula> <tex-math notation="LaTeX">\alpha \geq 3/2 </tex-math></inline-formula>, the MLE achieves the minimax optimal rate <inline-formula> <tex-math notation="LaTeX">n^{-1} </tex-math></inline-formula> regardless of the alphabet size. As an application of the general theory, we analyze the Dirichlet prior smoothing techniques for Shannon entropy estimation. In this context, one approach is to plug-in the Dirichlet prior smoothed distribution into the entropy functional, while the other one is to calculate the Bayes estimator for entropy under the Dirichlet prior for squared error, which is the conditional expectation. We show that in general such estimators do not improve over the maximum likelihood estimator. No matter how we tune the parameters in the Dirichlet prior, this approach cannot achieve the minimax rates in entropy estimation. The performance of the minimax rate-optimal estimator with <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> samples is essentially at least as good as that of Dirichlet smoothed entropy estimators with <inline-formula> <tex-math notation="LaTeX">n\ln n </tex-math></inline-formula> samples.
abstractGer	We consider the problem of estimating functionals of discrete distributions, and focus on a tight (up to universal multiplicative constants for each specific functional) nonasymptotic analysis of the worst case squared error risk of widely used estimators. We apply concentration inequalities to analyze the random fluctuation of these estimators around their expectations and the theory of approximation using positive linear operators to analyze the deviation of their expectations from the true functional, namely their bias . We explicitly characterize the worst case squared error risk incurred by the maximum likelihood estimator (MLE) in estimating the Shannon entropy <inline-formula> <tex-math notation="LaTeX">H(P) = \sum _{i = 1}^{S} -p_{i} \ln p_{i} </tex-math></inline-formula>, and the power sum <inline-formula> <tex-math notation="LaTeX">F_\alpha (P) = \sum _{i = 1}^{S} p_{i}^\alpha ,\alpha >0 </tex-math></inline-formula>, up to universal multiplicative constants for each fixed functional, for any alphabet size <inline-formula> <tex-math notation="LaTeX">S\leq \infty </tex-math></inline-formula> and sample size <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> for which the risk may vanish. As a corollary, for Shannon entropy estimation, we show that it is necessary and sufficient to have <inline-formula> <tex-math notation="LaTeX">n \gg S </tex-math></inline-formula> observations for the MLE to be consistent. In addition, we establish that it is necessary and sufficient to consider <inline-formula> <tex-math notation="LaTeX">n \gg S^{1/\alpha } </tex-math></inline-formula> samples for the MLE to consistently estimate <inline-formula> <tex-math notation="LaTeX">F_\alpha (P), 0<\alpha <1 </tex-math></inline-formula>. The minimax rate-optimal estimators for both problems require <inline-formula> <tex-math notation="LaTeX">S/\ln S </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">S^{1/\alpha }/\ln S </tex-math></inline-formula> samples, which implies that the MLE has a strictly sub-optimal sample complexity. When <inline-formula> <tex-math notation="LaTeX">1<\alpha <3/2 </tex-math></inline-formula>, we show that the worst case squared error rate of convergence for the MLE is <inline-formula> <tex-math notation="LaTeX">n^{-2(\alpha -1)} </tex-math></inline-formula> for infinite alphabet size, while the minimax squared error rate is <inline-formula> <tex-math notation="LaTeX">(n\ln n)^{-2(\alpha -1)} </tex-math></inline-formula>. When <inline-formula> <tex-math notation="LaTeX">\alpha \geq 3/2 </tex-math></inline-formula>, the MLE achieves the minimax optimal rate <inline-formula> <tex-math notation="LaTeX">n^{-1} </tex-math></inline-formula> regardless of the alphabet size. As an application of the general theory, we analyze the Dirichlet prior smoothing techniques for Shannon entropy estimation. In this context, one approach is to plug-in the Dirichlet prior smoothed distribution into the entropy functional, while the other one is to calculate the Bayes estimator for entropy under the Dirichlet prior for squared error, which is the conditional expectation. We show that in general such estimators do not improve over the maximum likelihood estimator. No matter how we tune the parameters in the Dirichlet prior, this approach cannot achieve the minimax rates in entropy estimation. The performance of the minimax rate-optimal estimator with <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> samples is essentially at least as good as that of Dirichlet smoothed entropy estimators with <inline-formula> <tex-math notation="LaTeX">n\ln n </tex-math></inline-formula> samples.
abstract_unstemmed	We consider the problem of estimating functionals of discrete distributions, and focus on a tight (up to universal multiplicative constants for each specific functional) nonasymptotic analysis of the worst case squared error risk of widely used estimators. We apply concentration inequalities to analyze the random fluctuation of these estimators around their expectations and the theory of approximation using positive linear operators to analyze the deviation of their expectations from the true functional, namely their bias . We explicitly characterize the worst case squared error risk incurred by the maximum likelihood estimator (MLE) in estimating the Shannon entropy <inline-formula> <tex-math notation="LaTeX">H(P) = \sum _{i = 1}^{S} -p_{i} \ln p_{i} </tex-math></inline-formula>, and the power sum <inline-formula> <tex-math notation="LaTeX">F_\alpha (P) = \sum _{i = 1}^{S} p_{i}^\alpha ,\alpha >0 </tex-math></inline-formula>, up to universal multiplicative constants for each fixed functional, for any alphabet size <inline-formula> <tex-math notation="LaTeX">S\leq \infty </tex-math></inline-formula> and sample size <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> for which the risk may vanish. As a corollary, for Shannon entropy estimation, we show that it is necessary and sufficient to have <inline-formula> <tex-math notation="LaTeX">n \gg S </tex-math></inline-formula> observations for the MLE to be consistent. In addition, we establish that it is necessary and sufficient to consider <inline-formula> <tex-math notation="LaTeX">n \gg S^{1/\alpha } </tex-math></inline-formula> samples for the MLE to consistently estimate <inline-formula> <tex-math notation="LaTeX">F_\alpha (P), 0<\alpha <1 </tex-math></inline-formula>. The minimax rate-optimal estimators for both problems require <inline-formula> <tex-math notation="LaTeX">S/\ln S </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">S^{1/\alpha }/\ln S </tex-math></inline-formula> samples, which implies that the MLE has a strictly sub-optimal sample complexity. When <inline-formula> <tex-math notation="LaTeX">1<\alpha <3/2 </tex-math></inline-formula>, we show that the worst case squared error rate of convergence for the MLE is <inline-formula> <tex-math notation="LaTeX">n^{-2(\alpha -1)} </tex-math></inline-formula> for infinite alphabet size, while the minimax squared error rate is <inline-formula> <tex-math notation="LaTeX">(n\ln n)^{-2(\alpha -1)} </tex-math></inline-formula>. When <inline-formula> <tex-math notation="LaTeX">\alpha \geq 3/2 </tex-math></inline-formula>, the MLE achieves the minimax optimal rate <inline-formula> <tex-math notation="LaTeX">n^{-1} </tex-math></inline-formula> regardless of the alphabet size. As an application of the general theory, we analyze the Dirichlet prior smoothing techniques for Shannon entropy estimation. In this context, one approach is to plug-in the Dirichlet prior smoothed distribution into the entropy functional, while the other one is to calculate the Bayes estimator for entropy under the Dirichlet prior for squared error, which is the conditional expectation. We show that in general such estimators do not improve over the maximum likelihood estimator. No matter how we tune the parameters in the Dirichlet prior, this approach cannot achieve the minimax rates in entropy estimation. The performance of the minimax rate-optimal estimator with <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> samples is essentially at least as good as that of Dirichlet smoothed entropy estimators with <inline-formula> <tex-math notation="LaTeX">n\ln n </tex-math></inline-formula> samples.
collection_details	GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_65 GBV_ILN_70 GBV_ILN_2002 GBV_ILN_2088
container_issue	10
title_short	Maximum Likelihood Estimation of Functionals of Discrete Distributions
url	http://dx.doi.org/10.1109/TIT.2017.2733537 http://ieeexplore.ieee.org/document/7997814 http://arxiv.org/abs/1406.6959
remote_bool	false
author2	Venkat, Kartik Han, Yanjun Weissman, Tsachy
author2Str	Venkat, Kartik Han, Yanjun Weissman, Tsachy
ppnlink	12954731X
mediatype_str_mv	n
isOA_txt	false
hochschulschrift_bool	false
author2_role	oth oth oth
doi_str	10.1109/TIT.2017.2733537
up_date	2024-07-04T01:48:21.194Z
_version_	1803611224088248320
fullrecord_marcxml	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a2200265 4500</leader><controlfield tag="001">OLC1996944002</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20220221163315.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">171125s2017 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1109/TIT.2017.2733537</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">PQ20171228</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC1996944002</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBVOLC1996944002</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(PRQ)a1023-d7b684d5f7b3b3ec98e8ab70e5508eb0d03460fcbfacd4b0adbed995a54f96c10</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(KEY)0023448620170000063001006774maximumlikelihoodestimationoffunctionalsofdiscrete</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">070</subfield><subfield code="a">620</subfield><subfield code="q">DE-600</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">SA 5570</subfield><subfield code="q">AVZ</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Jiao, Jiantao</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Maximum Likelihood Estimation of Functionals of Discrete Distributions</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2017</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">We consider the problem of estimating functionals of discrete distributions, and focus on a tight (up to universal multiplicative constants for each specific functional) nonasymptotic analysis of the worst case squared error risk of widely used estimators. We apply concentration inequalities to analyze the random fluctuation of these estimators around their expectations and the theory of approximation using positive linear operators to analyze the deviation of their expectations from the true functional, namely their bias . We explicitly characterize the worst case squared error risk incurred by the maximum likelihood estimator (MLE) in estimating the Shannon entropy <inline-formula> <tex-math notation="LaTeX">H(P) = \sum _{i = 1}^{S} -p_{i} \ln p_{i} </tex-math></inline-formula>, and the power sum <inline-formula> <tex-math notation="LaTeX">F_\alpha (P) = \sum _{i = 1}^{S} p_{i}^\alpha ,\alpha >0 </tex-math></inline-formula>, up to universal multiplicative constants for each fixed functional, for any alphabet size <inline-formula> <tex-math notation="LaTeX">S\leq \infty </tex-math></inline-formula> and sample size <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> for which the risk may vanish. As a corollary, for Shannon entropy estimation, we show that it is necessary and sufficient to have <inline-formula> <tex-math notation="LaTeX">n \gg S </tex-math></inline-formula> observations for the MLE to be consistent. In addition, we establish that it is necessary and sufficient to consider <inline-formula> <tex-math notation="LaTeX">n \gg S^{1/\alpha } </tex-math></inline-formula> samples for the MLE to consistently estimate <inline-formula> <tex-math notation="LaTeX">F_\alpha (P), 0<\alpha <1 </tex-math></inline-formula>. The minimax rate-optimal estimators for both problems require <inline-formula> <tex-math notation="LaTeX">S/\ln S </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">S^{1/\alpha }/\ln S </tex-math></inline-formula> samples, which implies that the MLE has a strictly sub-optimal sample complexity. When <inline-formula> <tex-math notation="LaTeX">1<\alpha <3/2 </tex-math></inline-formula>, we show that the worst case squared error rate of convergence for the MLE is <inline-formula> <tex-math notation="LaTeX">n^{-2(\alpha -1)} </tex-math></inline-formula> for infinite alphabet size, while the minimax squared error rate is <inline-formula> <tex-math notation="LaTeX">(n\ln n)^{-2(\alpha -1)} </tex-math></inline-formula>. When <inline-formula> <tex-math notation="LaTeX">\alpha \geq 3/2 </tex-math></inline-formula>, the MLE achieves the minimax optimal rate <inline-formula> <tex-math notation="LaTeX">n^{-1} </tex-math></inline-formula> regardless of the alphabet size. As an application of the general theory, we analyze the Dirichlet prior smoothing techniques for Shannon entropy estimation. In this context, one approach is to plug-in the Dirichlet prior smoothed distribution into the entropy functional, while the other one is to calculate the Bayes estimator for entropy under the Dirichlet prior for squared error, which is the conditional expectation. We show that in general such estimators do not improve over the maximum likelihood estimator. No matter how we tune the parameters in the Dirichlet prior, this approach cannot achieve the minimax rates in entropy estimation. The performance of the minimax rate-optimal estimator with <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> samples is essentially at least as good as that of Dirichlet smoothed entropy estimators with <inline-formula> <tex-math notation="LaTeX">n\ln n </tex-math></inline-formula> samples.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">approximation theory</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">approximation using positive linear operators</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">high dimensional statistics</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">maximum likelihood estimator</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Information theory</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Entropy estimation</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Approximation methods</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Maximum likelihood estimation</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Complexity theory</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Entropy</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Smoothing methods</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Rényi entropy</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Dirichlet prior smoothing</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Statistics Theory</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Computer Science</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Information Theory</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Mathematics</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Venkat, Kartik</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Han, Yanjun</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Weissman, Tsachy</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">IEEE transactions on information theory</subfield><subfield code="d">Piscataway, NJ : IEEE, 1963</subfield><subfield code="g">63(2017), 10, Seite 6774-6798</subfield><subfield code="w">(DE-627)12954731X</subfield><subfield code="w">(DE-600)218505-2</subfield><subfield code="w">(DE-576)01499819X</subfield><subfield code="x">0018-9448</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:63</subfield><subfield code="g">year:2017</subfield><subfield code="g">number:10</subfield><subfield code="g">pages:6774-6798</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">http://dx.doi.org/10.1109/TIT.2017.2733537</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://ieeexplore.ieee.org/document/7997814</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://arxiv.org/abs/1406.6959</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OPC-BBI</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_65</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2002</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2088</subfield></datafield><datafield tag="936" ind1="r" ind2="v"><subfield code="a">SA 5570</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">63</subfield><subfield code="j">2017</subfield><subfield code="e">10</subfield><subfield code="h">6774-6798</subfield></datafield></record></collection>
score	7.3996363

Nicht das Richtige dabei?

Schreiben Sie uns!

Maximum Likelihood Estimation of Functionals of Discrete Distributions

Nicht das Richtige dabei?

Zugang & Verfügbarkeit

Vorhandene Bände

Nicht das Richtige dabei?