The Code Below uses the EM-algorithm derived in class.
em.gen <- function(theta0,tolerr, rate = FALSE, IT = TRUE){
y <- c(1997,907, 904,32)
d_sum <- sum(y[2:4])
maxit = 200
values <<- matrix(0, ncol = 3, nrow = maxit,
dimnames = list(rep("", maxit), c("IT", "Theta(n)", "MRE")))
condition = FALSE
it = 0
while(it < 200 & condition == FALSE){
it = it + 1
x2 = 1997*((theta0/4)/(.5+(theta0/4))) # E-step
theta1 = (x2 + y[4])/(x2 + d_sum) # M-step
mre = abs(theta1 - theta0)/(max(1,abs(theta1)))
values[it,] <<-c(format(it, digits = 0), formatC(theta1,digits = 12, format = "f"),
format(x = mre, scientific = TRUE, digits = 2)) # storing/format
if(mre < tolerr){
condition = TRUE
}
theta0 = theta1 # Iterative Step
}
if( IT == TRUE){
print(values[1:it,], quote = FALSE) # print steps/values
}
# Print Rate of Convergence:
if(rate == TRUE){
print(paste('B = 1,',norm(as.numeric(values[it,2])-0.03567466, type = "2")/(
norm(as.numeric(values[it-1,2])-0.03567466, type ="2"))))
print(paste('B = 2,',norm(as.numeric(values[it,2])-0.03567466, type = "2")/(
norm(as.numeric(values[it-1,2])-0.03567466, type ="2")^(2))))
print(paste('B = 3/2,',norm(as.numeric(values[it,2])-0.03567466, type = "2")/(
norm(as.numeric(values[it-1,2])-0.03567466, type ="2")^(3/2))))
}
}
em.gen(.02,10^(-6))
## IT Theta(n) MRE
## 1 0.027793132773 7.8e-03
## 2 0.031742941132 3.9e-03
## 3 0.033721123764 2e-03
## 4 0.034705946241 9.8e-04
## 5 0.035194771667 4.9e-04
## 6 0.035437045211 2.4e-04
## 7 0.035557033560 1.2e-04
## 8 0.035616437341 5.9e-05
## 9 0.035645841640 2.9e-05
## 10 0.035660395186 1.5e-05
## 11 0.035667598091 7.2e-06
## 12 0.035671162906 3.6e-06
## 13 0.035672927162 1.8e-06
## 14 0.035673800302 8.7e-07
The exact solution to this problem is \(\theta^{*} = (-1657 + \sqrt(3728689))/7680\) = \(0.03567466\). Numerically determine whether the EM algorithm is linearly, superlinearly, or quadratically convergent for this problem.
Note for the formula below: - If \(0<C<1\) and B = 1, the algorithm converges linearly.
If \(0<C<1\) and B = 2, the algorithm converges quadratically.
If \(C = 0\) and 1< B <2, the algorithm converges super-linearly.
\[ \lim_{n\to\infty} \frac{\left|\left| e^{n+1}\right|\right|}{\left|| e^{n}\right||} = C \]
where,
\[ e^{n} = x_n - x^{*} \]
For our calculations, we’ll use the true value of theta, the value at the last iterations for \(e^{(n+1)}\) and the prior theta for calculating \(e^{(n)}\). At the point of convergence, the speed of convergence is
em.gen(.02, 10^(-6), rate = TRUE, IT = FALSE)
## [1] "B = 1, 0.496121391611294"
## [1] "B = 2, 286305.69713457"
## [1] "B = 3/2, 376.885103033597"
If we increase of iterations by increasing the accuracy of MRE we get,
em.gen(.02, 10^(-10), rate = TRUE, IT = FALSE)
## [1] "B = 1, 0.978679503991574"
## [1] "B = 2, 224364856.624001"
## [1] "B = 3/2, 14818.2754257679"
We can see here that the true value of B, is most likely 1, which implies the algorithm converges linearly.
####A. Plot a density histogram of the data given, superimposed by a Kernal smoother.
data = read.csv("http://mathfaculty.fullerton.edu/mori/Math534/Homework/Chapter%204/ExJ42.txt", header = TRUE)
attach(data)
data_df = data.frame(data)
\(\mu_1, \mu_2, \mu_3\), and \(\sigma^{2}\).
Provide initial values for\(\alpha, \beta, \mu_1, \mu_2, \mu_3\), and \(\sigma^{2}\)
E Step. Calculate our E^{*}(Z_{ij})
\[ E^{*}(Z_{ij}) = \frac{\pi_{j}f(y_{i}|\mu_j,\Sigma)}{\sum_{k = 1}^{K}\pi_{k}f(y_{i}|\mu_k,\Sigma)} \]
\[ \hat{\mu_k} = \frac{\sum_{n = 1}^{N} E^{*}(Z_{ik})y_i}{\sum_{n = 1}^{N} E^{*}(Z_{ik})} \]
\[ \hat{\sigma} = \frac{\sum_{n = 1}^{N}\sum_{k = 1}^{K} \left[E^{*}(Z_{ik})(y_i-\mu_k)^2 \right] }{\sum_{n = 1}^{N} \sum_{k = 1}^{K}\left[ E^{*}(Z_{ik})\right]} \]
\[ \hat{\alpha} = \frac{\sum_{n = 1}^{N} E^{*}(Z_{i1})}{N} \]
\[ \hat{\beta} = \frac{\sum_{n = 1}^{N} E^{*}(Z_{i2})}{N} \]
\[ \text{Relative error} \approx \left|\left|\frac{ \theta^{(n+1)} - \theta^{(n)}}{\text{max}(1,\text{abs}(\theta^{(n+1)}))}\right|\right| \]
theta0 <- matrix(c(1/3,1/3,-2,2,6,1),6,1) # Theta's approx from histogram
y = data$y
em.gauss <- function(y, theta0){
it = 0
condition = FALSE
res <- list()
while(it < 200 & condition == FALSE){
it = 1 + it
p = c(theta0[1], theta0[2],(1-theta0[1]-theta0[2]))
mu1 = theta0[3]; mu2=theta0[4]; mu3=theta0[5];s2=theta0[6]
f1 = dnorm(y,mean=mu1,sd=sqrt(s2))
f2 = dnorm(y,mean=mu2,sd=sqrt(s2))
f3 = dnorm(y,mean=mu3,sd=sqrt(s2))
N1 = f1*p[1]
N2 = f2*p[2]
N3 = f3*p[3]
D = N1+N2+N3
Post1 = N1/D # Posterior probability of belonging to group 1
Post2 = N2/D # Posterior probability of belonging to group 2
Post3 = N3/D # Posterior probability of belonging to group 3
Post <<- cbind(Post1,Post2,Post3)
theta1 <- numeric(6)
theta1[1:2]=(apply(Post,2,sum)/length(data$y))[1:2]
theta1[3] <- sum(Post1*y)/sum(Post1)
theta1[4] <- sum(Post2*y)/sum(Post2)
theta1[5] <- sum(Post3*y)/sum(Post3)
theta1[6] <- sum(Post1*(y-mu1)*t(y-mu1) + Post2*(y-mu2)*t(y-mu2) +
Post3*(y-mu3)*t(y-mu3))/sum(Post1 + Post2 + Post3)
mre = norm((theta1-theta0)/pmax(1,abs(theta1)), type = "2")
if(mre < 10^(-6)){
condition = TRUE
}
values <- cbind(it, t(theta1),mre)
dimnames(values) <- list("",c("IT", "Alpha","Beta", "Mu1","Mu2","Mu3","Sigma^2", "MRE"))
res[[it]] <- values
theta0 = theta1
}
res
}
a <- em.gauss(y,theta0)
IT | Alpha | Beta | Mu1 | Mu2 | Mu3 | Sigma^2 | MRE |
---|---|---|---|---|---|---|---|
1 | 0.1696706 | 0.4120828 | -1.231056 | 2.142406 | 5.173012 | 1.427437 | 0.7367365 |
2 | 0.1796142 | 0.3690914 | -1.073349 | 2.089254 | 5.006117 | 1.263665 | 0.2051593 |
3 | 0.1902183 | 0.3326898 | -1.011493 | 2.004341 | 4.953239 | 1.141629 | 0.1360620 |
58 | 0.1808103 | 0.2954493 | -1.099085 | 1.680808 | 4.849162 | 1.005054 | 0.0000015 |
59 | 0.1808101 | 0.2954493 | -1.099086 | 1.680807 | 4.849161 | 1.005054 | 0.0000012 |
60 | 0.1808100 | 0.2954494 | -1.099087 | 1.680806 | 4.849161 | 1.005054 | 0.0000010 |
D. Again graph a density histogram of the data, and superimpose the histogram by the three component normal mixture density with parameters obtained as in part h.
x1fit<-seq(-4,4,length=40)
y1fit<-a[[60]][2]*dnorm(x1fit,mean=a[[60]][4],sd=sqrt(a[[60]][7]))
x2fit<-seq(-2,6,length=40)
y2fit<-a[[60]][3]*dnorm(x2fit,mean=a[[60]][5],sd=sqrt(a[[60]][7]))
x3fit<-seq(0,8,length=40)
y3fit<-(1-sum(a[[60]][2:3]))*dnorm(x3fit,mean=a[[60]][6],sd=sqrt(a[[60]][7]))
E. For each case use its corresponding posterior probability to determine to which group it is classified. You will compute probabilites of belonging to group 1, 2, or 3 for each case and then assign the case to the group with highest posterior probability. Then show your result graphically as follows: Draw a graph, with the x-axis indicating case number, and the y-axis including the group numbers 1,2, and 3 respectively on the graph. Do you see a pattern, and if so, what does the pattern indicate about the data?
Note: Post was stored into the global envoirnment.
class = numeric(length(data$y))
for(i in 1:length(data$y)){
class[i] = which(Post[i,] == max(Post[i,]))
}
compare <- cbind(seq(1:300),class)
red <- compare[which(compare[,2]==1)]
green <- compare[which(compare[,2]==2)]
blue <- compare[which(compare[,2]==3)]
Notice that the second distribution, the one with mean = Mu2, has the most missplaced data points because of its overlay with the two distribution beside it. For the first and third distributions, the missplaced points are correctly missplaced on the borders that overlay with the second distribution. The pattern here is that, if the means are close together and have large proportion, such as 2nd and 3rd distributions, your more likely to have missplaced data points.