A scientist studied the effects of a very strong storm in trees in a forest in Minnesota as a function of the tree diameters and the severity of winds in the areas where the trees were located. The data for \(n\) = 659 trees are given at: http://mathfaculty.fullerton.edu/mori/data/math534/blowBF.txt
\[ \begin{aligned} D &= \text{Diamter of the tree.}\\ S &= \text{Severity of the storm at the location of the tree in the scale of 0 to 1, with 1 being most severe.} \\ y &= \text{0 if the tree survived and 1 if the tree is blown down.}\\ \end{aligned} \]
Here, \(y_1 , \dots , y_n\) is a set of \(n\) observations where \(y_i\) ~ \(Bernoulli(\pi_i)\). We are to estimate \(\pi_i = P(y_i = 1)\), the probability that a tree is blown down, as a function of \(D\) and \(S\). In particular we parameterize this probability by the logit funciton:
\[ \pi_i(\beta) = \frac{\exp(x_{i}^{T}\beta)}{1 + \exp(x_{i}^{T}\beta)} \space\space \text{for i} = 1,\dots, n \]
where \(x_{i}^{T}\) = \((x_{i1}, x_{i2}, x_{i3})\), where \(x_{i1} = 1\) is the intercept, \(x_{i2} = log(D)\), and \(x_{i3} = S\), and \(\beta (\beta_1,\beta_2, \beta_3)^{T}\) is a set of 3 parameters to be estimated.
\[ \begin{aligned} \ell(\beta) &= \sum_{i=1}^{n} y_i\log(\pi_i(\beta)) + (1-y_i)\log(1 - \pi_i(\beta)) \\ &= \sum_{i=1}^{n} y_i [x_{i}^{T}\beta - \log(1 + \exp(x_{i}^{T}\beta))] + (1-y_i)\log\left(1-\frac{\exp(x_{i}^{T}\beta)}{1+\exp(x_{i}^{T}\beta)}\right) \end{aligned} \]
\[ d\pi_i(d\beta) = \frac{x_{i}^{T}d\beta e^{(x_{i}^{T}\beta)}}{(1 + e^{(x_{i}^{T}\beta)})^{2}} \]
Second differential of \(\pi_i(\beta)\): \(dd\pi_i(d\beta, d\beta)\) \[ dd\pi_i(d\beta,d\beta) = \frac{x_{i}^{T}d\beta x_{i}^{T}d\beta e^{(x_{i}^{T}\beta)}}{(1 + e^{(x_{i}^{T}\beta)})^{2}} - \frac{2x_{i}^{T}d\beta x_{i}^{T}d\beta e^{(2x_{i}^{T}\beta)}}{(1 + e^{(x_{i}^{T}\beta)})^{3}} \]
\[ \begin{aligned} d\ell(d\beta) &= \sum_{i = 1}^{n} \left[y_i\frac{1}{1+e^{x_{i}^{T}\beta}}x_{i}^{T}d\beta - (1-y_i)\frac{e^{x_{i}^{T}\beta}}{1+e^{x_{i}^{T}\beta}}x_{i}^{T}d\beta\right]\\ &= \sum_{i = 1}^{n} (y_i - \frac{e^{x_{i}^{T}\beta}}{1+e^{x_{i}^{T}\beta}})x_{i}^{T}d\beta\\ &= \sum_{i =1}^{n} (y_i - \pi_i(\beta))x_{i}^{T}d\beta = V d\beta \end{aligned} \] where, \(V = \sum_{i =1}^{n} (y_i - \pi_i(\beta))x_{i}^{T}\) is a 1 x 3 row vector. This implies that the formulas for \(\partial\ell(\beta)/\partial\beta_i\)’s are: \[ \frac{\partial\ell(\beta)}{\partial\beta_i} = V_i \space, \space \text{for i} = 1,2,3 \]
datat <- read.table("http://mathfaculty.fullerton.edu/mori/data/math534/blowBF.txt", header = TRUE)
attach(datat)
n = length(datat$y)
X = cbind(rep(1, n), log(datat$D), datat$S)
Y = matrix(datat$y, nrow = n, ncol = 1)
# likelihood
likemvn <- function(datat, B){
n = length(datat$y)
X = cbind(rep(1, n), log(datat$D), datat$S)
Y = matrix(datat$y, nrow = n, ncol = 1)
sum = 0
for( i in 1:n){
ex <- as.numeric(exp(X[i,]%*%B)) # compute exp(X_iB) to avoid clunkiness
sumnd <- Y[i]*log(ex/(1+ex)) + (1-Y[i])*(log(1-(ex/(1 +ex))))
sum = sum + sumnd
}
l = sum
grad = grad(X,B,n) # computes gradient using function line 27
list(l = l, grad = grad)
}
# gradient
grad = function(X,B,n){
sumg = 0
for(i in 1:n){
ex <- as.numeric(exp(X[i,]%*%B))
sumndg <- (Y[i] - ex/(1+ex))*X[i,]
sumg = sumg + sumndg
}
sumg <- matrix(sumg, 3,1)
return(sumg)
}
# steepest ascent
optmstp <- function(datat, B, maxit){
it = 0
condition = FALSE
res <- list() # creating list to store values
while(it < maxit & condition == FALSE){
like = rep("NA", 22)
grad_norm = rep("NA", 22)
it = it + 1
halve = 0
a = likemvn(datat, B)
like[1] <- formatC(a$l, digits =4, format = "f") # Storing formatted version of likelihood at B_0
grad_norm[1] <- format(norm(a$grad, type = "2"), scientific = TRUE, digits =2) # Storing formatted version of gradient at B_0
B1 = B + a$grad # Steepest Ascent
atmp = likemvn(datat, B1)
# Following prevents the function from stopping if it approaches NaNs
while(is.nan(atmp$l) == TRUE & halve <20){
halve = halve + 1
B1 = B + a$grad/2^halve
atmp = likemvn(datat, B1)
}
like[halve+2] <- formatC(atmp$l, digits =4, format = "f")
grad_norm[halve+2] <- format(norm(atmp$grad, type = "2"), scientific = TRUE, digits = 2)
# Step-halving
while(atmp$l < a$l & halve <20){
halve = halve + 1
B1 = B + a$grad/2^halve
atmp = likemvn(datat, B1)
like[halve+2] <- formatC(atmp$l, digits =4, format = "f")
grad_norm[halve+2] <- format(norm(atmp$grad, type = "2"),scientific = TRUE, digits = 2)
}
mre_norm <- norm((B1-B)/pmax(1,abs(B1)), type = "I") # calcualtes MRE norm for last iteration
if(halve >= 20){
print(paste("Step Halving failed after 20 atempts", it))
condition = TRUE
}
if(mre_norm <= 10^(-4)){
print(paste('Convergence Criteria Met', mre_norm, it))
condition = TRUE
}
values <- cbind(rep(it,length(halve +2)), cbind(c("NA",seq.int(0,halve))),
like[1:(halve +2)], grad_norm[1:(halve +2)],cbind(c(rep("NA", (halve +1)), mre_norm)))
dimnames(values) <- list(rep("", (halve+2)),c("IT", "Halving","Log-Likelihood","||gradient||", "MRE"))
res[[it]] <- values
B = B1
}
print(res[1:5], quote = FALSE) # prints first 5
print(res[(length(res)-2):length(res)], quote = FALSE) # prints last 3
mleB <<- B1
}
B <- matrix(0,3,1)
maxit <- 1200
optmstp(datat, B, maxit)
## [1] "Convergence Criteria Met 8.30780454975805e-05 607"
## [[1]]
## IT Halving Log-Likelihood ||gradient|| MRE
## 1 NA -456.7840 1.7e+02 NA
## 1 0 -104399.5101 6.3e+02 NA
## 1 1 -52199.7551 6.3e+02 NA
## 1 2 -26099.8775 6.3e+02 NA
## 1 3 -13049.9388 6.3e+02 NA
## 1 4 -6524.9694 6.3e+02 NA
## 1 5 -3262.4913 6.3e+02 NA
## 1 6 -1632.9088 6.3e+02 NA
## 1 7 -845.4744 5.7e+02 NA
## 1 8 -533.3509 3.7e+02 NA
## 1 9 -451.1653 1.4e+02 0.274532877479257
##
## [[2]]
## IT Halving Log-Likelihood ||gradient|| MRE
## 2 NA -451.1653 1.4e+02 NA
## 2 0 NA NA NA
## 2 1 NA NA NA
## 2 2 NA NA NA
## 2 3 NA NA NA
## 2 4 -8034.4810 9.7e+02 NA
## 2 5 -3858.5437 9.7e+02 NA
## 2 6 -1779.9833 9.5e+02 NA
## 2 7 -826.6247 7.5e+02 NA
## 2 8 -509.8254 3.8e+02 NA
## 2 9 -444.1664 1.2e+02 0.264789493142006
##
## [[3]]
## IT Halving Log-Likelihood ||gradient|| MRE
## 3 NA -444.1664 1.2e+02 NA
## 3 0 -71781.6175 6.3e+02 NA
## 3 1 -35905.0973 6.3e+02 NA
## 3 2 -17966.8372 6.3e+02 NA
## 3 3 -8997.7071 6.3e+02 NA
## 3 4 -4513.1422 6.3e+02 NA
## 3 5 -2271.0244 6.3e+02 NA
## 3 6 -1158.2708 6.2e+02 NA
## 3 7 -653.5970 4.9e+02 NA
## 3 8 -480.2722 2.8e+02 NA
## 3 9 -440.1489 1e+02 0.184019118125658
##
## [[4]]
## IT Halving Log-Likelihood ||gradient|| MRE
## 4 NA -440.1489 1e+02 NA
## 4 0 NA NA NA
## 4 1 NA NA NA
## 4 2 NA NA NA
## 4 3 NA NA NA
## 4 4 -5545.0425 9.7e+02 NA
## 4 5 -2635.5846 9.6e+02 NA
## 4 6 -1214.8275 8.9e+02 NA
## 4 7 -632.0816 6e+02 NA
## 4 8 -467.2475 2.7e+02 NA
## 4 9 -435.7997 9e+01 0.190815471290533
##
## [[5]]
## IT Halving Log-Likelihood ||gradient|| MRE
## 5 NA -435.7997 9e+01 NA
## 5 0 -51165.8741 6.3e+02 NA
## 5 1 -25605.5854 6.3e+02 NA
## 5 2 -12825.4411 6.3e+02 NA
## 5 3 -6435.3689 6.3e+02 NA
## 5 4 -3240.3379 6.3e+02 NA
## 5 5 -1644.1610 6.3e+02 NA
## 5 6 -868.5748 5.8e+02 NA
## 5 7 -548.9491 4.1e+02 NA
## 5 8 -452.6562 2.1e+02 NA
## 5 9 -432.5543 7.9e+01 0.126895198259121
##
## [[1]]
## IT Halving Log-Likelihood ||gradient|| MRE
## 605 NA -281.9534 9.7e-02 NA
## 605 0 -282.2357 1.8e+01 NA
## 605 1 -282.0219 9e+00 NA
## 605 2 -281.9694 4.5e+00 NA
## 605 3 -281.9568 2.2e+00 NA
## 605 4 -281.9540 1.1e+00 NA
## 605 5 -281.9534 5.4e-01 0.000304229562372205
##
## [[2]]
## IT Halving Log-Likelihood ||gradient|| MRE
## 606 NA -281.9534 5.4e-01 NA
## 606 0 -363.5547 3e+02 NA
## 606 1 -302.3803 1.5e+02 NA
## 606 2 -287.0193 7.6e+01 NA
## 606 3 -283.1999 3.8e+01 NA
## 606 4 -282.2556 1.9e+01 NA
## 606 5 -282.0243 9e+00 NA
## 606 6 -281.9688 4.3e+00 NA
## 606 7 -281.9561 1.9e+00 NA
## 606 8 -281.9535 6.7e-01 NA
## 606 9 -281.9531 1.1e-01 0.000314638923434959
##
## [[3]]
## IT Halving Log-Likelihood ||gradient|| MRE
## 607 NA -281.9531 1.1e-01 NA
## 607 0 -283.0717 3.6e+01 NA
## 607 1 -282.2311 1.8e+01 NA
## 607 2 -282.0213 8.9e+00 NA
## 607 3 -281.9695 4.4e+00 NA
## 607 4 -281.9569 2.2e+00 NA
## 607 5 -281.9539 1.1e+00 NA
## 607 6 -281.9532 5.1e-01 NA
## 607 7 -281.9531 2.3e-01 8.30780454975805e-05
MLE | |
---|---|
b0 | -9.508914 |
b1 | 3.177876 |
b2 | 4.486720 |
\[ d\ell(d\beta,d\beta) = \sum_{i =1}^{n} -x_{i}^{T}d\beta\left(\frac{e^{x_{i}^{T}\beta}}{(1+e^{x_{i}^{T}\beta})^2}\right)x_{i}^{T}d\beta \]
Let \(W=\frac{e^{x_{i}^{T}\beta}}{(1+e^{x_{i}^{T}\beta})^2}\), where W is a scalar. Then a\(-E[dd\ell(d\beta,d\beta)]\) is,
\[ - E\left[\frac{\partial^{2}\ell}{\partial \beta_{k}\partial \beta_{l}}\right] = \sum_{i =1}^{n} x_{i}^{T}d\beta W x_{i}^{T}d\beta \]
Furthermore For \(k = l\),
\[ \frac{\partial^{2}\ell}{\partial \beta_{k}\partial \beta_{l}} = \sum_{i =1}^{n} -x_{ik}^{T} W x_{ik}^{T} \]
For \(k \neq l\),
\[ \frac{\partial^{2}\ell}{\partial \beta_{k}\partial \beta_{l}} = \sum_{i =1}^{n} -x_{ik}^{T} W x_{il}^{T} \] In general, this implies that the Fisher Information Matrix elements are: For \(k \neq l\) and $ k = l$, \[ - E\left[\frac{\partial^{2}\ell}{\partial \beta_{k}\partial \beta_{l}}\right] = \sum_{i =1}^{n} x_{ik}^{T} W x_{il}^{T} \] We’ll only use this formula to compute our fisher information matrix because it is agreeable with both cases \(k \neq l\) and \(k = l\).
# Fisher Information Matrix
fish <- function(datat, B){
n = length(datat$y)
X = cbind(rep(1, n), log(datat$D), datat$S)
Y = matrix(datat$y, nrow = n, ncol = 1)
f_mat <- matrix(0,3,3)
B <- matrix(0, 3,1)
for(k in 1:3){
for(l in 1:3){
for(i in 1:n){
ex <- as.numeric(exp(X[i,]%*%B))
sumnd <- X[i,k]*(ex/(1+ex))*X[i,l]
f_mat[k,l] <- f_mat[k,l] + sumnd
}
}
}
return(f_mat)
}
# Fisher-Scoring
optmfish <- function(datat, B, maxit){
it = 0
condition = FALSE
res <- list()
while(it < maxit & condition == FALSE){
like = rep("NA", 22)
grad_norm = rep("NA", 22)
it = it + 1
halve = 0
a = likemvn(datat, B)
like[1] <- formatC(a$l, digits =4, format = "f")
grad_norm[1] <- format(norm(a$grad, type = "2"), scientific = TRUE, digits =2) # Storing formatted version of gradient at B_0
# Fisher Info
I = fish(datat,B)
invI = solve(I)
B1 = B + invI%*%a$grad
atmp = likemvn(datat, B1)
while(is.nan(atmp$l) == TRUE & halve <20){
halve = halve + 1
B1 = B + invI%*%a$grad/2^halve # Fisher Scoring
atmp = likemvn(datat, B1)
}
like[halve+2] <- formatC(atmp$l, digits =4, format = "f")
grad_norm[halve+2] <- format(norm(atmp$grad, type = "2"), scientific = TRUE, digits = 2)
while(atmp$l < a$l & halve <20){
halve = halve + 1
B1 = B + Inv%*%a$grad/2^halve
atmp = likemvn(datat, B1)
like[halve+2] <- formatC(atmp$l, digits =4, format = "f")
grad_norm[halve+2] <- format(norm(atmp$grad, type = "2"),scientific = TRUE, digits = 2)
}
mre_norm <- norm((B1-B)/pmax(1,abs(B1)), type = "I")
if(halve >= 20){
print(paste("Step Halving failed after 20 atempts", it))
condition = TRUE
}
if(mre_norm <= 10^(-6)){
print(paste('Convergence Criteria Met', mre_norm, it))
condition = TRUE
}
values <- cbind(rep(it,length(halve +2)), cbind(c("NA",seq.int(0,halve))),
like[1:(halve +2)], grad_norm[1:(halve +2)],cbind(c(rep("NA", (halve +1)), mre_norm)))
dimnames(values) <- list(rep("", (halve+2)),c("IT", "Halving","Log-Likelihood","||gradient||", "MRE"))
res[[it]] <- values
B = B1
}
print(res[1:5], quote = FALSE)
print(res[(length(res)-2):length(res)], quote = FALSE)
mleB <<- B1
}
B <- matrix(0,3,1)
maxit <- 1200
optmfish(datat,B,maxit)
## [1] "Convergence Criteria Met 9.74061166215013e-07 78"
## [[1]]
## IT Halving Log-Likelihood ||gradient|| MRE
## 1 NA -456.7840 1.7e+02 NA
## 1 0 -347.7468 9e+01 1
##
## [[2]]
## IT Halving Log-Likelihood ||gradient|| MRE
## 2 NA -347.7468 9e+01 NA
## 2 0 -314.6320 5.5e+01 0.348326346654852
##
## [[3]]
## IT Halving Log-Likelihood ||gradient|| MRE
## 3 NA -314.6320 5.5e+01 NA
## 3 0 -300.6367 3.7e+01 0.182343471553454
##
## [[4]]
## IT Halving Log-Likelihood ||gradient|| MRE
## 4 NA -300.6367 3.7e+01 NA
## 4 0 -293.4913 2.7e+01 0.114705559049553
##
## [[5]]
## IT Halving Log-Likelihood ||gradient|| MRE
## 5 NA -293.4913 2.7e+01 NA
## 5 0 -289.4290 2e+01 0.0794840532645471
##
## [[1]]
## IT Halving Log-Likelihood ||gradient|| MRE
## 76 NA -281.9505 4.3e-04 NA
## 76 0 -281.9505 3.8e-04 1.29565165419555e-06
##
## [[2]]
## IT Halving Log-Likelihood ||gradient|| MRE
## 77 NA -281.9505 3.8e-04 NA
## 77 0 -281.9505 3.3e-04 1.12340704889714e-06
##
## [[3]]
## IT Halving Log-Likelihood ||gradient|| MRE
## 78 NA -281.9505 3.3e-04 NA
## 78 0 -281.9505 2.8e-04 9.74061166215013e-07
MLE | |
---|---|
b0 | -9.562026 |
b1 | 3.197544 |
b2 | 4.508565 |
s = seq(0,1, length.out = 1000) # range for wind severity
p = rep(NA, 1000) # vector for probabilities
for(i in 1:length(s)){
x = matrix(c(1,log(10),s[i]),1,3) # severity is the only variable ranging from 0 to 1
ex <- exp(x%*%mleB) # mleB is obtained from fisher-scoring
p[i] = (ex/(1+ex)) # This is P(y_i = 1),
}
plot(s,p, type = 'l', xlab = "Wind Severity", ylab = "Probability Tree is Blown Down")
The plot shows an almost linear relationship between severity of wind and and the probability of a 10-diameter tree being blown down, as is logically expected. Further evidence of this logic could be seen if you take the mean value of y for trees with diameter of 10 and a severity greater than .7 from our given observations. This analysis produces results similar to those seen on the plot.
d10 <- datat[which(datat$D == 10),]
pred = mean(d10[which(d10$S>=.7),]$y)
pred = matrix(pred, 1,1, dimnames = list(c('Probabilty 10 Diameter Tree Falls'), c("Prediction")))
pred%>%
kable()%>%
kable_styling(full_width = F)
Prediction | |
---|---|
Probabilty 10 Diameter Tree Falls | 0.8333333 |
d = seq(5,30)
p = rep(NA, length(d)) # range of tree diameter
for(i in 1:length(d)){
x = matrix(c(1,log(d[i]),.4),1,3) # diameter is the only variable everything is is constant
ex <- exp(x%*%mleB) # mle B is obtained from fisher information
p[i] = (ex/(1+ex)) # This is P(y_i =1)
}
plot(d,p, type = 'l', xlab = "Tree Diameter", ylab = "Probability Tree is Blown Down")
The relationship between Tree Diameter and Probability of Being Blown Down is much more aggressive than the one found in part (h). Once the diameter of a tree reaches 10, its chances of being blown down increase at a much higher rate. This is logically sound, as wider trees have more surface area, which allows the wind to damage the tree at a higher rate. Efficacy of this graph in explaning the data could also be demonstrated by calculating the average Tree Diameter of a blown down tree around .40 severity from our data, which coincides with 50th percentile tree diameter presented displayed by the plot(diameter = 12).
s40 <- datat[which(datat$S>= .39 & datat$S<=.41),]
pred = mean(s40[which(s40$y ==1),]$D)
pred = matrix(pred, 1,1, dimnames = list(c('Avg. Diameter Fallen Tree for Severity .40'), c("Prediction")))
pred%>%
kable()%>%
kable_styling(full_width = F)
Prediction | |
---|---|
Avg. Diameter Fallen Tree for Severity .40 | 11.4 |
d = seq(5,30)
s = seq(0,1,length.out = length(d))
prob <- function(d,s){
len = length(d)
r = numeric(length(len))
mleB <- c(-9.562026, 3.197544, 4.508565)
for(i in 1:len){
x = matrix(c(1,log(d[i]),s[i]),1,3)
ex <- exp(x%*%mleB)
r[i] <- ex/(1+ex)
}
r
}
threed <- outer(d,s,prob)
library(plot3D)
persp3D(x = s, y = d , z = threed, col = "red", alpha = .5, xlab = "Severity", ylab = "Diameter", zlab = "Probability Y = 1", main = "")
contour(x = s,y = d,z = threed)
Although the 3D provides a unique visual of the effects of Diameter and Severity to the probability of a tree being blown down, the constant value contours of this graph display their relationship more precisely. As was demonstrated in the previous single varaiable plots, the change in diameter appears to have a more prominante effect on the probability of y = 1 than a change in severity, for larger values of D. That is, the rate the probability is effected by changes in severity is significantly larger for larger values of Diameter. The probability a 10 diameter tree will be blown down in .8 severity winds is equal to the probability that a 27 diameter tree will be blown down in .2 severity winds, which is .8. Again, as larger trees have more surface area, changes in severity have exponentially larger effects on their survival probabilites.