Problem 1

a)

Given that \(Y_{ij} \sim N(\alpha_{i} + \beta_{i}(x_{j}-\bar{x}), \sigma^{2})\), the loglikelihood could be written as follows:

\[ L(\mathbf{\alpha}, \mathbf{\beta}, \sigma^{2}| \mathbf{y}) = -\frac{150}{2}\left[log(2\pi) + log(\sigma^{2})\right] - \frac{1}{2\sigma^2}\sum_{i = 1}^{30} \sum_{j = 1}^{5} \left(y_{ij} - \alpha_{i} - \beta_{i}(x_j - \bar{x}) \right)^{2} \]

Now we’ll find the maximum likelihood estiamtes for \(\alpha_{i}, \beta_{i}\), and \(\sigma^{2}\) by taking the partial derivatives and equating them to 0.

MLE for \(\alpha_{i}\):

\[ \begin{aligned} \frac{\partial l}{\partial \alpha_{i}} &= \frac{2}{2\sigma^{2}}\sum_{j=1}^{5} \left(y_{ij} - \alpha_{i} - \beta_{i}(x_j - \bar{x}) \right)=0 \\ &= -5\alpha_{i} + \sum_{j=1}^{5}y_{ij} - \beta_{i}\sum_{j=1}^{5}(x_{j}-\bar{x}) =0 \end{aligned} \]

Note that, \(\sum_{j=1}^{5}(x_{j}-\bar{x}) = 0\). Therefore, MLE of \(\alpha_{i}\):

\[ \frac{\partial l}{\partial \alpha_{i}} = -5\alpha_{i} + \sum_{j=1}^{5}y_{ij} =0 \] \[ \vdots\\ \] \[ \alpha_{i} = \frac{1}{5}\sum_{j = 1}^{5}y_{ij} \]

MLE for \(\beta_{i}\):

\[ \begin{aligned} \frac{\partial l}{\partial \beta_{i}} &= \frac{2}{2\sigma^{2}}\sum_{j=1}^{5} \left(y_{ij} - \alpha_{i} - \beta_{i}(x_j - \bar{x}) \right)(x_{j} - \bar{x}) = 0\\ &= \sum_{j = 1}^{5}y_{ij}(x_j -\bar{x}) - \alpha_{i}\sum_{j = 1}^{5}(x_{j}-\bar{x}) - \beta_{i}\sum_{j = 1}^{5}(x_{j}-\bar{x})^2 = 0 \end{aligned} \]

Note: \(\alpha_{i}\sum_{j = 1}^{5}(x_{j}-\bar{x}) = 0\), this implies

\[ \frac{\partial l}{\partial \beta_{i}} = \sum_{j = 1}^{5}y_{ij}(x_j -\bar{x}) - \beta_{i}\sum_{j = 1}^{5}(x_{j}-\bar{x})^2 = 0 \]

\[ \vdots \\ \]

\[ \beta_{i}= \frac{\sum_{j = 1}^{5}y_{ij}(x_j -\bar{x})}{\sum_{j = 1}^{5}(x_{j}-\bar{x})^2} \]

MLE for \(\sigma^{2}\),

\[ \begin{aligned} \frac{\partial l}{\partial \sigma^{2}} &= \frac{-150}{2\sigma^2} + \frac{1}{2(\sigma^{2})^2}\sum_{i = 1}^{30} \sum_{j = 1}^{5} (y_{ij} - \alpha_{i} - \beta_{i}(x_{j} - \bar{x}))^{2} = 0\\ &= -150\sigma^{2} + \sum_{i = 1}^{30} \sum_{j = 1}^{5} (y_{ij} - \alpha_{i} - \beta_{i}(x_{j} - \bar{x}))^{2} = 0 \\ \end{aligned} \]

\[ \vdots\\ \]

\[ \sigma^{2} = \frac{1}{150} \sum_{i = 1}^{30} \sum_{j = 1}^{5} (y_{ij} - \alpha_{i} - \beta_{i}(x_{j} - \bar{x}))^{2} \]

data = read.csv("/Users/josegonzalez/Downloads/rats_data.csv", header = TRUE)
y = as.matrix(data[,-1])
y1 y2 y3 y4 y5
151 199 246 283 320
145 199 249 293 354
147 214 263 312 328
155 200 237 272 297
135 188 230 280 323
159 210 252 298 331
alpha = numeric(30);beta = numeric(30);

xj_xb = c(8,15,22,29,36)-c(22,22,22,22,22) # x_{j} - xbar
alpha = rowSums(y)/5 # MLE for alpha_i's
beta = y%*%xj_xb/sum(xj_xb^2) #MLE for beta_i's

mle_ab = cbind(alpha,beta); colnames(mle_ab) = c("MLE Alpha_i", "MLE Beta_i") 
bxj = c()

for(i in 1:length(xj_xb)){
  temp = xj_xb[i]*beta
  bxj = cbind(bxj,temp)
}
sig = sum(rowSums(((y- alpha) - bxj)^2))/(150); names(sig) = c("MLE of Sigma^2") # MLE for sigma^{2}

The final maximum likelihood estimates for \(\alpha_{i}, \beta_{i}, \sigma^{2}\) are printed below:

MLE.Alpha_i MLE.Beta_i
1 239.8 6.028571
2 248.0 7.314286
3 252.8 6.571429
4 232.2 5.085714
5 231.2 6.685714
6 250.0 6.171429
7 228.2 5.914286
8 248.6 6.485714
9 284.8 7.314286
10 218.4 5.742857
11 258.8 6.985714
12 227.6 6.100000
13 242.4 6.157143
14 269.2 6.842857
15 242.8 5.185714
16 245.4 5.842857
17 231.8 6.300000
18 240.4 5.742857
19 254.2 6.471429
20 241.6 6.014286
21 248.8 6.471429
22 224.6 5.757143
23 228.0 5.614286
24 245.2 5.800000
25 234.2 7.128571
26 254.4 6.657143
27 254.8 5.814286
28 243.0 5.742857
29 217.0 5.514286
30 241.4 6.114286
MLE of Sigma^2
21.70533

b)

i. Gibbs-Sampling Conditional Distributions

The goal is to generate values from the posterior distribution of \((\alpha_{1},\dots,\alpha_{30},\beta_{1},...,\beta{30,\tau, \alpha,\beta})\) through Gibbs Sampling by using the conditional distributions of these parameters.

The posterior distribution, in question, is proportional to the joint distribution of \((\alpha_{1},\dots,\alpha_{30},\beta_{1},...,\beta{30,\tau, \alpha,\beta})\), from which we could derive the required conditional distributions. Now that we know how the parameters above are distributed, and we know the likelihood, we could write the joint distribution as follows,

\[ \prod_{i = 1}^{30}\prod_{j = 1}^{5} \left[f(\mathbf{y_{ij}}|\alpha_{i},\beta_{i},\sigma^{2}) \right]\prod_{i = 1}^{30} \left[ f(\alpha_{i}|\alpha) f(\beta_{i}|\beta) \right] f(\alpha)f(\beta)f(\tau) \]

Conditional of \(\tau\) :

\[ f(\tau|\mathbf{y_{ij},\alpha_{i},\beta_{i}}, \alpha, \beta) \propto (\frac{1}{\sigma^2})^{10^{-3} + 75 -1} \exp \left( -\frac{1}{\sigma^2} (10^{-3} + \sum_{i = 1}^{30} \sum_{j = 1}^{5} (y_{ij} - (\alpha_{i} + \beta_{i}(x_{j}-\bar{x}) ))^2\right) \\ \] The above is a gamma kernel which implies the following for \(\tau\),

\[ \tau \sim gamma(10^{-3} + 75, (10^{-3} + \sum_{i = 1}^{30} \sum_{j = 1}^{5} (y_{ij} - (\alpha_{i} + \beta_{i}(x_{j}-\bar{x}) ))^2) \]

Conditional for \(\beta\),

\[ f(\beta | \mathbf{y_{ij},\alpha_{i},\beta_{i}}, \alpha, \tau) \propto e^{-\frac{30\beta^2 -2\beta \sum_{i = 1}^{30}\beta_i}{2}}e^{-\frac{\beta^{2}}{2000}} \]

Simplifying and completing the square yields,

\[ f(\beta | \mathbf{y_{ij},\alpha_{i},\beta_{i}}, \alpha, \tau) \propto e^{-\frac{30000 + 1}{2 \cdot 1000}(\beta - \frac{1000\sum_{i = 1}^{30}\beta_{i}}{30000 + 1})^2} \]

This is normal kernel, which implies the following for \(\beta\),

\[ \beta \sim N\left(\frac{1000\sum_{i = 1}^{30}\beta_{i}}{30000 + 1}, \frac{1000}{(30000 + 1)}\right), \]

Conditional for \(\alpha\):

\[ f(\alpha|\mathbf{y_{ij},\alpha_{i},\beta_{i}}, \beta, \tau) \propto e^{-\frac{\left(30\alpha^2 -2\alpha \sum_{i = 1}^{30}\alpha_i)\right)}{2\cdot 100}}e^{-\frac{\alpha^{2}}{2000}} \]

Simplifying and completing the square yields the following,

\[ f(\alpha|\mathbf{y_{ij},\alpha_{i},\beta_{i}}, \beta, \tau) \propto e^{\frac{300 + 1}{2\cdot 1000}\left(\alpha - \frac{10\sum_{i = 1}^{30}\alpha_{i}}{300 + 1}\right)^2} \]

This implies the following for \(\alpha\),

\[ \alpha \sim N\left( \frac{10\sum_{i = 1}^{30}\alpha_{i}}{300 + 1},\frac{1000}{(300+1)}\right) \]

The conditional for \(\beta_{i}\), for \(i = 1, \dots,30\),

\[ f(\beta_{i}|\mathbf{y_{ij},\alpha_{i},\beta_{-i}}, \alpha,\beta, \tau) \propto e^{-\frac{(\beta^{2}\sum_{j = 1}^{5}(x_{j} - \bar{x})^{2} -2\beta_{i}\sum_{j = 1}^{5} y_{ij}(x_{j}-\bar{x}))}{2\sigma^2}}e^{-\frac{(\beta_{i}^2 - 2\beta_{i}\beta)}{2}} \]

Simplifying and completing the square yields the following,

\[ f(\beta_{i}|\mathbf{y_{ij},\alpha_{i},\beta_{-i}}, \alpha,\beta, \tau) \propto e^{\frac{\sum_{j =1}^{5}(x_{j}-\bar{x})^2}{2\sigma^2}\left(\beta_{i} - \frac{\sum_{j = 1}^{5}y_{ij}(x_{j}-\bar{x}) + \beta\sigma^2}{\sum_{j = 1}^{5}(x_{j}-\bar{x})^2 + \sigma^2}\right)^2} \]

This implies the following for \(\beta_{i}\),

\[ \beta_{i} \sim N\left( \frac{\sum_{j = 1}^{5}y_{ij}(x_{j}-\bar{x}) + \beta\sigma^2}{\sum_{j = 1}^{5}(x_{j}-\bar{x})^2 + \sigma^2}, \frac{\sigma^2}{\sum_{j = 1}^{5}(x_{j} - \bar{x})^2 + \sigma^2}\right) \]

The conditional for \(\alpha_{i}\), for \(i = 1,\dots, 30\),

\[ f(\alpha_{i}|\mathbf{y_{ij},\alpha_{-i},\beta_{i}}, \alpha,\beta, \tau) \propto e^{-\frac{(5\alpha_{i}^2 - 2\alpha_{i}\sum_{j = 1}^{5} y_{ij})}{2\sigma^2}}e^{-\frac{\alpha_{i}^{2} -2\alpha_{i}\alpha}{200}} \]

Simplifying and completing the square yields the following,

\[ f(\alpha_{i}|\mathbf{y_{ij},\alpha_{-i},\beta_{i}}, \alpha,\beta, \tau) \propto e^{-\frac{500 + \sigma^2}{2\sigma^2 \cdot 100}\left(\alpha_{i} - \frac{100\sum_{j = 1}^{5} y_{ij} + \alpha\sigma^2}{500 + \sigma^2}\right)^2} \]

This implies the following for \(\alpha_{i}\)

\[ \alpha_{i} \sim N\left(\frac{100\sum_{j = 1}^{5} y_{ij} + \alpha\sigma^2}{500 + \sigma^2}, \frac{100\sigma^2}{500 + \sigma^2}\right) \]

ii. Gibbs-Sampling Algorithm for problem 1.
  1. Set t = 0
  2. Set \(\alpha^{(t)}\), \(\beta^{(t)}\), and \(\tau^{(t)}\) equal to their respective means of their distributions. In this case, set \(\alpha^{(t)} =0\), \(\beta^{(t)}=0\), and \(\tau^{(t)}=1\). Furthermore, set \((\sigma^2)^{(t)}\) = \(\frac{1}{\tau^{(t)}}\)
  3. Given \(\alpha^{(t)}\), \(\beta^{(t)}\), and \(\tau^{(t)}\), NOTE: \((\sigma^2)^{(t)} = \frac{1}{\tau^{(t)}}\), draw \(\alpha_{i}^{(t)}\) and \(\beta_{i}^{(t)}\) using the following conditional distributions:

\[ \alpha_{i}^{(t)} \sim N\left(\frac{100\sum_{j = 1}^{5} y_{ij} + \alpha^{(t)}(\sigma^2)^{(t)}}{500 + (\sigma^2)^{(t)}}, \frac{100(\sigma^2)^{(t)}}{500 + (\sigma^2)^{(t)}}\right) \\ \]

\[ \beta_{i}^{(t)} \sim N\left( \frac{\sum_{j = 1}^{5}y_{ij}(x_{j}-\bar{x}) + \beta^{(t)}(\sigma^2)^{(t)}}{\sum_{j = 1}^{5}(x_{j}-\bar{x})^2 + (\sigma^2)^{(t)}}, \frac{(\sigma^2)^{(t)}}{\sum_{j = 1}^{5}(x_{j} - \bar{x})^2 + (\sigma^2)^{(t)}}\right) \]

for \(i = 1,\dots, 30\) 3. Now given \(\alpha_{i}^{(t)}\) and \(\beta_{i}^{(t)}\), for \(i = 1,\dots,30\), draw \(\alpha^{(t+1)}\), \(\beta^{(t+1)}\), \(\tau^{(t+1)}\) using the following conditional distributions,

\[ \alpha^{(t+1)} \sim N\left( \frac{10\sum_{i = 1}^{30}\alpha_{i}^{(t)}}{300 + 1},\frac{1000}{(300+1)}\right) \\ \]

\[ \beta^{(t+1)} \sim N\left(\frac{1000\sum_{i = 1}^{30}\beta_{i}^{(t)}}{30000 + 1}, \frac{1000}{(30000 + 1)}\right), \\ \]

\[ \tau^{(t+1)} \sim gamma(10^{-3} + 75, (10^{-3} + \sum_{i = 1}^{30} \sum_{j = 1}^{5} (y_{ij} - (\alpha_{i}^{(t)} + \beta_{i}^{(t)}(x_{j}-\bar{x}) ))^2) \]

  1. Let \((\sigma^2)^{(t + 1)} = \frac{1}{\tau^{(t+1)}}\), increment t by 1, and go to step 2.
iii.
gibbs = function(nsim,burn){
gibb = matrix(NA,nrow = nsim, ncol = 63)
colnames(gibb) = c(rep("",60), "alpha", "beta", "tau")
for(i in 1:30){
  colnames(gibb)[i] = paste("a",i,sep = "")
  colnames(gibb)[30+i] = paste("b",i,sep= "")
}
# gibb[,1:30] = alpha_i's
# gibb[,31:60] = beta_i's
# gibb[,61:63] = alpha, beta, and tau respectively 
# starting values
gibb[1,61] = 0 # alpha expected value of N(0,10000)
gibb[1,62] = 0 # beta expected value of N(0,10000)
gibb[1,63] = 1 # set tau as expected values of gamma(10^-3, 10^-3)

# set values equal to variables for computation:
sig2 = 1/gibb[1,63]; alpha = gibb[1,61]; beta = gibb[1,62]

# Calculate first set of alpha_i's and beta_i's
gibb[1,c(1:30)] = rnorm(30, (100*rowSums(y) +  sig2*alpha)/(500 + sig2),
                        sqrt((sig2*100)/(500+sig2))) # alpha_i
gibb[1,c(31:60)] = rnorm(30,(y%*%xj_xb + sig2*beta)/(sum(xj_xb^2) + sig2),
                         sqrt((sig2)/(sum(xj_xb^2)+sig2))) #beta_i
# we have all initial values now we must generate next row priors based on 
# previous row alpha_i beta_i
for(i in 2:nsim){
  gibb[i,61] = rnorm(1, 10*sum(gibb[i-1,1:30])/(300+1), sqrt(1000/(300 + 1))) #alpha
  gibb[i,62] = rnorm(1, (1000*sum(gibb[i-1,31:60]))/(30000 + 1), sqrt(1000/(30000 + 1))) #beta
  # computation to avoid longer for loop in sig_sum computation
  bxj = c()
  for(w in 1:length(xj_xb)){
    temp = xj_xb[w]*gibb[i-1,31:60]
    bxj = cbind(bxj,temp)
  }
  sig_sum = sum(rowSums(((y- gibb[i-1,1:30]) - bxj)^2))/2
  gibb[i,63] = rgamma(1, 10^{-3} + 75, 10^{-3} + sig_sum) #tau
  
  sig2 = 1/gibb[i,63]; alpha = gibb[i,61]; beta = gibb[i,62] # reassign values
  
  
  gibb[i,c(1:30)] = rnorm(30, (100*rowSums(y) +  sig2*alpha)/(500 + sig2),
                    sqrt((sig2*100)/(500+sig2))) # alpha_i
  
  gibb[i,c(31:60)] = rnorm(30,(y%*%xj_xb + sig2*beta)/(sum(xj_xb^2) + sig2),
                         sqrt((sig2)/(sum(xj_xb^2)+sig2))) #beta_i

}
gibb[-c(1:burn),]
}
iv.
nsim = 50000
burn = 10000
gs.rat = gibbs(nsim,burn)

The mixing of, \(\alpha_{1}, \beta_{1}, \alpha_{15}, \beta_{15}\), and \(\tau\), as could be seen above, look ideal. None of the traces above, show a parameter getting stuck in a particular area of the parameter space. That is none, of the parameters had spurts of concentration in one particular area of the parameter space, only to jump to a very different area of the space. Although, \(\tau\) may be very close to look as a bad mix, as it appears to have holes near the 22,000, 27,000, 44,000 iteration marks, where it appears that \(\tau\) may have gotten stuck. However, the mixing looks good overall.

v.

The following are point estimates for \(\alpha_{1}, \beta_{1}, \alpha_{15}, \beta_{15},\) and \(\tau\).

sum_df = data.frame("Mean" = apply(gs.rat[,c("a1","b1","a15","b15","tau")], 2,mean)) # point estimates
rownames(sum_df) = c("Alpha 1", "Beta 1", "Alpha 15", "Beta 15", "Tau")
kable(t(sum_df))%>%
  kable_styling(full_width = F)
Alpha 1 Beta 1 Alpha 15 Beta 15 Tau
Mean 239.9342 6.03862 242.7326 5.256386 0.0274962

c)

interval <- function(x){
c(quantile(x,.025),"mean" =  mean(x), quantile(x,.975))
}
quant = t(data.frame(apply(gs.rat[,c("a1","b1")],2,interval)))
kable(quant)%>%
  kable_styling(full_width = F)
2.5% mean 97.5%
a1 234.812964 239.93422 245.060202
b1 5.517144 6.03862 6.559514

Given that the weight of rat i at time j is distributed, \(Y_{ij} \sim N(\alpha_{i} + \beta_{i}(x_{j}-\bar{x}), \sigma^{2})\), then our confidence intervals for \(\alpha_{1}\) and \(\beta_{1}\), show that \(Y_{ij}\) could have a negative mean which does not make sense in the context of the problem, negative weight is not possible.

d)

Examing the posterior distribution of \(\tau\) by overlaying the prior distribution shows that the posterior distribuiton is much different. If the variables were independent, than the posterior distribution would be equivalent to the the product of the distribution of the independent variables, and hence, the prior for \(\tau\) and its posterior should have been more similar.

Problem 2

a) Standardized Weights Importance Sampling Algorithm

  1. Randomly generate 20,000 numbers from a standard normal distribution i.e. \(x_{i} \sim N(0,1)\), for \(i = 1,\dots,20000\).

  2. Given that \(q(x) = e^{-\frac{|x^{3}|}{3}}\), \(g(x) = \frac{1}{\sqrt{2\pi}} e^{-\frac{x^2}{2}}\), note were drawing from g(x), calculate the importance ratio \(w^{*}(x_i)\) = \(\frac{f(x_{i})}{g(x_{i})}\), for \(i = 1, \dots, 20000\)

  3. For \(h(x) = x^2\), compute \(h(x_{i})\) for \(i = 1, \dots, 20000\)

  4. Compute the following estimation:

\[ \hat{\theta}_{IS} = \frac{\sum_{i = 1}^{20000} h(x_{i})\cdot w^{*}(x_i)}{\sum_{i =1}^{20000}w^{*}(x_i)} \]

b)

n = 20000
x.draws = rnorm(n,0,1)

f = numeric(n)
f = exp(-abs(x.draws^3)/3)

g = numeric(n)
g = dnorm(x.draws,0,1)

h = x.draws^2
w = f/g
IS_W = sum(h*w)/sum(w)

The Estimate for \(\theta\) is : 0.77576

Problem 3

a) Metropolis-Hastings Algorithm for Bivariate Normal

  1. Set t = 0

  2. Start with an initial values \(\mathbf{x^{(0)}}\), where \(\mathbf{x}\) = \((x_{1},x_{2})\).

  3. Let \(\mathbf{x^{*}} = \mathbf{x^{(t)}}\) and set i = 1

  4. Generate \(x_{i}^{*} \sim N(x_{i}^{t}, \sigma^{2})\)

  5. Replace the ith element of \(\mathbf{x^{*}}\) with \(x_{i}^{*}\)

  6. Compute \(R = \frac{f(\mathbf{x^{*}})}{f(\mathbf{x^{t}})}\), where f is the pdf of the bivariate normal distribution with \(\mathbf{\mu}\) and covariance matrix, \(\mathbf{\Sigma}\), stated in problem 3.

  7. Set \(\mathbf{x_{i}^{*}}\) based on the following probabilies

\[ \mathbf{x_{i}^{*}} = \begin{cases} x_{i}^{*}& \text{with probability min(R,1)}\\ x_{i}^{(t)}&\text{otherwise} \end{cases} \]

  1. Let i = 2 and perform steps 3-6.

  2. Set \(\mathbf{x^{(t+1)}} = \mathbf{x^{*}}\), increment t and go to step 2.

b)

library(mvtnorm)
library(scales)
library(coda)
sqrtm <- function (A) {
  # Obtain matrix square root of a matrix A
  a = eigen(A)
  sqm = a$vectors %*% diag(sqrt(a$values)) %*% t(a$vectors)
  sqm = (sqm+t(sqm))/2
}
gen <- function(n,p,mu, sig){
  # generate data from normal mu sigma
  x = matrix(rnorm(n*p),n,p)
  datan = x %*% sqrtm(sig) + matrix(mu,n,p, byrow = TRUE)
  datan
}

randomwalk <- function(nsim, x0,mu, sig, s){
  rw = matrix(NA, nsim, 2)
  rw[1,] = x0 # initial values 
  xt = x0 # x(t) vector, t =0 for this vector
  accept = numeric(2) # column vector keeping count of acceptance, corresponding to var1, var2
  for(i in 2:nsim){
    xs = xt 
    for(j in 1:2){
      xs[j] = rnorm(1,xt[j],s[j]) # gen x*[j] from N(x(t)[j], s^2)
      R = dmvnorm(xs,mu,sig)/(dmvnorm(xt,mu,sig)) # f(x*)/f(x(t))
      R = min(1,R,na.rm = TRUE) 
      if(R == 1 || (R < 1 && (runif(1) < R))){
        accept[j] = accept[j] + 1 # xs[j] does not change
      } else{
        xs[j] = xt[j] # keep xt[j], reject xs[j]
      }
    }
    rw[i,] = xs # add pair to the ith step of our random walk
    xt = xs # xs now becomes the xt vector for the next iteration
  }
  # This will be done in pairs of course like gibbs
  rate = accept/nsim
  list(rw = rw, rate = rate)
}

c)

nsim = 10000
x0 = c(0,0)
mu = c(1,2)
sig = matrix(c(1,.9,.9,1),2,2)
s = c(.001,.001)
bi.mh = randomwalk(nsim,c(0,0),mu, sig,s)
i.
bi.mh$rate
## [1] 0.9974 0.9967
mh.draws = mcmc(bi.mh$rw) # mcmc object
acceptance.rate = 1 - rejectionRate(mh.draws)
acceptance.rate
##      var1      var2 
## 0.9974997 0.9967997

The acceptance rate is unreasonably high for both variables. A rule of thumb states that a good acceptance rate for M-H with few variables is around 50%, and a range between 25% - 50% preferable in general. If many points are being accepted it suggest that the algorithm is most likely stuck in a particular area, taking tiny steps and not moving away from the intial value. This is caused by the extremely small choice in \(\sigma_{1}, \sigma_{2}\), as the algorithm will only generate points in the vicinity of \(x_{0}\), given such small standard deviation.

ii.
mh.draws = mcmc(bi.mh$rw, start = 1001, end = nsim) # burn-in 1000 points

The trace for both variables, are eratic and shows no clear sign of convergence. It could be seen that the variables vary at a neighborhood around 0, which were the initial values for both variables, however, it is clear that the algorithm is getting stuck at particular points and then jumping to another point only to get stuck again. What we would like to see instead of this behavior, is a couple of initial jumps (before burning), and then a random walk about a very small interval of a point, rather than jumping from point to point. Furthermore, the densities for both variables have many bumps almost resembeling a mixture of densities, rather than the normal densities it should be representing, about mean 1 and 2 respectively. Again, this is a representation of the many different places where the algorithm got stuck and produced several points before jumping to another point.

iii.
bvn = gen(9000,2,mu,sig)

The overlay of our 9000 Metropolis-Hastings draws over actual draws from a bivariate normal makes it visually clear that the M-H draws do not resemble the bivariate normal. As expected, the tiny size of our standard deviations limit the algorithm to a small neighborhood around the initial point, \(x_{0} = (0,0)\).

d)

Perhaps a more appropriate choice of \(\sigma_{1},\sigma_{2}\), would be \(\sigma_{1} = 1,\sigma_{2} = 1\), given that it corresponds to the square root of the variances from the covariance matrix.

s = c(1,1)
bi.mh = randomwalk(nsim,c(0,0),mu, sig,s)
mh.draws = mcmc(bi.mh$rw) # mcmc object
acceptance.rate = 1 - rejectionRate(mh.draws)
acceptance.rate
##      var1      var2 
## 0.4632463 0.4577458

The acceptance rate is much closer to the rule of thumb, 25%-50%.

mh.draws = mcmc(bi.mh$rw, start = 1001, end = nsim) # burn-in 1000 points

The mixing for these draws is much better. We could see here that the steps are consistently eratic about an the respective variables’ estimates, 1 and 2 respectively. The random walk is consisently and moving in an inteveral reflecting the standard deviations chosen. Furhtermore, the density for each variable appears normal and is centered around its respective estimate, 1 and 2 respectively, with standard deviations, \(\sigma_{1} = 1, \sigma_{2} = 1\), which was the goal of the Metropolis-Hastings.

bvn = gen(9000,2,mu,sig)

As could be seen above, the points drawn from the M-H closely resemble those drawn from the actual bivariate normal distribution. This visual shows that a change of varience greatly effects the outcome of M-H, as it now appears that drawning from M-H is equivalent to drawing from the actual bivariate normal, with the stated \(\mathbf{\mu, \Sigma}\).

e)

As the \(\sigma_{1}, \sigma_{2}\) increase, the number of points generated that lie outside the major density area for our bivariate normal also increases, which would lead to an increase in rejections, hence increasing the rejection rate. This is because points that lie far beyond the scope of the bivariate normal will be generated which would produce extremely small R values, which would therefore lead to an increase in rejections.