In linear regression, residuals can be used to assess the shape, constant variance, and normality assumptions. In Poisson regression, we can use quantile residuals to assess both the shape and Poisson distribution assumptions. The purpose of this class activity is to explore how quantile residual plots can diagnose violations of the Poisson regression assumptions.
To begin, let’s generate data from the model
\[X_i \sim N(0, 0.5)\] \[\log(\lambda_i) = X_i\] \[Y_i \sim Poisson(\lambda_i)\] Note that the Poisson regression assumptions are satisfied for this model.
<- 1000
n <- rnorm(n, sd = 0.5)
x <- rpois(n, lambda = exp(x))
y1
<- glm(y1 ~ x, family = poisson)
m1
data.frame(x = x, resids = qresid(m1)) %>%
ggplot(aes(x = x, y = resids)) +
geom_point() +
geom_smooth() +
theme_bw() +
labs(x = "Age", y = "Quantile residuals")
Now what happens if we change the distribution of \(Y\)? The negative binomial distribution is another count distribution, which is different from the Poisson (we will see much more of the negative binomial later!). Let’s generate negative binomial data (violating the Poisson distribution assumption), and see how the quantile residual plot changes.
<- 1000
n <- rnorm(n, sd = 0.5)
x <- rnbinom(n, size=0.5, mu=exp(x))
y2
<- glm(y2 ~ x, family = poisson)
m2
data.frame(x = x, resids = qresid(m2)) %>%
ggplot(aes(x = x, y = resids)) +
geom_point() +
geom_smooth() +
theme_bw() +
labs(x = "Age", y = "Quantile residuals")
Finally, let’s try violating the shape assumption with Poisson data.
Modify your code from question 1 to violate the shape assumption, but not the Poisson distribution assumption. What differences do you see in your plot compared to question 1?
Summarize how quantile residual plots can be used to identify issues with the Poisson regression assumptions.