\documentclass{article}
\author{Ben Bolker}
\date{\today}
\title{GLMMs in R: analyzing overdispersed data}
\newcommand{\code}[1]{{\tt #1}}
\usepackage{url}
\usepackage{hyperref}
\usepackage{natbib}
\usepackage{verbatim}
\usepackage[utf8]{inputenc}
\begin{document}
\bibliographystyle{chicago}
\maketitle
Looking for the simplest possible example that encapsulates
overdispersion which can be sensibly modeled via
lognormal-Poisson approaches (i.e., individual-level random
effects).
Unfortunately I haven't yet found a good, non-problematic dataset
that uses Poisson or binomial data, has overdispersion, but
doesn't have other issues [zero-inflation, too small/messy for
straightforward analysis, etc.] \ldots
(``All [simple data sets] are alike.
Every [messy data set] is [messy] in its own way.'')
From \url{http://glmm.wikidot.com/faq}:
\fbox{
\parbox{\textwidth}{
\begin{itemize}
\item quasilikelihood estimation: \code{MASS::glmmPQL}
(the ``quasi-'' families may be unreliable in \code{lme4},
and may disappear; not clear whether there is a good
theoretical foundation for extending quasilikelihood to the GLMM case);
\code{geepack::geeglm}, \code{gee::gee}
\item individual-level random effects (\code{MCMCglmm} or hacked
\code{lme4}) [or WinBUGS or AD Model Builder or \ldots] [note that
individual-level random effect estimation is probably dodgy for PQL
approaches]
\item alternative distributions
\begin{itemize}
\item Poisson-lognormal (see above, ``individual-level random effects'')
\item negative binomial
\begin{itemize}
\item \code{glmmADMB::glmm.admb} (off-CRAN: \url{http://otter-rsch.com/admbre/examples/glmmadmb/glmmADMB.html})
\item \code{repeated::gnlmm} (off-CRAN: \url{http://www.commanster.eu/rcode.html})
\item WinBUGS/JAGS (via R2WinBUGS/Rjags)
\item AD Model Builder (via R2ADMB?)
\end{itemize}
\end{itemize}
\item beta-binomial: all of the above except (?) \code{MCMCglmm},
\code{glmm.admb}
\item zero-inflated: all of the above except \code{gnlmm}
\end{itemize}
}}
\section{Examples (simulation)}
It's easy enough to generate lognormal-Poisson-distributed
``data'' and show that a (hacked) version of lme4 recovers
them appropriately, but it may not be very informative.
This is a basic Poisson simulation with a single covariate (uniformly
randomly distributed), random intercept differences among blocks, and
random intercept differences among individuals.
<<>>=
simfun <- function(ng=20,nr=100,fsd=1,indsd=0.5,
b=c(1,2)) {
ntot <- nr*ng
b.reff <- rnorm(ng,sd=fsd)
b.rind <- rnorm(ntot,sd=indsd)
x <- runif(ntot)
dd <- data.frame(x,f=factor(rep(LETTERS[1:ng],each=nr)),
obs=1:ntot)
dd$eta0 <- model.matrix(~x,data=dd) %*% b
dd$eta <- with(dd,eta0 + b.reff[f]+b.rind[obs])
dd$mu <- exp(dd$eta)
dd$y <- with(dd,rpois(ntot,lambda=mu))
dd
}
@
Try it out:
<<>>=
library(lme4)
set.seed(1001)
dd <- simfun()
(m0 <- glmer(y~x+(1|f),family="poisson",data=dd))
(m1 <- glmer(y~x+(1|f)+(1|obs),family="poisson",data=dd))
@
A summary function to fit the full model and extract parameters:
<<>>=
cfun <- function(d) {
m <- glmer(y~x+(1|f)+(1|obs),family="poisson",data=d)
c(sqrt(unlist(VarCorr(m))),fixef(m))
}
@
Run it 50 times:
<>=
rr <- replicate(50,cfun(simfun()))
@
This works pretty well (Figure~\ref{fig:simplot1}).
\begin{figure}
<>=
library(plotrix)
library(reshape)
mm <- melt(t(rr))
library(lattice)
print(bwplot(value~X2,data=mm,
panel=function(...) {
panel.bwplot(...)
panel.points(1:4,c(1,1,0.5,2),col=2,pch=16)
}))
@
\caption{Basic results for Poisson-lognormal model}
\label{fig:simplot1}
\end{figure}
\section{Examples (real)}
\begin{itemize}
\item \textbf{Count data: sheep tick burdens
on the heads of red grouse chicks} \citep{elston_analysis_2001}:
\begin{itemize}
\item Originally analyzed with the ``GLMM procedure in Genstat
5.4.1 (Genstat 5 Committee, 1997 ; Payne \& Arnold, 1998) and the
SAS GLIMMIX macro (Littell et al. 1996)''. (Both of these are
marginal [P/MQL] algorithms, individual-level effect estimation
is supposed to be dodgy in this case \ldots although I don't
have a peer-reviewed reference handy [see
e.g. the paragraph headed ``final remark'' in \url{https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q4/001488.html}].
\item I can run \code{MASS::glmmPQL}, but don't get the same results as quoted in the paper --- haven't looked into the details \ldots
\item When I try to run this analysis in a hacked version of
\code{lme4} I get \code{Cholmod ... 'not positive definite'} and
\verb+mer_finalize ... false convergence+ warnings \ldots
\item In \code{MCMCglmm}, I get \code{Mixed model equations singular: use
a (stronger) prior} after 8000 iterations.
\end{itemize}
In any case, this does not look like a straightforward/simple analysis.
\item \textbf{Count data: owl nestling begging} \citep{roulin_nestling_2007},
reproduced as an example in \cite{zuur_mixed_2009}: data available
from \url{http://www.highstat.com/Book2/ZuurDataMixedModelling.zip}
\begin{itemize}
\item I have run this analysis in lme4, and the results are
reasonably sensible. However, the residuals are a bit funny, and
Alain Zuur has mentioned that he is going to use the data in
a methods paper on zero-inflation.
\item could try this in \code{glmm.admb} or \code{MCMCglmm},
which both allow zero-inflation
\end{itemize}
\item \textbf{Count data: gopher tortoise shell counts} \citep{ozgul_upper_2009}: tried analysis in various ways, ended up coding in WinBUGS. Random effect (site) has quite limited sample sizes (only 10 sites), and \code{glmer} finds a best estimate of zero variance among sites (even among sites once we drop the overdispersion variation).
<>=
library(lme4)
load("gopherdat2.RData")
head(Gdat)
Gdat$obs <- 1:nrow(Gdat)
g1 <- glmer(shells~factor(year)+prev+offset(log(Area))+(1|Site)+(1|obs),
data=Gdat,family="poisson")
g0 <- glmer(shells~factor(year)+prev+offset(log(Area))+(1|Site),
data=Gdat,family="poisson")
@
\item \textbf{Binomial data: \emph{Glycera} cell survival} I'm working
on an analysis of a big factorial experiment on the response
of \emph{Glycera} (a marine worm) cells to various stressors.
The data aren't (yet) mine to release. In addition, I had
convergence problems with \code{glmer} --- I ended up analyzing
the data with \code{MCMCglmm}. (The version of \code{glmer} in
\code{lme4a} gives slightly different results
(more than round-off error), and does \emph{not} produce
convergence warnings \ldots
\item I have various binary data sets, but these are not
particularly good for exploring overdispersion, because
overdispersion is unidentifiable in binary data.
\end{itemize}
\bibliography{glmm}
\end{document}