\documentclass[11pt,letterpaper]{article}
\usepackage{graphicx}
%\usepackage[lmargin=3cm,rmargin=3cm,tmargin=3cm,bmargin=3cm]{geometry}
\usepackage{subfig}
\usepackage{paralist}
\usepackage{natbib}
\usepackage{color}
\usepackage{url}
\usepackage{hyperref}
\usepackage[utf8]{inputenc}
\newcommand{\correct}[1]{{\color{red}#1\color{black}}}
\newcommand{\todo}[1]{{\color{red}{\bf #1}\color{black}}}
\newcommand{\code}[1]{{\tt #1}}
\newcommand{\daic}{\Delta \mbox{AIC}}
\newcommand{\pkglink}[1]{\href{http://cran.r-project.org/web/packages/#1}{\nolinkurl{#1}}}
\newcommand{\rflink}[1]{\href{https://r-forge.r-project.org/projects/#1/}{\nolinkurl{#1}}}
%% only works for base packages
%%\newcommand{\fnlink}[2]{\href{http://stat.ethz.ch/R-manual/R-patched/library/#1/html/#2.html}{\nolinkurl{#1:#2}}}
\newcommand{\fnlink}[2]{\href{http://finzi.psych.upenn.edu/R/library/#1/html/#2.html}{\nolinkurl{#1:#2}}}

\usepackage{fancyvrb}
\VerbatimFootnotes
\begin{document}
%\linenumbers
\title{GLMMs in action: gene-by-environment interaction in total fruit production of wild populations of \emph{Arabidopsis thaliana} \\
    \textbf{Revised version, part 1}}
\author{Benjamin M. Bolker\footnote{
    $^1$Departments of Mathematics \& Statistics and Biology, McMaster University, Hamilton, Ontario, Canada L8S 4K1; 
    $^2$ Department of Biology, University of Florida, PO Box 118525, Gainesville FL 32611-8525;
    $^3$ Nicholas School of the Environment, Duke University, PO Box 90328, Durham, NC  27708;
    $^4$School of Biological Sciences, Victoria University of Wellington PO Box 600, Wellington 6140, New Zealand;
    $^5$Department of Botany, Miami University, Oxford OH, 45056;
    $^6$Department of Biological Sciences, California State University, Chico, Chico, CA 9529-515.
    \emph{Corresponding author}: Bolker, B. M. ({\tt bolker@mcmaster.ca})} \and 
  Mollie E. Brooks$^2$\and Connie J. Clark$^3$\and Shane W. Geange$^4$\and John R. Poulsen$^3$\and M. Henry H. Stevens$^5$ \and Jada-Simone S. White$^6$}
\maketitle
\SweaveOpts{eps=FALSE,prefix=FALSE,keep.source=TRUE,include=FALSE}

<<utils, echo=FALSE, results=hide>>=
options(width=80, contrasts=c("contr.treatment", "contr.poly"), continue=" ",
        SweaveHooks=list(fig=function() par(bty="l",las=1)))
if (require(cacheSweave)) {
  setCacheDir("banta_cache_part1")
}
@ 

\section{Introduction}
Spatial variation in nutrient availability and herbivory is likely to cause population differentiation and maintain genetic diversity in plant populations. Here we measure the extent to which mouse-ear cress (\emph{Arabidopsis thaliana}) exhibits population and genotypic variation in their responses to these important environmental factors. We are particularly interested in whether these populations exhibit nutrient mediated \emph{compensation}, where higher nutrient levels allow individuals to better tolerate herbivory \citep{banta_comprehensive_2010}. We use GLMMs to estimate the effect of nutrient levels, simulated hebivory, and their interaction on fruit production in \emph{Arabidopsis thaliana} (fixed effects), and the extent to which populations vary in their responses (random effects, or variance components)\footnote{Data and context are provided courtesy of J. Banta and M. Pigliucci, State University of New York (Stony Brook).}. 

Below we follow guidelines from \citep{Bolker+2008b} to step deliberately through the process of model building and analysis. In this example, we use the \code{lme4} package \citep{lme4} in the \textbf{R} language and environment \citep{R}; other approaches to fitting GLMMs are illustrated in part~2.

We will use some other packages for plotting, manipulating data, and
interpreting results:
<<libs>>=
library(lme4)
library(coefplot2)  ## for coefplot2
library(reshape)
library(plyr)
library(ggplot2)
library(gridExtra)
library(emdbook)   ## for qchibarsq
source("glmm_funs.R")
@ 

\footnote{
We used \Sexpr{R.version$version.string} and package versions:
<<echo=FALSE>>=
usedpkgs <- sort(c("coefplot2","ggplot2","plyr","reshape","gridExtra","emdbook",
                   "lme4","doBy","RLRsim"))
i1 <- installed.packages()
print(i1[usedpkgs,"Version"],quote=FALSE)
@ 
The \rflink{coefplot2} package must be
installed from \url{http://r-forge.r-project.org}; all others are on CRAN.
}

\section{The data set}
In this data set, the response variable is the number of fruits (i.e. seed capsules) per plant. The number of fruits produced by an individual plant (the experimental unit) was hypothesized to be a function of fixed effects, including nutrient levels (low \emph{vs.} high), simulated herbivory (none \emph{vs.} apical meristem damage), region (Sweden, Netherlands, Spain), and interactions among these. Fruit number was also a function of random effects including both the population and individual genotype. Because \emph{Arabidopsis} is highly selfing, seeds of a single individual served as replicates of that individual. There were also nuisance variables, including the placement of the plant in the greenhouse, and the method used to germinate seeds. These were estimated as fixed effects but interactions were excluded. 

\section{Specifying fixed and random effects}
Here we need to select a realistic full model, based on the scientific questions and the data actually at hand. We first load the data set and make sure that each variable is appropriately designated as numeric or factor (i.e. categorical variable). 
<<getdata,keep.source=TRUE>>=
dat.tf <- read.csv("Banta_TotalFruits.csv")
str(dat.tf)
@ 
The \code{X}, \code{gen}, \code{rack} and \code{nutrient} variables are
coded as integers, but we want them to be factors: we could either
redo the \code{read.csv} command with \code{colClasses} set, or
set the variables to factors by hand.
\begin{itemize}
\item{We use \code{transform()}, which operates within the data set,
to avoid typing lots of commands like \verb!dat.tf$rack <- factor(dat.tf$rack)!}
\item{At the same time, we reorder the clipping variable so that
  \code{"unclipped"} is the reference level: we could also have
  used \code{relevel(amd,"unclipped")}}
\end{itemize}
<<transdata>>=
dat.tf <- transform(dat.tf,
                    X=factor(X),
                    gen=factor(gen),
                    rack=factor(rack),
                    amd=factor(amd,levels=c("unclipped","clipped")),
                    nutrient=factor(nutrient,label=c("Low","High")))
@                  
\footnote{There is sometimes a tradeoff between clarity and
  comprehension.  More laboriously but perhaps more clearly, we could
  have made the same changes to the integer-coded factors as follows:
<<oldway,eval=FALSE>>=
dat.tf$X <- factor(dat.tf$X)
dat.tf$gen <- factor(dat.tf$gen)
dat.tf$rack <- factor(dat.tf$rack)
dat.tf$nutrient <- factor(dat.tf$nutrient)
@   
We find the code with \code{transform} clearer, but you should
use whatever form makes more sense to you (and is easier to
understand when you come back to the code later).}

The above output shows that the data include,
\begin{compactdesc}
\item[\code{X}~] observation number (we will use this observation number later, when we are accounting for overdispersion)
\item[\code{reg}~] a factor for region (Netherlands, Spain, Sweden).
\item[\code{popu}~] a factor with a level for each population.
\item[\code{gen}~] a factor with a level for each genotype.
\item[\code{rack}~] a nuisance factor for one of two greenhouse racks.
\item[\code{nutrient}~] a factor with levels for minimal or additional nutrients.
\item[\code{amd}~] a factor with levels for no damage or simulated herbivory (apical meristem damage; we will sometimes refer to this as ``clipping'')
\item[\code{status}~] a nuisance factor for germination method.
\item[\code{total.fruits}~] the response; an integer count of the number of fruits per plant.
\end{compactdesc}

Now we check replication for each genotype (columns) within each population (rows).
{\small
<<reps>>=
(reptab <- with(dat.tf, table(popu, gen)))
@ 
}
\footnote{you could also inspect this graphically:
\code{with(dat.tf,plot(table(popu,gen)))} produces a \emph{mosaic plot} of
the two-way table \ldots}
This reveals that we have only 2--4 populations per region and 2--3 genotypes per population. However, we also have 2--13 replicates per genotype for each treatment combination (four unique treatment combinations: 2 levels of nutrients by 2 levels of simulated herbivory). Thus, even though this was a reasonably large experiment (625 plants), there were a very small number of replicates with which to estimate variance components, and many more potential interactions than our data can support. Therefore, judicious selection of model terms, based on both biology and the data, is warranted. We note that we don't really have enough levels per random effect, nor enough replication per unique treatment combination. Therefore, we decide to omit the fixed effect of ``region'', although we recognize that populations in different regions are widely geographically separated\footnote{In hindsight, we should probably have included region as a \emph{fixed} effect; re-doing the analysis with this change is left as an exercise for the reader.}.  

We have only two random effects (population, individual), and so Laplace or Gauss-Hermite Quadrature (GHQ) should suffice, rather than requiring more complex methods. However, as in all GLMMs where the scale parameter is treated as fixed and deviations from the fixed scale parameter would be identifiable (i.e. Poisson and binomial ($N>1$), but not binary, models) we may have to deal with overdispersion.

\section{Choose an error distribution; graphical checks}

The data are non-normal in principle (i.e., count data, so our first guess would be a Poisson distribution). 
If we transform total fruits with the canonical link function (log), 
we hope to see relatively homogeneous variances across categories and groups.

First we define a new factor that represents every combination of genotype
and treatment (nutrient $\times$ clipping) treatment, and sort it in
order of increasing mean fruit set.
<<>>=
## use within() to make multiple changes within a data frame
dat.tf <- within(dat.tf,
                 {
                   gna <- interaction(gen,nutrient,amd)
                   gna <- reorder(gna, total.fruits, mean)
                 })
@ 

Now we use the \code{bwplot} function in the \code{lattice} package to
generate a box-and-whisker plot of $\log(\mbox{fruits}+1)$.
<<bwplots, fig=TRUE, width=10, height=5>>=
print(bwplot(log(total.fruits + 1) ~  gna,  
             data=dat.tf, 
             scales=list(x=list(rot=90))  ## rotate x-axis labels
             ))
@
\begin{figure}[ht]
  \centering
  \includegraphics[width=.9\linewidth]{bwplots}
  \caption{Box-and-whisker plots of log(total fruits+1) by treatment combinations and genotypes reveal heterogeneity of variances across groups.}
  \label{fig:bw}
\end{figure}
\footnote{
  The \code{print()} function is not necessary in interactive use, only when you want
  to send \code{lattice} or \code{ggplot} graphs to a file during an R batch run.}
\footnote{
  Alternatively, one can also use the \code{ggplot2} package to draw these
  graphs.  In this case, we rotate the graph 90 degrees
  (using \verb+coord_flip()+) rather than rotating the $x$-axis labels \ldots
<<eval=FALSE>>=
ggplot(dat.tf,aes(x=gna,y=log(1+total.fruits)))+geom_boxplot()+coord_flip()+
  theme_bw()
@ 
}

Taking the logarithm of total fruit (or in this case $\log(1+\mbox{fruit})$ to avoid taking the log of zero) should stabilize the variance \citep{McCullaghNelder1989}, but this does not seem to be the case (Figure \ref{fig:bw}). 
The variance actually seems to decline slightly in the largest groups.

We could also calculate the variance  for each genotype $\times$ treatment combination and provide a statistical summary of these variances.
<<checklambda>>=
grpVarL <- with(dat.tf, tapply(log(1+total.fruits), 
                               list(gna), var) )
summary(grpVarL)
@ 
This reveals substantial variation among the sample variances on the transformed data.  In addition to heterogeneous variances across groups, Figure \ref{fig:bw} reveals many zeroes in groups, and some groups with a mean and variance of zero, further suggesting we need a non-normal error distribution, and perhaps something other than a Poisson distribution. 

Figure \ref{fig:bw} also indicates that the mean ($\lambda$) of the Poisson distribution 
is well above 5 (i.e. in general the center of each group is greater than $\log(6)=1.8$), suggesting that PQL may be appropriate, if it is the only option available. 
We could calculate $\lambda$ for each genotype $\times$ treatment combination and provide a statistical summary of each group's $\lambda$.
<<checklambda>>=
grpMeans <- with(dat.tf, tapply(total.fruits, list(gna), mean))
summary(grpMeans)
@ 
The average  by-group $\bar{\lambda}$ is  $\Sexpr{round(mean(grpMeans))}$; the  median is $\approx \Sexpr{round(median(grpMeans))}$; and at least one group has $\lambda=0$ (i.e., complete absence of fruit set for that genotype $\times$ treatment combination).

A core property of the Poisson distribution is that the variance is equal to the mean. A simple diagnostic is a plot of the group variances against the group means
(Figure~\ref{fig:VM}): 
\begin{itemize}
\item Poisson-distributed data will result in a linear pattern with slope=1
\item as long as the variance is generally greater than the mean, we call 
  the data \emph{overdispersed}. Overdispersion comes in various forms:
  \begin{itemize}
  \item a linear mean-variance relationship with $\mbox{Var} = \phi \mu$ (a line through
    the origin) with $\phi>1$ is called a \emph{quasi-Poisson} pattern 
    (this term describes the mean-variance
    relationship, not any particular probability distribution); we can implement it 
    statistically via quasilikelihood \citep{venables_modern_2002}
    or by using a particular parameterization of the negative binomial
    distribution (``NB1'' in the terminology of \cite{hardin_generalized_2007}).
  \item a semi-quadratic pattern, $\mbox{Var}=\mu(1+\alpha \mu)$ or $\mu(1+\mu/k)$, is
    characteristic of overdispersed data that is driven by underlying heterogeneity among
    samples, either the negative binomial (gamma-Poisson) or the lognormal-Poisson
    \citep{elston_analysis_2001}; we will see below how to fit data following these distributions.
  \end{itemize}
\end{itemize}

We calculated the (genotype $\times$ treatment) variances in the
same way as the means:
<<logvar>>=
grpVars <- with(dat.tf, 
                tapply(total.fruits, 
                       list(gna), var) )  ## variance of UNTRANSFORMED data
@ 
\footnote{Here are two alternative ways to compute the mean
  and variance of total fruits by group, one using \code{plyr::ddply}
  and the other \code{aggregate} from base R:
<<results=hide>>=
ddply(dat.tf,"gna",
      summarise,
      mean=mean(total.fruits),var=var(total.fruits))
with(dat.tf,aggregate(total.fruits,
                      list(gna),
                      FUN=function(x) c(mean=mean(x),var=var(x))))
@ 
}

We can get approximate estimates of the quasi-Poisson (linear)
pattern using \code{lm}, as follows:
(The \code{-1} in the formulas below
specifies a model with the intercept set to zero.)
<<>>=
lm1 <- lm(grpVars~grpMeans-1)  ## `quasi-Poisson' fit
phi.fit <- coef(lm1)
@ 

We can also use \code{lm} to 
estimate the negative binomial (linear/quadratic) pattern
$V=\mu+\mu^2/k$
by adding an offset (\code{offset(grpMeans)}) to 
the right-hand side to specify that we want
the group means added as a term with its
coefficient fixed to 1
(we use \code{I()} to specify that we really want to
square the group means):
<<>>=
lm2 <- lm(grpVars~I(grpMeans^2)+offset(grpMeans)-1)
k.fit <- 1/coef(lm2)
@ 

We add a non-parametric \code{loess} fit to
the plot of the means vs. variances and the fitted relationships;

<<logvarplot,fig=TRUE>>=
plot( grpVars ~ grpMeans, xlab="group means", ylab="group variances" )
abline(c(0,1), lty=2)
text(105,500,"Poisson")
curve(phi.fit*x, col=2,add=TRUE)
## bquote() is used to substitute numeric values 
##   in equations with symbols
text(110,3900,
     bquote(paste("QP: ",sigma^2==.(round(phi.fit,1))*mu)),
     col=2)
curve(x*(1+x/k.fit),col=4,add=TRUE)
text(104,7200,paste("NB: k=",round(k.fit,1),sep=""),col=4)
## loess fit
Lfit <- loess(grpVars~grpMeans)
mvec <- 0:120
lines(mvec,predict(Lfit,mvec),col=5)
text(118,2000,"loess",col=5)
@ 

\footnote{\code{ggplot} solution:
<<>>=
ggplot(data.frame(grpMeans,grpVars),
       aes(x=grpMeans,y=grpVars))+geom_point()+
  geom_smooth(colour="blue",fill="blue")+
  geom_smooth(method="lm",formula=y~x-1,colour="red",fill="red")+
  geom_smooth(method="lm",formula=y~I(x^2)+offset(x)-1,
              colour="purple",fill="purple")
@ 
}

These fits are not rigorous statistical tests --- they violate
a variety of assumptions of linear regression (e.g. constant
variance, independence), but they are good enough to give
us an initial guess about what distributions we should use.


\begin{figure}[ht]
  \centering
  \includegraphics[width=.6\linewidth]{logvarplot}
  \caption{The variance-mean relation of Poisson random variables should exhibit a slope of approximately one (dashed line). In contrast, these data reveal a slope of much greater than one (the wedge or cornupia shape is expected for small Poisson samples, but the slope here is much, much greater than 1).
  The colored lines show estimated mean-variance relationships
  for quasi-Poisson (or negative-binomial ``type I'', \cite{hardin_generalized_2007}) [red]:
  $V=\phi \mu$;
  negative binomial or lognormal-Poisson \citep{elston_analysis_2001} [blue]: $V=\mu(1+\mu/k)$ or
  $V=\mu(1+\mu \cdot (\exp(\sigma^2)-1))$; or a loess fit [cyan].
}
\label{fig:VM}
\end{figure}
We find that the group variances increase with the mean much more rapidly than expected under the Poisson distribution. This indicates that we need to account
for overdispersion in the model. The linear (quasilikelihood/NB1) fit
may be better than the semi-quadratic (NB/lognormal-Poisson) fit, but
it's hard to tell without additional model-fitting and diagnostics.
 

\subsection{Plotting the responses \emph{vs.} treatments}
We would like to plot the response by treatment effects, to make sure there are no surprises.
<<plot1,fig=TRUE>>=
print(stripplot(log(total.fruits+1) ~ amd|popu, 
                data=dat.tf,
                groups=nutrient, 
                auto.key=list(lines=TRUE, columns=2),
                strip=strip.custom(strip.names=c(TRUE,TRUE)),
                type=c('p','a'), layout=c(5,2,1),
                scales=list(x=list(rot=90)), main="Panel: Population"))
@ 
There seems to be a consistent nutrient effect (different intercepts), and perhaps a weaker clipping effect (negative slopes) --- maybe even some compensation (steeper slope for zero nutrients, Fig. \ref{fig:plot1}).
\begin{figure}[ht]
  \centering
  \includegraphics[width=.6\linewidth]{plot1}
  \caption{Interaction plots of total fruit response to nutrients and clipping, by population. Different lines connect population means of log-transformed data for different nutrient levels; colors represent nutrient levels
  (low=1=blue, high=8=pink).}
  \label{fig:plot1}
\end{figure}

\begin{figure}[ht]
  \centering
  \includegraphics[width=.6\linewidth]{plot2}
  \caption{Interaction plots of total fruit response to clipping, for each genotype (lines and colors). Different panels show different nutrient levels.}
  \label{fig:plot2}
\end{figure}
Because we have replicates at the genotype level, we are also able to examine the responses of genotypes (Fig. \ref{fig:plot2}).
<<plot2,fig=TRUE>>=
print(stripplot(log(total.fruits+1) ~ amd|nutrient, dat.tf, 
                groups=gen, 
                strip=strip.custom(strip.names=c(TRUE,TRUE)),
                type=c('p','a'), ## points and panel-average value -- 
                                 ## see ?panel.xyplot
                scales=list(x=list(rot=90)),
                main="Panel: nutrient, Color: genotype"))
@ 
\footnote{
\code{ggplot} versions:  
<<eval=FALSE>>=
ggplot(dat.tf,aes(x=amd,y=log(total.fruits+1),colour=nutrient))+
  geom_point()+
  ## need to use as.numeric(amd) to get lines
  stat_summary(aes(x=as.numeric(amd)),fun.y=mean,geom="line")+
  theme_bw()+opts(panel.margin=unit(0,"lines"))+
  facet_wrap(~popu)
ggplot(dat.tf,aes(x=amd,y=log(total.fruits+1),colour=gen))+
  geom_point()+
  stat_summary(aes(x=as.numeric(amd)),fun.y=mean,geom="line")+
  theme_bw()+
  ## label_both adds variable name ('nutrient') to facet labels
  facet_grid(.~nutrient,labeller=label_both)
@ 
}

There appears to be a consistent nutrient effect among genotypes (means in right panel higher than left panel), but the variation increases dramatically, obscuring the clipping effects and any possible interaction (Fig. \ref{fig:plot2}).

\section{Fitting group-wise GLMs}
Another useful starting point is to fit GLMs to each group of data separately,
equivalent to treating the grouping variables
as fixed effects.
Odd patterns
in the parameters could call into question our assumption
of a single underlying normal distribution in the parameters.
In particular, groups with extreme parameter values or bimodal
patterns in the parameter values could indicate a problem, although
extreme estimated parameter values for poorly sampled groups are
expected.


We first fit the models, and then examine the coefficients.
We would normally use \code{glm} to fit a GLM, but 
we use \code{lmList} here because
can fit a single GLM to many
separate groups conveniently.
We fit Poisson models rather than negative binomial models because
\code{lmList} does not support negative binomial fits: we could
use \code{family="quasipoisson"}, but it would not affect the
parameter estimates (only the standard errors or confidence
intervals), which we are not examining here.
\footnote{If we insisted
on fitting a negative binomial GLM we could most conveniently
use \code{ldply} from the \code{plyr} package to split the
data set by genotype and apply \code{glm.nb} from the \code{MASS} package
to each chunk:
<<>>=
library(plyr)
glmnbfitlist <- dlply(dat.tf,"gen",MASS::glm.nb,formula=total.fruits~nutrient*amd)
coefmat <- t(sapply(glmnbfitlist,coef))
@
}
<<GLMfits>>=
glm.lis <- lmList(total.fruits~nutrient*amd|gen,data=dat.tf,
                  family="poisson")
@ 
\footnote{The versions of \code{lmList} in the \code{lme4} and
  \code{nlme} packages are different (only the one in 
  \code{lme4} has a \code{family} argument for running GLMs), 
  and the \code{RLRsim} package loads the \code{nlme} package.
  You should \code{detach("package:RLRsim"); detach("package:nlme")}
  before running this command if you have loaded \code{RLRsim}
  in the meantime \ldots
}


There are a variety of ways to examine the different coefficients. 
We have written a function to save all the coefficients in a data frame, 
order the data frame by rank order of one of the coefficients, 
and then plot all the cofficients together (see \cite{pinheiro_mixed-effects_2000});
it was loaded when we ran \verb+source("glmm_funs.R")+.

Plot the list of models created above:
<<GLMlis, fig=TRUE>>=
print(plot(glm.lis,scale=list(x=list(relation="free"))))
@ 
\begin{figure}[ht]
  \centering
  \includegraphics[width=.6\linewidth]{GLMlis}
  \caption{Model coefficients for GLM fits on each genotype.}
  \label{fig:glmcoefs}
\end{figure}
Three genotypes (5, 6, 34) have extreme coefficients (Fig. \ref{fig:glmcoefs}). 

A mixed model assumes that the underlying random effects are normally distributed, although we
note that we are not actually plotting the random effects, 
or even estimates of random effects (which are not themselves guaranteed to be normally distributed),
but rather separate estimates for each group. 

The \code{glmm\_funs.R} file contains
a plotting function 
(specifically, a \code{qqmath} method) that draws Q-Q plots 
based on these fits to visualize the departure from normality
(Figure \ref{fig:glmlisqq}):
<<qqmath1,fig=TRUE>>=
print(qqmath(glm.lis))
@ 
\begin{figure}[ht]
  \centering
  \includegraphics[width=.8\linewidth]{qqmath1}
  \caption{Q-Q plots of model coefficients for GLM fits on each genotype.}
  \label{fig:glmlisqq}
\end{figure}
We see that these extreme coefficients fall far outside a normal error distribution (Fig. \ref{fig:glmlisqq}).
As mentioned above, these 
are not estimates of random effects, 
however, but rather separate estimates for each group.  Especially if these groups have relatively small sample sizes, the estimates may eventually be ``shrunk'' closer to the mean when we do the mixed model. Shrinkage, the tendency of extreme (and poorly estimated) fixed effect parameters to be estimated as less extreme in a mixed model, is a general property of mixed models \citep[p. 152]{pinheiro_mixed-effects_2000}.

It turns out that two out of three of the outlier genotypes have small sample sizes, although the third is near the maximum sample size:
<<>>=
gentab <- with(dat.tf,table(gen))
summary(as.numeric(gentab))
gentab[c("5","6","34")]
@ 
We should nonetheless take care to see if the coefficients for these genotypes from the GLMM are still outliers, and take the same precautions as we usually do for outliers. For example, we can look back at the original data to see if there is something weird about the way those genotypes were collected, or try re-running the analysis without those genotypes to see if the results are robust.\footnote{Other options such as weakening the distributional assumptions on the random effects (making them completely nonparametric or using a fatter-tailed distribution such as the Student $t$) are beyond the scope of this note, although they can be achieved via general-purpose tools such as WinBUGS or AD Model Builder.}

\section{Fitting and evaluating the full GLMM}
Now we (try to) build and fit a full model, using \code{glmer} in the \code{lme4} package\footnote{As of this writing, \code{glmer} and \code{lmer} are nearly synonymous: calling \code{lmer} with a \code{family} argument automatically hands off to \code{glmer} to run a GLMM.  This may change in the future, and using \code{glmer} makes it clearer that one is really trying to run a GLMM rather than a LMM.}. This model has random effects for all genotype and population $\times$ treatment random effects, and for the nuisance variables for the rack and germination method (\code{status}). (Given the mean-variance relationship we saw it's pretty clear that we are going to have to proceed eventually to a model with overdispersion, but we fit the Poisson model first for illustration.)
<<full1,cache=TRUE>>=
mp1 <- glmer(total.fruits ~ nutrient*amd +
             rack + status +
             (amd*nutrient|popu)+
             (amd*nutrient|gen),
             data=dat.tf, family="poisson")
@ 
This fits without any (apparent) problems --- the model converges without warnings. 

We can use the Pearson residuals (i.e., $r_P = (y_i-u_i)/\sqrt{Var_i}$, where the variance is $\lambda$ and is therefore the fitted value for each observation) to assess overdispersion quantitatively,
although it's really unnecessary in this case because the pattern (Figure~\ref{fig:VM}) is so obvious:
if $r_{Pi}$ are the Pearson residuals and $d$ is the number of residual
degrees of freedom, we want $\sum{r_{Pi}^2} \approx d$ --- or
$\sum{r_{Pi}^2}/d \approx 1$, or more precisely,
$\sum{r_{Pi}^2} \sim \chi^2_d$ \citep[p. 208: but note warnings about applying this criterion for data with small sample means]{venables_modern_2002}.

We use a convenience function, \code{overdisp\_fun}, defined
in \code{glmm\_funs.R}:
<<overdisp>>=
overdisp_fun(mp1)  
@ 
This shows that the data are (extremely) over-dispersed, given the model
(the $p$-value is so low --- less than $10^{-308}$ --- that R just reports 0).

The \emph{deviance} of the model (in this case, a penalized version of
$-2$ times the log-likelihood) gives a similar value (also approximate for the
purposes of inference):
<<q.overdisp>>=
deviance(mp1)
@ 

Now we add the observation-level random effect to the model to account
for overdispersion, which gives us
a lognormal-Poisson model for the individual
samples \citep{elston_analysis_2001}:
<<full2,cache=TRUE>>=
mp2 <- update(mp1,.~.+(1|X))
@ 
We get a warning that the 
\code{Number of levels of a grouping factor for the random effects
is *equal* to n, the number of observations}, but we know this --- we did it on
purpose. The model also takes much longer to fit.

We start by looking at the variance components.  In particular, if we look
at the correlation matrix among the genotype random effects, we see
a perfect correlation:
<<>>=
attr(VarCorr(mp2)$gen,"correlation")
@ 

The \code{printvc} is a utility function that prints the variance-covariance
matrices along with an equals sign (=) appended to any covariance component
that corresponds to a perfect (or nearly perfect $\pm 1$ correlation):

<<>>=
printvc(mp2)
@ 

We'll try getting rid of the correlations between clipping
(\code{amd}) and nutrients, using \code{amd+nutrient} instead
of \code{amd*nutrient} in the random effects specification
(here it seems easier to re-do the model rather than 
using \code{update} to add and subtract terms):
<<full3,cache=TRUE>>=
mp3 <- glmer(total.fruits ~ nutrient*amd +
             rack + status +
             (amd+nutrient|popu)+
             (amd+nutrient|gen)+(1|X),
             data=dat.tf, family="poisson")
@ 


Unfortunately, we still have a problem with
perfect correlations among the random effects
terms:
<<>>=
printvc(mp3)
@ 

For some models (e.g. random-slope models), it is possible to fit
random effects models in such a way that the correlation between
the different parameters (intercept and slope in the case of random-slope
models) is constrained to be zero, by fitting a model
like \verb!(1|f)+(0+x|f)!; unfortunately, because of the way \code{lme4} is set up,
this is considerably more difficult with categorical predictors (factors).

We have to reduce the model further in some way in order not
to overfit (i.e., in order to avoid perfect $\pm 1$ correlations
among random effects).  It looks like we can't allow both nutrients and
clipping in the random effect model at either the population or the genotype
level.  However, it's hard to know whether we should proceed with
\code{amd} or \code{nutrient} in the model.  

A convenient way to proceed if we are going to try fitting several
different combinations of random effects is to fit the model with 
all the fixed effects but only observation-level random effects, and then to use \code{update} to add
various components to it:
<<mpobs,cache=TRUE>>=
mp_obs <- glmer(total.fruits ~ nutrient*amd +
               rack + status +
               (1|X),
               data=dat.tf, family="poisson")
@ 

Now, for example, \verb!update(mp_obs,.~.+(1|gen)+(amd|popu))!
fits the model with intercept random effects at the genotype level
and variation in clipping effects across populations.
\footnote{One can also model the interaction of categorical
  predictors (such as clipping) with a random effect in a slightly more
  restricted way, for example coding the \verb+(amd|popu)+ term
  in the model above as \verb+(1|popu/amd)+ instead.  In effect,
  we are treating the different \code{amd} levels as nested groups
  within populations. This approach is slightly less flexible (for
  example, one cannot have negative correlations between treatment
  levels) but slightly easier computationally.  In this case, trying
  to fit models with \code{amd} or \code{nutrient} nested within
  population or genotype leads to zero variances for the lowest level --- in
  other words, just as we concluded before, we don't have enough
  power to detect variation in treatment effects among groups.}


It is not too hard to discover by fitting various models such as
<<eval=FALSE>>=
amd_gen_amd_popu  <- update(mp_obs,. ~ . +(amd|gen)+(amd|popu))
@ 
that \emph{every} model more complicated than a simple
random intercept at the genotype and population levels
(\code{(1|gen)+(1|popu)}) is subject to problems with
zero variances or perfect correlations among variance components.
Thus  we used this genotype-intercept+population-intercept model for
the rest of our analysis.
\footnote{In our original analysis of this data,
we (slightly desperately) decided to fit a whole
series of models with \code{amd}, \code{nutrient}, just an intercept,
or no random effect in either the population or the genotype model, leading to a total of 24 models. 
We ran all 24 models in a batch file, and saved the whole set as a list (\verb+mp_fits+)
with appropriate names; we did some magic to generate the set of 24 models
automatically, but we could also have done it by brute force
enumeration, for example:
<<eval=FALSE>>=
amd_gen_amd_popu  <- update(mp_obs,. ~ . +(amd|gen)+(amd|popu))
amd_gen_nut_popu  <- update(mp_obs,. ~ . +(amd|gen)+(nutrient|popu))
@ 
(\emph{et cetera} \ldots)

We then used AIC (although there are some issues with applying 
AIC to mixed models
\citep{greven_non-standard_2008,greven_behaviour_2010}, 
it is a reasonable rough-and-ready
indication of the relative goodness of fit of different models)
to keep models with $\daic>10$ and without fitting problems \ldots
which ended up being only a few of the simplest ones.
}
In other words, while it's biologically plausible that there is
some variation in the nutrient or clipping effect at the genotype 
or population levels, and while this variation may be what
we are most interested in, 
it seems that we really don't have enough data to speak
confidently about these effects.

Fit the genotype+population-intercept model:
<<fitintgenintpopu,cache=TRUE>>=
mp4 <- update(mp_obs,. ~ . + (1|gen) + (1|popu))
@ 

All three levels (observation, genotype,
population) have non-zero variance, and there
is no possibility of correlation since we are
fitting only a single random effect at each level:
<<>>=
printvc(mp4)
@ 

Testing overdispersion:
<<>>=
overdisp_fun(mp4)
@ 
The reduced model (\code{mp4}) now meets the residual
deviance criterion (in fact, the residual deviance is
now \emph{smaller} than expected).

Next we apply a function from \verb+glmm_funs.R+ 
that defines a location-scale plot (Figure~\ref{fig:locscale}: this is plot \#3 in
the standard \code{plot.lm} sequence). 
The variance pattern is slightly alarming, with 
higher variance for small fitted values.  Some but not all
of this pattern is generated by zeros in the original data,
so later on (in Part~2) we'll try fitting a zero-inflated model.
<<locscale,fig=TRUE>>=
locscaleplot(mp4,col=ifelse(dat.tf$total.fruits==0,"blue","black"))
@
\begin{figure}[ht]
  \centering
  \includegraphics[width=.9\linewidth]{locscale}
  \caption{Location-scale plot, with superimposed loess fit.
  Zeros in the data are marked in blue.}
  \label{fig:locscale}
\end{figure}

\section{Inference}
Now that we have  a full model, we want to make inferences about the parameters. 

\subsection{Random effects}

We could use \code{coefplot2(mp4,ptype="vcov",intercept=TRUE)} to
look at a graphical representation of the variance terms
(without error bars, because \code{glmer} doesn't give us
information about the uncertainty of the variance terms),
but in this model with only three variance components
it's not much more informative than printing the values,
as we did above with \code{printvc(mp4)}.

If we want to test the significance of the random effects
we can fit reduced models and run likelihood ratio tests
via \code{anova}, keeping in mind that in this case
(testing a null hypothesis of zero variance, where
the parameter is on the boundary of its feasible region)
the reported $p$ value is approximately twice what
it should be \citep{pinheiro_mixed-effects_2000}:\footnote{The \code{pchibarsq} function in the \code{emdbook} package can be
used for these kinds of tests, although in this case it's overkill (we get
the same answer by simply dividing the $p$ values by 2)}

<<redfit,cache=TRUE>>=
mp4v1 <- update(mp_obs,.~.+(1|popu)) ## popu only (drop gen)
mp4v2 <- update(mp_obs,.~.+(1|gen))  ## gen only (drop popu)
@ 

<<var1anova>>=
anova(mp4,mp4v1)
anova(mp4,mp4v2)
@ 

For various forms of linear mixed models, the \pkglink{RLRsim} package
can do efficient simulation-based hypothesis testing of variance components --- unfortunately,
that doesn't include GLMMs.  If we are sufficiently patient we can do hypothesis
testing via brute-force parametric bootstrapping where we repeatedly simulate data from
the reduced (null) model, fit both the reduced and full models to the simulated
data, and compute the distribution of the deviance (change in -2 log likelihood).
The code below took about half an hour on a reasonably modern
desktop computer:

<<echo=FALSE>>=
load("pbbatch.RData")
@ 

<<eval=FALSE>>=
simdev <- function() {
  newdat <- simulate(mp4v1)
  reduced <- refit(mp4v1,newdat)
  full <- refit(mp4,newdat)
  2*(c(logLik(full)-logLik(reduced)))
}
library(plyr)
set.seed(101)
## raply is a convenient wrapper for repeating the simulation many times
nulldist <- raply(200,simdev(),.progress="text")
## zero spurious (small) negative values
nulldist[nulldist<0 & abs(nulldist)<1e-5] <- 0
@ 

Calculate the observed statistic and compare it
to the null distribution (including the observed
value in the null distribution, because under
the null hypothesis it would be in the null
distribution):
<<>>=
obsdev <- 2*c(logLik(mp4)-logLik(mp4v1))
mean(c(nulldist,obsdev)>=obsdev)
@ 

The true $p$-value is actually closer to
0.05 than 0.02. In other words, here 
the deviations from the original statistical model
from that for which the original ``$p$ value is inflated by 2''
rule of thumb was derived --- fitting a GLMM instead of a LMM,
and using a moderate-sized rather than an arbitrarily large
(asymptotic) data set --- have made the likelihood ratio test
liberal (increased type~I error) rather than conservative
(decreased type~I error).

We can visualize the distribution by
drawing a Q-Q plot, 
comparing the simulated distribution to the
the theoretically expected $\bar \chi^2_1$ distribution
rather than the normal distribution
(Figure~\ref{fig:qqdist}):
<<qqdist,fig=TRUE>>=
library(emdbook)
print(qqmath(c(nulldist,obsdev), 
             distribution=qchibarsq,
             aspect="iso",
             xlab="theoretical",
             ylab="simulated",
             panel = function(x, ...) {
               panel.abline(a=0,b=1,col="gray")
               panel.qqmath(x, ...)
               panel.abline(v=obsdev,lty=2)
               panel.abline(h=obsdev,lty=2)
             }))
@ 
We can see that the null distribution has more
large values than expected under the
theoretical distribution.

\begin{figure}[ht]
  \centering
  \includegraphics[width=.9\linewidth]{qqdist}
  \caption{Q-Q plot of the simulated null deviance between
  the models with and without a variance component at the genotype level,
  compared against the $\bar \chi^2_1$ distribution \citep{GoldmanWhelan2000}
}
  \label{fig:qqdist}
\end{figure}

We can also inspect the random effects estimates themselves
(in proper statistical jargon, these might be considered ``predictions''
rather than ``estimates'' \citep{robinson_that_1991}:
Figure~\ref{fig:reffplot}).
We use the built-in \code{dotplot} method for the random effects
extracted from \code{glmer} fits (i.e. \code{ranef(model,postVar=TRUE)}),
which returns a list of plots, one for each random effect level in the
model.  (We used a bit of trickery from the \code{gridExtra} package
to arrange these plots on the page, and we omit the observation-level
random effects, which are returned as the \verb+$obs+ component of
the list of plots.)

<<reffplot,fig=TRUE>>=
## squash margins a bit ...
pp <- list(layout.widths=list(left.padding=0, right.padding=0),
           layout.heights=list(top.padding=0, bottom.padding=0))
r2 <- ranef(mp4,postVar=TRUE)
d2 <- dotplot(r2,  par.settings=pp)
library(gridExtra)
grid.arrange(d2$gen,d2$popu,nrow=1)
@ 

As expected from the similarity of the variance estimates,
the population-level estimates (the only shared component)
do not differ much between the two models. There
is a hint of regional differentiation --- the
Spanish populations have higher fruit sets
than the Swedish and Dutch populations.
Genotype~34 again looks a little bit unusual.


\begin{figure}[ht]
  \centering
  \includegraphics[width=.9\linewidth]{reffplot}
  \caption{Random effects estimates/predictions for 
    the final model}
  \label{fig:reffplot}
\end{figure}


\subsection{Fixed effects}

Now we want to do inference on the fixed effects.
We use the \code{drop1} function to assess both the AIC difference
and the likelihood ratio test between models.  
(In \code{glmm\_funs.R} we define a convenience function 
\code{dfun} to convert the AIC tables
returned by \code{drop1} (which we will create momentarily)
into $\daic$ tables.)
Although the likelihood ratio test
(and the AIC) are asymptotic tests, comparing fits between full and reduced models
is still more accurate than the Wald (curvature-based) tests shown in the
\code{summary} tables for \code{glmer} fits.%
\footnote{In general you should \emph{not} explore both AIC- and LRT-based comparisons
          when analyzing a models;
          they represent different inferential approaches, and you should decide
          before analyzing the model which you prefer, to avoid the temptation of
          data snooping or ``cherry-picking'' (i.e., comparing the results of the
                                               two approaches and choosing the ones you prefer).
          Here we show both analyses for illustration.}


<<drop1,cache=TRUE>>=
(dd_AIC <- dfun(drop1(mp4)))
@ 
<<echo=FALSE>>=
dd_AIC
@ 
<<drop1LRT,cache=TRUE>>=
(dd_LRT <- drop1(mp4,test="Chisq"))
@ 
<<echo=FALSE>>=
dd_LRT
@ 

On the basis of these comparisons, there appears 
to be a very strong effect of rack
and weak effects of status and of the interaction term.
Dropping the \code{nutrient:amd} interaction gives
a (slightly) increased AIC
($\daic=\Sexpr{round(dd_AIC$dAIC[4],1)}$), so the full model has the 
best expected predictive capability (by a small margin).  On the other hand,
the $p$-value is slightly above 0.05
($p=\Sexpr{round(dd_LRT$`Pr(Chi)`[4],2)}$).

At this point we remove the non-significant
interaction term so we can test the main effects.
\footnote{There is significant controversy
over this point. Some researchers state that centering the input variables
(in this case using \code{contr.sum} to change to ``sum to zero'' contrasts)
will make the main effects interpretable even in the presence of the interaction term
\cite{schielzeth_simple_2010}, while others say one
should almost never violate the ``principle of marginality'' in this way
\citep{venables_exegeses_1998}.  Dropping the interactions may be an
appropriate way forward \citep{pinheiro_mixed-effects_2000}, although
others say that dropping non-significant terms from models
is almost never justified \citep{harrell_regression_2001,whittingham_why_2006} \ldots}
(We don't worry about removing \code{status} because
it measures an aspect of experimental design that
we want to leave in the model whether it is significant or not.)

Once we have fitted the reduced model,
we can run the LRT via \code{anova}:
<<mp6,cache=TRUE>>=
mp5 <- update(mp4, . ~ . - amd:nutrient)
@ 
<<mp6anova>>=
anova(mp5,mp4)
@ 

We can also test all the reduced models:
<<dropamdnut,cache=TRUE>>=
dd_AIC2 <- dfun(drop1(mp5))
@ 
<<echo=FALSE>>=
dd_AIC2
@ 
<<drop2,cache=TRUE>>=
dd_LRT2 <- drop1(mp5,test="Chisq")
@ 
<<echo=FALSE>>=
dd_LRT2
@ 
In the reduced model, we find that both nutrients
and clipping have strong effects, whether measured
by AIC or LRT.

If we wanted to be still more careful about our
interpretation, we would try to relax the asymptotic assumption.
In classical linear models, we would do this by doing
$F$ tests with the appropriate denominator degrees of freedom.
In ``modern'' mixed model approaches, we might try to use
denominator-degree-of-freedom approximations such as the
Kenward-Roger (despite the controversy over these approximations,
they are actually available in the \pkglink{doBy} package
(see \url{www.warwick.ac.uk/statsdept/useR-2011/abstracts/290311-halekohulrich.pdf})), but they do not apply to GLMMs.

We can use a parametric bootstrap comparison 
between nested models to test fixed effects, as
we did above for random effects, with the caveat that
is computationally slow.
(Here we illustrate how to use functions in the \code{doBy}
package rather than the home-grown functions we used above.)

<<>>=
library(doBy)
@ 

The following command took about 35 minutes:
<<echo=FALSE>>=
load("pbbatch2.RData")
@ 
<<eval=FALSE>>=
pb1 <- PBrefdist(mp4,mp5)
@ 
Once we have the null distribution, doing inference
with it is quick.
<<>>=
PBmodcomp(mp4,mp5,ref=pb1)
@ 
This compares the likelihood ratio and
parametric bootstrap tests: the parametric
bootstrap test is slightly more conservative,
but both tests have $0.05<p<0.1$.

At this point, we could re-check our assumptions. 

<<>>=
overdisp_fun(mp5)  
@ 

If we did \code{locscaleplot(mp5)} we would see
very much the same pattern as in Figure~\ref{fig:locscale}.

In addition, we can check the normality of the random effects and find they are reasonable (Fig. \ref{fig:reN}).
We use \code{ldply} from the \code{reshape} package to collapse the list of random effect values into a data
frame (we might have to do something different if there were more than one random effect within each level,
e.g. a model including \code{(nutrient|gen)}).  The fancy \code{panel} code in the figure adds a reference
line to the Q-Q plot: see \code{?qqmath}.
<<reN, fig=TRUE>>=
reStack <- ldply(ranef(mp5))
print( qqmath( ~`(Intercept)`|.id, data=reStack, scales=list(relation="free"),
              prepanel = prepanel.qqmathline,
              panel = function(x, ...) {
                panel.qqmathline(x, ...)
                panel.qqmath(x, ...)
            },
              layout=c(3,1)))
@ 
\begin{figure}[ht]
  \centering
  \includegraphics[width=.6\linewidth]{reN}
  \caption{Quantile-quantile plot of the genotype random effects of the final model.}
  \label{fig:reN}
\end{figure}

\section{Conclusions}

Our final model includes fixed effects of nutrients 
and clipping, as well as the nuisance variables \code{rack} and \code{status};
observation-level random effects to account for overdispersion;
and variation in overall fruit set at the population and genotype levels.
However, we don't (apparently) have quite enough information to estimate the
variation in clipping and nutrient effects, or their interaction,
at the genotype or population levels.  There is a strong overall positive effect
of nutrients and a slightly weaker negative effect of clipping.
The interaction between
clipping and nutrients is only weakly supported (i.e. the $p$-value is not
very small), but it is positive and about the same magnitude as the
clipping effect, which is consistent 
with the statement that ``nutrients cancel out the effect of herbivory''.
To test this equivalence formally, we could set the analysis up as a one-way
layout (with levels [\code{Nutrient1:Unclipped}, 
\code{Nutrient1:Clipped}, 
\code{Nutrient8:Clipped},
\code{Nutrient8:Unclipped}]); the contrast between 
\code{Nutrient1:Unclipped} and \code{Nutrient8:Clipped} would then
test the hypothesis (see
\url{http://glmm.wikidot.com/local--files/examples/culcita_glmm.pdf} for a
similar example).

This concludes part~1.  To see the comparisons of the same analysis with
a variety of different estimation approaches using different R packages, see part~2.

Save results to use in part~2:
<<>>=
save("dat.tf","mp4",file="Banta_part1.RData")
@ 
\bibliographystyle{ESA1009}
\bibliography{glmm}


\end{document}
