The extension of the classical linear models to generalized linear models has two parts: a distribution from the exponential family and a link function.
The first part extends the linear model to response variables that are members of a large family of distributions called the exponential family. Members of the exponential family of distributions have probability distribution functions for an observed response in this general form:
where a(∙), b(∙), and c(∙) depend on the distribution of the response variable. The parameter θ is a location parameter that is often called the canonical parameter, and ϕ is called the dispersion parameter. The function a(ϕ) is usually of the form a(ϕ)= ϕ/ ω, where ω is a known constant or weight that may vary from one observation to another. (In Minitab, when weights are given the function a(ϕ), is adjusted accordingly.)
Members of the exponential family can be discrete distributions or continuous distributions. Examples of continuous distributions that are members of the exponential family are the normal and the gamma distributions. Examples of discrete distributions that are members of the exponential family are the binomial and the Poisson distributions. The following table gives the characteristics of some of these distributions.
Distribution | ϕ | b(θ) | a(φ) | c(y, ϕ) |
Normal | σ^{2} | θ^{2}/2 | φω | |
Binomial | 1 | φ/ω | -ln(y!) | |
Poisson | 1 | exp(θ) | φ/ω |
The second part is the link function. The link function relates the mean of the response in the i^{th} observation to a linear predictor in this form:
The choice of the link function in the second part depends upon the specific distribution of the exponential family of the first part. In particular, each distribution in the exponential family has a special link function called the canonical link function. This link function satisfies the equation g ( μ_{ i}) = X_{ i}' β= θ, where θ is the canonical parameter. The canonical link function results in some desirable statistical properties of the model. The logit is the canonical link function for binomial models and provides an estimate of the odds ratios.
Term | Description |
---|---|
μ_{i} | the mean response of the i^{th} row |
g(μ_{i}) | the link function |
X | the vector of predictor variables |
β | the vector of coefficients associated with the predictors |
To calculate the prediction, invert the link function for the model, which is the logit link function. The formula is:
Term | Description |
---|---|
exp(·) | the exponential function |
X' | the transpose of the vector of points to predict for |
the vector of estimated coefficients |
Term | Description |
---|---|
Φ | 1, for the binomial and Poisson models |
x_{h} | the vector of a new design point |
the transpose of x_{h} | |
X | the design matrix |
W | the weight matrix |
the first derivative of the link function evaluated at | |
the predicted mean response |
The confidence limits use the Wald approximation method. This is the formula for a 100(1 − α)% two-sided confidence interval:
Term | Description |
---|---|
the inverse of the link function evaluated at x | |
the transpose of the vector of the predictors | |
the vector of estimated coefficients | |
the value of the inverse cumulative distribution function for the normal distribution evaluated at | |
α | the significance level |
X | the design matrix |
W | the weight matrix |
1, for binomial and Poisson models |