With the average linkage method, the distance between two clusters is the average distance between an observation in one cluster and an observation in the other cluster. The average distance is calculated with the following distance matrix:

Term | Description |
---|---|

d_{mj} | distance between clusters m and j |

m | merged cluster that consists of clusters k and l, with m = (k,i) |

d_{kj} | distance between clusters k and j |

d_{lj} | distance between clusters l and j |

N_{k} | number of observations in cluster k |

N_{l} | number of observations in cluster l |

N_{m} | number of observations in cluster m |

With the centroid linkage method, the distance between two clusters is the distance between the cluster centroids or means. The distance is calculated with the following distance matrix:

Term | Description |
---|---|

d_{mj} | distance between clusters m and j |

m | merged cluster that consists of clusters k and l, with m = (k,i) |

d_{kj} | distance between clusters k and j |

d_{lj} | distance between clusters l and j |

N_{k} | number of observations in cluster k |

N_{l} | number of observations in cluster l |

N_{m} | number of observations in cluster m |

With the complete linkage method (also called furthest neighbor method), the distance between two clusters is the maximum distance between an observation in one cluster and an observation in the other cluster. The complete distance is calculated with the following distance matrix:

d_{mj} = max (d_{kj}, d_{lj})

Term | Description |
---|---|

d_{mj} | distance between clusters m and j |

m | merged cluster that consists of clusters k and l, with m = (k,i) |

d_{kj} | distance between clusters k and j |

d_{lj} | distance between clusters l and j |

With McQuitty's linkage method, the distance is calculated with the following distance matrix:

Term | Description |
---|---|

d_{mj} | distance between clusters m and j |

m | merged cluster that consists of clusters k and l, with m = (k,i) |

d_{kj} | distance between clusters k and j |

d_{lj} | distance between clusters l and j |

With the median linkage method, the distance between two clusters is the median distance between an observation in one cluster and an observation in the other cluster. The median distance is calculated with the following distance matrix:

Term | Description |
---|---|

d_{mj} | distance between clusters m and j |

m | merged cluster that consists of clusters k and l, with m = (k,i) |

d_{kj} | distance between clusters k and j |

d_{lj} | distance between clusters l and j |

d_{kl} | distance between clusters k and l |

With the single linkage method (also called nearest neighbor method), the distance between two clusters is the minimum distance between an observation in one cluster and an observation in the other cluster. When observations lie close together, single linkage tends to identify long chain-like clusters, with relatively large distances separating observations at either end of the chain.

The distance is calculated with the following distance matrix:

d_{mj }= min (d_{kj}, d_{lj})

Term | Description |
---|---|

d_{mj} | distance between clusters m and j |

m | merged cluster that consists of clusters k and l, with m = (k,i) |

d_{kj} | distance between clusters k and j |

d_{lj} | distance between clusters l and j |

With Ward's linkage method, the distance between two clusters is the sum of squared deviations from points to centroids. The objective of Ward's linkage is to minimize the within-cluster sum of squares. The distance is calculated with the following distance matrix:

With Ward's linkage method, the distance between two clusters can be larger than d_{max}, which is the maximum value in the original distance matrix, **D**. If this happens, the similarity is negative.

Term | Description |
---|---|

d_{mj} | distance between clusters m and j |

m | merged cluster that consists of clusters k and l, with m = (k,i) |

d_{kj} | distance between clusters k and j |

d_{lj} | distance between clusters l and j |

d_{kl} | distance between clusters k and l |

N_{j} | number of observations in cluster j |

N_{k} | number of observations in cluster k |

N_{l} | number of observations in cluster l |

N_{m} | number of observations in cluster m |