In average linkage, the distance between two clusters is the average distance between a variable in one cluster and a variable in the other cluster. The average distance is calculated with the following distance matrix:

Term | Description |
---|---|

d_{mj} | distance between clusters m and j |

m | merged cluster that consists of clusters k and l, with m = (k,i) |

d_{kj} | distance between clusters k and j |

d_{lj} | distance between clusters l and j |

N_{k} | number of variables cluster k |

N_{l} | number of variables in cluster l |

N_{m} | number of variables in cluster m |

In centroid linkage, the distance between two clusters is the distance between the cluster centroids or means. The distance is calculated with the following distance matrix:

Term | Description |
---|---|

d_{mj} | distance between clusters m and j |

m | merged cluster that consists of clusters k and l, with m = (k,i) |

d_{kj} | distance between clusters k and j |

d_{lj} | distance between clusters l and j |

N_{k} | number of variables cluster k |

N_{l} | number of variables in cluster l |

N_{m} | number of variables in cluster m |

With the complete linkage method (also called furthest neighbor method), the distance between two clusters is the maximum distance between a variable in one cluster and a variable in the other cluster. The complete distance is calculated with the following distance matrix:

d_{mj} = max (d_{kj}, d_{lj})

Term | Description |
---|---|

d_{mj} | distance between clusters m and j |

m | merged cluster that consists of clusters k and l, with m = (k,i) |

d_{kj} | distance between clusters k and j |

d_{lj} | distance between clusters l and j |

With McQuitty's linkage method, the distance is calculated with the following distance matrix:

Term | Description |
---|---|

d_{mj} | distance between clusters m and j |

m | merged cluster that consists of clusters k and l, with m = (k,i) |

d_{kj} | distance between clusters k and j |

d_{lj} | distance between clusters l and j |

In median linkage, the distance between two clusters is the median distance between a variable in one cluster and a variable in the other cluster. The median distance is calculated with the following distance matrix:

Term | Description |
---|---|

d_{mj} | distance between clusters m and j |

m | merged cluster that consists of clusters k and l, with m = (k,i) |

d_{kj} | distance between clusters k and j |

d_{lj} | distance between clusters l and j |

d_{kl} | distance between clusters k and l |

With the single linkage method (also called nearest neighbor method), the distance between two clusters is the minimum distance between a variable in one cluster and a variable in the other cluster.

The distance is calculated with the following distance matrix:

d_{mj }= min (d_{kj}, d_{lj})

Term | Description |
---|---|

d_{mj} | distance between clusters m and j |

m | merged cluster that consists of clusters k and l, with m = (k,i) |

d_{kj} | distance between clusters k and j |

d_{lj} | distance between clusters l and j |

In Ward's linkage, the distance between two clusters is the sum of squared deviations from points to centroids. The objective of Ward's linkage is to minimize the within-cluster sum of squares. The distance is calculated with the following distance matrix:

In Ward's linkage, the distance between two clusters can be larger than d(max), the maximum value in the original distance matrix, **D**. If this happens, the similarity will be negative.

Term | Description |
---|---|

d_{mj} | distance between clusters m and j |

m | merged cluster that consists of clusters k and l, with m = (k,i) |

d_{kj} | distance between clusters k and j |

d_{lj} | distance between clusters l and j |

d_{kl} | distance between clusters k and l |

N_{j} | number of variables in cluster j |

N_{k} | number of variables in cluster k |

N_{l} | number of variables in cluster l |

N_{m} | number of variables in cluster m |