Neighbor-Joining method
Previous Top Next

The neighbor-joining method (NJ) is a distance based method (requires a distance matrix) and uses the star decomposition method.

Algorithm
Neighbor-joining is a recursive algorithm. Each step in the recursion consists of the following steps:
1.   Based on the current distance matrix calculate a modified distance matrix Q (see below).
2.   Find the least distant pair of nodes in Q (= the closest neighbors = the pair with the lowest distance value). Create a new node on the tree joining the two closest nodes: the two nodes are linked by their common ancestral node.
3.   Calculate the distance of each of the nodes in the pair to their ancestral node.
4.   Calculate the distance of all nodes outside of this pair to their ancestral node.
5.   Start the algorithm again, considering the pair of joined neighbors as a single taxon (the terminal nodes are replaced by their ancestral node and the ancestral node is then treated as a terminal node) and using the distances calculated in the previous step.

Tab. 1. Formula used in the NJ clustering method
Description
Formula
Distance matrix Q
Each member in the distance matrix Q is calculated as follows:
Q(i,j) = (r-2) * d(i,j) - sum[k=1..r](d(i,k)) - sum[k=1..r](d(j,k))
Neighbors in the pair
For each neighbor in the pair just joined, use the following formula to calculate to the new node:
d(f,u) = 0.5 * d(f,g) + 1/(2*(r-2)) * [sum[k=1..r](d(f,k)) - sum[k=1..r](d(g,k))]

With:
f and g are the paired taxa
u is the newly generated node
Each node/taxon outside the pair
The distance of the other taxa to the new node the distance to the new node is calculated as follows:
d(u,k) = 0.5 * [ d(f,k) - d(f,u) ] + 0.5 * [ d(g,k) - d(g,u) ]

With:
u is the new node
k is the node for which we want to calculate the distance
f and g are the members of the pair just joined


Negative branch lengths
NJ represents the data in an additive tree. Therefore it can assign negative branch lengths. Usually, branch lengths can be interpreted as an estimate for the substitutions. However, here we have difficulties in doing so.
If this occurs one can set negative branch length to zero and transfer the difference to the adjacent branch length so that the total distance between an adjacent pair of terminal nodes remains unaffected. This does not alter the overall topology of the tree (Kuhner and Felsenstein, 1994).


Advantages of NJ
·     fast (suited for large datasets)
·     does not require ultrametric data: suited for datasets comprising lineages with largely varying rates of evolution
·     permits correction for multiple substitutions


Disadvantages of NJ
·     information is reduced (distance matrix based)
·     gives only one tree (out of several possible trees)
·     the resulting tree depends on the model of evolution used


References
Kuhner, M.K., Felsenstein, J. (1994): A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol Biol Evol. 11(3): 459-68.

Saitou, N.,  Nei, M. (1987): The neighbor-joining method: a new method for reconstructing phylogenetic trees. In: Molecular Biology and Evolution. 1987, Vol 4(4), p. 406-425.



See
< Cluster analysis