The pair_normalise option for TreeSequence.pair_coalescence_counts should result in the output being divided by the possible number of coalescing sample pairs---so that if pair_normalise and span_normalise are both set, the output will sum to one over the time dimension. However, if there are isolated samples over a portion of the tree sequence, then this won't be the case---the denominator is incorrect.
Specifically, let $c_i(x)$ be the number of sample pairs coalescing in node $i$ at position $x$. Without normalisation, pair_coalescence_counts returns $C_i = \int_a^b c_i(x) dx$ where the integral is over the interval $[a, b)$ (ignoring "empty" trees). Let $p(x)$ be the number of potentially coalescing sample pairs at position $x$ (that is, omitting isolated samples and pairs where one sample is a descendant of the other). The "full normalisation" should return $C_i / \int_a^b p(x) dx$. If all sample pairs are present over the entirety of the sequence, then $\int_a^b p(x) dx = {n \choose 2} (b - a)$, and the current implementation is hunky dory. But otherwise $\int_a^b p(x) dx < {n \choose 2} (b - a)$ and the normalised output is biased upwards.
A consequence is that pair coalescence rates (which rely on the normalised counts) are biased when there are isolated samples. For example, the figure below shows empirical rates estimated by TreeSequence.pair_coalescence_rates (colored lines that are individual simulations) around the theoretical expectation for the Zigzag model (black line). The "full" panel is when all samples are present, and the "drop" panel is when ancestry is partially removed so that the number of non-isolated samples varies dramatically across the sequence.

The
pair_normaliseoption forTreeSequence.pair_coalescence_countsshould result in the output being divided by the possible number of coalescing sample pairs---so that ifpair_normaliseandspan_normaliseare both set, the output will sum to one over the time dimension. However, if there are isolated samples over a portion of the tree sequence, then this won't be the case---the denominator is incorrect.Specifically, let$c_i(x)$ be the number of sample pairs coalescing in node $i$ at position $x$ . Without normalisation, $C_i = \int_a^b c_i(x) dx$ where the integral is over the interval $[a, b)$ (ignoring "empty" trees). Let $p(x)$ be the number of potentially coalescing sample pairs at position $x$ (that is, omitting isolated samples and pairs where one sample is a descendant of the other). The "full normalisation" should return $C_i / \int_a^b p(x) dx$ . If all sample pairs are present over the entirety of the sequence, then $\int_a^b p(x) dx = {n \choose 2} (b - a)$ , and the current implementation is hunky dory. But otherwise $\int_a^b p(x) dx < {n \choose 2} (b - a)$ and the normalised output is biased upwards.
pair_coalescence_countsreturnsA consequence is that pair coalescence rates (which rely on the normalised counts) are biased when there are isolated samples. For example, the figure below shows empirical rates estimated by
TreeSequence.pair_coalescence_rates(colored lines that are individual simulations) around the theoretical expectation for the Zigzag model (black line). The "full" panel is when all samples are present, and the "drop" panel is when ancestry is partially removed so that the number of non-isolated samples varies dramatically across the sequence.