Should we BQSR ancient genomes?
Published:
A regular protocol for variant discovery pipeline includes base call recalibration after an alignment to a reference genome. This is usually done by GATK. Recalibration accounts for the fact that quality scores sometimes can depend on a cluster density of the sequencing machine, length of the read, position of a bp in the read and other factors. When sequencing ancient DNA, all these parameters go out the window because of DNA post mortem damage (PMD). So should we recalibrate anyway?
Here is what Kousathanas et al say in their paper about heterozygosity in low-coverage genomes:
first, [.BQSR.] cannot be applied to ancient DNA since they do not take PMD into account. Second, both require a reference sequence as well as knowledge of polymorphic positions, such that they can be excluded from the analysis.
See their paper for more details how they overcome this problem. Instead of classical BQSR we could use MapDamage2 –rescale option that specifically looks into deamination profile of the sample.
Update from Gaunitz et al., 2018 Science paper
Stright from the supplementary:
We explored the impact of base alignment quality (BAQ) and extended BAQ (eBAQ) (110) on low coverage samples. We down-sampled five genomes, including two ancient and three modern horses. Using ANGSD, we estimated the heterozygosity levels on chromosome 31, from the original depth-of-coverage to 1.0X (Fig. S19). We found that considering eBAQ results in filtering fewer variants than the original BAQ. This is especially true for coverage lower than 10.0X, where the eBAQ procedure leads to an over-estimate of the nucleotide diversity. The original BAQ, instead, produces a more stable heterozygosity estimate from 2.0X coverage and upwards. As the original BAQ is comparable across samples of different coverages, we used this procedure throughout all analyses.