Creating consensus fasta using iupac codes

less than 1 minute read

Published: October 27, 2018

Fasta files can be generated from vcf calls. There are two ways of doing that: (1) concatenate snips together (this can be done using either variants only or calling monomarphic (hom ref) variants as well and concatenating them too); (2) use reference genome as a backbone and incorporate variants into the reference. To incorporate information about heterozygotes, IUPAC substitution codes can be used. Here is a collection of scripts available:

Using fasta reference

the best and most convenient way: (allows to choose a sample from multiVCF and BED mask)
```
bcftools consensus --iupac-codes --sample --mark
```
vcf2fasta.py - phased only
For GATK fans: FastaAlternateReferenceMaker.

No reference

seqTools vcf2fasta
VCF2FASTA_IUPACcoding.r - something complicated that can be adapted to your needs.
VCF-to-Tab_to_Fasta_IUPAC_Converter.py - this code can be used on BSNP file. Keep in mind, that if you need to filter out transitions (in case of aDNA) and you use BSNP output, you need a reference to see wich substitutions are transitions. Thus, additional masking is needed.
freebayes_vcf2fa.py - freebayes only

Share on

Twitter Facebook Google+ LinkedIn

Tools to deal with aDNA damage

less than 1 minute read

Published: October 26, 2018

mapDamage –rescale option changes quality scores on bam files

PMDtools - filters out contaminant reads based on deamination profiles AND adjust scores too. Works on SAM files.

ATLAS toolset can do BQSR

AntCaller - genotype caller for ancient genomes. Works better than GATK according to the paper.

SpAl - estimate the proportion of spurious alignments in ancient DNA. Works with any species, as long as there is data on variation in the genome.

Should we BQSR ancient genomes?

1 minute read

Published: August 14, 2016

A regular protocol for variant discovery pipeline includes base call recalibration after an alignment to a reference genome. This is usually done by GATK. Recalibration accounts for the fact that quality scores sometimes can depend on a cluster density of the sequencing machine, length of the read, position of a bp in the read and other factors. When sequencing ancient DNA, all these parameters go out the window because of DNA post mortem damage (PMD). So should we recalibrate anyway?

Alisa Vershinina, PhD

Creating consensus fasta using iupac codes

Using fasta reference

No reference

Share on

You May Also Enjoy

Dear Aspiring Artist

Tools to deal with aDNA damage

Should we BQSR ancient genomes?

Does MapDamage recalibration dramatically change Phred quality? Not really.