Need help with vcf2phylip?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

edgardomortiz
126 Stars 54 Forks GNU General Public License v3.0 90 Commits 1 Opened issues

Description

Convert SNPs in VCF format to PHYLIP, NEXUS, binary NEXUS, or FASTA alignments for phylogenetic analysis

Services available

!
?

Need anything else?

Contributors list

# 195,458
C
TeX
Shell
vcf
83 commits
# 106,887
Nim
Shell
Python
nim-lan...
3 commits
# 120,679
Perl
R
variant...
vcf
1 commit

vcf2phylip

DOI
Convert SNPs in VCF format to PHYLIP, NEXUS, binary NEXUS, or FASTA alignments for phylogenetic analysis

Brief description

This script takes as input a VCF file and will use the SNP genotypes to create a matrix for phylogenetic analysis in the PHYLIP (relaxed version), FASTA, NEXUS, or binary NEXUS formats. For heterozygous SNPs the consensus is made and the IUPAC nucleotide ambiguity codes are written to the final matrix(ces), any ploidy level is allowed and automatically detected. The code is optimized for large VCF matrices (hundreds of samples and millions of genotypes), for example, in our tests it processed a 20GB VCF (~3 million SNPs x 650 individuals) in ~27 minutes. The initial version of the script just produced a PHYLIP matrix but now we have added other popular formats, including the binary NEXUS file to run SNPs analysis with the SNAPP plugin in BEAST (only for diploid genotypes).

Additionally, you can choose a minimum number of samples per SNP to control the final amount of missing data. Since phylogenetic software usually root the trees at the first sequence in the alignment (e.g. RAxML, IQTREE, and MrBayes), the script also allows you to specify an OUTGROUP sequence that will be written in the first place in the alignment.

Compressed VCF files can be directly analyzed but the extension must be

.vcf.gz
.

The script has been tested with VCF files produced by pyrad v.3.0.66, ipyrad v.0.7.x, Stacks v.1.47, dDocent, GATK, and freebayes.

Please don't hesitate to open an

Issue
if you find any problem or suggestions for a new feature.

Usage

Just type

python vcf2phylip.py -h
to show the help of the program:
usage: vcf2phylip.py [-h] -i FILENAME [-m MIN_SAMPLES_LOCUS] [-o OUTGROUP]
                     [-p] [-f] [-n] [-b] [-r] [-v]

The script converts a collection of SNPs in VCF format into a PHYLIP, FASTA, NEXUS, or binary NEXUS file for phylogenetic analysis. The code is optimized to process VCF files with sizes >1GB. For small VCF files the algorithm slows down as the number of taxa increases (but is still fast).

Any ploidy is allowed, but binary NEXUS is produced only for diploid VCFs.

optional arguments: -h, --help show this help message and exit -i FILENAME, --input FILENAME Name of the input VCF file, can be gzipped -m MIN_SAMPLES_LOCUS, --min-samples-locus MIN_SAMPLES_LOCUS Minimum of samples required to be present at a locus (default=4) -o OUTGROUP, --outgroup OUTGROUP Name of the outgroup in the matrix. Sequence will be written as first taxon in the alignment. -p, --phylip-disable A PHYLIP matrix is written by default unless you enable this flag -f, --fasta Write a FASTA matrix, disabled by default -n, --nexus Write a NEXUS matrix, disabled by default -b, --nexus-binary Write a binary NEXUS matrix for analysis of biallelic SNPs in SNAPP, only diploid genotypes will be processed, disabled by default. -r, --resolve-IUPAC Randomly resolve heterozygous genotypes to avoid IUPAC ambiguities in the matrices -v, --version show program's version number and exit

Examples

In the following examples you can omit

python
if you change the permissions of
vcf2phylip.py
to executable.

Example 1: Use default parameters to create a PHYLIP matrix with a minimum of 4 samples per SNP: ```bash python vcf2phylip.py --input myfile.vcf

Which i equivalent to:

python vcf2phylip.py -i myfile.vcf

This command will create a PHYLIP called myfile_min4.phy

_Example 2:_ Create a PHYLIP and a FASTA matrix using a minimum of 60 samples per SNP:
```bash
python vcf2phylip.py --input myfile.vcf --fasta --min-samples-locus 60
# Which is equivalent to:
python vcf2phylip.py -i myfile.vcf -f -m 60
# This command will create a PHYLIP called myfile_min60.phy and a FASTA called myfile_min60.fasta

Example 3: Create all output formats, and select "sample1" as outgroup: ```bash python vcf2phylip.py --input myfile.vcf --outgroup sample1 --fasta --nexus --nexus-binary

Which is equivalent to:

python vcf2phylip.py -i myfile.vcf -o sample1 -f -n -b

This command will create a PHYLIP called myfilemin4.phy, a FASTA called myfilemin4.fasta, a NEXUS called myfilemin4.nexus, and a binary NEXUS called myfilemin4.bin.nexus

_Example 4:_ If, for example, you wish to disable the creation of the PHYLIP matrix and only create a NEXUS matrix:
```bash
python vcf2phylip.py --input myfile.vcf --phylip-disable --nexus
# Which is equivalent to:
python vcf2phylip.py -i myfile.vcf -p -n
# This command will create only a NEXUS matrix called myfile_min4.nexus

Example 5: If for some reason you don't want to have IUPAC ambiguities representing heterozygous genotypes: ```bash python vcf2phylip.py --input myfile.vcf --resolve-IUPAC

Which is equivalent to:

python vcf2phylip.py -i myfile.vcf -r

This command will create only a PHYLIP matrix called myfile_min4.phy where IUPAC ambiguites have been randomly resolved

Credits

Citation

DOI
Ortiz, E.M. 2019. vcf2phylip v2.0: convert a VCF matrix into several matrix formats for phylogenetic analysis. DOI:10.5281/zenodo.2540861

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.