Boosting RNA-Seq assemblies with partial or related genomic sequences
BRANCH is a software that extends de novo transfrags and identifies novel transfrags with DNA contigs or genes of close related species. BRANCH discovers novel exons first and then extends/joins fragmented de novo transfrags, so that the resulted transfrags are more complete.
BRANCH is under the Artistic License 2.0.
If you use BRANCH, please cite the following paper:
Bao E, Jiang T, Girke T (2013). BRANCH: boosting RNA-Seq assemblies with partial or related genomic sequences. Bioinformatics: epub.
BRANCH is suitable for 32-bit or 64-bit machines with Linux operating systems. At least 4GB of system memory is recommended for assembling larger data sets.
The LEMON graph library is required to compile and run BRANCH.
The BLAT aligner is required to run BRANCH and the modified version (distributed with BRANCH) is highly recommended.
* Download the .cpp file. * If LEMON is already installed in your system, execute the command line:
g++ -o BRANCH BRANCH.cpp -lemon -lpthread; otherwise, down load LEMON, compile it, and execute:
g++ -o BRANCH -I PATH2LEMON/include BRANCH.cpp -L PATH2LEMON/bin -lpthread. * To use the modified BLAT, put it to your $PATH:
BRANCH --read1 reads_1.fa --read2 reads_2.fa --transfrag transfrags.fa --contig contigs.fa --transcript transcripts.fa [--insertLow insertLow --insertHigh insertHigh --threshSize threshSize --threshCov threshCov --threshSplit threshSplit --threshConn threshConn --closeGap --noAlignment]
--read1 is the first pair of PE RNA reads or single-end RNA reads in fasta format
--read2 is the second pair of PE RNA reads in fasta format
--transfrag is the de novo RNA transfrags to be extended
--contig is the reference DNA contigs
--transcript is the extended de novo transfrags
--insertLow is the lower bound of insert length (highly recommended; default: 0)
--insertHigh is the upper bound of insert length (highly recommended; default: 99999)
--threshSize is the minimum size of a genome region that could be identified as an exon (default: 2 bp)
--threshCov is the minimum coverage of a genome region that could be identified as an exon (default: 2)
--threshSplit is the minimum upstream and downstream junction coverages to split a genome region into more than one exons (default: 2)
--threshConn is the minimum connectivity of two exons that could be identified as a splice junction (default: 2)
--closeGap closes sequencing gaps using PE read information (default: none)
--noAlignment skips the initial time-consuming alignment step, if all the alignment files have been provided in tmp directory (default: none)
--misassemblyRemoval detects and then breaks at or removes misassembed regions (default: none)
BRANCH outputs the transfrag file in FASTA format. It contains all the improved transfrags.