Current version: 1.1.1, Jul 18, 2023

Bioconvert¶

Bioconvert is a collaborative project to facilitate the interconversion of life science data from one format to another.

contributions:: Want to add a convertor ? Please join https://github.com/bioconvert/bioconvert/issues/1

Overview¶

Life science uses many different formats. They may be old, or with complex syntax and converting those formats may be a challenge. Bioconvert aims at providing a common tool / interface to convert life science data formats from one to another.

Many conversion tools already exist but they may be dispersed, focused on few specific formats, difficult to install, or not optimised. With Bioconvert, we plan to cover a wide spectrum of format conversions; we will re-use existing tools when possible and provide facilities to compare different conversion tools or methods via benchmarking. New implementations are provided when considered better than existing ones.

In Jan 2023, we had 50 formats, 100 direct conversions available.

https://raw.githubusercontent.com/bioconvert/bioconvert/main/doc/conversion.png

Installation¶

BioConvert is developped in Python. Please use conda or any Python environment manager to install BioConvert using the pip command:

pip install bioconvert

50% of the conversions should work out of the box. However, many conversions require external tools. This is why we recommend to use a conda environment. In particular, most external tools are available on the bioconda channel. For instance if you want to convert a SAM file to a BAM file you would need to install samtools as follow:

conda install -c bioconda samtools

Since bioconvert is available on bioconda on solution that installs BioConvert and all its dependencies is to use conda/mamba:

conda env create --name bioconvert mamba
conda activate bioconvert
mamba install bioconvert
bioconvert --help

See the Installation section for more details and alternative solutions (docker, singularity).

Quick Start¶

There are many conversions available. Type:

bioconvert --help

to get a list of valid method of conversions. Taking the example of a conversion from a FastQ file into a FastA file, you could do the conversion as follows:

bioconvert fastq2fasta input.fastq output.fasta
bioconvert fastq2fasta input.fq    output.fasta
bioconvert fastq2fasta input.fq.gz output.fasta.gz
bioconvert fastq2fasta input.fq.gz output.fasta.bz2

When there is no ambiguity, you can be implicit:

bioconvert input.fastq output.fasta

The default method of conversion is used but you may use another one. Checkout the available methods with:

bioconvert fastq2fasta --show-methods

For more help about a conversion, just type:

bioconvert fastq2fasta --help

and more generally:

bioconvert --help

You may also call BioConvert from a Python shell:

# import a converter
from bioconvert.fastq2fasta import FASTQ2FASTA

# Instanciate with infile/outfile names
convert = FASTQ2FASTA(infile, outfile)

# the conversion itself:
convert()

Available Converters¶

Conversion table¶
Converters	CI testing	Default method
abi2fasta		BIOPYTHON
abi2fastq		BIOPYTHON
abi2qual		BIOPYTHON
bam2bedgraph		BEDTOOLS
bam2bigwig		DEEPTOOLS
bam2cov		BEDTOOLS
bam2cram		SAMTOOLS
bam2fasta		SAMTOOLS
bam2fastq		SAMTOOLS
bam2json		BAMTOOLS
bam2sam		SAMBAMBA
bam2tsv		SAMTOOLS
bam2wiggle		WIGGLETOOLS
bcf2vcf		BCFTOOLS
bcf2wiggle		WIGGLETOOLS
bed2wiggle		WIGGLETOOLS
bedgraph2bigwig		UCSC
bedgraph2cov		BIOCONVERT
bedgraph2wiggle		WIGGLETOOLS
bigbed2bed		DEEPTOOLS
bigbed2wiggle		WIGGLETOOLS
bigwig2bedgraph		DEEPTOOLS
bigwig2wiggle		WIGGLETOOLS
bplink2plink		PLINK
bplink2vcf		PLINK
bz22gz		Unix commands
clustal2fasta		BIOPYTHON
clustal2nexus		GOALIGN
clustal2phylip		BIOPYTHON
clustal2stockholm		BIOPYTHON
cram2bam		SAMTOOLS
cram2fasta		SAMTOOLS
cram2fastq		SAMTOOLS
cram2sam		SAMTOOLS
csv2tsv		BIOCONVERT
csv2xls		Pandas
dsrc2gz		DSRC software
embl2fasta		BIOPYTHON
embl2genbank		BIOPYTHON
fasta2clustal		BIOPYTHON
fasta2faa		BIOCONVERT
fasta2fasta_agp		BIOCONVERT
fasta2fastq		PYSAM
fasta2genbank		BIOCONVERT
fasta2nexus		GOALIGN
fasta2phylip		BIOPYTHON
fasta2twobit		UCSC
fasta_qual2fastq		PYSAM
fastq2fasta		BIOCONVERT available
fastq2fasta_qual		BIOCONVERT
fastq2qual		READFQ
genbank2embl		BIOPYTHON
genbank2fasta		BIOPYTHON
genbank2gff3		BIOCODE
gfa2fasta		BIOCONVERT
gff22gff3		BIOCONVERT
gff32gff2		BIOCONVERT
gff32gtf		BIOCONVERT
gz2bz2		pigz/pbzip2 software
gz2dsrc		DSRC software
json2yaml		Python
maf2sam		BIOCONVERT
newick2nexus		GOTREE
newick2phyloxml		GOTREE
nexus2clustal		GOALIGN
nexus2fasta		BIOPYTHON
nexus2newick		GOTREE
nexus2phylip		GOALIGN
nexus2phyloxml		GOTREE
ods2csv		pyexcel library
pdb2faa		BIOCONVERT
phylip2clustal		BIOPYTHON
phylip2fasta		BIOPYTHON
phylip2nexus		GOALIGN
phylip2stockholm		BIOPYTHON
phylip2xmfa		BIOPYTHON
phyloxml2newick		GOTREE
phyloxml2nexus		GOTREE
plink2bplink		PLINK
plink2vcf		PLINK
sam2bam		SAMTOOLS
sam2cram		SAMTOOLS
sam2paf		BIOCONVERT
scf2fasta		BIOCONVERT
scf2fastq		BIOCONVERT
sra2fastq		FASTQDUMP
stockholm2clustal		BIOPYTHON
stockholm2phylip		BIOPYTHON
tsv2csv		BIOCONVERT
twobit2fasta		DEEPTOOLS
vcf2bcf		BCFTOOLS
vcf2bed		BIOCONVERT
vcf2bplink		PLINK
vcf2plink		PLINK
vcf2wiggle		WIGGLETOOLS
wig2bed		BEDOPS
xls2csv
xlsx2csv		Pandas library
xmfa2phylip		BIOPYTHON
yaml2json		Pandas library

Contributors¶

Setting up and maintaining Bioconvert has been possible thanks to users and contributors. Thanks to all:

https://contrib.rocks/image?repo=bioconvert/bioconvert

Changes¶

Version	Description
1.1.1	Fix benchmark labels. NEW: fast52pod5 conversion FIX: set goalign and gotree instead of go requirements
1.1.0	Implement ability to benchmark the CPU and memory usage (not just time) benchmark incorporates CPU/memory usage
1.0.0	Fix bam2fastq for paired data that computed useless intermediate file https://github.com/bioconvert/bioconvert/issues/325 more realistic fastq simulator pin openpyxl to <=3.0.10 to prevent regression error in v3.1.0
0.6.3	add picard method in bam2sam Fixed all CI workflows to use mamba drop python3.7 support and add 3.10 support update bedops test file to fit the latest bedops 2.4.41 version revisit logging system
0.6.2	added gff3 to gtf conversion. Added pdb to faa conversion Added missing --reference argument to the cram2sam conversion
0.6.1	output file can be in sub-directories allowing syntax such as 'bioconvert fastq2fasta test.fastq outputs/test.fasta fix all CI actions add more examples as notebooks in ./examples add a Snakefile for the paper in ./doc/Snakefile_paper
0.6.0	Fix bug in bam2sam (method sambamba) Fix graph layout add threading in fastq2fasta (seqkit method) multibenchmark feature added stable version used for web interface
0.5.2	Update requirements and environment.yml and add a conda spec-file.txt file
0.5.1	add genbank2gff3 requirement material in bioconvert.utils.biocode
0.5.0	Add CI actions for all converters remove sniffer (now in biosniff on pypi https://pypi.org/project/biosniff/) A complete benchmarking suite (see doc/Snakefile_benchmark file and benchmarking) documentation and tests for all converters removed the validators (we assume intputs are correct)
0.4.X	(aug 2019) added nexus2fasta, cram2fasta, fasta2faa ... ; 1-to-many and many-to-one converters are now part of the API.
0.3.X	may 2019. new methods abi2qual, bigbed2bed, etc. added --threads option
0.2.X	aug 2018. abi2fastx, bioconvert_stats tool added
0.1.X	major refactoring to have subcommands with implicit/explicit mode

Complete documentation including User and Developer Guides¶