7.2. Reference converters

7.2.1. Summary

bioconvert.abi2fasta

Convert ABI format to FASTA format

bioconvert.abi2fastq

Convert ABI format to FASTQ format

bioconvert.abi2qual

Convert ABI format to QUAL format

bioconvert.bam2bedgraph

Convert BAM format to BEDGRAPH format

bioconvert.bam2cov

Convert BAM format to COV format

bioconvert.bam2bigwig

Convert BAM file to BIGWIG format

bioconvert.bam2cram

Convert BAM file to CRAM format

bioconvert.bam2fasta

Convert BAM format to FASTA format

bioconvert.bam2fastq

Convert BAM format to FASTQ foarmat

bioconvert.bam2json

Convert BAM format to JSON format

bioconvert.bam2sam

Convert SAM file to BAM format

bioconvert.bam2tsv

Convert BAM file to TSV format

bioconvert.bam2wiggle

Convert BAM to WIGGLE format

bioconvert.bcf2vcf

Convert BCF file to VCF format

bioconvert.bcf2wiggle

Convert BCF format to WIGGLE format

bioconvert.bed2wiggle

Convert BED format to WIGGLE format

bioconvert.bedgraph2cov

Convert BEDGRAPH file to COV format

bioconvert.bedgraph2bigwig

Convert BEDGRAPH to BIGWIG format

bioconvert.bedgraph2wiggle

Convert BEDGRAPH format to WIGGLE format

bioconvert.bigbed2wiggle

Convert BIGBED format to WIGGLE format

bioconvert.bigbed2bed

Convert BIGBED format to BED format

bioconvert.bigwig2bedgraph

Convert BIGWIG to BEDGRAPH format

bioconvert.bigwig2wiggle

Convert BIGWIG format to WIGGLE format

bioconvert.bplink2plink

Convert BPLINK to PLINK format

bioconvert.bz22gz

Convert BZ2 to GZ format

bioconvert.clustal2fasta

Convert CLUSTAL to FASTA format

bioconvert.clustal2phylip

Convert CLUSTAL to PHYLIP format

bioconvert.clustal2stockholm

Convert CLUSTAL to STOCKHOLM format

bioconvert.cram2bam

Convert CRAM file to BAM format

bioconvert.cram2fasta

Convert CRAM file to FASTQ format

bioconvert.cram2fastq

Convert CRAM file to FASTQ format

bioconvert.cram2sam

Convert CRAM file to SAM format

bioconvert.csv2tsv

Convert CSV format to TSV format

bioconvert.csv2xls

convert CSV to XLS format

bioconvert.dsrc2gz

Convert a compressed FASTQ from DSRC to FASTQ format

bioconvert.embl2fasta

Convert EMBL file to FASTA format

bioconvert.embl2genbank

Convert EMBL file to GENBANK format

bioconvert.fasta_qual2fastq

Convert FASTA format to FASTQ format

bioconvert.fasta2clustal

Convert FASTA to CLUSTAL format

bioconvert.fasta2faa

Convert FASTA format to FAA format

bioconvert.fasta2fasta_agp

Convert FASTA (scaffold) to FASTA (contig) and AGP formats

bioconvert.fasta2fastq

Convert FASTA format to FASTQ format

bioconvert.fasta2genbank

Convert FASTA to GENBANK format

bioconvert.fasta2nexus

Convert FASTA to NEXUS format

bioconvert.fasta2phylip

Convert FASTA to PHYLIP format

bioconvert.fasta2twobit

Convert FASTA to TWOBIT format

bioconvert.fastq2fasta

Convert FASTQ to FASTA format

bioconvert.genbank2embl

Convert GENBANK to EMBL format

bioconvert.genbank2fasta

Convert GENBANK to EMBL format

bioconvert.genbank2gff3

Convert GENBANK to GFF3 format

bioconvert.gff22gff3

Convert GFF2 to GFF3 format

bioconvert.gff32gff2

Convert GFF3 to GFF2 format

bioconvert.gfa2fasta

Convert GFA to FASTA format

bioconvert.gz2bz2

Convert GZ file to BZ2 format

bioconvert.gz2dsrc

Convert GZ to DSRC format

bioconvert.json2yaml

Convert JSON to YAML format

bioconvert.maf2sam

Convert MAF file to SAM format

bioconvert.newick2nexus

Converts NEWICK file to NEXUS format.

bioconvert.newick2phyloxml

Converts NEWICK file to PHYLOXML format.

bioconvert.nexus2fasta

Convert NEXUS to FASTA format

bioconvert.nexus2newick

Converts NEXUS file to NEWICK format.

bioconvert.nexus2phylip

Converts NEXUS file to PHYLIP format.

bioconvert.nexus2phyloxml

Converts NEXUS file to PHYLOXML format.

bioconvert.ods2csv

Convert XLS format to CSV format

bioconvert.pdb2faa

Convert PDB to FAA format

bioconvert.phylip2clustal

Converts PHYLIP file to CLUSTAL format.

bioconvert.phylip2fasta

Converts PHYLIP file to FASTA format.

bioconvert.phylip2nexus

Converts PHYLIP file to NEXUS format.

bioconvert.phylip2stockholm

Converts PHYLIP file to STOCKHOLM format.

bioconvert.phylip2xmfa

Converts PHYLIP file to XMFA format.

bioconvert.phyloxml2newick

Converts PHYLOXML file to NEWICK format.

bioconvert.phyloxml2nexus

Converts PHYLOXML file to NEXUS format.

bioconvert.plink2bplink

Convert PLINK to BPLINK

bioconvert.sam2bam

Convert SAM file to BAM format

bioconvert.sam2cram

Convert SAM file to CRAM format

bioconvert.sam2paf

"Convert CRAM to BAM format

bioconvert.scf2fasta

Convert SCF file to FASTA file

bioconvert.scf2fastq

Convert SCF file to FASTQ file

bioconvert.sra2fastq

Convert SRA format to FASTA format

bioconvert.stockholm2clustal

Converts STOCKHOLM file to CLUSTAL file.

bioconvert.stockholm2phylip

Converts STOCKHOLM to PHYLIP format.

bioconvert.tsv2csv

Convert TSV format to CSV format

bioconvert.twobit2fasta

Conversion from TWOBIT to FASTA format

bioconvert.vcf2bcf

Convert VCF to BCF format

bioconvert.vcf2bed

Convert VCF to BED3 file

bioconvert.vcf2wiggle

Convert VCF format to WIGGLE format

bioconvert.xls2csv

Convert XLS format to CSV format

bioconvert.xlsx2csv

Convert XLS format to CSV format

bioconvert.xmfa2phylip

Convert XMFA to PHYLIP format

bioconvert.yaml2json

Convert YAML to JSON format

7.2.2. All converters documentation

Convert ABI format to FASTA format

class ABI2FASTA(infile, outfile, *args, **kargs)[source]

Convert ABI file to FASTQ file

ABI files are created by ABI sequencing machine and includes PHRED quality scores for base calls. This allows the creation of FastA files.

Method implemented is based on BioPython [BIOPYTHON].

constructor

Parameters:
  • infile (str) -- input ABI file

  • outfile (str) -- output FASTA filename

_default_method = 'biopython'

Default value

_method_biopython(*args, **kwargs)[source]

For this method we use the biopython package Bio.SeqIO.

Reference:

Bio.SeqIO Documentation

Convert ABI format to FASTQ format

class ABI2FASTQ(infile, outfile, *args, **kargs)[source]

Convert ABI file to FASTQ file

ABI files are created by ABI sequencing machine and includes PHRED quality scores for base calls. This allows the creation of FastQ files.

Method implemented is based on BioPython [BIOPYTHON].

constructor

Parameters:
  • infile (str) -- input ABI file

  • outfile (str) -- output FASTQ filename

_default_method = 'biopython'

Default value

_method_biopython(*args, **kwargs)[source]

For this method we use the biopython package Bio.SeqIO.

Bio.SeqIO Documentation

Convert ABI format to QUAL format

class ABI2QUAL(infile, outfile, *args, **kargs)[source]

Convert ABI file to QUAL file

ABI files are created by ABI sequencing machine and includes PHRED quality scores for base calls. This allows the creation of QUAL files.

Method implemented is based on BioPython [BIOPYTHON].

constructor

Parameters:
  • infile (str) -- input ABI file

  • outfile (str) -- output QUAL filename

_default_method = 'biopython'

Default value

_method_biopython(*args, **kwargs)[source]

For this method we use the biopython package Bio.SeqIO.

Bio.SeqIO Documentation

Convert BAM format to BEDGRAPH format

class BAM2BEDGRAPH(infile, outfile)[source]

Convert sorted BAM file into BEDGRAPH file

Compute the coverage (depth) in BEDGRAPH. Regions with zero coverage are also reported.

Note that this BEDGRAPH format is of the form:

chrom chromStart chromEnd dataValue

Note that consecutive positions with same values are compressed.

chr1    0   75  0
chr1    75  176 1
chr1    176  177 2

Warning

the BAM file must be sorted. This can be achieved with bamtools.

Methods available are based on bedtools [BEDTOOLS] and mosdepth [MOSDEPTH].

Constructor

Parameters:
  • infile (str) -- The path to the input BAM file. It must be sorted.

  • outfile (str) -- The path to the output file

_default_method = 'bedtools'

Default value

_method_bedtools(*args, **kwargs)[source]

Do the conversion using bedtools.

bedtools documentation

_method_mosdepth(*args, **kwargs)[source]

Do the conversion using mosdepth.

mosdepth documentation

Convert BAM format to COV format

class BAM2COV(infile, outfile)[source]

Convert sorted BAM file into COV file

Note that the COV format is of the form:

chr1    1   0
chr1    2   0
chr1    3   0
chr1    4   0
chr1    5   0

that is contig name, position, coverage.

Warning

the BAM file must be sorted. This can be achieved with bamtools using bamtools sort -in INPUT.bam

Methods available are based on samtools [SAMTOOLS] or bedtools [BEDTOOLS].

Constructor

Parameters:
  • infile (str) -- The path to the input BAM file. It must be sorted.

  • outfile (str) -- The path to the output file

_method_bedtools(*args, **kwargs)[source]

Do the conversion sorted BAM -> BED using bedtools

bedtools documentation

_method_samtools(*args, **kwargs)[source]

Do the conversion sorted BAM -> BED using samtools

SAMtools documentation

Convert BAM file to BIGWIG format

class BAM2BIGWIG(infile, outfile, *args, **kargs)[source]

Convert BAM file to BIGWIG file

Convert BAM into a binary version of WIG format.

Methods are base on bamCoverage [DEEPTOOLS] and bedGraphToBigWig from wiggletools [WIGGLETOOLS]. Wiggletools method requires an extra argument (--chrom-sizes) therefore default one is bamCoverage for now.

Moreover, the two methods do not return exactly the same info!

You can check this by using bioconvert to convert into a human readable file such as wiggle. We will use the bamCoverage as our default conversion.

constructor

Parameters:
  • infile (str) -- input BAM file

  • outfile (str) -- output BIGWIG filename

_default_method = 'bamCoverage'

Default value

_method_bamCoverage(*args, **kwargs)[source]

run bamCoverage package.

bamCoverage documentation

_method_ucsc(*args, **kwargs)[source]

Run ucsc tool bedGraphToBigWig.

Requires extra argument (chrom_sizes) required by the bioconvert stanalone.

Convert BAM file to CRAM format

class BAM2CRAM(infile, outfile, *args, **kargs)[source]

Convert BAM file to CRAM file

The conversion requires the reference corresponding to the input file It can be provided as an argument with the standalone (--reference). Otherwise, users are asked to provide it.

Methods available are based on samtools [SAMTOOLS].

constructor

Parameters:
  • infile (str) -- input BAM file

  • outfile (str) -- output CRAM filename

_default_method = 'samtools'

Default value

_method_samtools(*args, **kwargs)[source]

Here we use the SAMtools tool.

SAMtools documentation

Convert BAM format to FASTA format

class BAM2FASTA(infile, outfile)[source]

Convert sorted BAM file into FASTA file

Methods available are based on samtools [SAMTOOLS] or bedtools [BEDTOOLS].

Warning

Using the bedtools method, the R1 and R2 reads must be next to each other so that the reads are sorted similarly

Warning

there is no guarantee that the R1/R2 output file are sorted similarly in paired-end case due to supp and second reads

constructor

Parameters:
  • infile (str) -- BAM file

  • outfile (str) -- FASTA file

_default_method = 'samtools'

Default value

_method_samtools(*args, **kwargs)[source]

do the conversion BAM -> FASTA using samtools.

Note

fasta are on one line

SAMtools documentation

Convert BAM format to FASTQ foarmat

class BAM2FASTQ(infile, outfile)[source]

Convert sorted BAM file into FASTQ file

Methods available are based on samtools [SAMTOOLS] or bedtools [BEDTOOLS].

Warning

Using the bedtools method, the R1 and R2 reads must be next to each other so that the reads are sorted similarly

Warning

there is no guarantee that the R1/R2 output file are sorted similarly in paired-end case due to supp and second reads

constructor

Parameters:
  • infile (str) --

  • outfile (str) --

_default_method = 'samtools'

Default value

_method_bedtools(*args, **kwargs)[source]

Do the conversion BAM -> Fastq using bedtools

bedtools documentation

_method_samtools(*args, **kwargs)[source]

Do the conversion BAM -> FASTQ using samtools

SAMtools documentation

Convert BAM format to JSON format

class BAM2JSON(infile, outfile)[source]

Convert BAM format to JSON file

Methods available are based on bamtools [BAMTOOLS].

constructor

Parameters:
  • infile (str) --

  • outfile (str) --

_default_method = 'bamtools'

Default value

_method_bamtools(*args, **kwargs)[source]

Do the conversion BAM -> JSON using bamtools.

BAMTools documentation

Convert SAM file to BAM format

class BAM2SAM(infile, outfile, *args, **kargs)[source]

Convert BAM file to SAM file

Methods available are based on samtools [SAMTOOLS] , sam-to-bam [SAMTOBAM] , sambamba [SAMBAMBA] and pysam [PYSAM].

constructor

Parameters:
  • infile (str) --

  • outfile (str) --

_default_method = 'sambamba'

default value

_method_pysam(*args, **kwargs)[source]

We use here the python module Pysam.

Pysam documentation

_method_sambamba(*args, **kwargs)[source]

Here we use the Sambamba tool. This is the default method because it is the fastest.

Sambamba documentation

_method_samtools(*args, **kwargs)[source]

Here we use the SAMtools tool.

SAMtools documentation

Convert BAM file to TSV format

class BAM2TSV(infile, outfile, *args, **kargs)[source]

Convert sorted BAM file into TSV stats

This is not a conversion per se but the extraction of BAM statistics saved into a TSV format. The 4 columns of the TSV file are:

Reference sequence name, Sequence length,Mapped reads, Unmapped reads

Methods are based on samtools [SAMTOOLS] and pysam [PYSAM].

constructor

Parameters:
  • infile (str) -- BAM file

  • outfile (str) -- TSV file

Methods are based on samtools [SAMTOOLS] and pysam [PYSAM].

_default_method = 'samtools'

Default value

_method_pysam(*args, **kwargs)[source]

We use here the python module Pysam.

Pysam documentation

_method_samtools(*args, **kwargs)[source]

Here we use the SAMtools tool.

SAMtools documentation

Convert BAM to WIGGLE format

class BAM2WIGGLE(infile, outfile)[source]

Convert sorted BAM file into WIGGLE file

Methods available are based on wiggletools [WIGGLETOOLS].

Parameters:
  • infile (str) -- The path to the input BAM file. It must be sorted.

  • outfile (str) -- The path to the output file

_method_wiggletools(*args, **kwargs)[source]

Conversion using wiggletools

wiggletools documentation

Convert BCF file to VCF format

class BCF2VCF(infile, outfile, *args, **kargs)[source]

Convert BCF file to VCF file

Methods available are based on bcftools [BCFTOOLS].

constructor

Parameters:
  • infile (str) -- input BCF file

  • outfile (str) -- output VCF file

_method_bcftools(*args, **kwargs)[source]

Here we use the bcftools tool from samtools.

bcftools documentation

Convert BCF format to WIGGLE format

class BCF2WIGGLE(infile, outfile)[source]

Convert sorted BCF file into WIGGLE file

Methods available are based on wiggletools [WIGGLETOOLS].

Parameters:
  • infile (str) -- The path to the input BCF file. It must be sorted.

  • outfile (str) -- The path to the output file

_default_method = 'wiggletools'

Default value

_method_wiggletools(*args, **kwargs)[source]

Conversion using wiggletools

wiggletools documentation

Convert BED format to WIGGLE format

class BED2WIGGLE(infile, outfile)[source]

Convert sorted BED file into WIGGLE file

Methods available are based on wiggletools [WIGGLETOOLS].

Parameters:
  • infile (str) -- The path to the input BED file. It must be sorted.

  • outfile (str) -- The path to the output file

_method_wiggletools(*args, **kwargs)[source]

Convert BED to WIGGLE using wiggletools

wiggletools documentation

Convert BEDGRAPH file to COV format

class BEDGRAPH2COV(infile, outfile)[source]

Converts a BEDGRAPH (4 cols) to COV format (3 cols)

Input example:

chr19   49302000    4930205    -1
chr19   49302005    4930210    1

becomes:

chr19   4930201    -1
chr19   4930202    -1
chr19   4930203    -1
chr19   4930204    -1
chr19   4930205    -1
chr19   4930206    1
chr19   4930207    1
chr19   4930208    1
chr19   4930209    1
chr19   4930210    1

Method available is a Bioconvert implementation (Python).

constructor

Parameters:
_default_method = 'python'

Default value

_method_python(*args, **kwargs)[source]

Convert bedgraph file in coverage. Internal method.

Convert BEDGRAPH to BIGWIG format

class BEDGRAPH2BIGWIG(infile, outfile)[source]

Converts BEDGRAPH format to BIGWIG format

Conversion is based on bedGraph2BigWig tool. Note that an argument --chrom-sizes is required.

constructor

Parameters:
_default_method = 'ucsc'

Default value

_method_ucsc(*args, **kwargs)[source]

Convert bedgraph file in bigwig format using ucsc tool.

bigWig documentation

chromosome size

Convert BEDGRAPH format to WIGGLE format

class BEDGRAPH2WIGGLE(infile, outfile)[source]

Convert sorted BEDGRAPH file into WIGGLE file

Methods available are based on wiggletools [WIGGLETOOLS].

Parameters:
  • infile (str) -- The path to the input BEDGRAPH file. It must be sorted.

  • outfile (str) -- The path to the output file

_default_method = 'wiggletools'

Default value

_method_wiggletools(*args, **kwargs)[source]

wiggletools based method. Extension must be .bg.

wiggletools documentation

Convert BIGBED format to WIGGLE format

class BIGBED2WIGGLE(infile, outfile)[source]

Convert sorted BIGBED file into WIGGLE file

Methods available are based on wiggletools [WIGGLETOOLS].

Parameters:
  • infile (str) -- The path to the input BIGBED file. It must be sorted.

  • outfile (str) -- The path to the output file

_default_method = 'wiggletools'

Default value

_method_wiggletools(*args, **kwargs)[source]

Conversion using wiggletools

wiggletools documentation

Convert BIGBED format to BED format

class BIGBED2BED(infile, outfile)[source]

Converts a sequence alignment in BIGBED format to BED4 format

Methods available are based on pybigwig [DEEPTOOLS].

constructor

Parameters:
  • infile (str) -- input BIGBED file.

  • outfile (str) -- (optional) output BED4 file

_default_method = 'pybigwig'

Default value

_method_pybigwig(*args, **kwargs)[source]

In this method we use the python extension written in C, pyBigWig.

pyBigWig documentation

Convert BIGWIG to BEDGRAPH format

class BIGWIG2BEDGRAPH(infile, outfile)[source]

Converts a sequence alignment in BIGWIG format to BEDGRAPH format

Conversion is based on ucsc bigWigToBedGraph tool or pybigwig (default) [DEEPTOOLS].

constructor

Parameters:
_default_method = 'pybigwig'

Default value

_method_pybigwig(*args, **kwargs)[source]

In this method we use the python extension written in C, pyBigWig.

pyBigWig documentation

_method_ucsc(*args, **kwargs)[source]

Convert bigwig file in bedgraph format using ucsc tool.

ucsc.bedgraph documentation

Convert BIGWIG format to WIGGLE format

class BIGWIG2WIGGLE(infile, outfile)[source]

Convert sorted BIGWIG file into WIGGLE file

Methods available are based on pybigwig [DEEPTOOLS].

Parameters:
  • infile (str) -- The path to the input BIGWIG file. It must be sorted.

  • outfile (str) -- The path to the output file

_default_method = 'wiggletools'

Default value

_method_wiggletools(*args, **kwargs)[source]

Conversion using wiggletools

wiggletools documentation

Convert BPLINK to PLINK format

Converts a genotype dataset bed+bim+fam in BPLINK format to ped+map PLINK format.

Conversion is based on plink [PLINK] executable.

Warning

plink takes several inputs and outputs and does not need extensions. What is required is a prefix. Bioconvert usage is therefore:

bioconvert bplink2plink plink_toy

Since there is no extension, you must be explicit by providing the conversion name (bplink2plink). This command will search for 3 input files plink_toy.bed, plink_toy.bim and plink_toy.fam. It will then create two output files named plink_toy.ped and plink_toy.map

constructor

Parameters:
  • infile (str) -- input BPLINK files.

  • outfile (str) -- (optional) output PLINK files.

_default_method = 'plink'

Default value

Convert plink file in text using plink executable.

plink documentation

Convert BZ2 to GZ format

class BZ22GZ(infile, outfile, *args, **kargs)[source]

Convert BZ2 file to GZ file

Methods based on bunzip2 or zlib/bz2 Python libraries.

constructor

Parameters:
  • infile (str) -- input BZ2 file

  • outfile (str) -- output GZ filename

_default_method = 'bz2_gz'

Default value

_method_bz2_gz(*args, **kwargs)[source]

Method that uses bunzip2 gzip.

bunzip2 documentation gzip documentation

_method_python(*args, **kargs)[source]

Internal method

Convert CLUSTAL to FASTA format

class CLUSTAL2FASTA(infile, outfile=None, alphabet=None, *args, **kwargs)[source]

Converts a sequence alignment from CLUSTAL to FASTA format.

Methods available are based on squizz [SQUIZZ] or biopython [BIOPYTHON], and goalign [GOALIGN].

constructor

Parameters:
_default_method = 'biopython'

Default value

_method_biopython(*args, **kwargs)[source]

Convert CLUSTAL interleaved file in PHYLIP format.

Bio.SeqIO Documentation

_method_goalign(*args, **kwargs)[source]

Convert CLUSTAL file in FASTA format using goalign.

goalign documentation

_method_squizz(*args, **kwargs)[source]

Convert CLUSTAL file in FASTA format.

Convert CLUSTAL to PHYLIP format

class CLUSTAL2PHYLIP(infile, outfile=None, alphabet=None, *args, **kwargs)[source]

Converts a sequence alignment from CLUSTAL format to PHYLIP format.

Methods available are based on squizz [SQUIZZ] or biopython [BIOPYTHON], and goalign [GOALIGN].

constructor

Parameters:
_default_method = 'biopython'

Default value

_method_biopython(*args, **kwargs)[source]

Convert CLUSTAL interleaved file in PHYLIP format using biopython.

Bio.SeqIO Documentation

_method_squizz(*args, **kwargs)[source]

Convert CLUSTAL interleaved file in PHYLIP format using squizz tool.

Convert CLUSTAL to STOCKHOLM format

class CLUSTAL2STOCKHOLM(infile, outfile=None, alphabet=None, *args, **kwargs)[source]

Converts a sequence alignment from CLUSTAL format to STOCKHOLM format.

Methods available are based on squizz [SQUIZZ] or biopython [BIOPYTHON], and goalign [GOALIGN].

constructor

Parameters:
_default_method = 'biopython'

Default value

_method_biopython(*args, **kwargs)[source]

Convert CLUSTAL interleaved file in PHYLIP format using biopython.

Bio.SeqIO Documentation

_method_squizz(*args, **kwargs)[source]

Convert CLUSTAL file in STOCKHOLM format using squizz tool.

Convert CRAM file to BAM format

class CRAM2BAM(infile, outfile, *args, **kargs)[source]

Convert CRAM file to BAM file

The conversion requires the reference corresponding to the input file It can be provided as an argument with the standalone (--reference). Otherwise, users are asked to provide it.

Methods available are based on samtools [SAMTOOLS].

constructor

Parameters:
  • infile (str) -- input CRAM file

  • outfile (str) -- output BAM filename

_default_method = 'samtools'

Default value

_method_samtools(*args, **kwargs)[source]

Here we use the SAMtools tool.

SAMtools documentation

Convert CRAM file to FASTQ format

class CRAM2FASTA(infile, outfile, *args, **kargs)[source]

Convert CRAM file to FASTA file

Methods available are based on samtools [SAMTOOLS].

constructor

Parameters:
  • infile (str) -- input CRAM file

  • outfile (str) -- output FASTA filename

_default_method = 'samtools'

Default value

_method_samtools(*args, **kwargs)[source]

do the conversion BAM -> FASTA using samtools

SAMtools documentation

Note

fasta are on one line

Convert CRAM file to FASTQ format

class CRAM2FASTQ(infile, outfile, *args, **kargs)[source]

Convert CRAM file to FASTQ file

Methods available are based on samtools [SAMTOOLS].

constructor

Parameters:
  • infile (str) -- input CRAM file

  • outfile (str) -- output FASTQ filename

_default_method = 'samtools'

Default value

_method_samtools(*args, **kwargs)[source]

Do the conversion BAM -> FASTQ using samtools

SAMtools documentation

Convert CRAM file to SAM format

class CRAM2SAM(infile, outfile, *args, **kargs)[source]

Convert CRAM file to SAM file

The conversion requires the reference corresponding to the input file It can be provided as an argument with the standalone (--reference). Otherwise, users are asked to provide it.

Methods available are based on samtools [SAMTOOLS].

constructor

Parameters:
  • infile (str) -- input CRAM file

  • outfile (str) -- output SAM filename

_default_method = 'samtools'

Default value

_method_samtools(*args, **kwargs)[source]

Here we use the SAMtools tool.

SAMtools documentation

Convert CSV format to TSV format

class CSV2TSV(infile, outfile)[source]

Convert CSV file into TSV file

Available methods: Python, Pandas

Methods available are based on python or Pandas [PANDAS].

See also

TSV2CSV

Constructor

Parameters:
  • infile (str) -- comma-separated file

  • outfile (str) -- tabulated file

_default_method = 'python'

Default value

_method_pandas(in_sep=',', out_sep='\t', line_terminator='\n', *args, **kwargs)[source]

Do the conversion CSV -> TSV using Pandas library

pandas documentation

_method_python(in_sep=',', out_sep='\t', line_terminator='\n', *args, **kwargs)[source]

Do the conversion CSV -> TSV using standard Python modules.

csv documentation

_method_python_v2(in_sep=',', out_sep='\t', line_terminator='\n', *args, **kwargs)[source]

Do the conversion CSV -> CSV using csv module.

Note

This method cannot escape nor quote output char

csv documentation

convert CSV to XLS format

class CSV2XLS(infile, outfile, *args, **kargs)[source]

Convert CSV file to XLS file

Methods available are based on python, pyexcel [PYEXCEL], or pandas [PANDAS].

constructor

Parameters:
  • infile (str) -- input CSV file

  • outfile (str) -- output XLS filename

_default_method = 'pandas'

Default value

_method_pandas(in_sep=',', sheet_name='Sheet 1', *args, **kwargs)[source]

Do the conversion CSV -> XLS using Panda modules.

pandas documentation

_method_pyexcel(in_sep=',', sheet_name='Sheet 1', *args, **kwargs)[source]

Do the conversion CSV -> XLS using pyexcel modules.

pyexcel documentation

Convert a compressed FASTQ from DSRC to FASTQ format

class DSRC2GZ(infile, outfile, *args, **kargs)[source]

Convert a compressed FASTQ from DSRC to GZ format

Methods available are based on dsrc [DSRC] and pigz [PIGZ].

constructor

Parameters:
  • infile (str) -- input DSRC filename

  • outfile (str) -- output GZ filename

_default_method = 'dsrcpigz'

Default value

_method_dsrcpigz(*args, **kwargs)[source]

Do the conversion dsrc -> GZ. Method that uses pigz and dsrc.

pigz documentation dsrc documentation

option threadig does not work with the dsrc version from conda so we do not add the -t threads option

Convert EMBL file to FASTA format

class EMBL2FASTA(infile, outfile, *args, **kargs)[source]

Convert EMBL file to FASTA file

Methods available are based on squizz [SQUIZZ] or biopython [BIOPYTHON].

constructor

Parameters:
  • infile (str) -- input EMBL file

  • outfile (str) -- output FASTA filename

_default_method = 'biopython'

Default value

_method_biopython(*args, **kwargs)[source]

For this method we use the biopython package Bio.SeqIO.

Bio.SeqIO Documentation

_method_squizz(*args, **kwargs)[source]

Header is less informative than the one obtained with biopython

Convert EMBL file to GENBANK format

class EMBL2GENBANK(infile, outfile, *args, **kargs)[source]

Convert EMBL file to GENBANK file

Methods available are based on squizz [SQUIZZ] and biopython [BIOPYTHON].

constructor

Parameters:
  • infile (str) -- input EMBL file

  • outfile (str) -- output GENBANK filename

_default_method = 'biopython'

Default value

_method_biopython(*args, **kwargs)[source]

For this method we use the biopython package Bio.SeqIO.

Bio.SeqIO Documentation

_method_squizz(*args, **kwargs)[source]

Header is less informative than the one obtained with biopython.

Convert FASTA format to FASTQ format

class FASTA_QUAL2FASTQ(infile, outfile)[source]

Convert FASTA and QUAL back into a FASTQ file

Method based on pysam [PYSAM].

Parameters:
  • infile (list) -- The path to the input FASTA file, the path to the input QUAL file

  • outfile (str) -- The path to the output FASTQ file

_default_method = 'pysam'

Default value

_method_pysam(*args, **kwargs)[source]

This method uses the FastxFile function of the Pysam python module.

FastxFile documentation

Convert FASTA to CLUSTAL format

class FASTA2CLUSTAL(infile, outfile=None, alphabet=None, *args, **kwargs)[source]

Converts a sequence alignment from FASTA to CLUSTAL format

Methods available are based on squizz [SQUIZZ] or biopython [BIOPYTHON], and goalign [GOALIGN].

constructor

Parameters:
_default_method = 'biopython'

Default value

_method_biopython(*args, **kwargs)[source]

Convert FASTA interleaved file in CLUSTAL format using biopython.

Bio.SeqIO Documentation

_method_goalign(*args, **kwargs)[source]

Convert FASTA file in CLUSTAL format using goalign tool.

goalign documentation

_method_squizz(*args, **kwargs)[source]

Convert FASTA file in CLUSTAL format using squizz tool.

Convert FASTA format to FAA format

class FASTA2FAA(infile, outfile)[source]

Methods available is a bioconvert implementation.

Parameters:
  • infile (str) -- The path to the input FASTA file

  • outfile (str) -- The path to the output FASTQ file

_default_method = 'bioconvert'

Default value

_method_bioconvert(*args, **kwargs)[source]

Internal method.

Convert FASTA format to FASTQ format

class FASTA2FASTQ(infile, outfile)[source]

Methods available are based on pysam [PYSAM].

Parameters:
  • infile (str) -- The path to the input FASTA file

  • outfile (str) -- The path to the output FASTQ file

_default_method = 'pysam'

Default value

_method_pysam(quality_file=None, *args, **kwargs)[source]

This method uses the FastxFile function of the Pysam python module.

FastxFile documentation

Convert FASTA to GENBANK format

class FASTA2GENBANK(infile, outfile, *args, **kargs)[source]

Convert FASTA file to GENBANK file

Methods available are based on squizz [SQUIZZ] or biopython [BIOPYTHON] or Bioconvert pure implementation (default).

constructor

Parameters:
  • infile (str) -- input FASTA file

  • outfile (str) -- output GENBANK filename

_default_method = 'bioconvert'

Default value

_method_bioconvert(*args, **kwargs)[source]

Internal method

_method_biopython(*args, **kwargs)[source]

For this method we use the biopython package Bio.SeqIO.

Bio.SeqIO Documentation

_method_squizz(*args, **kwargs)[source]

Header is less informative than the one obtained with biopython

Convert FASTA to NEXUS format

class FASTA2NEXUS(infile, outfile=None, alphabet=None, *args, **kwargs)[source]

Converts a sequence alignment in FASTA format to NEXUS format

Methods available are based on squizz [GOALIGN].

constructor

Parameters:
  • infile (str) -- input FASTA file.

  • outfile (str) -- (optional) output NEXUS file

_default_method = 'goalign'

Default value

_method_goalign(*args, **kwargs)[source]

Convert fasta file in Nexus format using goalign tool.

goalign documentation

The fasta file must be an alignemnt file, yhis mean all the sequences must have the same length (with the gap) otherwise an error will be raised.

Convert FASTA to PHYLIP format

class FASTA2PHYLIP(infile, outfile=None, alphabet=None, *args, **kwargs)[source]

Converts a sequence alignment in FASTA format to PHYLIP format

Conversion is based on Bio Python modules

Methods available are based on squizz [SQUIZZ] or biopython [BIOPYTHON] or goalign [GOALIGN]. Squizz is the default (https://github.com/bioconvert/bioconvert/issues/149). Phylip created is a strict phylip that is with 10 characters on the first column.

constructor

Parameters:
_default_method = 'biopython'

Default value

_method_biopython(*args, **kwargs)[source]

For this method we use the biopython package Bio.SeqIO.

Bio.SeqIO Documentation

_method_goalign(*args, **kwargs)[source]

Convert fasta file in Phylip interleaved format using goalign tool.

goalign documentation

The fasta file must be an alignemnt file, this means that all sequences must have the same length (with the gap) otherwise an error will be raised

_method_squizz(*args, **kwargs)[source]

Convert fasta file in Phylip interleaved format using squizz tool. The fasta file must be an alignement file, this means that all sequences must have the same length (with the gap) otherwise an error will be raised.

Convert FASTA to TWOBIT format

class FASTA2TWOBIT(infile, outfile=None, alphabet=None, *args, **kwargs)[source]

Converts a sequence alignment in FASTA format to TWOBIT format

Methods available are based on UCSC faToTwoBit [UCSC].

constructor

Parameters:
_default_method = 'ucsc'

default value

_method_ucsc(*args, **kwargs)[source]

Convert fasta file in twobit format using ucsc faToTwoBit.

uscsc faToTwoBit Documentation

Convert FASTQ to FASTA format

class FASTQ2FASTA(infile, outfile)[source]

Convert FASTQ to FASTA

This converter has lots of methods. Some of them have also been removed or commented with time. BioPython for instance is commented due to poo performance compared to others. Does not mean that it is not to be considered. Performances are decrease due to lot of sanity checks.

Similarly, bioawk and python_external method are commented because redundant with other equivalent method.

Parameters:
  • infile (str) -- The path to the input FASTA file.

  • outfile (str) -- The path to the output file.

_default_method = 'bioconvert'

default value

_method_awk(*args, **kwargs)[source]

Here we are using the awk method.

Note

Another method with awk has been tested but is less efficient. Here is which one was used:

box.awkcmd = """awk '{{if(NR%4==1) {{printf(">%s\n",substr($0,2));}} else if(NR%4==2) print;}}' """

awk documentation

_method_bioconvert(*args, **kwargs)[source]

Bioconvert implementation in pure Python. This is the default method because it is the fastest.

_method_mappy(*args, **kwargs)[source]

This method provides a fast and accurate C program to align genomic sequences and transcribe nucleotides.

mappy method

_method_mawk(*args, **kwargs)[source]

This variant of the awk method uses mawk, a lighter and faster implementation of awk.

Note

Other methods with mawk have been tested but are less efficient. Here are which ones were used:

mawkcmd_v2 = """mawk '{{if(NR%4==1) {{printf(">%s\n",substr($0,2));}} else if(NR%4==2) print;}}' """
mawkcmd_v3 = """mawk '(++n<=0){next}(n!=1){print;n=-2;next}{print">"substr($0,2)}'"""

mawk documentation

_method_perl(*args, **kwargs)[source]

This method uses the perl command which will call the "fastq2fasta.pl" script.

Perl documentation

_method_readfq(*args, **kwargs)[source]

This method is inspired by Readfq coded by Heng Li.

original Readfq method

_method_sed(*args, **kwargs)[source]

This method uses the UNIX function sed which is a non-interactive editor.

Note

Another method with sed has been tested but is less efficient. Here is which one was used:

cmd = """sed -n 's/^@/>/p;n;p;n;n'"""

sed documentation

_method_seqkit(*args, **kwargs)[source]

We use the Seqkit library.

Documentation of the Seqkit method

_method_seqtk(*args, **kwargs)[source]

We use the Seqtk library.

Documentation of the Seqtk method

static just_name(record)[source]

This method takes a Biopython sequence record record and returns its name. The comment part is not included.

static unwrap_fasta(infile, outfile, strip_comment=False)[source]

This method reads fasta sequences from infile and writes them unwrapped in outfile. Used in the test suite.

Parameters:
  • infile (str) -- The path to the input FASTA file.

  • outfile (str) -- The path to the output file.

Convert GENBANK to EMBL format

class GENBANK2EMBL(infile, outfile, *args, **kargs)[source]

Convert GENBANK file to EMBL file

Some description.

constructor

Parameters:
  • infile (str) -- input GENBANK file

  • outfile (str) -- output EMBL filename

_default_method = 'biopython'

Default value

_method_biopython(*args, **kwargs)[source]

For this method we use the biopython package Bio.SeqIO.

Bio.SeqIO Documentation

_method_squizz(*args, **kwargs)[source]

Header is less informative than the one obtained with biopython

Convert GENBANK to EMBL format

class GENBANK2FASTA(infile, outfile, *args, **kargs)[source]

Convert GENBANK file to FASTA file

Methods are based on biopython [BIOPYTHON], squizz [SQUIZZ] and our own Bioconvert implementation.

constructor

Parameters:
  • infile (str) -- input GENBANK file

  • outfile (str) -- output EMBL filename

_default_method = 'biopython'

Default value

_method_biopython(*args, **kwargs)[source]

For this method we use the biopython package Bio.SeqIO.

Bio.SeqIO Documentation

_method_python(*args, **kwargs)[source]

Internal method.

_method_squizz(*args, **kwargs)[source]

Header is less informative than the one obtained with biopython

Convert GENBANK to GFF3 format

class GENBANK2GFF3(infile, outfile, *args, **kargs)[source]

Convert GENBANK file to GFF3 file

Method based on biocode.

constructor

Parameters:
  • infile (str) -- input GENBANK file

  • outfile (str) -- output GFF3 filename

_default_method = 'biocode'

Default value

_method_biocode(*args, **kwargs)[source]

Uses scripts from biocode copied and modified in bioconvert.utils.biocode

Please see Main entry

Convert GFF2 to GFF3 format

class GFF22GFF3(infile, outfile, *args, **kargs)[source]

Convert GFF2 to GFF3

constructor

Parameters:
  • infile (str) --

  • outfile (str) --

Method available is pure Python.

_default_method = 'bioconvert'

Default value

_method_bioconvert(*args, **kwargs)[source]

This method is a basic mapping of the 9th column of gff2 to gff3. Other methods with smart translations must be created for specific usages. There is no good solution for this translation.

Convert GFF3 to GFF2 format

class GFF32GFF2(infile, outfile, *args, **kargs)[source]

Convert GFF2 to GFF3

Method available is Python-based.

constructor

Parameters:
  • infile (str) -- input GFF3 filename

  • outfile (str) -- output GFF2 filename

_default_method = 'bioconvert'

Default value

_method_bioconvert(*args, **kwargs)[source]

This method is a basic mapping of the 9th column of gff2 to gff3. Other methods with smart translations must be created for specific usages. There is no good solution for this translation.

Convert GFA to FASTA format

class GFA2FASTA(infile, outfile)[source]

Convert sorted GFA file into FASTA file

Available methods are based on awk or python (default)

(Source code)

Reference:

https://github.com/GFA-spec/GFA-spec/blob/master/GFA-spec.md

See also

bioconvert.simulator.gfa

Parameters:
  • infile (str) -- The path to the input BAM file. It must be sorted.

  • outfile (str) -- The path to the output file

_default_method = 'python'

Default value

_method_awk(*args, **kwargs)[source]

For this method, we use the awk tools.

awk documentation

Returns:

the standard output

Return type:

io.StringIO object.

Note

this method fold the sequence to 80 characters

_method_python(*args, **kwargs)[source]

Internal method

Convert GZ file to BZ2 format

class GZ2BZ2(infile, outfile, *args, **kargs)[source]

Convert GZ file to BZ2 file

Unzip input file using pigz or gunzip and compress using pbzip2. Default is pigz/pbzip2.

constructor

Parameters:
  • infile (str) -- input GZ file

  • outfile (str) -- output BZ2 filename

_default_method = 'pigz_pbzip2'

Default value

_method_gunzip_bzip2(*args, **kwargs)[source]

Single theaded conversion. Method that uses gunzip bzip2.

gunzip documentation bzip2 documentation

_method_pigz_pbzip2(*args, **kwargs)[source]

Method that uses pigz pbzip2.

pigz documentation pbzip2 documentation

_method_python(*args, **kwargs)[source]

Internal method

Convert GZ to DSRC format

class GZ2DSRC(infile, outfile, *args, **kargs)[source]

Convert compressed fastq.gz file into DSRC compressed file

(Source code)

constructor

Parameters:
  • infile (str) -- input GZ filename

  • outfile (str) -- output DSRC filename

_default_method = 'pigzdsrc'

Default value

_method_pigzdsrc(*args, **kwargs)[source]

do the conversion gz -> DSRC

Returns:

the standard output

Return type:

io.StringIO object.

Method that uses pigz and dsrc.

pigz documentation dsrc documentation

Convert JSON to YAML format

class JSON2YAML(infile, outfile, *args, **kargs)[source]

Convert JSON file into YAML file

Conversion is based on yaml and json standard Python modules Indentation is set to 4 by default and affects the sections (not the list). For example:

fruits_list:
- apple
- orange
section1:
    do: true
    misc: 1

constructor

Parameters:
  • infile (str) -- input JSON file

  • outfile (str) -- input YAML file.

_default_method = 'yaml'

Default value

_method_yaml(*args, **kwargs)[source]

Internal method

Convert MAF file to SAM format

class MAF2SAM(infile, outfile)[source]

This is the Multiple alignment format or MIRA assembly format

This is not Mutation Annotation Format (somatic)

pbsim creates this kind of data

Some references:

Those two codes were in Py2 at the time of this implementation. We re-used some of the information from maf-convert but the code in bioconvert.io.maf can be considered original.

constructor

Parameters:
  • infile (str) -- the path of the input file.

  • outfile (str) -- the path of The output file

_default_method = 'python'

Default value

_method_python(*args, **kwargs)[source]

Internal module

MAF documentation

Converts NEWICK file to NEXUS format.

class NEWICK2NEXUS(infile, outfile=None, *args, **kwargs)[source]

Converts a tree file from NEWICK format to NEXUS format.

Methods available are based on gotree [GOTREE].

constructor

Parameters:
_default_method = 'gotree'

Default value

_method_gotree(*args, **kwargs)[source]

Convert NEWICK file in NEXUS format using gotree tool.

gotree documentation

Converts NEWICK file to PHYLOXML format.

class NEWICK2PHYLOXML(infile, outfile=None, alphabet=None, *args, **kwargs)[source]

Converts a tree file from NEWICK format to PHYLOXML format.

Methods available are based on gotree [GOTREE].

constructor

Parameters:
_default_method = 'gotree'

Default value

_method_gotree(*args, **kwargs)[source]

Convert NEWICK file in PHYLOXML format using gotree tool.

gotree documentation

Convert NEXUS to FASTA format

class NEXUS2FASTA(infile, outfile=None, alphabet=None, *args, **kwargs)[source]

Converts a sequence alignment from NEXUS format to FASTA format.

constructor

Parameters:
  • infile (str) -- input NEXUS file.

  • outfile (str) -- (optional) output FASTA file

_default_method = 'biopython'

Default value

_method_biopython(*args, **kwargs)[source]
Convert NEXUS interleaved or sequential file in FASTA format using biopython.

The FASTA output file will be an aligned FASTA file.

Bio.AlignIO

For instance:

We have a Nexus input file that look like

#NEXUS
[TITLE: Test file]

begin data;
dimensions ntax=3 nchar=123;
format interleave datatype=DNA missing=N gap=-;

matrix
read3                -AT--------CCCGCTCGATGGGCCTCATTGCGTCCACTAGTTGATCTT
read2                -----------------------GGAAGCCCACGCCACGGTCTTGATACG
read4                ---------------------AGGGATGAACGATGCTCGCAGTTGATGCT

read3                CTGGAGTAT---T----TAGGAAAGCAAGTAAACTCCTTGTACAAATAAA
read2                AATTTTTCTAATGGCTATCCCTACATAACCTAACCGGGCATGTAATGTGT
read4                CAGAAGTGCCATTGCGGTAGAAACAAATGTTCCCAGATTGTTGACTGATA

read3                GATCTTA-----GATGGGCAT--
read2                CACCGTTGTTTCGACGTAAAGAG
read4                AGTAGGACCTCAGTCGTGACT--
;

end;
begin assumptions;
options deftype=unord;
end;

the output file will look like

>read3
-AT--------CCCGCTCGATGGGCCTCATTGCGTCCACTAGTTGATCTTCTGGAGTAT-
--T----TAGGAAAGCAAGTAAACTCCTTGTACAAATAAAGATCTTA-----GATGGGCA
T--
>read2
-----------------------GGAAGCCCACGCCACGGTCTTGATACGAATTTTTCTA
ATGGCTATCCCTACATAACCTAACCGGGCATGTAATGTGTCACCGTTGTTTCGACGTAAA
GAG
>read4
---------------------AGGGATGAACGATGCTCGCAGTTGATGCTCAGAAGTGCC
ATTGCGGTAGAAACAAATGTTCCCAGATTGTTGACTGATAAGTAGGACCTCAGTCGTGAC
T--

and not

>read3
ATCCCGCTCGATGGGCCTCATTGCGTCCACTAGTTGATCTTCTGGAGTATTTAGGAAAGC
AAGTAAACTCCTTGTACAAATAAAGATCTTAGATGGGCAT
>read2
GGAAGCCCACGCCACGGTCTTGATACGAATTTTTCTAATGGCTATCCCTACATAACCTAA
CCGGGCATGTAATGTGTCACCGTTGTTTCGACGTAAAGAG
>read4
AGGGATGAACGATGCTCGCAGTTGATGCTCAGAAGTGCCATTGCGGTAGAAACAAATGTT
CCCAGATTGTTGACTGATAAGTAGGACCTCAGTCGTGACT
_method_goalign(*args, **kwargs)[source]

Convert NEXUS interleaved file in FASTA format using goalign tool.

goalign documentation

Warning

the sequential format is not supported

_method_squizz(*args, **kwargs)[source]

Convert NEXUS sequential or interleave file in FASTA format using squizz tool.

command used:

squizz -c FASTA infile > outfile

Converts NEXUS file to NEWICK format.

class NEXUS2NEWICK(infile, outfile=None, alphabet=None, *args, **kwargs)[source]

Converts a tree file from NEXUS format to NEWICK format.

Methods available are based on biopython [BIOPYTHON] or goalign [GOALIGN].

constructor

Parameters:
_default_method = 'gotree'

Default value

_method_biopython(*args, **kwargs)[source]

For this method we use the biopython package Bio.Phylo.

Bio.Phylo Documentation

_method_gotree(*args, **kwargs)[source]

Convert NEXUS file in NEWICK format using gotree tool.

gotree documentation

Converts NEXUS file to PHYLIP format.

class NEXUS2PHYLIP(infile, outfile=None, alphabet=None, *args, **kwargs)[source]

Converts a sequence alignment from NEXUS format to PHYLIP format.

Methods available are based on goalign [GOALIGN].

constructor

Parameters:
_default_method = 'goalign'

Default value

_method_goalign(*args, **kwargs)[source]

Convert NEXUS interleaved file in PHYLIP format using goalign tool.

goalign documentation

Converts NEXUS file to PHYLOXML format.

class NEXUS2PHYLOXML(infile, outfile=None, alphabet=None, *args, **kwargs)[source]

Converts a tree file from NEXUS format to PHYLOXML format.

Methods available are based on squizz [SQUIZZ] or biopython [BIOPYTHON], and goalign [GOALIGN].

constructor

Parameters:
_default_method = 'gotree'

Default value

_method_gotree(*args, **kwargs)[source]

uses gotree tool:

gotree documentation

Convert XLS format to CSV format

class ODS2CSV(infile, outfile)[source]

Convert XLS file into CSV file

Method based on pyexcel [PYEXCEL].

constructor

Parameters:
  • infile (str) --

  • outfile (str) --

_default_method = 'pyexcel'

Default value

_method_pyexcel(out_sep=',', line_terminator='\n', sheet_name=0, *args, **kwargs)[source]

Do the conversion XLS -> CSV using Panda library

pyexcel documentation

Converts PHYLIP file to CLUSTAL format.

class PHYLIP2CLUSTAL(infile, outfile=None, alphabet=None, *args, **kwargs)[source]

Converts a sequence alignment from PHYLIP format to CLUSTAL format

Methods available are based on biopython [BIOPYTHON], squiz [SQUIZZ].

constructor

Parameters:
_default_method = 'biopython'

Default value

_method_biopython(*args, **kwargs)[source]

Convert PHYLIP interleaved file in CLUSTAL format using biopython.

Bio.SeqIO Documentation

_method_squizz(*args, **kwargs)[source]

Convert PHYLIP interleaved file in CLUSTAL format using squizz tool.

Converts PHYLIP file to FASTA format.

class PHYLIP2FASTA(infile, outfile=None, alphabet=None, *args, **kwargs)[source]

Converts a sequence alignment in PHYLIP format to FASTA format

Methods available are based on biopython [BIOPYTHON], squiz [SQUIZZ].

constructor

Parameters:
_default_method = 'biopython'

default value

_method_biopython(*args, **kwargs)[source]

For this method we use the biopython package Bio.SeqIO.

Bio.SeqIO Documentation

_method_goalign(*args, **kwargs)[source]

Convert fasta file in Phylip interleaved format using goalign tool.

goalign documentation

The fasta file must be an alignemnt file, yhis mean all the sequences must have the same length (with the gap) otherwise an error will be raised

_method_squizz(*args, **kwargs)[source]

Convert Phylip inteleaved file in fasta format using squizz tool. The fasta file is an alignemnt, that means the gap are conserved.

Converts PHYLIP file to NEXUS format.

class PHYLIP2NEXUS(infile, outfile=None, *args, **kwargs)[source]

Converts a sequence alignment from PHYLIP format to NEXUS format.

Methods available are based on goalign [GOALIGN].

constructor

Parameters:
_default_method = 'goalign'

Default value

_method_goalign(*args, **kwargs)[source]

Convert PHYLIP interleaved file in NEXUS format using goalign tool.

goalign documentation

Converts PHYLIP file to STOCKHOLM format.

class PHYLIP2STOCKHOLM(infile, outfile=None, alphabet=None, *args, **kwargs)[source]

Converts a sequence alignment from PHYLIP interleaved to STOCKHOLM

Methods available are based on biopython [BIOPYTHON], squiz [SQUIZZ].

constructor

Parameters:
_default_method = 'biopython'

Default value

_method_biopython(*args, **kwargs)[source]

Convert PHYLIP interleaved file in STOCKHOLM format using biopython.

Bio.SeqIO Documentation

_method_squizz(*args, **kwargs)[source]

Convert PHYLIP interleaved file in STOCKHOLM format using squizz tool.

Converts PHYLIP file to XMFA format.

class PHYLIP2XMFA(infile, outfile=None, *args, **kwargs)[source]

Converts a sequence alignment from PHYLIP format to XMFA

Methods available are based on biopython [BIOPYTHON].

constructor

Parameters:
_default_method = 'biopython'

Default value

_method_biopython(*args, **kwargs)[source]

Convert PHYLIP interleaved file in XMFA (Mauve)format.

Bio.AlignIO

Converts PHYLOXML file to NEWICK format.

class PHYLOXML2NEWICK(infile, outfile=None, *args, **kwargs)[source]

Converts a tree file from PHYLOXML format to NEWICK format.

Methods available are based on gotree [GOTREE].

constructor

Parameters:
_default_method = 'gotree'

Default value

_method_gotree(*args, **kwargs)[source]

Convert PHYLOXML file in NEWICK format using gotree tool.

gotree documentation

Converts PHYLOXML file to NEXUS format.

class PHYLOXML2NEXUS(infile, outfile=None, *args, **kwargs)[source]

Converts a tree file from PHYLOXML format to NEXUS format.

Methods available are based on gotree [GOTREE].

constructor

Parameters:
_default_method = 'gotree'

Default value

_method_gotree(*args, **kwargs)[source]

Convert PHYLOXML file in NEXUS format using gotree tool.

gotree documentation

Convert PLINK to BPLINK

Converts a genotype dataset ped+map in PLINK format to bed+bim+fam BPLINK format

Conversion is based on plink executable

constructor

Parameters:
_default_method = 'plink'

Default value

Convert plink file in text using plink executable.

plink documentation

Convert SAM file to BAM format

class SAM2BAM(infile, outfile, *args, **kargs)[source]

Convert SAM file to BAM file

constructor

Parameters:
  • infile (str) --

  • outfile (str) --

_default_method = 'samtools'

Default value

_method_samtools(*args, **kwargs)[source]

Do the conversion SAM -> BAM using samtools

SAMtools documentation

Convert SAM file to CRAM format

class SAM2CRAM(infile, outfile, reference=None, *args, **kargs)[source]

Convert SAM file to CRAM file

The conversion requires the reference corresponding to the input file It can be provided as an argument with the standalone (--reference). Otherwise, users are asked to provide it.

Methods available are based on samtools [SAMTOOLS].

constructor

Parameters:
  • infile (str) -- input SAM file

  • outfile (str) -- output CRAM filename

_default_method = 'samtools'

Default value

_method_samtools(*args, **kwargs)[source]

Here we use the SAMtools tool.

SAMtools documentation

"Convert CRAM to BAM format

class SAM2PAF(infile, outfile, *args, **kargs)[source]

Convert SAM file to PAF file

The SAM and PAF formats are described in the Formats section.

Description:

The header of the SAM file (lines starting with @) are dropped. However, the length of the target is retrieved from the @SQ line that must be present.

Consider this SAM file with two alignements only. One is aligned on the target (first) while the other is not (indicated by the * characters):

@SQ     SN:ENA|K01711|K01711.1  LN:15894
@PG     ID:minimap2     PN:minimap2     VN:2.5-r572     CL:minimap2 -a measles.fa Hm2_GTGAAA_L005_R1_001.fastq.gz
HISEQ:426:C5T65ACXX:5:2302:1943:2127    0       ENA|K01711|K01711.1     448     60      101M    *       00      CTTACCTTCGCATCAAGAGGTACCAACATGGAGGATGAGGCGGACCAATACTTTTCACATGATGATCCAATTAGTAGTGATCAATCCAGGTTCGGATGGTT   BCCFFFFFHHHHHIIJJJJJJIIJJJJJJJJFHIHIJJJIJIIIIGHFFFFFFEEEEEEEDDDDDFDDDDDDDDD>CDDEDEEDDDDDDCCDDDDDDDDCD   NM:i:0  ms:i:202        AS:i:202        nn:i:0  tp:A:P  cm:i:14 s1:i:94 s2:i:0
HISEQ:426:C5T65ACXX:5:2302:4953:2090    4       *       0       0       *       *       0       0       AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAAACAACCAAAAAGAGACGAACAA   CCCFFDDFAFFBHJHGGGIHIJBGGHIIJJJJJJJHGEIJGIFIIIHCBGHIJIIIIIJJHHHHEF@D@=;=,0)0&5&))+(((+((((&+(((()&&)(

The equivalent PAF file is

HISEQ:426:C5T65ACXX:5:2302:1943:2127    101     0       101     +       ENA|K01711|K01711.1     15894   447     548     101     101     60      NM:i:0  ms:i:202        AS:i:202        nn:i:0  tp:A:P  cm:i:14 s1:i:94 s2:i:0  cg:Z:101M

In brief, the sequences are dropped. The final file is therefore smaller. Extra fields (starting from NM:i:0) can be dropped or kept using the keep_extra_field argument. Alignement with * characters are dropped. The first line (@SQ) is used to retrieve the length of the contigs that is stored in the PAF file (column 6).

The 12 compulsary PAF fields are:

Col

Type

Description

1

string

Query sequence name

2

int

Query sequence length

3

int

Query start (0-based)

4

int

Query end (0-based)

5

char

Relative strand: "+" or "-"

6

string

Target sequence name

7

int

Target sequence length

8

int

Target start on original strand (0-based)

9

int

Target end on original strand (0-based)

10

int

Number of residue matches

11

int

Alignment block length

12

int

Mapping quality (0-255; 255 for missing)

For developesr:

Get the measles data from Sequana library (2 paired fastq files):

minimap2 measles.fa R1.fastq > approx-mapping.paf

You can ask minimap2 to generate CIGAR at the cg tag of PAF with:

minimap2 -c measles.fa R1.fastq > alignment.paf

or to output alignments in the SAM format:

minimap2 -a measles.fa R1.fastq > alignment.sam

The SAM lines must contains 11 positional element and the NM:i and nn:i fields (see example above).

constructor

Parameters:
  • infile (str) -- input SAM file

  • outfile (str) -- output PAF filename

Reference:

This function is a direct translation of https://github.com/lh3/miniasm/blob/master/misc/sam2paf.js (Dec. 2017).

_default_method = 'python'

Default value

_method_python(*args, **kwargs)[source]

Internal method

Convert SCF file to FASTA file

class SCF2FASTA(infile, outfile)[source]

Converts a binary SCF/ABI file to Fasta format.

Parameters:
  • infile (str) -- input SCF/ABI file

  • outfile (str) -- output name file

constructor

Parameters:
  • infile (str) -- the path of the input file.

  • outfile (str) -- the path of The output file

_default_method = 'python'

Default value

_method_python(*args, **kwargs)[source]

Internal method

Convert SCF file to FASTQ file

class SCF2FASTQ(infile, outfile)[source]

Converts a binary SCF file to FastQ file

Parameters:
  • infile (str) -- input SCF file

  • outfile (str) -- output name file

constructor

Parameters:
  • infile (str) -- the path of the input file.

  • outfile (str) -- the path of The output file

_default_method = 'python'

Default value

_method_python(*args, **kwargs)[source]

Internal method

Convert SRA format to FASTA format

class SRA2FASTQ(infile, outfile, test=False)[source]

Download FASTQ from SRA archive

bioconvert sra2fastq ERR043367

This may take some times since the files are downloaded from SRA website.

constructor

https://edwards.flinders.edu.au/fastq-dump/

library used: sra-toolkit

_default_method = 'fastq_dump'

Default value

_method_fastq_dump(*args, **kwargs)[source]

Uses Sratoolkit (fastq-dump) to convert a sra file to fastq

Fastq-dump documentation

Converts STOCKHOLM file to CLUSTAL file.

class STOCKHOLM2CLUSTAL(infile, outfile=None, alphabet=None, *args, **kwargs)[source]

Converts a sequence alignment from STOCKHOLM format to CLUSTAL format

Methods available are based on squizz [SQUIZZ] and biopython [BIOPYTHON].

constructor

Parameters:
_default_method = 'biopython'

Default value

_method_biopython(*args, **kwargs)[source]

Convert STOCKHOLM interleaved file in CLUSTAL format using biopython.

Bio.SeqIO Documentation

_method_squizz(*args, **kwargs)[source]

Convert STOCKHOLM file in CLUSTAL format using squizz tool.

Converts STOCKHOLM to PHYLIP format.

class STOCKHOLM2PHYLIP(infile, outfile=None, *args, **kwargs)[source]

Converts a sequence alignment from STOCKHOLM format to PHYLIP interleaved format

Methods available are based on squizz [SQUIZZ], and biopython [BIOPYTHON].

constructor

Parameters:
_default_method = 'biopython'

Default value

_method_biopython(*args, **kwargs)[source]

Convert STOCKHOLM interleaved file in PHYLIP format using biopython.

Bio.SeqIO Documentation

_method_squizz(*args, **kwargs)[source]

Convert STOCKHOLM interleaved file in PHYLIP interleaved format using squizz tool.

Convert TSV format to CSV format

class TSV2CSV(infile, outfile)[source]

Convert TSV file into CSV file

Available methods: Python, Pandas

Methods available are based on python or Pandas [PANDAS].

See also

CSV2TSV

Constructor

Parameters:
  • infile (str) -- tabulated file

  • outfile (str) -- comma-separated file

_default_method = 'python'

Default value

_method_pandas(in_sep='\t', out_sep=',', line_terminator='\n', *args, **kwargs)[source]

Do the conversion TSV -> CSV using Pandas library

pandas documentation

_method_python(in_sep='\t', out_sep=',', line_terminator='\n', *args, **kwargs)[source]

Do the conversion TSV -> CSV using csv module.

csv documentation

_method_python_v2(in_sep='\t', out_sep=',', line_terminator='\n', *args, **kwargs)[source]

Do the conversion TSV -> CSV using csv module

Note

Note that this method cannot escape nor quote output char

csv documentation

Conversion from TWOBIT to FASTA format

class TWOBIT2FASTA(infile, outfile=None, alphabet=None, *args, **kwargs)[source]

Converts a sequence alignment in TWOBIT format to FASTA format

Conversion is based on UCSC [UCSC] and py2bit.

constructor

Parameters:
_default_method = 'py2bit'

Default value

_method_py2bit(*args, **kwargs)[source]

This method uses the py2bit python extension.

py2bit documentation

_method_ucsc(*args, **kwargs)[source]

Convert twobit file in fasta format using ucsc twobittofa.

uscsc faToTwoBit Documentation

Convert VCF to BCF format

class VCF2BCF(infile, outfile=None, *args, **kwargs)[source]

Convert VCF file to BCF format

Method based on bcftools [BCFTOOLS].

Parameters:
  • infile (str) -- The path to the input FASTA file.

  • outfile (str) -- The path to the output file.

_default_method = 'bcftools'

Default value

_method_bcftools(*args, **kwargs)[source]

For this method, we use the BCFtools tool

BCFtools documentation

command used:

bcftools view -Sb
Parameters:
  • args --

  • kwargs --

Returns:

Convert VCF to BED3 file

class VCF2BED(infile, outfile)[source]

Convert VCF file to BED3 file by extracting positions.

The awk method implemented here below reports an interval of 1 for SNP, the length of the insertion or the length of the deleted part in case of deletion.

constructor

Parameters:
  • infile (str) -- the path of the input file.

  • outfile (str) -- the path of The output file

_default_method = 'awk'

Default value

_method_awk(*args, **kwargs)[source]

do the conversion VCF -> BED using awk

awk documentation

Returns:

the standard output

Return type:

io.StringIO object.

Convert VCF format to WIGGLE format

class VCF2WIGGLE(infile, outfile)[source]

Convert sorted VCF file into WIGGLE file

Parameters:
  • infile (str) -- The path to the input VCF file. It must be sorted.

  • outfile (str) -- The path to the output file

_default_method = 'wiggletools'

Default value

_method_wiggletools(*args, **kwargs)[source]

Conversion using wiggletools

wiggletools documentation

Convert XLS format to CSV format

class XLS2CSV(infile, outfile)[source]

Convert XLS file into CSV file

Extra arguments when using Bioconvert executable.

name

Description

--sheet-name

The name or id of the sheet to convert

--out-sep

The separator used in the output file

--line-terminator

The line terminator used in the output file

Methods available are based on pandas [PANDAS] and pyexcel [PYEXCEL].

Constructor

Parameters:
  • infile (str) --

  • outfile (str) --

_method_pandas(out_sep=',', line_terminator='\n', sheet_name=0, *args, **kwargs)[source]

Do the conversion XLSX -> CSV using Pandas library.

pandas documentation

_method_pyexcel(out_sep=',', line_terminator='\n', sheet_name=0, *args, **kwargs)[source]

Do the conversion XLS -> CSV using pyexcel library

pyexcel documentation

Convert XLS format to CSV format

class XLSX2CSV(infile, outfile)[source]

Convert XLSX file into CSV file

Extra arguments when using Bioconvert executable.

name

Description

--sheet-name

The name or id of the sheet to convert

--out-sep

The separator used in the output file

--line-terminator

The line terminator used in the output file

Methods available are based on pandas [PANDAS] and pyexcel [PYEXCEL].

Constructor

Parameters:
  • infile (str) --

  • outfile (str) --

_default_method = 'pandas'

Default value

_method_pandas(out_sep=',', line_terminator='\n', sheet_name=0, *args, **kwargs)[source]

Do the conversion XLSX -> CSV using Pandas library.

pandas documentation

_method_pyexcel(out_sep=',', line_terminator='\n', sheet_name=0, *args, **kwargs)[source]

Do the conversion XLSX -> CSV using pyexcel library

pyexcel documentation

Convert XMFA to PHYLIP format

class XMFA2PHYLIP(infile, outfile=None, *args, **kwargs)[source]

Converts a sequence alignment from XMFA to PHYLIP format.

Method available based on biopython [BIOPYTHON].

constructor

Parameters:
_default_method = 'biopython'

Default value

_method_biopython(*args, **kwargs)[source]

Convert XMFA interleaved file in PHYLIP (Mauve)format.

Bio.SeqIO Documentation

Convert YAML to JSON format

class YAML2JSON(infile, outfile, *args, **kargs)[source]

Convert YAML file into JSON file

Conversion is based on yaml and json standard Python modules

Note

YAML comments will be lost in JSON output

Reference:

http://yaml.org/spec/1.2/spec.html#id2759572

constructor

Parameters:
  • infile (str) -- input YAML file.

  • outfile (str) -- input JSON file

_default_method = 'python'

Default value

_method_python(*args, **kwargs)[source]

Internal method

get_json()[source]

Return the JSON dictionary corresponding to the YAML input.