7.1. Core functions

bioconvert.core.base

Main factory of Bioconvert

bioconvert.core.benchmark

Tools for benchmarking

bioconvert.core.converter

bioconvert.core.decorators

Provides a general tool to perform pre/post compression

bioconvert.core.downloader

Download singularity image

bioconvert.core.extensions

List of formats and associated extensions

bioconvert.core.graph

Network tools to manipulate the graph of conversion

bioconvert.core.registry

Main bioconvert registry that fetches automatically the relevant converter

bioconvert.core.shell

Simplified version of shell.py module from snakemake package

bioconvert.core.utils

misc utility functions

7.1.1. Base

Main factory of Bioconvert

class bioconvert.core.base.ConvArg(names, help, **kwargs)[source]

This class can be used to add specific extra arguments to any converter

For instance, imagine a conversion named A2B that requires the user to provide a reference. Then, you may want to provide the –reference extra argument. This is possible by adding a class method named get_additional_arguments that will yield instance of this class for each extra argument.

@classmethod
def get_additional_arguments(cls):
    yield ConvArg(
        names="--reference",
        default=None,
        help="the referenc"
    )

Then, when calling bioconvert as follows,:

bioconvert A2B --help

the new argument will be shown in the list of arguments.

Methods

add_to_sub_parser

file

class bioconvert.core.base.ConvBase(infile, outfile)[source]

Base class for all converters.

To build a new converter, create a new class which inherits from ConvBase and implement method that performs the conversion. The name of the converter method must start with _method_.

For instance:

class FASTQ2FASTA(ConvBase):

    def _method_python(self, *args, **kwargs):
        # include your code here. You can use the infile and outfile
        # attributes.
        self.infile
        self.outfile
Attributes:
default
input_ext
name

The name of the class

output_ext

Methods

__call__(*args[, method_name])

boxplot_benchmark([rot_xticks, ...])

This function plots the benchmark computed in compute_benchmark()

compute_benchmark([N, to_exclude, to_include])

Simple wrapper to call Benchmark

get_IO_arguments()

install_tool(executable)

Install the given tool, using the script: bioconvert/install_script/install_executable.sh if the executable is not already present

add_argument_to_parser

execute

get_additional_arguments

get_common_arguments

get_common_arguments_for_converter

get_description

shell

constructor

Parameters:
  • infile (str) – the path of the input file.

  • outfile (str) – the path of The output file

Attributes:
default
input_ext
name

The name of the class

output_ext

Methods

__call__(*args[, method_name])

boxplot_benchmark([rot_xticks, ...])

This function plots the benchmark computed in compute_benchmark()

compute_benchmark([N, to_exclude, to_include])

Simple wrapper to call Benchmark

get_IO_arguments()

install_tool(executable)

Install the given tool, using the script: bioconvert/install_script/install_executable.sh if the executable is not already present

add_argument_to_parser

execute

get_additional_arguments

get_common_arguments

get_common_arguments_for_converter

get_description

shell

boxplot_benchmark(rot_xticks=90, boxplot_args={}, mode='time')[source]

This function plots the benchmark computed in compute_benchmark()

compute_benchmark(N=5, to_exclude=[], to_include=[])[source]

Simple wrapper to call Benchmark

This function computes the benchmark

see Benchmark for details.

install_tool(executable)[source]

Install the given tool, using the script: bioconvert/install_script/install_executable.sh if the executable is not already present

Parameters:

executable – executable to install

Returns:

nothing

property name

The name of the class

class bioconvert.core.base.ConvMeta(name, bases, namespace, /, **kwargs)[source]

This metaclass checks that the converter classes have

  • an attribute input_ext

  • an attribute output_ext

This is a meta class used by ConvBase class. For developers only.

Methods

__call__(*args, **kwargs)

Call self as a function.

mro(/)

Return a type's method resolution order.

register(subclass)

Register a virtual subclass of an ABC.

lower_tuple

split_converter_to_format

bioconvert.core.base.make_chain(converter_map)[source]

Create a class performing step-by-step conversions following a path. converter_map is a list of pairs ((in_fmt, out_fmt), converter). It describes the conversion path.

7.1.2. Benchmark

Tools for benchmarking

class bioconvert.core.benchmark.Benchmark(obj, N=5, to_exclude=None, to_include=None)[source]

Convenient class to benchmark several methods for a given converter

c = BAM2COV(infile, outfile)
b = Benchmark(c, N=5)
b.run_methods()
b.plot()

Methods

plot([rerun, ylabel, rot_xticks, ...])

Plots the benchmark results, running the benchmarks if needed or if rerun is True.

run_methods()

Runs the benchmarks, and stores the timings in self.results.

monitor_usage

Constructor

Parameters:
  • obj – can be an instance of a converter class or a class name

  • N (int) – number of replicates

  • to_exclude (list) – methods to exclude from the benchmark

  • to_include (list) – methods to include ONLY

Use one of to_exclude or to_include. If both are provided, only the to_include one is used.

Methods

plot([rerun, ylabel, rot_xticks, ...])

Plots the benchmark results, running the benchmarks if needed or if rerun is True.

run_methods()

Runs the benchmarks, and stores the timings in self.results.

monitor_usage

plot(rerun=False, ylabel=None, rot_xticks=0, boxplot_args={}, mode='time')[source]

Plots the benchmark results, running the benchmarks if needed or if rerun is True.

Parameters:
  • rot_xlabel – rotation of the xticks function

  • boxplot_args – dictionary with any of the pylab.boxplot arguments

  • mode – either time, CPU or memory

Returns:

dataframe with all results

run_methods()[source]

Runs the benchmarks, and stores the timings in self.results.

bioconvert.core.benchmark.plot_multi_benchmark_max(path_json, output_filename='multi_benchmark.png', min_ylim=0, mode=None)[source]

Plotting function for the Snakefile_benchmark to be found in the doc

The json file looks like:

{
  "awk":{
    "0":0.777020216,
    "1":0.9638044834,
    "2":1.7623617649,
    "3":0.8348755836
  },
  "seqtk":{
    "0":1.0024843216,
    "1":0.6313509941,
    "2":1.4048073292,
    "3":1.0554351807
  },
  "Benchmark":{
    "0":1,
    "1":1,
    "2":2,
    "3":2
  }
}

Number of benchmark is infered from field ‘Benchmark’.

7.1.3. Converter

Standalone application dedicated to conversion

class bioconvert.core.converter.Bioconvert(infile, outfile, force=False, threads=None, extra=None)[source]

Universal converter used by the standalone

from bioconvert import Bioconvert
c = Bioconvert("test.fastq", "test.fasta", threads=4, force=True)

Methods

__call__(*args, **kwargs)

Call self as a function.

constructor

Parameters:
  • infile (str) – The path of the input file.

  • outfile (str) – The path of The output file

  • force (bool) – overwrite output file if it exists already otherwise raises an error

Methods

__call__(*args, **kwargs)

Call self as a function.

7.1.4. Decorators

Provides a general tool to perform pre/post compression

bioconvert.core.decorators.compressor(func)[source]

Decompress/compress input file without pipes

Does not use pipe: we decompress and compress back the input file. The advantage is that it should work for any files (even very large).

This decorator should be used by method that uses pure python code

bioconvert.core.decorators.in_gz(func)[source]

Marks a function as accepting gzipped input.

bioconvert.core.decorators.make_in_gz_tester(converter)[source]

Generates a function testing whether a conversion method of converter has the in_gz tag.

bioconvert.core.decorators.out_compressor(func)[source]

Compress output file without pipes

This decorator should be used by method that uses pure python code

bioconvert.core.decorators.requires(external_binary=None, python_library=None, external_binaries=None, python_libraries=None)[source]
Parameters:
  • external_binary – a system binary required for the method

  • python_library – a python library required for the method

  • external_binaries – an array of system binaries required for the method

  • python_libraries – an array of python libraries required for the method

Returns:

bioconvert.core.decorators.requires_nothing(func)[source]

Marks a function as not needing dependencies.

7.1.5. Downloader

Download singularity image

7.1.6. Extensions

List of formats and associated extensions

class bioconvert.core.extensions.AttrDict(**kwargs)[source]

Copy from easydev package.

Methods

clear()

copy()

fromkeys(iterable[, value])

Create a new dictionary with keys from iterable and values set to value.

get(key[, default])

Return the value for key if key is in the dictionary, else default.

items()

keys()

pop(key[, default])

If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem(/)

Remove and return a (key, value) pair as a 2-tuple.

setdefault(key[, default])

Insert key with a value of default if key is not in the dictionary.

update(content)

See class/constructor documentation for details

values()

update(content)[source]

See class/constructor documentation for details

Parameters:

content (dict) – a valid dictionary

bioconvert.core.extensions.extensions = {'abi': ['abi', 'ab1'], 'agp': ['agp'], 'bam': ['bam'], 'bcf': ['bcf'], 'bed': ['bed'], 'bedgraph': ['bedgraph', 'bg'], 'bigbed': ['bb', 'bigbed'], 'bigwig': ['bigwig', 'bw'], 'bplink': ['bplink'], 'bz2': ['bz2'], 'cdao': ['cdao'], 'clustal': ['clustal', 'aln', 'clw'], 'cov': ['cov'], 'cram': ['cram'], 'csv': ['csv'], 'dsrc': ['dsrc'], 'embl': ['embl'], 'ena': ['ena'], 'faa': ['faa', 'mpfa', 'aa'], 'fast5': ['fast5'], 'fasta': ['fasta', 'fa', 'fst'], 'fastq': ['fastq', 'fq'], 'genbank': ['genbank', 'gbk', 'gb', 'gbff'], 'gexf': ['gexf'], 'gfa': ['gfa'], 'gff2': ['gff'], 'gff3': ['gff3'], 'gml': ['gml'], 'graphml': ['graphml'], 'gtf': ['gtf'], 'gz': ['gz'], 'jaspar': ['jaspar'], 'json': ['json'], 'maf': ['maf'], 'mol2': ['mol2'], 'newick': ['newick', 'nw', 'nhx', 'nwk'], 'nexus': ['nexus', 'nx', 'nex', 'nxs'], 'ods': ['ods'], 'paf': ['paf'], 'pajek': ['net'], 'pdb': ['pdb'], 'phylip': ['phy', 'ph', 'phylip'], 'phyloxml': ['phyloxml', 'xml'], 'plink': ['plink'], 'pod5': ['pod5'], 'qual': ['qual'], 'sam': ['sam'], 'scf': ['scf'], 'sdf': ['sdf'], 'smiles': ['smi', 'smiles'], 'sra': ['sra'], 'stockholm': ['sto', 'sth', 'stk', 'stockholm'], 'transfac': ['transfac', 'tf'], 'tsv': ['tsv'], 'twobit': ['2bit'], 'vcf': ['vcf'], 'wig': ['wig'], 'wiggle': ['wig', 'wiggle'], 'xls': ['xls'], 'xlsx': ['xlsx'], 'xmfa': ['xmfa'], 'yaml': ['yaml', 'YAML']}

List of formats and their extensions included in Bioconvert

7.1.7. Graph

Network tools to manipulate the graph of conversion

bioconvert.core.graph.create_graph(filename, layout='dot', use_singularity=False, color_for_disabled_converter='red', include_subgraph=False)[source]
Parameters:

filename – should end in .png or .svg or .dot

If extension is .dot, only the dot file is created without annotations. This is useful if you have issues installing graphviz. If so, under Linux you could use our singularity container see github.com/cokelaer/graphviz4all

bioconvert.core.graph.create_graph_for_cytoscape(all_converter=False)[source]
Parameters:

all_converter – use all converters or only the ones available in the current installation

Returns:

7.1.8. Registry

Main bioconvert registry that fetches automatically the relevant converter

class bioconvert.core.registry.Registry[source]

class to centralise information about available conversions

from bioconvert.core.registry import Registry
r = Registry()
r.conversion_exists("BAM", "BED")
r.info()  # returns number of available methods for each converter

conv_class = r[(".bam", ".bed")]
converter = conv_class(input_file, output_file)
converter.convert()

Methods

conversion_exists(input_fmt, output_fmt[, ...])

conversion_path(input_fmt, output_fmt)

Return a list of conversion steps to get from input and output formats

get_all_conversions()

get_conversions()

get_conversions_from_ext()

get_converters_names()

get_ext(ext_pair)

Copy the registry into a dict that behaves like a list to be able to have multiple values for a single key and from a key have all converter able to do the conversion from the input extension to the output extension.

iter_converters([allow_indirect])

set_ext(ext_pair, convertor)

Register new convertor from input extension and output extension in a list.

get_info

info

conversion_exists(input_fmt, output_fmt, allow_indirect=False)[source]
Parameters:
  • input_fmt (str) – the input format

  • output_fmt (str) – the output format

  • allow_indirect (boolean) – whether to count indirect conversions

Returns:

True if a converter which transform input_fmt into output_fmt exists

Return type:

boolean

conversion_path(input_fmt, output_fmt)[source]

Return a list of conversion steps to get from input and output formats

Parameters:

Each step in the list is a pair of formats.

get_all_conversions()[source]
Returns:

a generator which allow to iterate on all available conversions and their availability; a conversion is encoded by a tuple of 2 strings (input format, output format)

Retype:

generator (input format, output format, status)

get_conversions()[source]
Returns:

a generator which allow to iterate on all available conversions a conversion is encoded by a tuple of 2 strings (input format, output format)

Retype:

generator

get_conversions_from_ext()[source]
Returns:

a generator which allow to iterate on all available conversions a conversion is encoded by a tuple of 2 strings (input extension, output extension)

Return type:

generator

get_converters_names()[source]
Returns:

a generator that allows to get the name of the converter from the subclass (ConvBase object)

Return type:

generator

get_ext(ext_pair)[source]

Copy the registry into a dict that behaves like a list to be able to have multiple values for a single key and from a key have all converter able to do the conversion from the input extension to the output extension.

Parameters:

ext_pair (tuple of 2 strings) – the input extension, the output extension

Returns:

list of objects of subclass o ConvBase

iter_converters(allow_indirect: bool = False)[source]
Parameters:

allow_indirect (bool) – also return indirect conversion

Returns:

a generator to iterate over (in_fmt, out_fmt, converter class when direct, path when indirect)

Return type:

a generator

set_ext(ext_pair, convertor)[source]

Register new convertor from input extension and output extension in a list. We can have a list of multiple convertors for one ext_pair.

Parameters:
  • ext_pair (tuple) – tuple containing the input extensions and the output extensions e.g. ( (“fastq”,) , (“fasta”) )

  • convertor (list of ConvBase object) – the convertor which handle the conversion from input_ext -> output_ext

7.1.9. Utils

misc utility functions

class bioconvert.core.utils.TempFile(suffix='', dir=None)[source]

A small wrapper around tempfile.NamedTemporaryFile function

f = TempFile(suffix="csv")
f.name
f.delete() # alias to delete=False and close() calls

Copy from easydev package

Attributes:
name

Methods

delete

class bioconvert.core.utils.Timer(times)[source]

Timer working with with statement

Copy from easydev package.

bioconvert.core.utils.generate_outfile_name(infile, out_extension)[source]

simple utility to replace the file extension with the given one.

Parameters:
  • infile (str) – the path to the Input file

  • out_extension (str) – Desired extension

Returns:

The file path with the given extension

Return type:

str

bioconvert.core.utils.get_extension(filename, remove_compression=False)[source]

Return extension of a filename

>>> get_extension("test.fastq")
fastq
>>> get_extension("test.fastq.gz")
fastq
bioconvert.core.utils.get_format_from_extension(extension)[source]

get format from extension.

Parameters:

extension – the extension

Returns:

the corresponding format

Return type:

str

bioconvert.core.utils.md5(fname, chunk=65536)[source]

Return the MD5 checksums of a file