7.1. Core functions
Main factory of Bioconvert |
|
Tools for benchmarking |
|
Provides a general tool to perform pre/post compression |
|
Download singularity image |
|
List of formats and associated extensions |
|
Network tools to manipulate the graph of conversion |
|
Main bioconvert registry that fetches automatically the relevant converter |
|
|
Simplified version of shell.py module from snakemake package |
misc utility functions |
7.1.1. Base
Main factory of Bioconvert
- class bioconvert.core.base.ConvArg(names, help, **kwargs)[source]
This class can be used to add specific extra arguments to any converter
For instance, imagine a conversion named A2B that requires the user to provide a reference. Then, you may want to provide the –reference extra argument. This is possible by adding a class method named get_additional_arguments that will yield instance of this class for each extra argument.
@classmethod def get_additional_arguments(cls): yield ConvArg( names="--reference", default=None, help="the referenc" )
Then, when calling bioconvert as follows,:
bioconvert A2B --help
the new argument will be shown in the list of arguments.
Methods
add_to_sub_parser
file
- class bioconvert.core.base.ConvBase(infile, outfile)[source]
Base class for all converters.
To build a new converter, create a new class which inherits from
ConvBaseand implement method that performs the conversion. The name of the converter method must start with_method_.For instance:
class FASTQ2FASTA(ConvBase): def _method_python(self, *args, **kwargs): # include your code here. You can use the infile and outfile # attributes. self.infile self.outfile
- Attributes:
- default
- input_ext
nameThe name of the class
- output_ext
Methods
__call__(*args[, method_name])boxplot_benchmark([rot_xticks, ...])This function plots the benchmark computed in
compute_benchmark()compute_benchmark([N, to_exclude, to_include])Simple wrapper to call
Benchmarkget_IO_arguments()install_tool(executable)Install the given tool, using the script: bioconvert/install_script/install_executable.sh if the executable is not already present
add_argument_to_parser
execute
get_additional_arguments
get_common_arguments
get_common_arguments_for_converter
get_description
shell
constructor
- Parameters:
- Attributes:
- default
- input_ext
nameThe name of the class
- output_ext
Methods
__call__(*args[, method_name])boxplot_benchmark([rot_xticks, ...])This function plots the benchmark computed in
compute_benchmark()compute_benchmark([N, to_exclude, to_include])Simple wrapper to call
Benchmarkget_IO_arguments()install_tool(executable)Install the given tool, using the script: bioconvert/install_script/install_executable.sh if the executable is not already present
add_argument_to_parser
execute
get_additional_arguments
get_common_arguments
get_common_arguments_for_converter
get_description
shell
- boxplot_benchmark(rot_xticks=90, boxplot_args={}, mode='time')[source]
This function plots the benchmark computed in
compute_benchmark()
- compute_benchmark(N=5, to_exclude=[], to_include=[])[source]
Simple wrapper to call
BenchmarkThis function computes the benchmark
see
Benchmarkfor details.
- install_tool(executable)[source]
Install the given tool, using the script: bioconvert/install_script/install_executable.sh if the executable is not already present
- Parameters:
executable – executable to install
- Returns:
nothing
- property name
The name of the class
- class bioconvert.core.base.ConvMeta(name, bases, namespace, /, **kwargs)[source]
This metaclass checks that the converter classes have
an attribute input_ext
an attribute output_ext
This is a meta class used by
ConvBaseclass. For developers only.Methods
__call__(*args, **kwargs)Call self as a function.
mro(/)Return a type's method resolution order.
register(subclass)Register a virtual subclass of an ABC.
lower_tuple
split_converter_to_format
7.1.2. Benchmark
Tools for benchmarking
- class bioconvert.core.benchmark.Benchmark(obj, N=5, to_exclude=None, to_include=None)[source]
Convenient class to benchmark several methods for a given converter
c = BAM2COV(infile, outfile) b = Benchmark(c, N=5) b.run_methods() b.plot()
Methods
plot([rerun, ylabel, rot_xticks, ...])Plots the benchmark results, running the benchmarks if needed or if rerun is True.
Runs the benchmarks, and stores the timings in self.results.
monitor_usage
Constructor
- Parameters:
Use one of to_exclude or to_include. If both are provided, only the to_include one is used.
Methods
plot([rerun, ylabel, rot_xticks, ...])Plots the benchmark results, running the benchmarks if needed or if rerun is True.
Runs the benchmarks, and stores the timings in self.results.
monitor_usage
- plot(rerun=False, ylabel=None, rot_xticks=0, boxplot_args={}, mode='time')[source]
Plots the benchmark results, running the benchmarks if needed or if rerun is True.
- Parameters:
rot_xlabel – rotation of the xticks function
boxplot_args – dictionary with any of the pylab.boxplot arguments
mode – either time, CPU or memory
- Returns:
dataframe with all results
- bioconvert.core.benchmark.plot_multi_benchmark_max(path_json, output_filename='multi_benchmark.png', min_ylim=0, mode=None)[source]
Plotting function for the Snakefile_benchmark to be found in the doc
The json file looks like:
{ "awk":{ "0":0.777020216, "1":0.9638044834, "2":1.7623617649, "3":0.8348755836 }, "seqtk":{ "0":1.0024843216, "1":0.6313509941, "2":1.4048073292, "3":1.0554351807 }, "Benchmark":{ "0":1, "1":1, "2":2, "3":2 } }
Number of benchmark is infered from field ‘Benchmark’.
7.1.3. Converter
Standalone application dedicated to conversion
- class bioconvert.core.converter.Bioconvert(infile, outfile, force=False, threads=None, extra=None)[source]
Universal converter used by the standalone
from bioconvert import Bioconvert c = Bioconvert("test.fastq", "test.fasta", threads=4, force=True)
Methods
__call__(*args, **kwargs)Call self as a function.
constructor
- Parameters:
Methods
__call__(*args, **kwargs)Call self as a function.
7.1.4. Decorators
Provides a general tool to perform pre/post compression
- bioconvert.core.decorators.compressor(func)[source]
Decompress/compress input file without pipes
Does not use pipe: we decompress and compress back the input file. The advantage is that it should work for any files (even very large).
This decorator should be used by method that uses pure python code
- bioconvert.core.decorators.make_in_gz_tester(converter)[source]
Generates a function testing whether a conversion method of converter has the in_gz tag.
- bioconvert.core.decorators.out_compressor(func)[source]
Compress output file without pipes
This decorator should be used by method that uses pure python code
- bioconvert.core.decorators.requires(external_binary=None, python_library=None, external_binaries=None, python_libraries=None)[source]
- Parameters:
external_binary – a system binary required for the method
python_library – a python library required for the method
external_binaries – an array of system binaries required for the method
python_libraries – an array of python libraries required for the method
- Returns:
7.1.5. Downloader
Download singularity image
7.1.6. Extensions
List of formats and associated extensions
- class bioconvert.core.extensions.AttrDict(**kwargs)[source]
Copy from easydev package.
Methods
clear()copy()fromkeys(iterable[, value])Create a new dictionary with keys from iterable and values set to value.
get(key[, default])Return the value for key if key is in the dictionary, else default.
items()keys()pop(key[, default])If the key is not found, return the default if given; otherwise, raise a KeyError.
popitem(/)Remove and return a (key, value) pair as a 2-tuple.
setdefault(key[, default])Insert key with a value of default if key is not in the dictionary.
update(content)See class/constructor documentation for details
values()
- bioconvert.core.extensions.extensions = {'abi': ['abi', 'ab1'], 'agp': ['agp'], 'bam': ['bam'], 'bcf': ['bcf'], 'bed': ['bed'], 'bedgraph': ['bedgraph', 'bg'], 'bigbed': ['bb', 'bigbed'], 'bigwig': ['bigwig', 'bw'], 'bplink': ['bplink'], 'bz2': ['bz2'], 'cdao': ['cdao'], 'clustal': ['clustal', 'aln', 'clw'], 'cov': ['cov'], 'cram': ['cram'], 'csv': ['csv'], 'dsrc': ['dsrc'], 'embl': ['embl'], 'ena': ['ena'], 'faa': ['faa', 'mpfa', 'aa'], 'fast5': ['fast5'], 'fasta': ['fasta', 'fa', 'fst'], 'fastq': ['fastq', 'fq'], 'genbank': ['genbank', 'gbk', 'gb', 'gbff'], 'gexf': ['gexf'], 'gfa': ['gfa'], 'gff2': ['gff'], 'gff3': ['gff3'], 'gml': ['gml'], 'graphml': ['graphml'], 'gtf': ['gtf'], 'gz': ['gz'], 'jaspar': ['jaspar'], 'json': ['json'], 'maf': ['maf'], 'mol2': ['mol2'], 'newick': ['newick', 'nw', 'nhx', 'nwk'], 'nexus': ['nexus', 'nx', 'nex', 'nxs'], 'ods': ['ods'], 'paf': ['paf'], 'pajek': ['net'], 'pdb': ['pdb'], 'phylip': ['phy', 'ph', 'phylip'], 'phyloxml': ['phyloxml', 'xml'], 'plink': ['plink'], 'pod5': ['pod5'], 'qual': ['qual'], 'sam': ['sam'], 'scf': ['scf'], 'sdf': ['sdf'], 'smiles': ['smi', 'smiles'], 'sra': ['sra'], 'stockholm': ['sto', 'sth', 'stk', 'stockholm'], 'transfac': ['transfac', 'tf'], 'tsv': ['tsv'], 'twobit': ['2bit'], 'vcf': ['vcf'], 'wig': ['wig'], 'wiggle': ['wig', 'wiggle'], 'xls': ['xls'], 'xlsx': ['xlsx'], 'xmfa': ['xmfa'], 'yaml': ['yaml', 'YAML']}
List of formats and their extensions included in Bioconvert
7.1.7. Graph
Network tools to manipulate the graph of conversion
- bioconvert.core.graph.create_graph(filename, layout='dot', use_singularity=False, color_for_disabled_converter='red', include_subgraph=False)[source]
- Parameters:
filename – should end in .png or .svg or .dot
If extension is .dot, only the dot file is created without annotations. This is useful if you have issues installing graphviz. If so, under Linux you could use our singularity container see github.com/cokelaer/graphviz4all
7.1.8. Registry
Main bioconvert registry that fetches automatically the relevant converter
- class bioconvert.core.registry.Registry[source]
class to centralise information about available conversions
from bioconvert.core.registry import Registry r = Registry() r.conversion_exists("BAM", "BED") r.info() # returns number of available methods for each converter conv_class = r[(".bam", ".bed")] converter = conv_class(input_file, output_file) converter.convert()
Methods
conversion_exists(input_fmt, output_fmt[, ...])conversion_path(input_fmt, output_fmt)Return a list of conversion steps to get from input and output formats
get_ext(ext_pair)Copy the registry into a dict that behaves like a list to be able to have multiple values for a single key and from a key have all converter able to do the conversion from the input extension to the output extension.
iter_converters([allow_indirect])set_ext(ext_pair, convertor)Register new convertor from input extension and output extension in a list.
get_info
info
- conversion_path(input_fmt, output_fmt)[source]
Return a list of conversion steps to get from input and output formats
Each step in the list is a pair of formats.
- get_all_conversions()[source]
- Returns:
a generator which allow to iterate on all available conversions and their availability; a conversion is encoded by a tuple of 2 strings (input format, output format)
- Retype:
generator (input format, output format, status)
- get_conversions()[source]
- Returns:
a generator which allow to iterate on all available conversions a conversion is encoded by a tuple of 2 strings (input format, output format)
- Retype:
generator
- get_conversions_from_ext()[source]
- Returns:
a generator which allow to iterate on all available conversions a conversion is encoded by a tuple of 2 strings (input extension, output extension)
- Return type:
generator
- get_converters_names()[source]
- Returns:
a generator that allows to get the name of the converter from the subclass (ConvBase object)
- Return type:
generator
- get_ext(ext_pair)[source]
Copy the registry into a dict that behaves like a list to be able to have multiple values for a single key and from a key have all converter able to do the conversion from the input extension to the output extension.
- Parameters:
ext_pair (tuple of 2 strings) – the input extension, the output extension
- Returns:
list of objects of subclass o
ConvBase
- iter_converters(allow_indirect: bool = False)[source]
- Parameters:
allow_indirect (bool) – also return indirect conversion
- Returns:
a generator to iterate over (in_fmt, out_fmt, converter class when direct, path when indirect)
- Return type:
a generator
- set_ext(ext_pair, convertor)[source]
Register new convertor from input extension and output extension in a list. We can have a list of multiple convertors for one ext_pair.
- Parameters:
ext_pair (tuple) – tuple containing the input extensions and the output extensions e.g. ( (“fastq”,) , (“fasta”) )
convertor (list of
ConvBaseobject) – the convertor which handle the conversion from input_ext -> output_ext
7.1.9. Utils
misc utility functions
- class bioconvert.core.utils.TempFile(suffix='', dir=None)[source]
A small wrapper around tempfile.NamedTemporaryFile function
f = TempFile(suffix="csv") f.name f.delete() # alias to delete=False and close() calls
Copy from easydev package
- Attributes:
- name
Methods
delete
- class bioconvert.core.utils.Timer(times)[source]
Timer working with with statement
Copy from easydev package.
- bioconvert.core.utils.generate_outfile_name(infile, out_extension)[source]
simple utility to replace the file extension with the given one.
- bioconvert.core.utils.get_extension(filename, remove_compression=False)[source]
Return extension of a filename
>>> get_extension("test.fastq") fastq >>> get_extension("test.fastq.gz") fastq