7.1. Core functions¶
Main factory of Bioconvert |
|
Tools for benchmarking |
|
Standalone application dedicated to conversion |
|
Provides a general tool to perform pre/post compression |
|
Download singularity image |
|
List of formats and associated extensions |
|
Network tools to manipulate the graph of conversion |
|
Main bioconvert registry that fetches automatically the relevant converter |
|
|
Simplified version of shell.py module from snakemake package |
misc utility functions |
7.1.1. Base¶
Main factory of Bioconvert
- class ConvArg(names, help, **kwargs)[source]¶
This class can be used to add specific extra arguments to any converter
For instance, imagine a conversion named A2B that requires the user to provide a reference. Then, you may want to provide the --reference extra argument. This is possible by adding a class method named get_additional_arguments that will yield instance of this class for each extra argument.
@classmethod def get_additional_arguments(cls): yield ConvArg( names="--reference", default=None, help="the referenc" )
Then, when calling bioconvert as follows,:
bioconvert A2B --help
the new argument will be shown in the list of arguments.
- class ConvBase(infile, outfile)[source]¶
Base class for all converters.
To build a new converter, create a new class which inherits from
ConvBase
and implement method that performs the conversion. The name of the converter method must start with_method_
.For instance:
class FASTQ2FASTA(ConvBase): def _method_python(self, *args, **kwargs): # include your code here. You can use the infile and outfile # attributes. self.infile self.outfile
constructor
- Parameters:
- boxplot_benchmark(rot_xticks=90, boxplot_args={}, mode='time')[source]¶
This function plots the benchmark computed in
compute_benchmark()
- compute_benchmark(N=5, to_exclude=[], to_include=[])[source]¶
Simple wrapper to call
Benchmark
This function computes the benchmark
see
Benchmark
for details.
- install_tool(executable)[source]¶
Install the given tool, using the script: bioconvert/install_script/install_executable.sh if the executable is not already present
- Parameters:
executable -- executable to install
- Returns:
nothing
- property name¶
The name of the class
7.1.2. Benchmark¶
Tools for benchmarking
- class Benchmark(obj, N=5, to_exclude=None, to_include=None)[source]¶
Convenient class to benchmark several methods for a given converter
c = BAM2COV(infile, outfile) b = Benchmark(c, N=5) b.run_methods() b.plot()
Constructor
- Parameters:
Use one of to_exclude or to_include. If both are provided, only the to_include one is used.
- plot(rerun=False, ylabel=None, rot_xticks=0, boxplot_args={}, mode='time')[source]¶
Plots the benchmark results, running the benchmarks if needed or if rerun is True.
- Parameters:
rot_xlabel -- rotation of the xticks function
boxplot_args -- dictionary with any of the pylab.boxplot arguments
mode -- either time, CPU or memory
- Returns:
dataframe with all results
- plot_multi_benchmark_max(path_json, output_filename='multi_benchmark.png', min_ylim=0, mode=None)[source]¶
Plotting function for the Snakefile_benchmark to be found in the doc
The json file looks like:
{ "awk":{ "0":0.777020216, "1":0.9638044834, "2":1.7623617649, "3":0.8348755836 }, "seqtk":{ "0":1.0024843216, "1":0.6313509941, "2":1.4048073292, "3":1.0554351807 }, "Benchmark":{ "0":1, "1":1, "2":2, "3":2 } }
Number of benchmark is infered from field 'Benchmark'.
7.1.3. Converter¶
Standalone application dedicated to conversion
7.1.4. Decorators¶
Provides a general tool to perform pre/post compression
- compressor(func)[source]¶
Decompress/compress input file without pipes
Does not use pipe: we decompress and compress back the input file. The advantage is that it should work for any files (even very large).
This decorator should be used by method that uses pure python code
- make_in_gz_tester(converter)[source]¶
Generates a function testing whether a conversion method of converter has the in_gz tag.
- out_compressor(func)[source]¶
Compress output file without pipes
This decorator should be used by method that uses pure python code
- requires(external_binary=None, python_library=None, external_binaries=None, python_libraries=None)[source]¶
- Parameters:
external_binary -- a system binary required for the method
python_library -- a python library required for the method
external_binaries -- an array of system binaries required for the method
python_libraries -- an array of python libraries required for the method
- Returns:
7.1.5. Downloader¶
Download singularity image
7.1.6. Extensions¶
List of formats and associated extensions
- extensions = {'abi': ['abi', 'ab1'], 'agp': ['agp'], 'bam': ['bam'], 'bcf': ['bcf'], 'bed': ['bed'], 'bedgraph': ['bedgraph', 'bg'], 'bigbed': ['bb', 'bigbed'], 'bigwig': ['bigwig', 'bw'], 'bplink': ['bplink'], 'bz2': ['bz2'], 'cdao': ['cdao'], 'clustal': ['clustal', 'aln', 'clw'], 'cov': ['cov'], 'cram': ['cram'], 'csv': ['csv'], 'dsrc': ['dsrc'], 'embl': ['embl'], 'ena': ['ena'], 'faa': ['faa', 'mpfa', 'aa'], 'fast5': ['fast5'], 'fasta': ['fasta', 'fa', 'fst'], 'fastq': ['fastq', 'fq'], 'genbank': ['genbank', 'gbk', 'gb'], 'gfa': ['gfa'], 'gff2': ['gff'], 'gff3': ['gff3'], 'gtf': ['gtf'], 'gz': ['gz'], 'json': ['json'], 'maf': ['maf'], 'newick': ['newick', 'nw', 'nhx', 'nwk'], 'nexus': ['nexus', 'nx', 'nex', 'nxs'], 'ods': ['ods'], 'paf': ['paf'], 'pdb': ['pdb'], 'phylip': ['phy', 'ph', 'phylip'], 'phyloxml': ['phyloxml', 'xml'], 'plink': ['plink'], 'pod5': ['pod5'], 'qual': ['qual'], 'sam': ['sam'], 'scf': ['scf'], 'sra': ['sra'], 'stockholm': ['sto', 'sth', 'stk', 'stockholm'], 'tsv': ['tsv'], 'twobit': ['2bit'], 'vcf': ['vcf'], 'wig': ['wig'], 'wiggle': ['wig', 'wiggle'], 'xls': ['xls'], 'xlsx': ['xlsx'], 'xmfa': ['xmfa'], 'yaml': ['yaml', 'YAML']}¶
List of formats and their extensions included in Bioconvert
7.1.7. Graph¶
Network tools to manipulate the graph of conversion
- create_graph(filename, layout='dot', use_singularity=False, color_for_disabled_converter='red', include_subgraph=False)[source]¶
- Parameters:
filename -- should end in .png or .svg or .dot
If extension is .dot, only the dot file is created without annotations. This is useful if you have issues installing graphviz. If so, under Linux you could use our singularity container see github.com/cokelaer/graphviz4all
7.1.8. Registry¶
Main bioconvert registry that fetches automatically the relevant converter
- class Registry[source]¶
class to centralise information about available conversions
from bioconvert.core.registry import Registry r = Registry() r.conversion_exists("BAM", "BED") r.info() # returns number of available methods for each converter conv_class = r[(".bam", ".bed")] converter = conv_class(input_file, output_file) converter.convert()
- conversion_path(input_fmt, output_fmt)[source]¶
Return a list of conversion steps to get from input and output formats
Each step in the list is a pair of formats.
- get_all_conversions()[source]¶
- Returns:
a generator which allow to iterate on all available conversions and their availability; a conversion is encoded by a tuple of 2 strings (input format, output format)
- Retype:
generator (input format, output format, status)
- get_conversions()[source]¶
- Returns:
a generator which allow to iterate on all available conversions a conversion is encoded by a tuple of 2 strings (input format, output format)
- Retype:
generator
- get_conversions_from_ext()[source]¶
- Returns:
a generator which allow to iterate on all available conversions a conversion is encoded by a tuple of 2 strings (input extension, output extension)
- Return type:
generator
- get_converters_names()[source]¶
- Returns:
a generator that allows to get the name of the converter from the subclass (ConvBase object)
- Return type:
generator
- get_ext(ext_pair)[source]¶
Copy the registry into a dict that behaves like a list to be able to have multiple values for a single key and from a key have all converter able to do the conversion from the input extension to the output extension.
- Parameters:
ext_pair (tuple of 2 strings) -- the input extension, the output extension
- Returns:
list of objects of subclass o
ConvBase
- iter_converters(allow_indirect: bool = False)[source]¶
- Parameters:
allow_indirect (bool) -- also return indirect conversion
- Returns:
a generator to iterate over (in_fmt, out_fmt, converter class when direct, path when indirect)
- Return type:
a generator
- set_ext(ext_pair, convertor)[source]¶
Register new convertor from input extension and output extension in a list. We can have a list of multiple convertors for one ext_pair.
- Parameters:
ext_pair (tuple) -- tuple containing the input extensions and the output extensions e.g. ( ("fastq",) , ("fasta") )
convertor (list of
ConvBase
object) -- the convertor which handle the conversion from input_ext -> output_ext
7.1.9. Utils¶
misc utility functions
- class TempFile(suffix='', dir=None)[source]¶
A small wrapper around tempfile.NamedTemporaryFile function
f = TempFile(suffix="csv") f.name f.delete() # alias to delete=False and close() calls
Copy from easydev package
- generate_outfile_name(infile, out_extension)[source]¶
simple utility to replace the file extension with the given one.
- get_extension(filename, remove_compression=False)[source]¶
Return extension of a filename
>>> get_extension("test.fastq") fastq >>> get_extension("test.fastq.gz") fastq