2. Developer guide¶
2.1. How to add a new converter ?¶
For now, converters are simple conversion from one format to another one. There is no third-party file. For instance, if you need a reference file, this is not part of the API for the moment.
Now, let us take a simple example such as a fastq to fasta conversion.
First, you need to create a module (Python file). We use the convention:
all in small caps ! In this file, copy and paste this example:
"""Convert :term:`FastQ` format to :term:`Fasta` formats""" from bioconvert import ConvBase __all__ = ["Fastq2Fasta"] class Fastq2Fasta(ConvBase): """ """ _default_method = "v1" def __init__(self, infile, outfile): """ :param str infile: information :param str outfile: information """ super().__init__(infile, outfile) def _method_v1(self, *args, **kwargs): Conversion is made here. You can use self.infile and self.outfile If you use an external command, you can use: self.execute(cmd) def _method_v2(self, *args, **kwargs): another method
You may also use this standalone to create the bioconvert_init standalone. For instance to create the sam to bam conversion, redirect the output of the following command in the correct file:
$ bioconvert_init -i sam -o bam > sam2bam.py
Of course, you will need to edit the file to add the conversion itself in the appropriate method (e.g. _method_samtools).
2.1.1. How to add a new method¶
As shown above, use this coding:
def _method_YOUuniqueMETHODname(self, *args, **kwargs): # from kwargs, you can use any kind of arguments. # threads is an example, reference, another example. Your code here
Then, it will be available in the class and bioconvert standalone !
The code that you will add may be of different kind:
- pure Python: just write it.
- Python code but relying on third-party library, two options:
- if the Python library is on pypi and is simple, add it to requirements.txt
- if the Python library requires lots of compilation, add it to requirements_tools.txt (assuming it is on bioconda).
- if the code is not on pypi or bioconda (e.g., GO code), use the self.install_tool(NAME) and add a script in ./misc/install_NAME.sh
2.1.2. Method decorators¶
been defined in
bioconvert/core/decorators.py that can be used to “flag” or
“modify” conversion methods (actually, a new method is usually returned):
@in_gzcan be used to indicate that the method is able to transparenly handle input files that are compressed in
.gzformat. This is done by adding an
in_gzattribute (set to
True) to the method.
@compressorwill wrap the method in code that handles input decompression from
.gzformat and output compression to
.dsrc. This automatically applies
@compressor def _method_noncompressor(self, *args, **kwargs): """This method does not handle compressed input or output.""" pass # This results in a method that handles compressed input and output # The method has an in_gz attribute (which is set to True)
@out_compressorwill wrap the method in code that handles output compression to
.dsrc. It is intended to be used on methods that already handle compressed input transparently, and therefore do not need the input decompression provided by
@compressor. Typically, one would also apply
@in_gzto such methods. In that case,
@in_gzshould be applied “on top” of
@out_compressor. The reason is that decorators closest to the function are applied first, and applying another decorator on top of
@in_gzwould typically not preserve the
@in_gz @out_compressor def _method_incompressor(self, *args, **kwargs): """This method already handles compressed .gz input.""" pass # This results in a method that handles compressed input and output # This method is further modified to have an in_gz attribute # (which is set to True)
(For more general explanations about decorators, see https://stackoverflow.com/a/1594484/1878788.)
2.2. How to add a test and test file¶
Go to ./test and add a file named
import pytest from bioconvert.fastq2fasta import Fastq2Fasta from bioconvert import bioconvert_data from easydev import TempFile, md5 @pytest.mark.parametrize("method", Fastq2Fasta.available_methods) def test_fastq2fasta(method): #your code here # you will need data for instance "mydata.fastq and mydata.fasta". # Put it in bioconvert/bioconvert/data # you can then use :: infile = bioconvert_data("mydata.fastq") expected_outfile = bioconvert_data("mydata.fasta") with TempFile(suffix=".fasta") as tempfile: converter = Fastq2Fasta(infile, tempfile.name) converter(method=method) # Check that the output is correct with a checksum assert md5(tempfile.name) == md5(expected_outfile)
Files used for testing should be added in ./bioconvert/data/testing/converter_name. For instance test files for the sam2paf converter should be added in bioconvert/data/testing/sam2paf directory where you should have the test files, a __init__.py file, a README.rst file. The latter should contain the name of the test files and a short description.
2.3. How to locally run the tests¶
Go to root directory. If not already done, install all packages listed in
You can do so by running:
pip3 install -r requirements_dev.txt
Then, run the tests using:
pytest test/ -v
Or, to run a specific test file, for example for your new convertor fastq2fasta:
pytest test/test_fastq2fasta.py -v
2.4. How to benchmark your new method vs others¶
from bioconvert import Benchmark from bioconvert.fastq2fasta import Fastq2Fasta converter = Fastq2Fasta(infile, outfile) b = Benchmark(converter) b.plot()
you can also use the bioconvert standalone with -b option.
2.5. How to add you new converter to the main documentation ?¶
Edit the doc/references.rst and add those lines
.. automodule:: bioconverter.fastq2fasta :members: :synopsis:
2.6. pep8 and conventions¶
In order to write your Python code, use PEP8 convention as much as possible. Follow the conventions used in the code. For instance,
class A(): """Some documentation""" def __init__(self): """some doc""" pass def another_method(self): """some doc""" c = 1 + 2 class B(): """Another class""" def __init__(self, *args, **kwargs): """some doc""" pass def AFunction(x): """some doc""" return x
- 2 blank lines between classes and functions
- 1 blank lines between methods
- spaces around operators (e.g. =, +)
- Try to have 80 characters max on one line
- Add documentation in triple quotes
To check PEP8 compliance of a python source code file, you can run
flake8 on it.
$ flake8 bioconvert/fastq2fasta.py
2.7. Requirements files¶
- requirements.txt : should contain the packages to be retrieved from Pypi only. Those are downloaded and installed (if missing) when using python setup.py install
- environment_rtd.yml : do not touch. Simple file for readthedocs
- readthedocs.yml : all conda and pip dependencies to run the example and build the doc
- requirements_dev.txt : packages required for testing or building the doc (not required to run the bioconvert package
- requirements_tools.txt : all conda dependencies