User Guide ========== .. contents:: Quick Start ----------- Most of the time, Bioconvert simply requires the input and output filenames. If there is no ambiguity, the extension are used to infer the type of conversion you wish to perform. For instance, to convert a FASTQ to a FASTA file, use this type of command:: bioconvert test.fastq test.fasta If the converter *fastq* to *fasta*² exists in **Bioconvert**, it will work out of the box. In order to get a list of all possible conversions, just type:: bioconvert --help To obtain more specific help about a converter that you found in the list:: bioconvert fastq2fasta --help .. note:: All converters are named as 2 Explicit conversion ------------------- Sometimes, Bioconvert won't be able to know what you want solely based on the input and ouput extensions. So, you may need to be explicit and use a subcommand. For instance to use the converter *fastq2fasta*, type:: bioconvert fastq2fasta input.fastq output.fasta The rationale behind the subcommand choice is manyfold. First, you may have dedicated help for a given conversion, which may be different from one conversion to the other:: bioconvert fastq2fasta --help Second, the extensions of your input and output may be non-standard or different from the choice made by the **bioconvert** developers. So, using the subcommand you can do:: bioconvert fastq2fasta input.fq output.fas where the extensions can actually be whatever you want. If you do not provide the output file, it will be created based on the input filename by replacing the extension automatically. So this command:: bioconvert fastq2fasta input.fq generates an output file called *input.fasta*. Note that it will be placed in the same directory as the input file, not locally. So:: bioconvert fastq2fasta ~/test/input.fq will create the *input.fasta* file in the ~/test directory. If an output file exists, it will not be overwritten. If you want to do so, use the --force argument:: bioconvert fastq2fasta input.fq output.fa --force Implicit conversion ------------------- If the extensions match the conversion name, you can perform implicit conversion:: bioconvert input.fastq output.fasta Internally, a format may be registered with several extensions. For instance the extensions possible for a FastA file are ``fasta`` and ``fa`` so you can also write:: bioconvert input.fastq output.fa Compression ----------- Input files may be compressed. For instance, most FASTQ are compressed in GZ format. Compression are handled in some converters. Basically, most of the humand-readable files handle compression. For instance, all those commands should work and can be used to compress output files, or handle input compressed files:: bioconvert test.fastq.gz test.fasta bioconvert test.fastq.gz test.fasta.gz bioconvert test.fastq.gz test.fasta.bz2 Note that you can also decompress and compress into another compression keeping without doing any conversion (note the fastq extension in both input and output files):: bioconvert test.fastq.gz test.fastq.dsrc Parallelization --------------- In Bioconvert, if the input contains a wildcard such as ``*`` or ``?`` characters, then, input filenames are treated separately and converted sequentially:: bioconvert fastq2fasta "*.fastq" Note, however, that the files are processed sequentially one by one. So, we may want to parallelise the computation. Iteration with unix commands ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can use a bash script under unix to run Bioconvert on a set of files. For instance the following script takes all files with the *.fastq* extension and convert them to fasta: .. literalinclude:: script.sh :language: shell Note, however, that this is still a sequential computation. Yet, you may now change it slightly to run the commands on a cluster. For instance, on a SLURM scheduler, you can use: .. literalinclude:: script2.sh :language: shell Snakemake option ~~~~~~~~~~~~~~~~ If you have lots of files to convert, a snakemake pipeline is available in the `Sequana `_ project and can be installed using **pip install sequana_bioconvert**. It also installs bioconvert with an ap ptainer image that contains all dependencies for you. Here is another way of running your jobs in parallel using a simple Snakefile (`snakemake `_) that can be run easily either locally or on a cluster. You can download the following file :download:`Snakefile` .. literalinclude:: Snakefile :language: python :lines: 12-28 and execute it locally as follows (assuming you have 4 CPUs):: snakemake -s Snakefile --cores 4 or on a cluster:: snakemake -s Snakefile --cluster "--mem=1000 -j 10"