.. _glossary:

Glossary
========


Note that formats mentionned here below have dedicated description in the
:ref:`formats` section. 

.. glossary::
    :sorted:

    ABI

        File format produced by ABI sequencing machines. Contains the trace data
        which includes probabilities of the four nucleotides. See
        the :ref:`format_abi` format page for details.

    ASQG

       The ASQG format describes an assembly graph. Each line is a tab-delimited
       record. The first field in each record describes the record type. See the
       :ref:`format_asqg` page for details.

    BAI

        The index file related to file generated in the BAM format. (This is a
        non-standard file type.) See the :ref:`format_bai` page for details.

    BAM

        Binary version of the Sequence Alignment Map (SAM) format. See the
        :ref:`format_bam` format page for details. 

    BCF

        Binary version of the Variant Call Format (VCF).
        See :ref:`format_bcf` page for details.

    BCL

        BCL is the raw format used by Illumina sequencers. See the :ref:`format_bcl` format 
        page for details.

    BED

        BEDGRAPH/BED format is line-oriented and allows display of continuous-valued
        data. Similar to WIG format.
        See the :ref:`format_bed` format page for details.


    BED3

        Variants of the BED format with 4 columns storing the track name,
        start and end positions and values.
        See the :ref:`format_bed4` format page for details.

    BED4

        Variants of the BED format with 4 columns storing the track name,
        start and end positions and values.
        See the :ref:`format_bed4` format page for details.

    BEDGRAPH

        BEDGRAPH/BED format is line-oriented and allows display of continuous-valued
        data. Similar to WIG format.
        See the :ref:`format_bed` format page for details.

    BIGBED

        An indexed binary version of a BED file
        See :ref:`format_bigbed` page for details.

    BPLINK

        Binary version of the PlINK forat used for analyzing genotypic data 
        for Genome-wide Association Studies (GWAS). 
        See :ref:`format_plink_binary` page for details.

    BZ2

        **bzip2** is a file compression program that uses the Burrows–Wheeler algorithm. 
        Extension is usually .bz2 See :ref:`format_bz2` page for details.

    BIGWIG

        Indexed binary version of the Wiggle format.
        See :ref:`format_bigwig` page for details.

    CLUSTAL

        The alignment format of Clustal X and Clustal W. See
        :ref:`format_clustal` page for details.

    COV

        A bioconvert format to store coverage in the form of a 3 column 
        tab-tabulated file. See :ref:`format_cov` page for details.

    CRAM

        A more compact version of BAM files used to store Sequence Alignment 
        Map (SAM) format. See :ref:`format_cram` page for details.

    CSV

        A comma-separated values format is a delimited text file that uses a
        comma to separate values. See :ref:`format_csv` format page for
        details.

    DSRC

        A compression tool dedicated to FastQ files
        See :ref:`format_dsrc` page for details.

    EMBL

        EMBL Flat File Format.
        See :ref:`format_embl` page for details.

    FAA

        FASTA-formatted sequence files containing amino acid sequences
        See :ref:`format_faa` page for details.
 
    FASTA

        FASTA-formatted sequence files contain either nucleic acid sequence
        (such as DNA) or protein sequence information. FASTA files can also store multiple
        sequences in a single file. See :ref:`format_fasta` page for details.

    FASTQ

        FASTQ-formatted sequence files are used to represent high-throughput
        sequencing data, where each read is described by a name, its sequence,
        and its qualities. See :ref:`format_fastq` page for details.

    GFA

        Graphical Fragment Assembly format. https://github.com/GFA-spec/GFA-spec

    GFF2

        General Feature Format, used for describing genes and other features
        associated with DNA, RNA and Protein sequences.
        See :ref:`format_gff` page for details.

    GFF3

        General Feature Format, used for describing genes and other features
        associated with DNA, RNA and Protein sequences.
        http://genome.ucsc.edu/FAQ/FAQformat#format3
        See :ref:`format_gff` page for details.

    GENBANK

        GenBank Flat File Format.
        See :ref:`format_genbank` page for details.

    GZ

        **gzip** is a file compression program based on the DEFLATE algorithm. 
        See :ref:`format_gz` page for details.

    JSON

        A human-readable data serialization language commonly used in
        configuration files. See :ref:`format_json` page for details.

    MAF

        A human-readable multiple alignment format. 
        See :ref:`format_maf` page for details.

    NEXUS

        Plain text minimal format used to store multiple alignment and 
        phylogenetic trees. See :ref:`format_nexus` page for details.

    NEWICK

        Plain text minimal format used to store phylogenetic tree.
        See :ref:`format_newick` page for details.

    PAF

        PAF is a text format describing the approximate mapping positions
        between two set of sequences.

    PHYLIP

        Plain text format to store a multiple sequence alignment.
        See :ref:`format_phylip` page for details.

    PHYLOXML

        XML format to store a multiple sequence alignment.
        See :ref:`format_phyloxml` page for details.

    PLINK

        Format used for analyzing genotypic data for Genome-wide Association
        Studies (GWAS). See :ref:`format_plink_flat` page for details.

    QUAL

        Sequence of qualities associated with a sequence of nucleotides.
        Associated with FastA file, the original FastQ file can be built back.
        See :ref:`format_qual` page for details.

    SAM

        Sequence Alignment Map is a generic nucleotide alignment format that
        describes the alignment of query sequences or sequencing reads to a reference
        sequence or assembly. See :ref:`format_sam` page for details.

    SCF

        Standard Chromatogram Format, a binary
        chromatogram format described in Staden package documentation SCF file format.

    SRA

        The Sequence Read Archive (SRA) is a website that stores
        sequencing data at https://www.ncbi.nlm.nih.gov/sra
        It is not a format per se. See :ref:`format_sra` page for details.

    STOCKHOLM

        Stockholm format is a multiple sequence alignment format used to store 
        multiple sequence alignment. See :ref:`format_stockholm` page for details.

    TSV

        A tab-separated values format is a delimited text file that uses a
        tab character to separate values. See :ref:`format_tsv` format page for
        details.

    TWOBIT

        **2bit** file stores multiple DNA sequences (up to 4 Gb total) in a
        compact randomly-accessible format. The file contains masking information 
        as well as the DNA itself. See :ref:`format_twobit` format page for
        details.

    VCF

        Variant Call Format (VCF) is a flexible and extendable format for 
        storing variation in sequences such as single nucleotide variants,
        insertions/deletions, copy number variants and structural variants. 
        See :ref:`format_vcf` page for details.

    WIG

        Synonym for the wiggle (WIG) format. See :ref:`format_wig`.

    WIGGLE

        The wiggle (WIG) format stores dense, continuous data such as GC percent, 
        probability scores, and transcriptome data. See :ref:`format_wig` page
        for details.

    XLS

        Spreadsheet file format (Microsoft Excel file format). 
        See :ref:`format_xls` page for details.

    XLSX

        Spreadsheet file format defined in the Office Open XML specification.
        See :ref:`format_xlsx` page for details.


    XMFA

        TODO

    YAML

        A human-readable data serialization language commonly used in
        configuration files. See https://en.wikipedia.org/wiki/YAML
        See :ref:`format_yaml` page for details.