9. Glossary¶
Note that formats mentionned here below have dedicated description in the Formats section.
- ABI¶
File format produced by ABI sequencing machines. Contains the trace data which includes probabilities of the four nucleotides. See the ABI format page for details.
- ASQG¶
The ASQG format describes an assembly graph. Each line is a tab-delimited record. The first field in each record describes the record type. See the ASQG page for details.
- BAI¶
The index file related to file generated in the BAM format. (This is a non-standard file type.) See the BAI page for details.
- BAM¶
Binary version of the Sequence Alignment Map (SAM) format. See the BAM format page for details.
- BCF¶
Binary version of the Variant Call Format (VCF). See BCF page for details.
- BCL¶
BCL is the raw format used by Illumina sequencers. See the BCL format page for details.
- BED¶
BEDGRAPH/BED format is line-oriented and allows display of continuous-valued data. Similar to WIG format. See the BED format page for details.
- BED3¶
Variants of the BED format with 4 columns storing the track name, start and end positions and values. See the BED4 format page for details.
- BED4¶
Variants of the BED format with 4 columns storing the track name, start and end positions and values. See the BED4 format page for details.
- BEDGRAPH¶
BEDGRAPH/BED format is line-oriented and allows display of continuous-valued data. Similar to WIG format. See the BED format page for details.
- BIGBED¶
An indexed binary version of a BED file See BIGBED page for details.
- BIGWIG¶
Indexed binary version of the Wiggle format. See BIGWIG page for details.
- BPLINK¶
Binary version of the PlINK forat used for analyzing genotypic data for Genome-wide Association Studies (GWAS). See PLINK binary files (BED/BIM/FAM) page for details.
- BZ2¶
bzip2 is a file compression program that uses the Burrows–Wheeler algorithm. Extension is usually .bz2 See BZ2 page for details.
- CLUSTAL¶
The alignment format of Clustal X and Clustal W. See CLUSTAL page for details.
- COV¶
A bioconvert format to store coverage in the form of a 3 column tab-tabulated file. See COV page for details.
- CRAM¶
A more compact version of BAM files used to store Sequence Alignment Map (SAM) format. See CRAM page for details.
- CSV¶
A comma-separated values format is a delimited text file that uses a comma to separate values. See CSV format page for details.
- DSRC¶
A compression tool dedicated to FastQ files See DSRC page for details.
- EMBL¶
EMBL Flat File Format. See EMBL page for details.
- FAA¶
FASTA-formatted sequence files containing amino acid sequences See FAA page for details.
- FASTA¶
FASTA-formatted sequence files contain either nucleic acid sequence (such as DNA) or protein sequence information. FASTA files can also store multiple sequences in a single file. See FASTA page for details.
- FASTQ¶
FASTQ-formatted sequence files are used to represent high-throughput sequencing data, where each read is described by a name, its sequence, and its qualities. See FastQ page for details.
- GENBANK¶
GenBank Flat File Format. See GENBANK page for details.
- GFA¶
Graphical Fragment Assembly format. https://github.com/GFA-spec/GFA-spec
- GFF2¶
General Feature Format, used for describing genes and other features associated with DNA, RNA and Protein sequences. See GTF page for details.
- GFF3¶
General Feature Format, used for describing genes and other features associated with DNA, RNA and Protein sequences. http://genome.ucsc.edu/FAQ/FAQformat#format3 See GTF page for details.
- GZ¶
gzip is a file compression program based on the DEFLATE algorithm. See GZ page for details.
- JSON¶
A human-readable data serialization language commonly used in configuration files. See JSON page for details.
- MAF¶
A human-readable multiple alignment format. See MAF (Multiple Alignement Format) page for details.
- NEWICK¶
Plain text minimal format used to store phylogenetic tree. See NEWICK page for details.
- NEXUS¶
Plain text minimal format used to store multiple alignment and phylogenetic trees. See NEXUS page for details.
- PAF¶
PAF is a text format describing the approximate mapping positions between two set of sequences.
- PHYLIP¶
Plain text format to store a multiple sequence alignment. See PHYLIP page for details.
- PHYLOXML¶
XML format to store a multiple sequence alignment. See PHYLOXML page for details.
- PLINK¶
Format used for analyzing genotypic data for Genome-wide Association Studies (GWAS). See PLINK flat files (MAP/PED) page for details.
- QUAL¶
Sequence of qualities associated with a sequence of nucleotides. Associated with FastA file, the original FastQ file can be built back. See QUAL page for details.
- SAM¶
Sequence Alignment Map is a generic nucleotide alignment format that describes the alignment of query sequences or sequencing reads to a reference sequence or assembly. See SAM page for details.
- SCF¶
Standard Chromatogram Format, a binary chromatogram format described in Staden package documentation SCF file format.
- SRA¶
The Sequence Read Archive (SRA) is a website that stores sequencing data at https://www.ncbi.nlm.nih.gov/sra It is not a format per se. See SRA page for details.
- STOCKHOLM¶
Stockholm format is a multiple sequence alignment format used to store multiple sequence alignment. See STOCKHOLM page for details.
- TSV¶
A tab-separated values format is a delimited text file that uses a tab character to separate values. See TSV format page for details.
- TWOBIT¶
2bit file stores multiple DNA sequences (up to 4 Gb total) in a compact randomly-accessible format. The file contains masking information as well as the DNA itself. See TWOBIT format page for details.
- VCF¶
Variant Call Format (VCF) is a flexible and extendable format for storing variation in sequences such as single nucleotide variants, insertions/deletions, copy number variants and structural variants. See VCF page for details.
- WIG¶
Synonym for the wiggle (WIG) format. See WIG.
- WIGGLE¶
The wiggle (WIG) format stores dense, continuous data such as GC percent, probability scores, and transcriptome data. See WIG page for details.
- XLS¶
Spreadsheet file format (Microsoft Excel file format). See XLS page for details.
- XLSX¶
Spreadsheet file format defined in the Office Open XML specification. See XLSX page for details.
- XMFA¶
TODO
- YAML¶
A human-readable data serialization language commonly used in configuration files. See https://en.wikipedia.org/wiki/YAML See YAML page for details.