Bioinformatics

Molecular biology in the internet

Main page

Appointments

Bioinformatics

Literature

Exercises

Tasks

Databases

Software

Sequence comparisons

Homology searches

Motif searches

Hidden Markov models

Hydrophobicity analyses

Topology and helix packing

Protein localization

Secondary structure

Super-secondary structure

3D structure

 

    Sequence comparisons in the internet:

    Dot matrices

  • Examples of DotPlot Visualization Technique
  • DotPlot @ Univ. Düsseldorf, Germany
  • DotPlot: Dot matrix comparison of two sequences (not for Macintosh browser!)

    Classic algorithms

  • Needleman & Wunsch at EBI: Global alignment
  • Smith & Waterman at EBI: Local alignment

    Pairwise sequence comparisons

  • BCM: Pairwise sequence comparisons: SIM, (ALIGN/LALIGN), BLAST2, LAP2, PGWISE, PCWISE
  • ExPASy: Sequence comparisons: SIM, LALIGN (see below), Dotlet
  • Sequence analysis Tools at EBI
    • EMBOSS Pairwise Alignment Algorithms (global and local)
    • Wise2 (basic) compares a protein sequence to a genomic DNA sequence, allowing for introns and frameshifting errors
    • Wise2 (advanced)
    • Wise2 Dna Block Aligner aligns two sequences under the assumption that the sequences share a number of colinear blocks of conservation separated by potentially large and varied lengths of DNA in the two sequences
    • Wise2 PromoterWise compares two DNA sequences allowing for inversions and translocations, ideal for promoters

  • ALIGN at Genestream
  • BLAST 2 at NIH: Comparison of two sequence
  • BLAST 2 at Genestream
  • GLASS at MIT: GLobal Alignment SyStem
  • LALIGN at EMBnet: Finds multiple matching subsegments in two sequences (SIM-based code)
  • LALIGN at Genestream
  • LFASTA at PBIL: Local Alignment Tool for Nucleic Sequences
  • Palign (Stockholm): Protein alignment
  • SIM at SIB/ExPASy: Alignment of two protein sequences with SIM, results can be viewed with LALNVIEW
  • SIM4 at PBIL: Program to align cDNA and genomic DNA
  • ToPlign (Toolbox for Protein ALignment) at Fraunhofer: Pairwise and multiple sequence comparison
    • ToPlign: Login at BioSolveIT (OUTDATED)

  • PRSS3 at EMBnet: evaluates the significance of a protein sequence alignment

    Multiple sequence comparisons

  • BCM: Multiple sequence comparisons: ClustalW, CAP, MAP, PIMA, MSA, BLOCK MAKER, MEME, Match-Box
  • ExPASy: Sequence comparisons: ClustalW, KALIGN, MAFFT, Muscle, T-Coffee, MSA, DIALGN, Match-Box, Multalin, MUSCA (see below)

  • ClustalW at EBI
  • ClustalW at PBIL
  • ClustalW at MyHits/SIB
  • ClustalW at EMBnet

  • COBALT at NIH: Multiple alignment incorporating pairwise constraints
  • Coffee's at CNRS Marseille
  • Coffee's at EBI
  • Coffee's at SIB
  • T-Coffee at BioAssist Wageningen
  • DIALIGN at Bielefeld University: Multiple sequence alignment based on segment-to-segment comparison
  • Kalign at EBI: A fast and accurate multiple sequence alignment algorithm
  • Kalign at Karolinska
  • MAFFT at EBI: Multiple Alignment using Fast Fourier Transform
  • MAFFT at MyHits/SIB
  • MAFFT at Kyushu University
  • Match-Box at University of Namur (only proteins!)
  • MSA at Genestream: Multiple Sequence Alignment
  • Multalin at INRA
  • Multalin at PBIL (only proteins!)
  • MUSCA at IBM: Multiple sequence alignment using pattern discovery (only proteins!)
  • MUSCLE at Drive5: MUltiple Sequence Comparison by Log-Expectation
  • MUSCLE at EBI
  • MUSCLE at BioAssist Wageningen
  • PRALINE: PRofile ALIgNEment, at Vrije Universiteit Amsterdam
  • PROBCONS: Probabilistic Consistency-based Multiple Alignment of Amino Acid Sequences, at Stanford University

  • SAGA: Sequence Alignment by Genetic Algorithm (software for download)
  • SATCHMO at Drive5: Simultaneous Alignment and Tree Construction using Hidden Markov mOdels (software for download)
  • SSMAL at DKFZ: Shuffled Similarities with Multiple ALignments (software for download) (OUTDATED)

    Profile comparisons

  • COACH at Drive5: COmparison of Alignments by Constructing HMMs (software for download)
  • COMPASS at University of Texas: COmparison of Multiple Protein Alignments with Assessment of Statistical Significance
  • FFAS03: Fold and Function Assignment System
    Genome comparisons:

  • TaxPlot
  • GRAPe at University of Oxford: Probabilistic whole-genome re-alignment (for download)
    Similarity matrices:

  • PAM250: Percent Accepted Mutation-Matrix (Dayhoff et al., 1978)
  • BLOSUM62: Blocks Substitution-Matrix (Henikoff and Henikoff, 1992)
  • More similarity matrices
    • HELP to proper use of similarity matrices
    Software for representation of a pairwise sequence alignment:

  • LalnView at PBIL
  • LalnView at SIB/ExPASy
    Software to generate sequence logos:

  • plogo: Protein sequence logos at CBS/Denmark
  • slogo: RNA structure logos at CBS/Denmark
  • GENIO/logo: Sequence logos
  • WebLogo: Sequence logos at Berkeley
    Short online course about sequence comparisons:

  • Online course from "Biochemstry" (Jeremy M. Berg, John L. Tymoczko, Lubert Stryer; ISBN: 0-7167-3051-0)
    Example of a pairwise sequence alignment:

  • beta globin:
    mvhltpeeks avtalwgkvn vdevggealg rllvvypwtq rffesfgdls tpdavmgnpk
    vkahgkkvlg afsdglahld nlkgtfatls elhcdklhvd penfrllgnv lvcvlahhfg
    keftppvqaa yqkvvagvan alahkyh
    

  • Myoglobin:
    mglsdgewql vlnvwgkvea dipghgqevl irlfkghpet lekfdkfkhl ksedemkase
    dlkkhgatvl talggilkkk ghheaeikpl aqshatkhki pvkylefise ciiqvlqskh
    pgdfgadaqg amnkalelfr kdmasnykel gfqg
    

  • SIM homology search:

    Gap open penalty: 24
    Gap extension penalty:  4
    Comparison Matrix: BLOSUM62 
    ------------------------------------------------------------------------
    23.8% identity in 122 residues overlap; Score: 91.0; Gap frequency: 0.0%
    
    beta-Globin   25 GGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKG
    Myoglobin     26 GQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASEDLKKHGATVLTALGGILKKKGHHEA
                     * * * **    * *   *  *  *   *        * **  ** *    *        
    
    beta-Globin   85 TFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAH
    Myoglobin     86 EIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKHPGDFGADAQGAMNKALELFRKDMAS
                         *   *  *                 **       *    * *  *         * 
    
    beta-Globin  145 KY
    Myoglobin    146 NY
                      *
    ------------------------------------------------------------------------
    40.9% identity in 22 residues overlap; Score: 38.0; Gap frequency: 0.0%
    
    beta-Globin    4 LTPEEKSAVTALWGKVNVDEVG
    Myoglobin      3 LSDGEWQLVLNVWGKVEADIPG
                     *   *   *   ****  *  *
    ------------------------------------------------------------------------
    

    Gap open penalty: 12
    Gap extension penalty:  4
    Comparison Matrix: BLOSUM62 
    ------------------------------------------------------------------------
    25.5% identity in 145 residues overlap; Score: 103.0; Gap frequency: 1.4%
    
    beta-Globin    4 LTPEEKSAVTALWGKVNVDEVGG--EALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKV
    Myoglobin      3 LSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASEDL
                     *   *   *   ****  *  *   * * **    * *   *  *  *   *        
    
    beta-Globin   62 KAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGK
    Myoglobin     63 KKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKHPG
                     * **  ** *    *            *   *  *                 **      
    
    beta-Globin  122 EFTPPVQAAYQKVVAGVANALAHKY
    Myoglobin    123 DFGADAQGAMNKALELFRKDMASNY
                      *    * *  *         *  *
    ------------------------------------------------------------------------
    
    Sequence examples:

  • Database of amino acid sequences via Entrez
  • Database of nucleotide sequences via Entrez

  • Bacteriorhodopsin from Halobacterium salinarium: Seven-helix bundle protein
  • TonB from Escherichia coli: Protein with N-terminal transmembrane helix
  • Maltose-binding protein from Escherichia coli: Protein with N-terminal signal sequence for secretion into the periplasmic space
  • OmpA from Escherichia coli: Two-domain protein: the N-terminal protein domain is embedded into the outer membrane in form of an 8-stranded β barrel while the C-terminal protein domain is found in the periplasmic space
    Abbreviations:

  • AMAS: Analyse Multiply Aligned Sequences
  • BCM: Baylor College of Medicine
  • BEAUTY: BLAST Enhanced Alignment Utility
  • BLAST: Basic Local Alignment Search Tool
  • BLOSUM: Blocks Substitution-Matrix
  • CINEMA: Color INteractive Editor for Multiple Alignments
  • COACH: COmparison of Alignments by Constructing HMMs
  • COFFEE: Consistency based Objective Function For alignmEnt Evaluation
  • COMPASS: COmparison of Multiple Protein Alignments with Assessment of Statistical Significance
  • ExPASy: Expert Protein Analysis System
  • FFAS03: Fold and Function Assignment System
  • MAFFT: Multiple Alignment using Fast Fourier Transform
  • MSA: Multiple Sequence Alignment
  • NIH: National Institute of Health
  • PAM: Percent Accepted Mutation
  • PIMA: Pattern-Induced Multiple-sequence Alignment program
  • PRALINE: PRofile ALIgNEment
  • SAGA: Sequence Alignment by Genetic Algorithm
  • SSMAL: Shuffled Similarities with Multiple ALignments

 

Latest update: October 14, 2009


Ralf Koebnik
Institut de recherche pour le dèveloppement
UMR 5096, CNRS-UP-IRD
911, Avenue Agropolis, BP 64501
34394 Montpellier, Cedex 5
FRANCE
Phone: +33 (0)4 67 41 62 28
Fax: +33 (0)4 67 41 61 81
Email: koebnik(at)gmx.de
Please replace (at) by @.


Home Back to main page