    Some basic PatScan rules to demonstrate the power:

  • p1=20...30 3...8 ~p1
    • The first pattern unit p1=20...30 says "find a string of length 20 - 30 bp and call it p1".
    • The second pattern unit matches any string of length 3 - 8 bp.
    • The third pattern unit ~p1 matches only the reverse complement of whatever was named p1.
  • p1=20...30 3...8 ~p1 [2,0,0]
    • Numbers in [brackets] specify the maximally allowed number of mismatches, insertions, deletions, respectively (i.e. two mismatches are allowed, but no insertions or deletions).
  • p1=4...4 p1[1,0,0] p1[1,0,0] p1[1,0,0] p1[1,0,0] p1[1,0,0]
    • This example searches for six tandem repeats of four nucleotides in length. Here one mismatch between the initial occurence and each of the five that follow are allowed.

    The PatScan language for expressing patterns makes it quite easy to search for complex structures such as repeats, hairpins and pseudoknots. PatScan includes the ability to specity weight matrices and to search for alternative constructs. Here is an example of a more finely tuned pattern (SRP RNA):

  • Pairing rules:
  • r1={au,ua,gc,cg,gu,ug} r2={au,ua,gc,cg,gu,ug,ag,ga}
  • Pattern:
  • p1=5...5 BNA 1...2 p2=3...3 YNAGK[1,1,0] p3=3...3 G (YYYY|NR) A r1~p3 AGCAG[1,0,0] r2~p2[0,0,0] 2...3 r1~p1[1,2,1]
    • Ambiguity codes are used to represent several possible nucleotides.
    • r1={au,ua,gc,cg,gu,ug} gives a non-standard notion of complementarity (allowing UG and GU pairs).
    • The pattern unit (YYYY|NR) provides two alternative subpatterns for matching. It is equivalent to saying "either YYYY or NR".


Latest update of content: October 4, 2004

