::Welcome to UKCS & Associates::

Home Our Society Pattern / Motif / Protein Family

Pattern / Motif / Protein Family Finding Tools

  • InterPro Scan
InterPro is a database of protein families, domains and functional sites that have been identified in known proteins. For unknown protein sequences, if such pattern/motif/family can be observed, we can assume that those unknown proteins may take the corresponding functions. InterProScan is such a program that scans the query sequence for patterns stored in InterPro database. As member databases of InterPro, PROSITE, Pfam, PRINTS, ProDom, Smart, TIGRFAMS, and PIR SuperFamily are searched by InterProScan at the same time.

HMMER can do sensitive database searching. With a query profile (statistical descriptions of a sequence family's consensus), HMMER searches an arbitrary sequence database and retrieves sequences that show significant similarity to the query consensus sequence based on profile hidden Markov models Notice that the query consensus sequence is a profile that can be generated by BLAST search, ClustalW, or some other tools, but not just some simple sequence.

  • Protein Clustering (GeneRAGE TRIBE-MCL)
In many sequencing projects, researchers want to discover different protein families. The major problems related to protein clustering include multi-domain proteins, peptide fragments, and proteins possessing promiscuous domains. It's a very active research field. Bioinformatics scientists have been trying various algorithms to reduce the false positive rate. GeneRAGE uses Smith-Waterman dynamic programming alignment algorithm. TribeMCL uses Markov Clustering (MCL) method. Users need to balance the performance and sensitivity.

  • Protein Structure Classification (SCOP & CATH)
Protein can also be classified based on their 2D and 3D structure that may reveal important structure-function relationships. Actually, it's been argued that nearly all proteins have structural similarities with other proteins and, in some of these cases, share a common evolutionary origin.
SCOP (Structural Classification of Proteins) database, created by manual inspection and abetted by a battery of automated methods, aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known. As such, it provides a broad survey of all known protein folds, detailed information about the close relatives of any particular protein, and a framework for future research and classification. All PDB entries have been analyzed to build SCOP database.
CATH is a novel hierarchical classification of protein domain structures, which clusters proteins at four major levels, Class(C), Architecture(A), Topology(T) and Homologous superfamily (H). Class is derived from secondary structure content. Architecture describes the gross orientation of secondary structures, independent of connectivities. The topology level clusters structures according to their toplogical connections and numbers of secondary structures. The homologous superfamilies cluster proteins with highly similar structures and functions. The assignments of structures to toplogy families and homologous superfamilies are made by sequence and structure comparisons.

MACAW is a program for locating, analyzing, and editing blocks of localized sequence similarity among multiple seqences and linking them into a composite multiple alignment. A Gibbs Sampling strategy is taken for multiple alignment. As such, it only works for detecting pattern/motif that occurs only once in your interested region.