Skip to contents

Summary

The goal of protHMM is to help integrate profile hidden markov model (HMM) representations of proteins into the machine learning and bioinformatics workflow. protHMM ports a number of features from use in Position Specific Scoring Matrices (PSSMs) to HMMs, along with implementing features used with HMMs specifically, which to our knowledge has not been done before. The adoption of HMM representations of proteins derived from HHblits and HMMer also presents an opportunity for innovation; it has been shown that HMMs can benefit from better multiple sequence alignment than PSSMs and thus get better results than corresponding HMMs using similar feature extraction techniques (Lyons et al. 2015). protHMM implements 20 different feature extraction techniques to provide a comprehensive list of feature sets for use in bioinformatics tasks ranging from protein fold classification to protein-protein interaction.

Installation

You can install the development version of protHMM from GitHub with:

# install.packages("devtools")
devtools::install_github("semran9/protHMM")

protHMM can also be installed from CRAN:

install.packages("protHMM", repos = "http://cran.us.r-project.org")
#> 
#> The downloaded binary packages are in
#>  /var/folders/2z/7q42mmyn5f18vx697sj_30p40000gn/T//RtmpKmgX8K/downloaded_packages

Functions List

A more comprehensive list of functions and the calculations behind the functions can be found here.

hmm_ac()

hmm_bigrams()

hmm_cc()

chmm()

hmm_distance()

fp_hmm()

hmm_GA()

hmm_GSD()

IM_psehmm()

hmm_LBP()

hmm_LPC()

hmm_MA()

hmm_MB()

pse_hmm()

hmm_read()

hmm_SCSH()

hmm_SepDim()

hmm_Single_Average()

hmm_smooth()

hmm_svd()

hmm_trigrams()

Example

## this shows the functionality of hmm_distance, which calculates a similarity score between two proteins
## other functions are documented fully in the protHMM vignette
library(protHMM)
## these proteins are from the same fold and similar; h should be low
h <- hmm_distance(system.file("extdata", "1DLHA2-7", package="protHMM"), system.file("extdata", "1TEN-7", package="protHMM"))
## these proteins are from different folds and not similar; h_2 should be high
h_2<- hmm_distance(system.file("extdata", "1DLHA2-7", package="protHMM"), system.file("extdata", "1TAHA-23", package="protHMM"))
h < h_2
#> [1] TRUE

References

Lyons, J., Dehzangi, A., Heffernan, R., Yang, Y., Zhou, Y., Sharma, A., & Paliwal, K. K. (2015). Advancing the Accuracy of Protein Fold Recognition by Utilizing Profiles From Hidden Markov Models. IEEE Transactions on Nanobioscience, 14(7), 761–772.