This feature returns the 2 and 3-mer compositions of the protein sequence. This is done by first finding all possible 2 and 3-mers for any protein (\(20^2\) and \(20^3\) permutations for 2 and 3-mers respectively). With those permutations, vectors of length 400 and 8000 are created, each point corresponding to one 2 or 3-mer. Then, the protein sequence that corresponds to the HMM scores is extracted, and put into a bipartite graph with the protein sequence. Each possible path of length 1 or 2 is found, and the corresponding vertices on the graph are noted as 2 and 3-mers. For each 2 or 3-mer found from these paths, 1 is added to the position that responds to that 2/3-mer in the 2-mer and 3-mer vectors , which are the length 400 and 8000 vectors created previously. The vectors are then returned.
References
Mohammadi, A. M., Zahiri, J., Mohammadi, S., Khodarahmi, M., & Arab, S. S. (2022). PSSMCOOL: a comprehensive R package for generating evolutionary-based descriptors of protein sequences from PSSM profiles. Biology Methods and Protocols, 7(1).
Examples
h_400<- hmm_SCSH(system.file("extdata", "1DLHA2-7", package="protHMM"))[[1]]
h_8000<- hmm_SCSH(system.file("extdata", "1DLHA2-7", package="protHMM"))[[2]]