This feature initially creates a grouping matrix \(G\) by assigning each position a number \(1:3\) based on the value at each position of HMM matrix \(H\); \(1\) represents the low probability group, \(2\) the medium and \(3\) the high probability group. The number of total points in each group for each column is then calculated, and the sequence is then split based upon the the positions of the 1st, 25th, 50th, 75th and 100th percentile (last) points for each of the three groups, in each of the 20 columns of the grouping matrix. Thus for column \(j\), \(S(k, j, z) = \sum_{i = 1}^{(z)*.25*N} |G[i, j] = k|\), where \(k\) is the group number, \(z = 1:4\) and \(N\) corresponds to number of rows in matrix \(G\).
References
Jin, D., & Zhu, P. (2021). Protein Subcellular Localization Based on Evolutionary Information and Segmented Distribution. Mathematical Problems in Engineering, 2021, 1–14.
Examples
h<- hmm_GSD(system.file("extdata", "1DLHA2-7", package="protHMM"))