-
RNA分子三级结构模建是分子生物物理学研究的基本问题之一, 对理解RNA的功能和设计新的结构有重要意义. RNA三级结构主要由主链和侧链上的7个扭转角确定, 准确预测这些扭转角是RNA分子三级结构模建的基础. 目前只有个别采用深度学习模型预测RNA分子扭转角的方法, 要用于模建RNA分子的三级结构其预测精度还有待进一步提高. 本文提出了一种预测RNA分子扭转角的深度学习方法1dRNA, 采用了考虑相邻核苷酸的卷积模型(DRCNN)和考虑全链核苷酸的超长短期记忆模型(DHLSTM)两种不同的深度学习模型. 结果显示, 与现有方法相比, 这两种模型都能提高RNA分子大部分扭转角的预测精度, DRCNN预测精度提高在5%到28%之间, DHLSTM预测精度提高在6%到15%之间. 结果还显示, α和 γ角是最难预测的, 环区扭转角比螺旋区的扭转角难预测, 模型对预测序列长度的变化不敏感, 模型预测角度与decoys的角度偏差可用于模型质量评估.Modeling of RNA tertiary structure is one of the basic problems in molecular biophysics, and it is very important in understanding the biological function of RNA and designing new structures. RNA tertiary structure is mainly determined by seven torsions of main-chain and side-chain backbone, the accurate prediction of these torsion angles is the basis of modeling RNA tertiary structure. At present, there are only a few methods of using deep learning to predict RNA torsion angles, and the prediction accuracy needs further improving if it is used to model RNA tertiary structure. In this study, we also develop a deep learning method, 1dRNA, to predict RNA backbone torsions and pseudotorsion angles, including two different deep learning models, the convolution model (DRCNN) that considers the features of adjacent nucleotides and the Hyper-long-short-term memory model (DHLSTM) that considers the features of all the nucleotides. We then empirically show that DRCNN and DHLSTM outperform existing state-of-the-art methods under the same datasets, the prediction accuracy of DRCNN model is improved by 5% to 28% for β, δ, ζ, χ, η, and θangle, and the prediction accuracy of DHLSTM model is improved by 6% to 15% for β, δ, ζ, χ, η, θangle. The DRCNN model predicts better results than the DHLSTM model and the existing models in the δ, ζ, χ, η, θangle, and the DHLSTM model predicts better results than the DRCNN model and the existing model in the βand εangles, and the existing models predicted better results than the DRCNN model and DHLSTM model in the αand γangles. The DRCNN model and the existing models predict a richer distribution of angles than the DHLSTM model. In terms of model stability, the DHLSTM model is much more stable than the DRCNN model and the existing models, with fewer outliers. The results also show that the αangle and γangle are the most difficult to predict, the angles of the ring region is more difficult to predict than the angles of the helix region, the model is also not sensitive to the change of the target sequence length, and the deviation of the model prediction angle from the decoys can also be used to evaluate the RNA tertiary structures quality.
-
Keywords:
- RNA structure/
- torsional angle prediction/
- deep learning
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] -
数据集 序列长度区间数目 二级结构 20—50 50—100 100—200 200—300 300—400 400—512 括号 假结 不配对 训练集 50 179 46 1 7 1 55.10% 5.63% 39.36% 验证集 20 10 0 0 0 0 52.19% 9.8% 38.01% 测试集I 11 41 10 0 0 0 57.58% 2.81% 39.61% 测试集II 8 16 6 0 0 0 58.42% 5.25% 36.33% 测试集III 40 13 1 0 0 0 65.02% 2.67% 32.31% 数据集 7个标准扭转角 伪角 α/(°) β/(°) γ/(°) δ/(°) ε/(°) ζ/(°) χ/(°) η/(°) θ/(°) DHLSTM 验证集 47.91 20.22 37.18 16.57 18.23 35.02 19.85 28.09 32.85 测试集I 48.20 20.66 37.13 13.08 18.82 30.27 17.33 25.74 29.22 测试集II 47.95 19.89 35.30 15.19 17.87 30.99 17.67 27.20 31.49 测试集III 45.45 22.30 40.80 13.51 21.43 30.69 16.96 23.87 29.84 DRCNN 验证集 44.67 19.96 35.31 13.86 22.20 31.62 19.49 24.77 30.22 测试集I 44.84 20.74 36.27 10.51 21.48 27.53 16.39 23.12 26.34 测试集II 43.41 19.55 35.45 12.19 22.71 28.13 17.16 24.28 28.12 测试集III 27.14 15.81 25.20 9.73 14.51 17.98 11.58 13.67 17.77 SPOT-
RNA-1D[21]验证集 45.18 20.58 33.88 17.99 20.72 37.50 23.01 33.55 37.02 测试集I 43.94 21.94 32.98 14.61 20.69 33.27 19.59 30.25 32.91 测试集II 39.50 18.92 29.47 16.01 17.46 28.91 18.20 28.14 30.25 测试集III 37.89 21.04 34.68 13.83 22.32 27.87 17.01 25.31 27.22 配对类型 七个标准扭转角 伪角 α/(°) β/(°) γ/(°) δ/(°) ε/(°) ζ/(°) χ/(°) η/(°) θ/(°) DHLSTM 括号 34.08 16.48 30.21 9.76 17.98 21.38 11.23 18.03 21.91 假结 34.20 14.98 27.06 6.80 14.25 20.29 10.98 27.41 18.02 环区 66.77 32.60 60.72 21.05 27.54 47.85 28.52 35.41 46.16 DRCNN 括号 19.43 11.40 18.54 6.65 11.84 12.0 8.30 10.90 12.94 假结 20.42 14.25 16.75 6.73 12.86 13.54 10.25 16.14 13.52 环区 40.84 23.26 37.44 15.59 19.07 29.07 18.44 19.25 27.08 -
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35]
计量
- 文章访问数:1773
- PDF下载量:121
- 被引次数:0