-
The AI revolution sparked by natural language and image processing has brought new ideas and research paradigms to the field of protein computing. One significant advancement is the development of pre-trained protein language models through self-supervised learning from massive protein sequences. These pre-trained models encode various information about protein sequences, evolution, structures, and even functions, which can be easily transferred to various downstream tasks and demonstrate robust generalization capabilities. Recently, researchers are further developing multimodal pre-trained models that integrate more diverse types of data. This review summarizes the recent studies in this direction from the following aspects. Firstly, it reviews protein pre-trained models that integrate protein structures into language models; this is of particular importance since protein structure is the primary determinant of its function. Secondly, the pre-trained models that integrate protein dynamic information are introduced. These models may benefit downstream tasks such as protein-protein interactions, soft docking of ligands, and interactions involving allosteric proteins and intrinsic disordered proteins. Thirdly, the pre-trained models that integrate knowledge such as gene ontology are described. Fourthly, we briefly introduce pretrained models in RNA fields. Lastly, we introduce the most recent developments in protein designs and discuss the relations of these models with respect to the aforementioned pre-trained models that integrate protein structure information.
-
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] Su J, Li Z, Han C, Zhou Y, Shan J, Zhou X, Ma D, OPMC T, Ovchinnikov S, Yuan F 2024 bioRxiv:2024.05.24.595648
[54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] Pearce R, Omenn G S, Zhang Y 2022 bioRxiv:2022.05.15.491755
[75] Baek M, McHugh R, Anishchenko I, Baker D, DiMaio F 2022 bioRxiv:2022.09.09.507333
[76] [77] [78] [79] Chen K, Zhou Y, Ding M, Wang Y, Ren Z, Yang Y 2023 bioRxiv:2023.01.31.526427
[80] [81] [82] Yang Y, Li G, Pang K, Cao W, Li X, Zhang Z 2023 bioRxiv:2023.09.08.556883
[83] [84] Wang X, Gu R, Chen Z, Li Y, Ji X, Ke G, Wen H 2023 bioRxiv:2023.07.11.548588
[85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] Song Y, Sohl-Dickstein J, Kingma D P, Kumar A, Ermon S, Poole B 2020 arXiv:2011.13456
[112] [113] Liu Y, Chen L, Liu H 2023 bioRxiv:2023.11.18.567666
[114] [115] [116] Kaplan J, McCandlish S, Henighan T, Brown T B, Chess B, Child R, Gray S, Radford A, Wu J, Amodei D 2020 arXiv:2001.08361
[117] [118] [119] Chen T, Kornblith S, Norouzi M, Hinton G 2020 arXiv:2002.05709
[120] Wang Z, Wang Z, Srinivasan B, Ioannidis V N, Rangwala H, Anubhai R 2023 arXiv:2310.03320
[121] [122] [123] [124]
Metrics
- Abstract views:123
- PDF Downloads:4
- Cited By:0