Recent developments in bioinformatics and computational biology have seen the emergence of advanced language models and neural networks designed to improve the annotation and prediction of protein functions. These models, such as NEAR, DeepGOMeta, and ProtHyena, utilize techniques like neural embeddings, residual convolutional neural networks, and evolutionary scale modeling to enhance the understanding and analysis of protein sequences. The introduction of the Receptance Weighted Key Value (RWKV) model by Peng et al. aims to address the trade-off between computational efficiency and model performance in sequence processing tasks by combining aspects of both Transformers and RNNs into a novel architecture.
ProtHyena, a fast and parameter-efficient foundation model that incorporates the Hyena operator. This architecture can unlock the potential to capture both the long-range and single amino acid resolution of real protein sequences over attention-based approaches.
ProtHyena: A fast and efficient foundation protein language model at single amino acid resolution https://t.co/1bDziQtbqh https://t.co/lplJqK7tJV
NEAR is implemented as a 1D Residual Convolutional Neural Network. A batch of sequences is initially embedded as a [batch x 256 Xseq length tensor using a context-unaware residue embedding layer. The tensor is then passed through 8 residual blocks.
NEAR initiates search by computing residue embeddings for a set of target proteins. These embeddings are used to generate a search index with the FAISS library for efficient similarity search in high dimensions.
NEAR is implemented as a 1D Residual Convolutional Neural Network. A batch of sequences is initially embedded as a [batch x 256 Xseq length tensor using a context-unaware residue embedding layer. The tensor is then passed through 8 residual blocks.
NEAR's ResNet embedding model is trained using an N-pairs loss function guided by sequence alignments generated by the widely used HMMER3 tool.
NEAR's neural embedding model computes per-residue embeddings for target and query protein sequences, and identifies alignment candidates with a pipeline consisting of k-NN search, filtration, and neighbor aggregation.
NEAR: Neural Embeddings for Amino acid Relationships https://t.co/TpiLv4C5DA https://t.co/aA3jpf3nLG
DeepGOMeta can predict protein functions even in the absence of explicit sequence similarity or homology to known proteins. For measuring the semantic similarity between protein pairs, DeepGOMeta utilized Resnik's similarity method, combined with Best Match Average strategy.
DeepGOMeta incorporates ESM2 (Evolutionary Scale Modeling 2), a deep learning framework that extracts meaningful features from protein sequences by learning from evolutionary data.
NEAR: Neural Embeddings for Amino acid Relationships https://t.co/7FRgN1wUPj #biorxiv_bioinfo
📌 The Receptance Weighted Key Value (RWKV, the architecture behind Eagle-7B) introduced by Peng et al. aims to reconcile the trade-off between computational efficiency and model performance in sequence processing tasks. 📌 RWKV combines aspects of both Transformers and RNNs… https://t.co/XOYvi3wDhK https://t.co/KncPEkfLmO
📌 The Receptance Weighted Key Value (RWKV) model introduced by Peng et al. aims to reconcile the trade-off between computational efficiency and model performance in sequence processing tasks. 📌 RWKV combines aspects of both Transformers and RNNs into a novel architecture that… https://t.co/wPjCytpExe https://t.co/KncPEkfLmO
Use protein language models to annotate viral proteins with previously unknown functions. @zflam94 @microbegrrl @steve_biller https://t.co/UTWQl0785M
Large language models improve annotation of prokaryotic viral proteins | Nature Microbiology https://t.co/mUvD8Ce138
Large language models improve annotation of prokaryotic viral proteins https://t.co/clNzuSqJ0U