Researchers have developed new models for predicting protein functions and properties. DeepGOMeta, a deep learning framework, can predict protein functions without explicit sequence similarity. NEAR, another model, uses neural embeddings to compute residue embeddings for target proteins and can efficiently search for similarities. ProtHyena, a fast and parameter-efficient foundation model, captures both long-range and single amino acid resolution of real protein sequences. ALBATROSS, a deep-learning model, predicts ensemble properties of intrinsically disordered proteins directly from sequences at a proteome-wide scale.
ALBATROSS, a deep-learning model for predicting ensemble dimensions of IDRs, including the radius of gyration, end-to-end distance, polymer-scaling exponent and ensemble asphericity, directly from sequences at a proteome-wide scale.
ALBATROSS: Direct prediction of intrinsically disordered protein conformational properties from sequences https://t.co/V9U9qyy7Z9 https://t.co/2aXmnEZSsO
Aggregating Residue-Level Protein Language Model Embeddings with Optimal Transport https://t.co/FRMKFjQ351 #biorxiv_bioinfo
Predicting protein functions using positive-unlabeled ranking with ontology-based priors https://t.co/IExcalEgmY #biorxiv_bioinfo
Approximate Nearest Neighbor Graph Provides Fast and Efficient Embedding with Applications in Large-scale Biological Data https://t.co/bhj3ReOacr #biorxiv_bioinfo
ALBATROSS: a deep learning approach to predict ensemble properties of intrinsically disordered proteins and regions, such as radius of gyration, end-to-end distance, polymer-scaling exponent and ensemble asphericity, directly from sequence. @jefflotthammer https://t.co/yLBQzOWKnQ https://t.co/VJTJ309TCR
ProtHyena, a fast and parameter-efficient foundation model that incorporates the Hyena operator. This architecture can unlock the potential to capture both the long-range and single amino acid resolution of real protein sequences over attention-based approaches.
ProtHyena: A fast and efficient foundation protein language model at single amino acid resolution https://t.co/1bDziQtbqh https://t.co/lplJqK7tJV
Happy to share the publication of our paper in @Nature Conformational ensembles of the human intrinsically disordered proteome Work led by @GiulioTesei and @AnnaIdaTrolle. I'll post more later, but for now here is a link: https://t.co/b3aZ6BbGpB and a short movie about the work https://t.co/blmrjNKjCi
NEAR initiates search by computing residue embeddings for a set of target proteins. These embeddings are used to generate a search index with the FAISS library for efficient similarity search in high dimensions.
NEAR's ResNet embedding model is trained using an N-pairs loss function guided by sequence alignments generated by the widely used HMMER3 tool.
NEAR's neural embedding model computes per-residue embeddings for target and query protein sequences, and identifies alignment candidates with a pipeline consisting of k-NN search, filtration, and neighbor aggregation.
NEAR: Neural Embeddings for Amino acid Relationships https://t.co/TpiLv4C5DA https://t.co/aA3jpf3nLG
DeepGOMeta can predict protein functions even in the absence of explicit sequence similarity or homology to known proteins. For measuring the semantic similarity between protein pairs, DeepGOMeta utilized Resnik's similarity method, combined with Best Match Average strategy.
DeepGOMeta incorporates ESM2 (Evolutionary Scale Modeling 2), a deep learning framework that extracts meaningful features from protein sequences by learning from evolutionary data.
DeepGOMeta: Predicting functions for microbes https://t.co/XCvhY9Y3qv https://t.co/9nlANHbOYD
DeepGOMeta: Predicting functions for microbes} https://t.co/QjjVIylyj1 #biorxiv_bioinfo
NEAR: Neural Embeddings for Amino acid Relationships https://t.co/7FRgN1wUPj #biorxiv_bioinfo