Loading...
Scientists have developed a new alignment-based DNA language model called GPN-MSA for predicting the effects of genetic variants on a genome-wide scale. The model, which uses an alignment approach, has the potential to improve our understanding of genetic variations and their impact on human health. This development could have significant implications for the field of bioinformatics and genetic research. Additionally, a study conducted in Mexico City involving the genotyping, sequencing, and analysis of 140,000 adults has been published in the journal Nature. The study aims to provide valuable insights into the genetic makeup of the population and could contribute to future research on genetic diseases and personalized medicine.
Single-nucleotide variant calling in single-cell sequencing data with Monopogen | Nature Biotechnology https://t.co/TiAXOMoUnk #Bioinformatics https://t.co/pV8E3t6jy0
GPN-MSA is trained with a weighted cross-entropy loss, designed to downweight repetitive elements and up-weight conserved elements. As data augmentation in non-conserved regions, prior to computing the loss, the reference is sometimes replaced by a random nucleotide.
GPN-MSA, a novel DNA language model which is designed for genome wide variant effect prediction and is based on the biologically-motivated integration of a multiple-sequence alignment (MSA) across diverse species using the flexible Transformer architecture.
GPN-MSA: an alignment-based DNA language model for genome-wide variant effect prediction https://t.co/UfQqcBcqUm https://t.co/TjLk9EbkTs
cgMSI formulates strain identification as a maximum a posteriori (MAP) estimation problem to take both sequencing errors and genome similarity between different strains into consideration for accurate strain-typing at low abundance.
Genotyping, sequencing and analysis of 140,000 adults from Mexico City | Nature https://t.co/HzvEhRrKyF
GPN-MSA: an alignment-based DNA language model for genome-wide variant effect prediction https://t.co/MHTGjs1caX #biorxiv_bioinfo