Nature Communications (May 2025)
A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data
Abstract
Abstract Accurately predicting the effect of missense variants is important in discovering disease risk genes and clinical genetic diagnostics. Commonly used computational methods predict pathogenicity, which does not capture the quantitative impact on fitness in humans. We develop a method, MisFit, to estimate missense fitness effect using a graphical model. MisFit jointly models the effect at a molecular level ( $$d$$ d ) and a population level (selection coefficient, $$s$$ s ), assuming that in the same gene, missense variants with similar $$d$$ d have similar $$s$$ s . We train it by maximizing probability of observed allele counts in 236,017 individuals of European ancestry. We show that $$s$$ s is informative in predicting allele frequency across ancestries and consistent with the fraction of de novo mutations in sites under strong selection. Further, $$s$$ s outperforms previous methods in prioritizing de novo missense variants in individuals with neurodevelopmental disorders. In conclusion, MisFit accurately predicts $$s$$ s and yields new insights from genomic data.