Many efforts have been made to investigate the role of single nucleotide variant (SNV) in human complex diseases. The identified disease susceptible common and rare SNVs can explain only a small proportion of disease heritability. Low allele frequency (LAF) SNV is another kind of variant that is not easy to be investigated because it requires a large sample size for association analysis or multiple families with similar manifestations for linkage analysis. The disease-associated LAF SNVs could be located in Identity-By-Descent (IBD) regions that are inherited from recent common ancestor and carried by sporadic patients. Extensive work has been done for IBD detection by common SNVs, losing the power from LAF SNVs to improve the sensitivity of detecting short IBD regions. In addition, previous tools only consider the effects of SNVs on gene functions rather than on particular diseases. Identifying which SNV in the IBD region is truly associated with the disease pose a substantial challenge.
In this project, we plan to design a computational framework to prioritize disease-associated LAF SNVs from IBD regions. A novel statistical model is designed to detect IBD regions by making use of LAF SNVs from sporadic patients. For LAF SNVs, the calculation of haplotype frequency is influenced by sequencing errors substantially. Instead, the model assumes the haplotype is sampled from Bernoulli distribution and its parameter could be evaluated and updated by the observations from training and test sets. The model further finds a way to remove the influence of sequencing error in parameter estimation to avoid double counting of them. Our preliminary study proved the shared LAF SNVs were tended to be observed in IBD regions and hardly to be observed by random chance.
The disease-specific impacts of LAF SNVs are evaluated by integrating gene functional weights and SNV pathogenicity scores. For each IBD region, we apply bi-clustering algorithm to identify the patients sharing the same disease manifestations, which are further used to calculate the susceptibilities of IBD regions by comparing with the number of carriers in controls. The disease seed genes are identified by examining patients’ manifestations and used to calculate gene functional weight, which is calculated by the functional similarities between SNV altering and disease seed genes. The gene functional weight is further integrated with SNV pathogenicity scores to calculate disease-specific SNV pathogenicity score.
This project is supported by the Research Grants Council (RGC), Hong Kong SAR, China (Project 22201419).
For further information on this research topic, please contact Dr. Eric Lu Zhang.