Hong Kong Baptist University (HKBU) Research Cluster on Data Analytics and Artificial Intelligence in X

A computational framework to prioritize disease-associated low frequency variants from IdentityBy-Descent regions
Principal Investigatgor: Dr. Eric Lu Zhang ( Department of Computer Science )

Many efforts have been made to investigate the role of single nucleotide variant (SNV) in human complex diseases. The identified disease susceptible common and rare SNVs can explain only a small proportion of disease heritability. Low allele frequency (LAF) SNV is another kind of variant that is not easy to be investigated because it requires a large sample size for association analysis or multiple families with similar manifestations for linkage analysis. The disease-associated LAF SNVs could be located in Identity-By-Descent (IBD) regions that are inherited from recent common ancestor and carried by sporadic patients. Extensive work has been done for IBD detection by common SNVs, losing the power from LAF SNVs to improve the sensitivity of detecting short IBD regions. In addition, previous tools only consider the effects of SNVs on gene functions rather than on particular diseases. Identifying which SNV in the IBD region is truly associated with the disease pose a substantial challenge.

In this project, we plan to design a computational framework to prioritize disease-associated LAF SNVs from IBD regions. A novel statistical model is designed to detect IBD regions by making use of LAF SNVs from sporadic patients. For LAF SNVs, the calculation of haplotype frequency is influenced by sequencing errors substantially. Instead, the model assumes the haplotype is sampled from Bernoulli distribution and its parameter could be evaluated and updated by the observations from training and test sets. The model further finds a way to remove the influence of sequencing error in parameter estimation to avoid double counting of them. Our preliminary study proved the shared LAF SNVs were tended to be observed in IBD regions and hardly to be observed by random chance.

The disease-specific impacts of LAF SNVs are evaluated by integrating gene functional weights and SNV pathogenicity scores. For each IBD region, we apply bi-clustering algorithm to identify the patients sharing the same disease manifestations, which are further used to calculate the susceptibilities of IBD regions by comparing with the number of carriers in controls. The disease seed genes are identified by examining patients’ manifestations and used to calculate gene functional weight, which is calculated by the functional similarities between SNV altering and disease seed genes. The gene functional weight is further integrated with SNV pathogenicity scores to calculate disease-specific SNV pathogenicity score.


  • To design a novel statistical model for IBD detection on the basis of LAF SNVs, which is expected to improve the performance for detecting short IBD regions substantially.
  • To develop a computational tool to predict the impact of SNV on particular disease, including three modules: disease manifestation clustering and mapping, gene functional weight calculation, SNV pathogenicity score prediction.
  • To build an online system for disease-specific SNV prioritization which could be applied to genetic testing in the future.

Related Publications:

  • JiFeng Guo, Lu Zhang et al. Coding mutations in NUS1 contribute to Parkinson’s disease. Proceedings of the National Academy of Sciences of the United States of America. 2018 doi:10.1073/pnas.1809969115
  • Lu Liu, Lu Zhang et al. The SNP-set based association study identifies ITGA1 as a susceptiblity gene of attention-deficit/hyperactivity disorder in Han Chinese. Translational Psychatary 2017 doi:10.1038/tp.2017.156 (joint first author)
  • Lu Zhang, Jing Zhang, Jing Yang, Dingge Ying, Yu lung Lau, Wanling Yang. PriVar: a flexible toolkit for prioritizing SNV and indel from next generation sequencing data. Bioinformatics 2013 doi:10.1093/bioinformatics/bts627.
  • Lu Zhang, Wanling Yang, Dingge Ying, Stacey S. Cherny, Friedhelm Hildebrandt, Pak Chung Sham, Yu lung Lau. 2011. Homozygosity mapping on a single patient identification of homozygous regions of recent common ancestry by using population data. Human Mutation doi: 10.1002/humu.21432.

Grant Support:

This project is supported by the Research Grants Council (RGC), Hong Kong SAR, China (Project 22201419).

For further information on this research topic, please contact Dr. Eric Lu Zhang.