Optimal Bayesian Transfer Learning for Count Data. | Academic Article individual record
abstract

There is often a limited amount of omics data to design predictive models in biomedicine. Knowing that these omics data come from underlying processes that may share common pathways and disease mechanisms, it may be beneficial for designing a more accurate and reliable predictor in a target domain of interest, where there is a lack of labeled data to leverage available data in relevant source domains. Here, we focus on developing Bayesian transfer learning methods for analyzing next-generation sequencing (NGS) data to help improve predictions in the target domain. We formulate transfer learning in a fully Bayesian framework and define the relatedness by a joint prior distribution of the model parameters of the source and target domains. Defining joint priors acts as a bridge across domains, through which the related knowledge of source data is transferred to the target domain. We focus on RNA-seq discrete count data, which are often overdispersed. To appropriately model them, we consider the Negative Binomial model and propose an Optimal Bayesian Transfer Learning (OBTL) classifier that minimizes the expected classification error in the target domain. We evaluate the performance of the OBTL classifier via both synthetic and cancer data from The Cancer Genome Atlas (TCGA).

publication outlet

IEEE/ACM Trans Comput Biol Bioinform

author list (cited authors)
Karbalayghareh, A., Qian, X., & Dougherty, E. R.
publication date
2021
keywords
  • Databases, Genetic
  • High-Throughput Nucleotide Sequencing
  • Neoplasms
  • Computational Biology
  • Models, Statistical
  • Bayes Theorem
  • Machine Learning
  • Humans
citation count

3

PubMed ID
31180899
identifier
501197SE
Digital Object Identifier (DOI)
start page
644
end page
655
volume
18
issue
2
UN Sustainable Development Goals