Motivated by a logistic regression problem involving diet and cancer, we reconsider the problem of forming a confidence interval for the ratio of two location parameters. We develop a new methodology, which we call the Direct Integral Method for Ratios (DIMER). In simulations, we compare this method to many others, including Wald's method, Fieller's interval, Hayya's method, the nonparametric bootstrap and the parametric bootstrap. These simulations show that, generally, DIMER more closely achieves the nominal confidence level, and in those cases that the other methods achieve the nominal levels, DIMER generally has smaller confidence interval lengths. We also show that DIMER eliminates the probability of infinite length or enormous length confidence intervals, something that can occur in Fieller's interval. Furthermore, we study the real Healthy Eating Index-2005 (HEI-2005) data set from the NIH-AARP Study of Diet and Health, consider a weighted logistic regression model in which there are multiple subpopulations, and multiple diseases within each subpopulation. Based on this model, we present six different approaches to form the confidence intervals for the relative risks of different diseases in different subpopulations, including DIMER. The asymptotic distributions of the estimates for the log(relative risks) by the maximum likelihood and the nonparametric bootstrap method are provided. Next, the algorithms are presented to perform hypothesis tests and likelihood ratio tests to check there are significant differences between our proposed model and the other three logistic regression models or not. In addition, the adaptive lasso and an estimator with bounded constrains are described for variable selection and a novel algorithm to solve the nonlinear regression model with L1 norm penalty is proposed. The application of all those methods to the HEI-2005 data are illustrated. Additionally, we expand the linear function of nutrition components inside the logistic regression model to a nonlinear case. More than that, we consider there are some limitations from the knowledge of biology and nutrition and propose a logistic regression model involving I-spline basis functions and an algorithm to solve it. Application to the real HEI-200d data set and comparison to a logistic model with total HEI scores are also presented.
- Mallick, Bani Distinguished Professor