Statistics Seminar at Georgia State University

Fall 2022-Spring 2023, Fridays 3:00-4:00pm, Virtual Seminar

Organizer: Yichuan Zhao

If you would like to give a talk in Statistics Seminar, please send an email to Yichuan Zhao at yichuan@gsu.edu



2:00-3:00pm, April 21, 2023, Statistics Seminar, Location: 25 Park Place, Room 1441, Webex Meeting Link: https://gsumeetings.webex.com/gsumeetings/j.php?MTID=m479e9d70c7c9f7ee6c9cc10c7c82cbbe , Professor Zhengjun Zhang, UCAS and University of Wisconsin at Madison,
Directly and Simultaneously Expressing Absolute and Relative Treatment Effects in Medical Data Models and Applications

Abstract: Logistic regression is widely used in the analysis of medical data with binary outcomes to study treatment effects through (absolute) treatment effect parameters in the models. However, the indicative parameters of relative treatment effects are not introduced in logistic regression models, which can be a severe problem in efficiently modeling treatment effects and lead to the wrong conclusions with regard to treatment effects. This paper introduces a new enhanced logistic regression model that offers a new way of studying treatment effects by measuring the relative changes in the treatment effects and also incorporates the way in which logistic regression models the treatment effects. The new model, called the Absolute and Relative Treatment Effects (AbRelaTEs) model, is viewed as a generalization of logistic regression and an enhanced model with increased flexibility, interpretability, and applicability in real data applications than the logistic regression. The AbRelaTEs model is capable of modeling significant treatment effects via an absolute or relative or both ways. The new model can be easily implemented using statistical software, with the logistic regression model being treated as a special case. As a result, the classical logistic regression models can be replaced by the AbRelaTEs model to gain greater applicability and have a new benchmark model for more efficiently studying treatment effects in clinical trials, economic developments, and many applied areas. Moreover, the estimators of the coefficients are consistent and asymptotically normal under regularity conditions. In both simulation and real data applications, the model provides both significant and more meaningful results. Joint work with Hao Yang Teng.

3:00-4:00pm, April 07, 2023, Statistics Seminar, Location: 25 Park Place, Room 1441, Webex Meeting Link: https://gsumeetings.webex.com/gsumeetings/j.php?MTID=m8beba471d9ad65a2b139730ce962ba9a , Dr. Jiancheng Jiang, Professor, Department of Mathematics and Statistics, The University of North Carolina at Charlotte,
Empirical likelihood ratio tests for nonnested model selection based on predictive losses

Abstract: We propose an empirical likelihood ratio (ELR) test for comparing any two supervised learning models, where the competing models may be nested, nonnested, overlapping, misspecified or correctly specified. It compares the prediction losses of models based on cross-validation. We develop its asymptotic distributions for comparing two nonparametric learning models under a general framework with convex loss functions. However, the prediction losses from the cross-validation involve repeatedly fitting the models with one observation left out, which is a heavy computational burden. We introduce an easy-to-implement ELR test that requires fitting the models only once and shares the same asymptotics as the original one. The proposed tests are applied to compare additive models with varying-coefficient models. Furthermore, a scalable distributed ELR test is proposed for testing the importance of a group of variables in possibly misspecified additive models with massive data. Simulations show that the proposed tests work well and have favorable finite sample performance over some existing approaches. The methodology is validated on an empirical application.

2:00-3:00pm, March 10, 2023, Colloquium, Location: 25 Park Place, Room 1441 , Dr. Huang Lin, NIH,
Microbiome Differential Abundance (DA) Analysis and Metabolomics Variational Autoencoder (VAE) Model

Abstract:

2:00-3:00pm, March 08, 2023, Colloquium, Location: 25 Park Place, Room 1441 , Dr. Li-Hsiang Lin, Department of Experimental Statistics Louisiana State University,
EHigh-Dimensional Multivariate Linear Regression with Weighted Nuclear Norm Regularization

Abstract: A low-rank matrix estimation problem when the data is assumed to be generated from the multivariate linear regression model is considered. To induce the low-rank coefficient matrix, we employ the weighted nuclear norm (WNN) penalty defined as the weighted sum of the singular values of the matrix. The weights are set in a non-decreasing order, which yields the non-convexity of the WNN objective function in the parameter space. Such objective function has been applied in many applications, but studies on the estimation properties of the estimator from the objective function are limited. We propose an efficient algorithm under the framework of alternative directional method of multipliers (ADMM) to estimate the coefficient matrix. The estimator from the suggested algorithm converges to a stationary point of an augmented Lagrangian function. Under the orthogonal design setting, effects of the weights for estimating the singular values of ground-truth coefficient matrix are derived. Under the Gaussian design setting, a minimax convergence rate on the estimation error is derived. We also propose a generalized cross-validation (GCV) criterion for selecting the tuning parameter and an iterative algorithm for updating the weights. Simulations and real data analyses demonstrate the competitive performance of our new method. Extensions of the proposed method will also be discussed.

2:00-3:00pm, March 03, 2023, Distinguished Lecture: Location: 25 Park Place, Room 1441 , Professor Zhezhen Jin, Department of Biostatistics, Columbia University,
Statistical issues and challenges in biomedical studies

Abstract: In this talk, I will present statistical issues and challenges that I have encountered in my biomedical research. Through real biomedical studies in transplantation, aging, and environmental science, I will illustrate the topics including data collection, data cleaning, formulation of research questions, data analysis, and related statistical methodology development. After a discussion on the issues and challenges, I will focus on item selection in disease screening, comparison and identification of biomarkers that are more informative to disease diagnosis, estimation of weights on relatively importance of exposure variables on health outcome, subsampling , and variable selection and dimension reduction for adjusted analysis. I will also present our newly developed methods tied to the real studies which can address some of the issues and challenges.

11:00-12:00pm, March 03, 2023, Colloquium, Location: 25 Park Place, Room 1441 , Huimin Cheng, Department of Statistics University of Georgia,
Masked Mirror Validation in Graphon Estimation

Abstract: Graphon, short for graph function, provides a generative model for networks. An accurate estimation of graphon plays a key role in many applications, such as link prediction. In recent decades, various methods for graphon estimation have been proposed. The success of most graphon estimation methods depends on a proper specification of hyperparameters. Some network cross-validation methods have been proposed, but they suffer from restrictive model assumptions, expensive computational costs, and a lack of theoretical guarantees. To address these issues, we propose a masked mirror validation (MMV) method. Asymptotic properties of the MMV are established. The effectiveness of the proposed method in terms of both computation and accuracy is demonstrated by extensive simulation experiments. We further apply MMV for drug repurposing in a real data application.

3:00-4:00pm, February 24, 2023, Statistics Seminar, Location: 25 Park Place, Room 1441 , Dr. Shihao Yang, School of Industrial & Systems Engineering at Georgia Tech,
Inference of dynamic systems from noisy and sparse data via manifold-constrained Gaussian processes

Abstract: Parameter estimation for nonlinear dynamic system models, represented by ordinary differential equations (ODEs) or partial differential equations (PDEs), using noisy and sparse experimental data is a vital task in many fields. We propose a fast and accurate method, manifold-constrained Gaussian process Inference, for this task. Our method uses a Gaussian process model over system components, explicitly conditioned on the manifold constraint that gradients of the Gaussian process must satisfy the ODE/PDE system. By doing so, we completely bypass the need for numerical integration and achieve substantial savings in computational time. Our method is also suitable for inference with unobserved system components, which often occur in real experiments. Our method is distinct from existing approaches as we provide a principled statistical construction under a Bayesian framework, which rigorously incorporates the ODE/PDE system through conditioning.

2:00-3:00pm, December 12, 2022, Colloquium, Location: 25 Park Place, Room 1441, Webex Meeting Link: https://gsumeetings.webex.com/webappng/sites/gsumeetings/meeting/info/7 , Dr. Yi Li, M. Anthony Schork Professor of Biostatistics and Professor of Global Public Health, Department of Biostatistics, University of Michigan,
Distinguished Lecture: Machine Learning in the Era of Big Data: Model Selection, Estimation, and Inference

Abstract: In the era of big data, high-throughput data are routinely collected. These high dimensional data defy classical regression models, which are either infeasible to fit or likely to incur low predictability because of overfitting. In this talk, we will introduce several cutting-edge machine learning methods developed by my group in the last few years for modeling (censored) outcome data with high dimensional predictors. Specifically, we will introduce a Dantzig selector for fitting survival models with high dimensional predictors, followed by various semiparametric and nonparametric feature screening methods for handling ultra-high dimensional predictors. We will also discuss statistical inference for regression models with high dimensional predictors. With high-dimensional outcome data, we will introduce a new class of high-dimensional Gaussian graphical regression models with predictors. The talk focuses on statistical principles and concepts behind these methods, which are motivated and illustrated by various biomedical examples with precision medicine contexts.

2:00-3:00pm, November 18, 2022, Virtual Colloquium, Webex Meeting Link: https://gsumeetings.webex.com/gsumeetings/j.php?MTID=m5315fe8659224e9317efc53c2f6c56a0 , Dr. Lili Yu, Professor and Karl E. Peace endowed Chair, Department of Biostatistics, Epidemiology and Environmental Health Sciences, Georgia Southern University,
Survival data analysis with Heteroscedastic Accelerated Failure time model

Abstract: The Buckley-James method for the classical accelerated failure time model has been extended to accommodate heteroscedastic survival data in two ways. The first is the weighted least squares method, which estimates the heteroscedasticity nonparametrically, while the second is the local Buckely-James method, which uses local Kaplan-Meier method to estimate heteroscedasticity. In this talk, we will discuss and compare these two methods theoretically and numerically with simulation studies. Two real data examples are used for practical illustration of the comparison.

3:00-4:00pm, November 04, 2022, Virtual Statistics Seminar, Webex Meeting Link: https://gsumeetings.webex.com/gsumeetings/j.php?MTID=mebd181d58e6c1ea7ab546b83c1bf8e39 , Dr. TSZ Chai Fung, Assistant Professor, Department of Risk Management & Insurance, Georgia State University,
A Posteriori Risk Classification and Ratemaking with Random Effects in the Mixture-of-Experts Model

Abstract: In the underwriting and pricing of non-life insurance products, the insurer needs to utilize both policyholder information and claim history to ensure profitability and proper risk management. While the policyholder information such as age and gender reflect the observable risk characteristics, their claim history may be regarded as a manifestation of unmeasurable and unobservable risk factors, which could vary drastically across different policyholders. This presentation introduces a flexible regression model, called the Mixed LRMoE, for a posteriori rate making, which leverages policyholder information and their claim history, to categorize policyholders into groups with risk profiles and to determine a premium that accurately captures the unobserved risks. Our proposed framework outperforms the benchmark models regarding goodness-of-fit to a real, multiyear automobile insurance dataset while offering intuitive and interpretable characterization of policyholders' risk profiles to reflect their claim history adequately.