2:00-3:00pm, April 19, 2024, Distinguished Lecture:
Location: 25 Park Place, Room 1441,
Professor Edsel Pena,
Department of Statistics, University South Carolina,
Distinguished Lecture: Joint Dynamic Models of Competing Recurrent Events, Longitudinal Marker, and Performance/Health Status
Abstract:
Recurrent events occur in many areas in the medical and public health sciences; in engineering and
reliability settings; in actuarial, financial, and economic settings; and in many other fields. For a given
subject or experimental unit, there could be competing recurrent events. Furthermore, a longitudinal
marker and a health, quality-of-life, or performance status could be monitored. I will present in this talk a
class of joint dynamic models for competing recurrent risks processes, a longitudinal marker process, and a
health or performance process that could be applicable in these varied settings. Interpretations of model
parameters will be presented, and semi-parametric estimators of these parameters will be described.
Finite and asymptotic properties of the estimators will be presented. Some potential applications of this
class of models in the context of performing personalized medicine or dynamic system-based interventions,
as well as the dynamic prediction of the occurrence of terminal events will be indicated. This is joint work
with Lili Tong (New York University) and Piaomu Liu (Bentley University).
11:00-12:00pm, March 25, 2024, Colloquium, Location: 25 Park Place, Room 1441,
Dr. Jiming Jiang,
Professor and Chair, Department of Statistics, University of California, Davis,
The Method of Limits and The Analysis of Count Data in Genome-wide Association Studies
Abstract:
We propose a new method of statistical inference, called the method of limits (MoL), which may be viewed as an extension of the method of moments. This method is motivated by the need to analyze count data for genome wide association studies (GWAS), where the existing methods are hindered in statistical inference due to computational challenges. We establish consistency and asymptotic normality of the MoL estimator of heritability from GWAS data, which is seen as an advantage over the existing PQLseq method. Furthermore, we derived a consistent estimator of the proportion of causal SNPs. MoL also showed an advantage of both statistical and computational efficiency measured by average statistical efficiency (ASE) in our simulation studies compared to PQLseq. We also illustrate the usefulness of MoL through its application to the UK Biobank data to infer the heritability of week champagne consumption and week red wine consumption using the count data. This work is joint with Leqi Xu, Yiliang Zhang and Hongyu Zhao of Yale University.
2:00-3:00pm, February 2, 2024, Colloquium, Location: 25 Park Place, Room 1441,
Dr. Rogers Silva,
Research Scientist II, TReNDS Center, Georgia State University,
Multi-level subspace analysis for linkage detection in big neuroimaging data
Abstract:
Artificial intelligence and machine learning (AI/ML) are paving the way to help unravel the complexities of the human brain and shed light on the underpinnings of atypical behaviors. This is critical for early detection of mental illnesses and personalized treatment prescription. Here, key AI/ML strategies to leverage the richness of brain data are discussed, including multimodal fusion, multidimensional and deep latent representations, as well as multi-level linkage detection and large-scale federated learning. Each of these is presented through a general overarching concept of subspaces. Examples from original research and the brain imaging field at large illustrate the application of each strategy to brain data from multiple modalities. Markedly, these approaches tackle challenges that extend readily to other fields outside of brain imaging.
2:00-3:00pm, January 30, 2024, Colloquium, Location: 25 Park Place, Room 1441,
Dr. Russell Jeter,
Motus Nova, Atlanta, Georgia,
Foundations of Big Data and Machine Learning in Stroke Recovery, Sepsis Treatment, and Large-Scale Learning Tools
Abstract:
In this talk, we will discuss three big data and machine learning projects. In these projects, we develop big data and machine learning methods to solve problems related to stroke recovery, sepsis treatment, and developing large-scale learning tools. Using sensor data collected from in-home stroke rehabilitation robotics, we developed a model for determining the stroke residual severity of stroke survivors. This model is intentionally portable and requires therapeutic summary measures that would be available on the home therapeutic device in the study as well as in an outpatient rehabilitation facility. Utilizing up to 72 hours of clinical observations taken from septic patients in the ICU in we developed a machine learning approach to guide a fluid and vasopressor dosing strategy that adapts to patient-specific clinical states to improve the survival of septic patients. This method simultaneously learns to infer the hidden state of a septic patient while learning how to respond to their specific needs to treat a hypotensive episode. Leveraging cognitive theory, we developed an algorithm for generating quality assessments with distractors. Using data collected from these assessments, we are developing machine learning methods to further train our algorithm to identify misconceptions that lead to students' incorrect responses to free response problems.
3:00-4:00pm, November 3, 2023, Statistics Seminar, Location: 25 Park Place, Room 1441,
Dr. Parichoy Choudhury,
Principal Scientist of Biostatistics, the American Cancer Society, Atlanta
Statistical methods in the development and prospective validation of absolute risk prediction models with applications in cancer
Abstract:
Risk-stratified disease prevention involves tailoring of health decisions about screening and prevention based on the individualized risk predictions. This requires a comprehensive understanding of the risk factors, including genetic variants, biomarkers, lifestyle/behavioral and environmental factors leading to the development of a model for predicting absolute risk of a disease of interest. Absolute risk model development requires information on relative risks of the risk factors, population-based age-specific disease incidence rates and competing event rates and population distributions of the risk factors. Such a model needs to be validated ideally in independent prospective cohorts before clinical applications. In this presentation, I will describe the iCARE software tool for implementing absolute risk estimation of a disease integrating multiple data sources leveraging the best information available for each of the input parameters and standardized approaches for risk model validation. I will describe a major recent application of this tool in the development and validation of a comprehensive risk prediction model for breast cancer interegrating questionnaire-based risk factors and polygenic risk score (PRS). Model validation in two-phase study settings often involve scenarios where expensive biomarkers (e.g., PRS) are measured in smaller subsample of a prospective cohort, where subjects may be selected using complex sampling designs. I will describe a simple method for improving precision of model validation statistics (e.g., AUC) using the partial risk factors from the full cohort and complete risk factors from the subsample. I will show an application in breast cancer risk prediction with questionnaire-based risk factors and PRS.
3:00-4:00pm, October 20, 2023, Statistics Seminar, Location: 25 Park Place, Room 1441,
Dr. Cheng Mao,
Assistant Professor, School of Mathematics, Georgia Institute of Technology,
Information-Theoretic and Computational Thresholds for Finding a Planted Dense Cycle in a Random Graph
Abstract:
Planted dense cycles are a type of geometric latent structure appearing in many real-world networks that exhibit the small-world phenomenon. We consider a model where a cycle with a small bandwidth and a high edge density is planted in a random graph. We characterize the information-theoretic and computational thresholds for the detection and recovery problems in this model. In particular, a statistical-to-computational gap exists between the thresholds achieved by exponential-time algorithms and efficient algorithms based on subgraph counts. Moreover, a detection-to-recovery gap exists between the threshold for testing the existence of the planted cycle and that for estimating the location of the cycle.