Statistics Seminar at Georgia State University

Fall 2016-Spring 2017, Fridays 3:00-4:00pm, Paul Erdos Conference room (796) COE

Organizer: Yichuan Zhao

If you would like to give a talk in Statistics Seminar, please send an email to Yichuan Zhao at

2:00-3:00pm, April 14, 2017, 150 COE, Haoda Fu, Ph. D., Eli Lilly and Company
Individualized Treatment Recommendation (ITR) for Survival Outcomes

Abstract: ITR is a method to recommend treatment based on individual patient characteristics to maximize clinical benefit. During the past a few years, we have developed and published methods on this topic with various applications including comprehensive search algorithms, tree methods, benefit risk algorithm, multiple treatment & multiple ordinal treatment algorithms. In this talk, we propose a new ITR method to handle survival outcomes for multiple treatments. This new model enjoy the following practical and theoretical features. Instead of fitting the data, our method directly search the optimal treatment police which improve the efficiency To adjust censoring, we propose a doubly robust estimator. Our method only requires either censoring model or survival model is correct, but not both. When both are correct, our method enjoys better efficiency. Our method handles multiple treatments with intuitive geometry explanations. Our method is Fisher's consistent even under either censoring model or survival model misspecification (but not both). This method has potential applications in multiple therapeutic areas. One direct impact for Diabetes business unit is that how we can leverage Lilly Diabetes' broad treatment options to reduce or delay diabetes comorbidities such as CV event, diabetes related retinopathy, nephropathy, or neuropathy.

3:00-4:00pm, March 31, 2017, 796 COE, Yichen Cheng, Ph. D., Assistant Professor of Analytics, Institute for Insight, Robinson College of Business, Georgia State University
Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule

Abstract: Simulated annealing has been widely used in the solution of optimization problems. As known by many researchers, the global optima cannot be guaranteed to be located by simulated annealing unless a logarithmic cooling schedule is used. However, the logarithmic cooling schedule is so slow that no one can afford to have so long CPU time. This paper proposes a new stochastic optimization algorithm, the so-called simulated stochastic approximation annealing algorithm, which is a combination of simulated annealing and the stochastic approximation Monte Carlo algorithm. Under the framework of stochastic approximation, it is shown that the new algorithm can work with a cooling schedule in which the temperature can decrease much faster than in the logarithmic cooling schedule, e.g., a square-root cooling schedule, while guaranteeing the global optima to be reached when the temperature tends to zero. The new algorithm has been tested on a few benchmark optimization problems, including feed-forward neural network training and protein-folding. The numerical results indicate that the new algorithm can significantly outperform simulated annealing and other competitors.

10:00-11:00am, March 30, 2017, 124 Petit Science Center, Yi Li, Ph. D. Professor of Global Public Health and Biostatistics, School of Public Health, University of Michigan
Molecular Basis of Disease Distinguished Lecture and Mathematics & Statistics Colloquium: Big Biomedical Data Analytics: New Tools and Applications

Abstract: In this talk, I will briefly introduce some big biomedical datasets (BBD) my group has been analyzing. I will then talk about some statistical work that aims to model and analyze them. Time permitting, I will specifically illustrate 3 new methods that our group has recently developed: (1) a Gateaux-differential based boosting method (GdBoost) for modeling and variable selection in the presence of high-dimensional time-varying effects; (2) a covariance-enhanced discriminating analytical (CEDA) tool for classifications in the presence of high-dimensional gene expression profiles; (3) a computationally efficient modeling technique for evaluating national dialysis facilities' performance in terms of 30-day readmission.

3:00-4:00pm, March 3, 2017, 796 COE, Suprateek Kundu, Assistant Professor, Department of Biostatistics & Bioinformatics, Emory University
Estimating Dynamic Brain Functional Networks Using Multi-subject fMRI Data

Abstract: A prominent assumption in the study of brain functional connectivity is the stationarity of the brain network. However, it is increasingly recognized that the brain network is prone to variations across the scanning session, fueling the need for dynamic brain functional connectivity approaches. One of the main challenges in developing such approaches is that the frequency and change points for the brain organization are unknown and that the changes may occur frequently over the scanning session. In order to provide greater power to detect rapidly evolving connectivity changes, we propose a fully automated two-stage approach which pools information across multiple subjects, in order to divide the scanning session into non-overlapping time intervals, such that each interval is characterized by a distinct brain network. The number and positioning of the time intervals are unknown and learned from the data in the first stage, by approximating the multivariate time series of correlations using a piecewise constant function under a fused lasso approach. In the second stage, the brain functional network for each time interval is inferred via sparse inverse covariance matrices. Numerical experiments show the effectiveness of the proposed method, and we also apply it to a saccade block task fMRI data.

2:00-3:00pm, February 3, 2017, 796 COE, Dr. Gang Li, Janssen Research & Development, Johnson & Johnson
Inconsistency and drop-minimum data analysis

Abstract: Multi-regional clinical trial (MRCT) becomes the preferred strategy for drug development. Consistency among regions, which means the treatment effect is clinically meaningful and relevant to all regions being studied, is the underlying assumption of MRCT. Even though consistency is an important issue in MRCT, inconsistency is often anticipated, solutions for handling inconsistency are rare. If a region's treatment effects are inconsistent with that of the other regions, pooling all the regions to estimate the overall treatment effect may not be reasonable. Unlike the multi-center clinical trials conducted in the US and Europe, in MRCT different regional regulatory agencies may have their own ways to interpret data and approve new drugs. It is therefore practical to consider the case in which the data from the region with the minimal observed treatment effect is excluded from the analysis in order to attain the regulatory approval of the study drug. Under such cases, what is the appropriate statistical approach for the remaining regions? We provide a solution first formulated in fixed effects framework and then extended it to discrete random effects model. This is joint work with Fei Chen and K.K. Gordon Lan.

12:00-1:30pm, February 3, 2017, 796 COE, Dr. Gang Li, Janssen Research & Development, Johnson & Johnson
Career Development Luncheon for Graduate Students


3:00-4:00pm, January 13, 2017, 796 COE, Dr. Yunxiao Chen, Assistant Professor, Department of Psychology, Emory University
A Fused Latent and Graphical Model for Multivariate Binary Data

Abstract: We consider modeling, inference, and computation for analyzing multivariate binary data. We propose a new model that consists of a low dimensional latent variable component and a sparse graphical component. Our study is motivated by analysis of item response data in cognitive assessment and has applications to many disciplines where item response data are collected. Standard approaches to item response data in cognitive assessment adopt the multidimensional item response theory (IRT) models. However, human cognition is typically a complicated process and thus may not be adequately described by just a few factors. Consequently, a low-dimensional latent factor model, such as the multidimensional IRT models, is often insufficient to capture the structure of the data. The proposed model adds a sparse graphical component that captures the remaining ad hoc dependence. It reduces to a multidimensional IRT model when the graphical component becomes degenerate. Model selection and parameter estimation are carried out simultaneously through construction of a pseudo-likelihood function and properly chosen penalty terms. The convexity of the pseudo-likelihood function allows us to develop an efficient algorithm, while the penalty terms generate a low-dimensional latent component and a sparse graphical structure. Desirable theoretical properties are established under suitable regularity conditions. The method is applied to the revised Eysenck's personality questionnaire, revealing its usefulness in item analysis. Simulation results are reported that show the new method works well in practical situations. This is joint work with Xiaoou Li, Jingchen Liu, Zhiliang Ying.

3:00-4:00pm, December 2, 2016, 796 COE, Dr. Yiyuan She, Associate Professor, Department of Statistics, Florida State University
Indirect Gaussian Graph Learning beyond Gaussianity

Abstract: This paper studies how to capture dependency graph structures from real data which may not be multivariate Gaussian. Starting from marginal loss functions not necessarily derived from probability distributions, we use an additive over-parametrization with shrinkage to incorporate variable dependencies into the criterion. An iterative Gaussian graph learning algorithm is proposed with ease in implementation. Statistical analysis shows that with the error measured in terms of a proper Bregman divergence, the estimators have fast rate of convergence. Real-life examples in different settings are given to demonstrate the efficacy of the proposed methodology. This is joint work with Shao Tang and Qiaoya Zhang.

10:00-11:00pm, November 17, 2016, Petit Science Center 124, Dr. Heping Zhang, Susan Dwight Bliss Professor of Biostatistics, Yale School of Public Health, Department of Biostatistics, Mailman School of Public Health, Yale University
Distinguished Lecture: Statistical Strategies in Analyzing Data with Unequal Prior Knowledge

Abstract: The advent of technologies including high throughput genotyping and computer information technologies has produced ever large and diverse databases that are potentially information rich. This creates the need to develop statistical strategies that have a sound mathematical foundation and are computationally feasible and reliable. In statistics, we commonly deal with relationship between variables using correlation and regression models. With diverse databases, the quality of the variables may vary and we may know more about some variables than the others. I will present some ideas on how to conduct statistical inference with unequal prior knowledge. Specifically how do we define correlation between two sets of random variables conditional on a third set of random variables and how do we select predictors when we have information from sources other than the databases with raw data? I will address some mathematical and computational challenges in order to answer these questions. Analysis of real genomic data will be presented to support the proposed methods and highlight remaining challenges. This is a joint work with Xueqin Wang, Yuan Jiang, and Yunxiao He.

2:00-3:00pm, November 4, 2016, 796 COE, Dr. Zhezhen Jin, Professor of Biostatistics, Department of Biostatistics, Mailman School of Public Health, Columbia University
Statistical issues and challenges in biomedical studies

Abstract: In this talk, I will present statistical issues and challenges that I have encountered in my biomedical collaborative studies of item selection in disease screening, comparison and identification of biomarkers that are more informative to disease diagnosis, and estimation of weights on relatively importance of exposure variables on health outcome. After a discussion on the issues and challenges with real examples, I will review available statistical methods and present our newly developed methods.

3:00-4:00pm, October 28, 2016, 796 COE, Dr. Guanghui (George) Lan, Associate Professor, H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology
Optimal Stochastic Gradient Descent

Abstract: The last few years have seen an increasing interest in the development and application of stochastic gradient descent (SGD) methods for large-scale optimization and data analysis. In this talk, we will first provide a brief introduction to our earlier work on optimal SGD methods, namely the mirror-descent stochastic approximation (MD-SA) and accelerated stochastic approximation (AC-SA) methods for minimizing a general expectation function, and show how they evolved into popular machine learning algorithms. We then consider a special class of stochastic optimization problems whose objective function is given by the summation of finitely many terms, and present a new SGD type method, i.e., the randomized primal-dual gradient (RPDG) method, for solving these problems in an optimal manner. We provide a unified and intuitive game interpretation for Nesterov's accelerated gradient method, the more general AC-SA method and the aforementioned PRDG method.

3:00-4:00pm, October 14, 2016, 796 COE, Dr. Ruixuan Liu, Assistant Professor, Department of Economics, Emory University
A single-index Cox model driven by Levy subordinators

Abstract: I propose a new duration model where the survival time is defined as the first time a Levy subordinator crosses an exponential threshold, with covariates acting multiplicatively on the latent process. The specification is a natural variant of the mixed proportional hazards model from a stochastic process point of view. When the covariates' effect is parameterized as a linear index, the model reduces to a single-index Cox model. The large sample property of a sieve maximum partial likelihood estimator is presented.

2:00-3:00pm, October 7, 2016, 796 COE, Professor Ming-Yen Cheng, Department of Mathematics, National Taiwan University,
A simple and adaptive two-sample test in high dimensions

Abstract: High-dimensional data are commonly encountered nowadays. Testing the equality of two means is a fundamental problem in the inference. But the conventional Hotelling's $T^2$ test performs poorly or becomes inapplicable in high dimensions. Several modifications have been proposed to address this challenging issue and shown to perform well. However, they all use normal approximation to the null distributions of their test statistics, thus they all require strong regularity conditions. We study this issue thoroughly and propose an $L^2$-norm based test that works under milder conditions and even when there are fewer observations than the dimension. In particular, to cope with possible non-normality of the null distribution, we employ the Welch-Satterthwaite $\chi^2$-approximation. Simple ratio-consistently estimators for the parameters in the approximation distribution are given. While existing tests are not, our test is adaptive to singularity or near singularity of the unknown covariance structure, which is commonly seen in high dimensions and has great impact on the shape of the null distribution. The approximate and asymptotic powers of the proposed test are also investigated. Simulation studies and real data applications show that our test has a better size controlling than a benchmark test, while the powers are comparable when their sizes are comparable.