Statistics Seminar at Georgia State University
Fall 2016-Spring 2017, Fridays 3:00-4:00pm, Paul Erdos Conference room (796)
Organizer: Yichuan Zhao
If you would like to give a talk in Statistics Seminar, please send an email to
Yichuan Zhao at
2:00-3:00pm, April 14, 2017, 150 COE
Haoda Fu, Ph. D.,
Eli Lilly and Company
Individualized Treatment Recommendation (ITR) for Survival Outcomes
ITR is a method to recommend treatment based on individual patient characteristics to maximize clinical benefit.
During the past a few years, we have developed and published methods on this topic with various applications including
comprehensive search algorithms, tree methods, benefit risk algorithm, multiple treatment & multiple ordinal treatment
algorithms. In this talk, we propose a new ITR method to handle survival outcomes for multiple treatments. This new model
enjoy the following practical and theoretical features.
Instead of fitting the data, our method directly search the optimal treatment police which improve the efficiency
To adjust censoring, we propose a doubly robust estimator. Our method only requires either censoring model or survival
model is correct, but not both. When both are correct, our method enjoys better efficiency.
Our method handles multiple treatments with intuitive geometry explanations.
Our method is Fisher's consistent even under either censoring model or survival model misspecification (but not both).
This method has potential applications in multiple therapeutic areas. One direct impact for Diabetes business unit is
that how we can leverage Lilly Diabetes' broad treatment options to reduce or delay diabetes comorbidities such as CV event,
diabetes related retinopathy, nephropathy, or neuropathy.
3:00-4:00pm, March 31, 2017, 796 COE
Yichen Cheng, Ph. D.,
Assistant Professor of Analytics,
Institute for Insight,
Robinson College of Business,
Georgia State University
Simulated Stochastic Approximation Annealing for Global
Optimization with a Square-Root Cooling Schedule
Simulated annealing has been widely used in the solution of optimization problems. As known by many researchers, the global optima cannot be guaranteed to be located by simulated annealing unless a logarithmic cooling schedule is used. However, the logarithmic cooling schedule is so slow that no one can afford to have so long CPU time. This paper proposes a new stochastic optimization algorithm, the so-called simulated stochastic approximation annealing algorithm, which is a combination of simulated annealing and the stochastic approximation Monte Carlo algorithm. Under the framework of stochastic approximation, it is shown that the new algorithm can work with a cooling schedule in which the temperature can decrease much faster than in the logarithmic cooling schedule, e.g., a square-root cooling schedule, while guaranteeing the global optima to be reached when the temperature tends to zero. The new algorithm has been tested on a few benchmark optimization problems, including feed-forward neural network training and protein-folding. The numerical results indicate that the new algorithm can significantly outperform simulated annealing and other competitors.
10:00-11:00am, March 30, 2017, 124 Petit Science Center
Yi Li, Ph. D.
Professor of Global Public Health and Biostatistics,
School of Public Health,
University of Michigan
Molecular Basis of Disease
Distinguished Lecture and Mathematics & Statistics Colloquium: Big Biomedical Data Analytics: New Tools and Applications
In this talk, I will briefly introduce some big biomedical datasets (BBD) my group has been analyzing.
I will then talk about some statistical work that aims to model and analyze them. Time permitting,
I will specifically illustrate 3 new methods that our group has recently developed: (1) a Gateaux-differential
based boosting method (GdBoost) for modeling and variable selection in the presence of high-dimensional time-varying
effects; (2) a covariance-enhanced discriminating analytical (CEDA) tool for classifications in the presence of
high-dimensional gene expression profiles; (3) a computationally efficient modeling technique for evaluating national
dialysis facilities' performance in terms of 30-day readmission.
3:00-4:00pm, March 3, 2017, 796 COE
Department of Biostatistics & Bioinformatics,
Estimating Dynamic Brain Functional Networks Using Multi-subject fMRI Data
A prominent assumption in the study of brain functional connectivity is the stationarity of the brain network. However, it is increasingly recognized that the brain network is prone to variations across the scanning session, fueling the need for dynamic brain functional connectivity approaches. One of the main challenges in developing such approaches is that the frequency and change points for the brain organization are unknown and that the changes may occur frequently over the scanning session. In order to provide greater power to detect rapidly evolving connectivity changes, we propose a fully automated two-stage approach which pools information across multiple subjects, in order to divide the scanning session into non-overlapping time intervals, such that each interval is characterized by a distinct brain network. The number and positioning of the time intervals are unknown and learned from the data in the first stage, by approximating the multivariate time series of correlations using a piecewise constant function under a fused lasso approach. In the second stage, the brain functional network for each time interval is inferred via sparse inverse covariance matrices. Numerical experiments show the effectiveness of the proposed method, and we also apply it to a saccade block task fMRI data.
2:00-3:00pm, February 3, 2017, 796 COE
Dr. Gang Li,
Janssen Research & Development, Johnson & Johnson
Inconsistency and drop-minimum data analysis
Multi-regional clinical trial (MRCT) becomes the preferred strategy for drug development.
Consistency among regions, which means the treatment effect is clinically meaningful
and relevant to all regions being studied, is the underlying assumption of MRCT.
Even though consistency is an important issue in MRCT, inconsistency is often anticipated,
solutions for handling inconsistency are rare. If a region's treatment effects are
inconsistent with that of the other regions, pooling all the regions to estimate the
overall treatment effect may not be reasonable. Unlike the multi-center clinical trials
conducted in the US and Europe, in MRCT different regional regulatory agencies may have
their own ways to interpret data and approve new drugs. It is therefore practical to
consider the case in which the data from the region with the minimal observed treatment
effect is excluded from the analysis in order to attain the regulatory approval of the
study drug. Under such cases, what is the appropriate statistical approach for the
remaining regions? We provide a solution first formulated in fixed effects framework and
then extended it to discrete random effects model.
This is joint work with Fei Chen and K.K. Gordon Lan.
12:00-1:30pm, February 3, 2017, 796 COE
Dr. Gang Li,
Janssen Research & Development, Johnson & Johnson
Career Development Luncheon for Graduate Students
3:00-4:00pm, January 13, 2017, 796 COE
Dr. Yunxiao Chen, Assistant Professor, Department of Psychology,
A Fused Latent and Graphical Model for Multivariate Binary Data
We consider modeling, inference, and computation for analyzing multivariate binary data.
We propose a new model that consists of a low dimensional latent variable component and a
sparse graphical component. Our study is motivated by analysis of item response data in
cognitive assessment and has applications to many disciplines where item response data are
collected. Standard approaches to item response data in cognitive assessment adopt the multidimensional
item response theory (IRT) models. However, human cognition is typically a complicated process
and thus may not be adequately described by just a few factors. Consequently, a low-dimensional
latent factor model, such as the multidimensional IRT models, is often insufficient to capture
the structure of the data. The proposed model adds a sparse graphical component that captures
the remaining ad hoc dependence. It reduces to a multidimensional IRT model when the graphical
component becomes degenerate. Model selection and parameter estimation are carried out
simultaneously through construction of a pseudo-likelihood function and properly
chosen penalty terms. The convexity of the pseudo-likelihood function allows us to
develop an efficient algorithm, while the penalty terms generate a low-dimensional
latent component and a sparse graphical structure. Desirable theoretical properties
are established under suitable regularity conditions. The method is applied to the
revised Eysenck's personality questionnaire,
revealing its usefulness in item analysis. Simulation results are reported that show
the new method works well in practical situations.
This is joint work with Xiaoou Li, Jingchen Liu, Zhiliang Ying.
3:00-4:00pm, December 2, 2016, 796 COE
Dr. Yiyuan She, Associate Professor,
Department of Statistics,
Florida State University
Indirect Gaussian Graph Learning beyond Gaussianity
This paper studies how to capture dependency graph structures from
real data which may not be multivariate Gaussian. Starting from
marginal loss functions not necessarily derived from probability
distributions, we use an additive over-parametrization with shrinkage
to incorporate variable dependencies into the criterion. An
iterative Gaussian graph learning algorithm is proposed with ease in
implementation. Statistical analysis shows that with the error measured
in terms of a proper Bregman divergence, the estimators have fast rate
of convergence. Real-life examples in different settings are given to
demonstrate the efficacy of the proposed methodology. This is joint work
with Shao Tang and Qiaoya Zhang.
10:00-11:00pm, November 17, 2016, Petit Science Center 124
Dr. Heping Zhang, Susan Dwight Bliss Professor of Biostatistics, Yale School of Public Health,
Department of Biostatistics,
Mailman School of Public Health, Yale University
Distinguished Lecture: Statistical Strategies in Analyzing Data with Unequal Prior Knowledge
The advent of technologies including high throughput genotyping and computer information
technologies has produced ever large and diverse databases that are potentially information rich.
This creates the need to develop statistical strategies that have a sound mathematical foundation
and are computationally feasible and reliable. In statistics, we commonly deal with relationship
between variables using correlation and regression models. With diverse databases, the quality of
the variables may vary and we may know more about some variables than the others. I will present
some ideas on how to conduct statistical inference with unequal prior knowledge. Specifically how
do we define correlation between two sets of random variables conditional on a third set of random
variables and how do we select predictors when we have information from sources other than the databases
with raw data? I will address some mathematical and computational challenges in order to answer these
questions. Analysis of real genomic data will be presented to support the proposed methods and highlight
This is a joint work with Xueqin Wang, Yuan Jiang, and Yunxiao He.
2:00-3:00pm, November 4, 2016, 796 COE
Dr. Zhezhen Jin,
Professor of Biostatistics,
Department of Biostatistics,
Mailman School of Public Health, Columbia University
Statistical issues and challenges in biomedical studies
In this talk, I will present statistical issues and challenges that I have encountered
in my biomedical collaborative studies of item selection in disease screening,
comparison and identification of biomarkers that are more informative to disease diagnosis, and
estimation of weights on relatively importance of exposure variables on health outcome.
After a discussion on the issues and challenges with real examples, I will review available
statistical methods and present our newly developed methods.
3:00-4:00pm, October 28, 2016, 796 COE
Dr. Guanghui (George) Lan,
H. Milton Stewart School of Industrial and Systems Engineering,
Georgia Institute of Technology
Optimal Stochastic Gradient Descent
The last few years have seen an increasing interest in the development and application of
stochastic gradient descent (SGD) methods for large-scale optimization and data analysis.
In this talk, we will first provide a brief introduction to our earlier work on optimal SGD methods,
namely the mirror-descent stochastic approximation (MD-SA) and accelerated stochastic approximation
(AC-SA) methods for minimizing a general expectation function, and show how they evolved into popular
machine learning algorithms. We then consider a special class of stochastic optimization problems whose objective function is given by the summation of finitely many terms, and present a new SGD type method, i.e., the randomized primal-dual gradient (RPDG) method, for solving these problems in an optimal manner. We provide a unified and intuitive game interpretation for Nesterov's accelerated gradient method, the more general AC-SA method and the aforementioned PRDG method.
3:00-4:00pm, October 14, 2016, 796 COE
Dr. Ruixuan Liu,
Department of Economics,
A single-index Cox model driven by Levy subordinators
I propose a new duration model where the survival time is defined as the first time
a Levy subordinator crosses an exponential threshold, with covariates acting multiplicatively
on the latent process. The specification is a natural variant of the mixed proportional hazards model
from a stochastic process point of view. When the covariates' effect is parameterized as a linear index, the model reduces to a single-index Cox model. The large sample property of a
sieve maximum partial likelihood estimator is presented.
2:00-3:00pm, October 7, 2016, 796 COE
Professor Ming-Yen Cheng,
Department of Mathematics, National Taiwan University,
A simple and adaptive two-sample test in high dimensions
High-dimensional data are commonly encountered nowadays. Testing the equality of two means
is a fundamental problem in the inference. But the conventional Hotelling's $T^2$ test performs
poorly or becomes inapplicable in high dimensions. Several modifications have been proposed to
address this challenging issue and shown to perform well. However, they all use normal approximation
to the null distributions of their test statistics, thus they all require strong regularity conditions.
We study this issue thoroughly and propose an $L^2$-norm based test that works under milder conditions
and even when there are fewer observations than the dimension. In particular, to cope with possible
non-normality of the null distribution, we employ the Welch-Satterthwaite $\chi^2$-approximation.
Simple ratio-consistently estimators for the parameters in the approximation distribution are given.
While existing tests are not, our test is adaptive to singularity or near singularity of the unknown
covariance structure, which is commonly seen in high dimensions and has great impact on the shape of the null
distribution. The approximate and asymptotic powers of the proposed test are also investigated. Simulation
studies and real data applications show that our test has a better size controlling than a benchmark test,
while the powers are comparable when their sizes are comparable.