Statistics Seminar at Georgia State University
Fall 2017-Spring 2018, Fridays 3:00-4:00pm, Seminar room (1441), 25 Park Place
Organizer: Yichuan Zhao
If you would like to give a talk in Statistics Seminar, please send an email to
Yichuan Zhao at
2:00-3:00pm, April 13, 2018, 1441, 25 Park Place
Peng Zeng, Ph.D.,
Associate Professor of Statistics,
Department of Mathematics and Statistics,
Degrees of Freedom in Regression Models
The number of degrees of freedom is a measure of the effective number of parameters used to
fit a regression model. It has been used in various information criteria for model selection.
For the least squares fit of a linear regression model, the number of degrees of freedom is
exactly the number of parameters. Geometrically, it is also the dimension of the column space of
the design matrix. There have been many recent researches trying to extend these results beyond least
squares fit. In this talk, we will review some existing results and techniques on degrees of
freedom, which are mainly under the framework of Stein (1981), Ye (1998), and Efron (2004).
Additionally, we will also discuss some recent development for degrees of freedom in linear
regression with linear constraints.
2:00-3:00pm, April 06, 2018, 1441, 25 Park Place
Regina Liu, PhD,
Distinguished Professor of Statistics,
iFusion: Individualized Fusion Learning
Inferences from different data sources can often be fused together to yield more powerful
findings than those from individual sources alone. We present a new fusion learning approach,
called ‘iFusion’, for drawing efficient individualized inference by fusing the leanings from relevant data sources. iFusion is robust for handling heterogeneity arising from diverse data sources, and is ideally suited for goal-directed applications such as precision medicine. Specifically, iFusion summarizes individual inferences as confidence distributions (CDs), adaptively forms a clique of individuals that bear relevance to the target individual, and then suitably combines the CDs from those relevant individuals in the clique to draw inference for the target individual. In essence, iFusion strategically ‘borrows strength’ from relevant individuals to improve efficiency while retaining its inference validity. Computationally, iFusion is parallel in nature and scales up well in comparison with competing approaches. The performance of the approach is demonstrated by simulations and real data applications.
This is joint work with Jieli Shen, Goldman Sachs, and Minge Xie, Rutgers University
2:00-3:00pm, February 23, 2018, 1441, 25 Park Place
Frazier Bindele, PhD,
Assistant Professor of Statistics,
University of South Alabama,
Rank-based estimating equation with non-ignorable missing responses via empirical likelihood
In this talk, a general regression model with responses missing not at random is considered.
From a rank-based estimating equation, a rank-based estimator of the regression parameter is
derived. Based on this estimator's asymptotic normality property, a consistent sandwich estimator
of its corresponding asymptotic covariance matrix is obtained. In order to overcome the over-coverage issue of
the normal approximation procedure, the empirical likelihood based on the rank-based gradient function is
defined, and its asymptotic distribution is established. Extensive simulation experiments under different settings
of error distributions with different response probabilities are considered, and the simulation results show that
the proposed empirical likelihood approach has better performance in terms of coverage probability and average length of
confidence intervals for the regression parameters compared with the normal approximation approach and its least-squares
counterpart. A data example is provided to illustrate the proposed methods.
3:00-4:00pm, February 16, 2018, 1441, 25 Park Place
Daniel Pimentel-Alarcón, Ph. D.,
Assistant Professor of Computer Science, Georgia State University,
Sexy (Nonlinear) Matrix Completion
Low-rank matrix completion (LRMC) has received tremendous attention in recent years. The problem is
tantamount to identifying a linear variety (subspace) that explains an incomplete data matrix.
The reason for the great success of LRMC is that data often lies near a linear variety. Now think about this:
if "lines" are good models for explaining data, "polynomials" can only be better!
In this talk I will discuss the challenges of inferring algebraic varieties (polynomial structures) from
incomplete data. I will explain the main ideas behind our theory and algorithms. These include precise deterministic
sampling conditions (specifying which entries we need to observe), and computationally efficient algorithms guaranteed
to correctly complete matrices even in cases where traditional LRMC is guaranteed to fail. I'll present some experimental
results showing that the new approach significantly outperforms existing state-of-the-art methods in many situations,
including standard LRMC and the popular Subspace Clustering problem.
3:00-4:00pm, February 9, 2018, 1441, 25 Park Place
Stephen Shore, Ph. D.,
R. Means Davis Professor of Risk Management and Insurance,
Robinson College of Business, Georgia State University,
Responses to Saving Commitments: Evidence from Mortgage Run-offs
We study how individuals respond to the removal of a saving constraint. Mortgage run-offs predictably relax
a saving constraint for borrowers who chose mortgage contracts that committed them to effectively save by paying
down mortgage principal. Using the universe the Danish populations we identify individuals whose mortgages were
on track to run off between 1995 and 2014. We use mortgage runoffs to understand the importance of relaxing a saving
constraint on wealth, leisure, consumption, saving, and investment decisions ? as well as the mechanism individuals
use to circumvent the saving constraint. We find that on average, borrowers use 39 percent of the
resources previously devoted to mortgage payments to decrease labor income, and use 53 percent to pay down
other debts. The labor supply response is limited to those without substantial assets or debts prior to the run-off,
while the debt reduction response is limited to (and one-for-one among) those without substantial assets but with
other debt prior to the run-off. We find no statistically significant results for wealth accumulation
in bank deposits, stocks, or bonds.
3:00-4:00pm, January 26, 2018, 1441, 25 Park Place
Yubao (Robert) Wu, Ph. D.,
Assistant Professor of Computer Science, Georgia State University,
Tensor Decompositions and Applications
Matrices have been widely used to model the real-world data such as the subject-feature matrices
and the adjacency matrices of graphs. Matrix low-rank decompositions give good sketches of
the original data. They have been well studied and provided elegant mathematical
foundations for many classic machine learning and data mining methods. Singular-value
decomposition (SVD) and nonnegative matrix factorization (NMF) are two widely-used example methods.
In many real-world applications, the data has more than two dimensions thus matrices are not
able to model the data. Higher-order matrices (tensors) are used instead. Tensor analysis is
generally harder than matrix analysis. Even tensors have been studied for many years, there is
still no unique definitions of eigen-values for tensors. Similar to matrix low-rank decompositions,
tensor low-rank decompositions have also been studied and applied in various real-world applications.
In this talk, we will give a tutorial about matrix and tensor decompositions and their applications
in real-world applications. We will also discuss new ideas about tensor decompositions and try to
establish collaborations with researchers from the math department.
2:00-3:00pm, January 26, 2018, 1441, 25 Park Place
Dr. John Rosen,
Adjunct Associate Professor of Math. and Stat., Georgia State University,
Math, Data, and Science: Statistical Myths
The talk will explore the relations among mathematical statistics, actual data, and the scientific fields that
collect data. The talk should be accessible to all of the department?s graduate students. The most complicated
mathematical concept will be the definition of a random variable (I will summarize it).
The talk would present my ideas (apparently they?re new) about the relations among math, data and science.
The bottom line: The usual books have it wrong.
2:00-3:00pm, November 17, 2017, 1441, 25 Park Place
Professor Robert Lund,
NSF and Department of Mathematical Sciences, Clemson University,
Distinguished Lecture: Bayesian Multiple Breakpoint Detection: Mixing Documented and Undocumented Changepoints
This talk presents methods to estimate the number of changepoint time(s)
and their locations in time-ordered data sequences when prior information is
known about some of the changepoint times. A Bayesian version of a penalized
likelihood objective function is developed from minimum description length (MDL)
information theory principles. Optimizing the objective function yields estimates of the changepoint
number(s) and location time(s). Our MDL penalty depends on where the changepoint(s) lie,
but not solely on the total number of changepoints (such as classical AIC and BIC penalties).
Specifically, configurations with changepoints that occur relatively closely to one and other
are penalized more heavily than sparsely arranged changepoints. The techniques allow for
autocorrelation in the observations and mean shifts at each changepoint time. This scenario
arises in climate time series where a ``metadata" record exists documenting some, but not necessarily
all, of station move times and instrumentation changes. Applications to climate time series are presented throughout.
2:00-3:00pm, November 10, 2017, 1441, 25 Park Place
Dr. Dungang Liu, Assistant Professor,
Carl H Lindner College of Business, University of Cincinnati
3:00-4:00pm, November 3, 2017, 1441, 25 Park Place
Dr. Roshan Vengazhiyil, Professor,
Stewart School of Industrial & Systems Engineering,
Georgia Institute of Technology
Deterministic Sampling: An Alternative to Monte Carlo
Monte Carlo techniques are widely used in all fields of science and engineering. However, when the system under study is
expensive or time consuming, we can use carefully chosen deterministic samples to get faster results. In this talk, I will
focus on the applications of deterministic sampling in Bayesian computational problems and uncertainty quantification. I will
discuss two techniques, known as minimum energy designs and support points,
and explain how they can be used for sampling from arbitrary probability distributions with few function evaluations.
3:00-4:00pm, October 13, 2017, 1441, 25 Park Place
Dr. Mayya Zhilova,
School of Mathematics,
Georgia Institute of Technology
Higher-order Berry-Esseen inequalities and accuracy of the bootstrap
In this talk, we study higher-order accuracy of a bootstrap procedure for approximation in distribution
of a smooth function of a sample average in high-dimensional non-asymptotic framework. Our approach
is based on Berry-Esseen type inequalities which extend the classical normal approximation bounds.
These results justify in non-asymptotic setting that bootstrapping can outperform Gaussian (or chi-squared)
approximation in accuracy with respect to both dimension
and sample size. In addition, the presented results lead to improvements of
accuracy of a weighted bootstrap procedure for general log-likelihood ratio
statistics (under certain regularity conditions). The theoretical results are
illustrated with numerical experiments on simulated data.
2:00-3:00pm, September 29, 2017, 1441, 25 Park Place
Dr. Ray Bai,
Department of Statistics,
University of Georgia
An invitation to the long memory phenomenon
We provide an introduction to the long memory phenomenon in time series analysis.
Some recent research progress on the topic will be touched as well.
The talk should be accessible to a general
audience including graduate students.
2:00-3:00pm, September 8, 2017, in 1441, 25 Park Place
Professor Hao Zhang,
Department of Statistics, Purdue University
The Big Data Issues in Spatial Statistics
One of areas where big data are collected is in environmental and climate studies. The Global Circulation Models
or Regional Circulation Models can generate huge amount of data in space and time. Data collected through remote sensing
or sensor networks are also huge. All these data are correlated spatially and temporally. One therefore has to deal with
the huge covariance matrix in the traditional likelihood-based inferences or Bayesian inferences. When the dimension is very
large, inversion of the matrix becomes infeasible numerically and also unstable due to the ill condition of the matrix.
The issue becomes more complex when multivariate data are observed which can have different spatial scales. In this talk,
I will discuss some recent developments in the theory and methods for dealing with the big spatial data, and
in particular highlight some numerical
algorithms that are applicable to large multivariate spatial data that are possibly of different scales.
3:00-4:00pm, September 1, 2017, 1441, 25 Park Place
Assistant Professor Tuo Zhao,
School of Industrial and Systems Engineering,
Georgia Institute of Technology,
Compute Faster and Learn Better: Machine Learning via Nonconvex Model-based Optimization
Nonconvex optimization naturally arises in many machine learning problems.
Machine learning researchers exploit various nonconvex formulations to gain
modeling flexibility, estimation robustness, adaptivity, and computational scalability.
Although classical computational complexity theory has shown that solving nonconvex optimization
is generally NP-hard in the worst case, practitioners have proposed numerous heuristic optimization
algorithms, which achieve outstanding empirical performance in real-world applications.
To bridge this gap between practice and theory, we propose a new generation of model-based optimization
algorithms and theories, which incorporate the statistical thinking into modern optimization. Specifically,
when designing practical computational algorithms, we take the underlying statistical models into consideration.
Our novel algorithms exploit hidden geometric structures behind many nonconvex optimization problems, and
can obtain global optima with the desired statistics properties in polynomial time with high probability.