Statistics Seminar at Georgia State University

Fall 2017-Spring 2018, Fridays 3:00-4:00pm, Seminar room (1441), 25 Park Place

Organizer: Yichuan Zhao

If you would like to give a talk in Statistics Seminar, please send an email to Yichuan Zhao at

2:00-3:00pm, April 13, 2018, 1441, 25 Park Place, Peng Zeng, Ph.D., Associate Professor of Statistics, Department of Mathematics and Statistics, Auburn University,
Degrees of Freedom in Regression Models

Abstract: The number of degrees of freedom is a measure of the effective number of parameters used to fit a regression model. It has been used in various information criteria for model selection. For the least squares fit of a linear regression model, the number of degrees of freedom is exactly the number of parameters. Geometrically, it is also the dimension of the column space of the design matrix. There have been many recent researches trying to extend these results beyond least squares fit. In this talk, we will review some existing results and techniques on degrees of freedom, which are mainly under the framework of Stein (1981), Ye (1998), and Efron (2004). Additionally, we will also discuss some recent development for degrees of freedom in linear regression with linear constraints.

2:00-3:00pm, April 06, 2018, 1441, 25 Park Place, Regina Liu, PhD, Distinguished Professor of Statistics, Rutgers University,
Distinguished Lecture: iFusion: Individualized Fusion Learning

Abstract: Inferences from different data sources can often be fused together to yield more powerful findings than those from individual sources alone. We present a new fusion learning approach, called ‘iFusion’, for drawing efficient individualized inference by fusing the leanings from relevant data sources. iFusion is robust for handling heterogeneity arising from diverse data sources, and is ideally suited for goal-directed applications such as precision medicine. Specifically, iFusion summarizes individual inferences as confidence distributions (CDs), adaptively forms a clique of individuals that bear relevance to the target individual, and then suitably combines the CDs from those relevant individuals in the clique to draw inference for the target individual. In essence, iFusion strategically ‘borrows strength’ from relevant individuals to improve efficiency while retaining its inference validity. Computationally, iFusion is parallel in nature and scales up well in comparison with competing approaches. The performance of the approach is demonstrated by simulations and real data applications. This is joint work with Jieli Shen, Goldman Sachs, and Minge Xie, Rutgers University

2:00-3:00pm, February 23, 2018, 1441, 25 Park Place, Frazier Bindele, PhD, Assistant Professor of Statistics, University of South Alabama,
Rank-based estimating equation with non-ignorable missing responses via empirical likelihood

Abstract: In this talk, a general regression model with responses missing not at random is considered. From a rank-based estimating equation, a rank-based estimator of the regression parameter is derived. Based on this estimator's asymptotic normality property, a consistent sandwich estimator of its corresponding asymptotic covariance matrix is obtained. In order to overcome the over-coverage issue of the normal approximation procedure, the empirical likelihood based on the rank-based gradient function is defined, and its asymptotic distribution is established. Extensive simulation experiments under different settings of error distributions with different response probabilities are considered, and the simulation results show that the proposed empirical likelihood approach has better performance in terms of coverage probability and average length of confidence intervals for the regression parameters compared with the normal approximation approach and its least-squares counterpart. A data example is provided to illustrate the proposed methods.

3:00-4:00pm, February 16, 2018, 1441, 25 Park Place, Daniel Pimentel-Alarcón, Ph. D., Assistant Professor of Computer Science, Georgia State University,
Sexy (Nonlinear) Matrix Completion

Abstract: Low-rank matrix completion (LRMC) has received tremendous attention in recent years. The problem is tantamount to identifying a linear variety (subspace) that explains an incomplete data matrix. The reason for the great success of LRMC is that data often lies near a linear variety. Now think about this: if "lines" are good models for explaining data, "polynomials" can only be better! In this talk I will discuss the challenges of inferring algebraic varieties (polynomial structures) from incomplete data. I will explain the main ideas behind our theory and algorithms. These include precise deterministic sampling conditions (specifying which entries we need to observe), and computationally efficient algorithms guaranteed to correctly complete matrices even in cases where traditional LRMC is guaranteed to fail. I'll present some experimental results showing that the new approach significantly outperforms existing state-of-the-art methods in many situations, including standard LRMC and the popular Subspace Clustering problem.

3:00-4:00pm, February 9, 2018, 1441, 25 Park Place, Stephen Shore, Ph. D., R. Means Davis Professor of Risk Management and Insurance, Robinson College of Business, Georgia State University,
Responses to Saving Commitments: Evidence from Mortgage Run-offs

Abstract: We study how individuals respond to the removal of a saving constraint. Mortgage run-offs predictably relax a saving constraint for borrowers who chose mortgage contracts that committed them to effectively save by paying down mortgage principal. Using the universe the Danish populations we identify individuals whose mortgages were on track to run off between 1995 and 2014. We use mortgage runoffs to understand the importance of relaxing a saving constraint on wealth, leisure, consumption, saving, and investment decisions ? as well as the mechanism individuals use to circumvent the saving constraint. We find that on average, borrowers use 39 percent of the resources previously devoted to mortgage payments to decrease labor income, and use 53 percent to pay down other debts. The labor supply response is limited to those without substantial assets or debts prior to the run-off, while the debt reduction response is limited to (and one-for-one among) those without substantial assets but with other debt prior to the run-off. We find no statistically significant results for wealth accumulation in bank deposits, stocks, or bonds.

3:00-4:00pm, January 26, 2018, 1441, 25 Park Place, Yubao (Robert) Wu, Ph. D., Assistant Professor of Computer Science, Georgia State University,
Tensor Decompositions and Applications

Abstract: Matrices have been widely used to model the real-world data such as the subject-feature matrices and the adjacency matrices of graphs. Matrix low-rank decompositions give good sketches of the original data. They have been well studied and provided elegant mathematical foundations for many classic machine learning and data mining methods. Singular-value decomposition (SVD) and nonnegative matrix factorization (NMF) are two widely-used example methods. In many real-world applications, the data has more than two dimensions thus matrices are not able to model the data. Higher-order matrices (tensors) are used instead. Tensor analysis is generally harder than matrix analysis. Even tensors have been studied for many years, there is still no unique definitions of eigen-values for tensors. Similar to matrix low-rank decompositions, tensor low-rank decompositions have also been studied and applied in various real-world applications. In this talk, we will give a tutorial about matrix and tensor decompositions and their applications in real-world applications. We will also discuss new ideas about tensor decompositions and try to establish collaborations with researchers from the math department.

2:00-3:00pm, January 26, 2018, 1441, 25 Park Place, Dr. John Rosen, Adjunct Associate Professor of Math. and Stat., Georgia State University,
Math, Data, and Science: Statistical Myths

Abstract: The talk will explore the relations among mathematical statistics, actual data, and the scientific fields that collect data. The talk should be accessible to all of the department?s graduate students. The most complicated mathematical concept will be the definition of a random variable (I will summarize it). The talk would present my ideas (apparently they?re new) about the relations among math, data and science. The bottom line: The usual books have it wrong.

2:00-3:00pm, November 17, 2017, 1441, 25 Park Place, Professor Robert Lund, NSF and Department of Mathematical Sciences, Clemson University,
Distinguished Lecture: Bayesian Multiple Breakpoint Detection: Mixing Documented and Undocumented Changepoints

Abstract: This talk presents methods to estimate the number of changepoint time(s) and their locations in time-ordered data sequences when prior information is known about some of the changepoint times. A Bayesian version of a penalized likelihood objective function is developed from minimum description length (MDL) information theory principles. Optimizing the objective function yields estimates of the changepoint number(s) and location time(s). Our MDL penalty depends on where the changepoint(s) lie, but not solely on the total number of changepoints (such as classical AIC and BIC penalties). Specifically, configurations with changepoints that occur relatively closely to one and other are penalized more heavily than sparsely arranged changepoints. The techniques allow for autocorrelation in the observations and mean shifts at each changepoint time. This scenario arises in climate time series where a ``metadata" record exists documenting some, but not necessarily all, of station move times and instrumentation changes. Applications to climate time series are presented throughout.

2:00-3:00pm, November 10, 2017, 1441, 25 Park Place, Dr. Dungang Liu, Assistant Professor, Carl H Lindner College of Business, University of Cincinnati


3:00-4:00pm, November 3, 2017, 1441, 25 Park Place, Dr. Roshan Vengazhiyil, Professor, Stewart School of Industrial & Systems Engineering, Georgia Institute of Technology
Deterministic Sampling: An Alternative to Monte Carlo

Abstract: Monte Carlo techniques are widely used in all fields of science and engineering. However, when the system under study is expensive or time consuming, we can use carefully chosen deterministic samples to get faster results. In this talk, I will focus on the applications of deterministic sampling in Bayesian computational problems and uncertainty quantification. I will discuss two techniques, known as minimum energy designs and support points, and explain how they can be used for sampling from arbitrary probability distributions with few function evaluations.

3:00-4:00pm, October 13, 2017, 1441, 25 Park Place, Dr. Mayya Zhilova, Assistant Professor, School of Mathematics, Georgia Institute of Technology
Higher-order Berry-Esseen inequalities and accuracy of the bootstrap

Abstract: In this talk, we study higher-order accuracy of a bootstrap procedure for approximation in distribution of a smooth function of a sample average in high-dimensional non-asymptotic framework. Our approach is based on Berry-Esseen type inequalities which extend the classical normal approximation bounds. These results justify in non-asymptotic setting that bootstrapping can outperform Gaussian (or chi-squared) approximation in accuracy with respect to both dimension and sample size. In addition, the presented results lead to improvements of accuracy of a weighted bootstrap procedure for general log-likelihood ratio statistics (under certain regularity conditions). The theoretical results are illustrated with numerical experiments on simulated data.

2:00-3:00pm, September 29, 2017, 1441, 25 Park Place, Dr. Ray Bai, Assistant Professor, Department of Statistics, University of Georgia
An invitation to the long memory phenomenon

Abstract: We provide an introduction to the long memory phenomenon in time series analysis. Some recent research progress on the topic will be touched as well. The talk should be accessible to a general audience including graduate students.

2:00-3:00pm, September 8, 2017, in 1441, 25 Park Place, Professor Hao Zhang, Department of Statistics, Purdue University
The Big Data Issues in Spatial Statistics

Abstract: One of areas where big data are collected is in environmental and climate studies. The Global Circulation Models or Regional Circulation Models can generate huge amount of data in space and time. Data collected through remote sensing or sensor networks are also huge. All these data are correlated spatially and temporally. One therefore has to deal with the huge covariance matrix in the traditional likelihood-based inferences or Bayesian inferences. When the dimension is very large, inversion of the matrix becomes infeasible numerically and also unstable due to the ill condition of the matrix. The issue becomes more complex when multivariate data are observed which can have different spatial scales. In this talk, I will discuss some recent developments in the theory and methods for dealing with the big spatial data, and in particular highlight some numerical algorithms that are applicable to large multivariate spatial data that are possibly of different scales.

3:00-4:00pm, September 1, 2017, 1441, 25 Park Place, Assistant Professor Tuo Zhao, School of Industrial and Systems Engineering, Georgia Institute of Technology,
Compute Faster and Learn Better: Machine Learning via Nonconvex Model-based Optimization

Abstract: Nonconvex optimization naturally arises in many machine learning problems. Machine learning researchers exploit various nonconvex formulations to gain modeling flexibility, estimation robustness, adaptivity, and computational scalability. Although classical computational complexity theory has shown that solving nonconvex optimization is generally NP-hard in the worst case, practitioners have proposed numerous heuristic optimization algorithms, which achieve outstanding empirical performance in real-world applications. To bridge this gap between practice and theory, we propose a new generation of model-based optimization algorithms and theories, which incorporate the statistical thinking into modern optimization. Specifically, when designing practical computational algorithms, we take the underlying statistical models into consideration. Our novel algorithms exploit hidden geometric structures behind many nonconvex optimization problems, and can obtain global optima with the desired statistics properties in polynomial time with high probability.