Statistics Seminar

2:00-3:00pm, April 8, 2016, 796 COE, Dr. Min Zhang, Associate Professor, Department of Statistics, Purdue University

Statistical Methods for Integrative Genome-Wide Analysis

Abstract: We developed a variable selection framework to integrate pathway information for genome-wide association analysis. Unlike Bayesian variable selection methods that rely on computation-intensive Markov chain Monte Carlo algorithms, we proposed an iterated conditional modes/medians algorithm to implement an empirical Bayes variable selection. Iterated conditional modes are first utilized to optimize values of the hyper-parameters and to implement the empirical Bayes method, and then iterated conditional medians are used to estimate the model parameters and therefore implement the variable selection function. In addition to the advantages of Bayesian inference, the proposed method enjoys efficient computation, increased statistical power of the analysis, and improved estimation of the model parameters. Extensive computer simulation studies show the superior performance of our proposed approach, and the method has been applied to real data from genome-wide association studies. This is a joint work with Vitara Pungpapong and Dabao Zhang.

3:00-4:00pm, March 11, 2016, 796 COE, Dr. Zhongjian Lin, Assistant Professor of Economics, Emory University

Identification and Estimation of Hierarchy Effects in Social Interactions

Abstract: This paper studies status based heterogenous peer effects in a large network. We extend Xu (2011)'s large network game to allow for heterogenous peer effects. In particular, we measure individuals' social status by the number of friends nominations in the network, which determine the strength of peer effects. Hierarchy effects are characterized by the differences of peer effects from friends with different social status. To solve the computational burden when the data come from a single large network, we extend Aguirregabiria and Mira (2007)'s nested pseudo likelihood estimation (NPLE) to the large network game model. We illustrate our method through both Monte Carlo experiments and an empirical study of high school students' college attendance decisions using the Add Health dataset.

3:00-4:00pm, February 19, 796 COE, Dr. Benjamin Haaland, Assistant Professor, Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

Computationally efficient approximation of large-scale and high-dimensional simulations

Abstract: Simulations are used by scientists and engineers to study complex real systems such as material micro-structure or passenger flows through a new airport design. Frequently, these complex simulations are too expensive to allow full exploration of the unknown relationship, much less optimization. A common solution is to build a computationally efficient approximate simulation, or emulator. Here, we discuss several aspects of building an accurate and efficient emulator in the context of large-scale and high-dimensional simulations. Specifically, we examine sources of inaccuracy related to data collection and present two techniques well-adapted to large-scale and high-dimensional simulations, local Gaussian process fitting and multi-resolution functional ANOVA modeling.

3:00-4:00pm, February 12, 796 COE, Dr. Norou Diawara, Associate Professor, Department of Mathematics and Statistics, Old Dominion University

Statistical Pattern Recognition using Gaussian Copula

Abstract: Statistical pattern recognition has attracted great interest due to their applicability and to the advances in technology and computing. Significant research has been done in areas such as automatic character recognition, medical diagnostic and data mining. Classical discrimination rule for pattern recognition assumes normality. But in real life, this assumption is often questionable. In some situation, the pattern vector is a mixture of discrete and continuous random variables. In this talk, we use copula densities to model class conditional distribution for pattern recognition with Bayes' decision rule. These types of densities are useful when the marginal densities of a pattern vector are not normally distributed. Those models are also useful for a mixed pattern vectors. We present simulations' results to compare the performance of the copula based classifier with classical normal distribution based model and the independent assumption based model. Application to real data is presented.

2:00-3:00pm, January 29, 796 Petite Science Center Building (PSC) 124 (Distinguished Lecture in Statistics), Dr. C. F. Jeff Wu, the Professor and Coca-Cola Chair in Engineering Statistics, School of Industrial and Systems Engineering, Georgia Institute of Technology

From real world problems to esoteric research: examples and personal experience

Abstract: Young (and some not-so-young) researchers often wonder how to extract good research ideas and develop useful methodologies from solving real world problems. The path is rarely straightforward and its success depends on the circumstances, tenacity and luck. I will use three examples to illustrate how I trod the path. The first involved an attempt to find optimal growth conditions for nano structures. It led to the development of a new method "sequential minimum energy design (smed)", which exploits an analogy to potential energy of charged particles. After a few years of frustrated efforts and relentless pursuit, we realized that smed is more suitable for generating samples adaptively to mimic an arbitrary distribution rather than for optimization. The main objective of the second example was to build an efficient statistical emulator based on finite element simulation results with two mesh densities in cast foundry operations. It eventually led to the development of a class of nonstationary Gaussian process models that can be used to connect simulation data of different precisions and speeds. The third example is about sequential design that works well for small samples in sensitivity testing. I will describe three major papers in a span of 30 years and how each paper had one new idea that inspired the next paper. In each example, the developed methodology has broader applications beyond the original problem. I will explain the thought process in each example. Finally, I will share some secrets about a "path to innovation".

3:00-4:00pm, December 4, 796 COE, Professor Betty Sao-Hou Lai, Division of Epidemiology and Biostatistics, Georgia State University

Children's Reactions to Trauma: Modeling Posttraumatic Stress Symptoms After Hurricanes Andrew and Katrina

Abstract: Approximately 100 million children are exposed to disasters each year. Disaster exposure leads to the development of posttraumatic stress, anxiety, and depression symptoms in children. However, how and why children differ in their reactions to disaster is poorly understood. This talk focuses on two applied examples of growth mixture modeling, examining children's varying reactions after Hurricanes Andrew and Katrina.

2- 3pm, November 20, 796 COE, Dr. Yi Li, Professor of Biostatistics and Director of Kidney Epidemiology and Cost Center, University of Michigan

Modeling Complex Large-scale Time-to-event Data: An Efficient Quasi-Newton Approach

Abstract: Nonproportional hazards models often arise in modern biomedical studies, as evidenced by a recent national kidney transplant study. During the follow up, the effects of baseline risk factors, such as patients' commorbidity conditions collected at transplantation, may vary over time, resulting in a weakening or strengthening of associations over time. Time-varying survival models have emerged as a powerful means of modeling the dynamic changes of covariate effects. Traditional methods of fitting time-varying effects survival model rely on an expansion of the original dataset in a repeated measurement format, which, even with a moderate sample size, leads to an extremely large working dataset. Consequently, the computational burden increases quickly as the sample size grows, and analyses of a large dataset such as our motivating example defy any existing statistical methods and software. We propose a novel application of quasi-Newton iteration method, via a refined line search procedure, to model the dynamic changes of covariates' effects in survival analysis. We show that the algorithm converges superlinearly and is computationally efficient for large-scale datasets. We apply the proposed methods to analyze the national kidney transplant data and study the impact of potential risk factors on post-transplant survival.

3:00-4:00pm, Novemver 6, 796 COE, Professor Tao Zha, School of Economics, Emory University and Federal Reserve Bank of Atlanta

Dynamic Striated Metropolis Hastings Sampler for High-Dimensional Models

Having efficient and accurate samplers for simulating the posterior distribution is crucial for Bayesian analysis. We develop a generic posterior simulator called the "dynamic striated Metropolis-Hastings (DSMH)" sampler. Grounded in the Metropolis-Hastings algorithm, it pools the strengths from the equi-energy and sequential Monte Carlo samplers while avoiding the weaknesses of the standard Metropolis-Hastings algorithm and those of importance sampling. In particular, the DSMH sampler possesses the capacity to cope with extremely irregular distributions that contain winding ridges and multiple peaks; and it is robust to how the sampling procedure progresses across stages. The high-dimensional application studied in this paper provides a natural platform for testing any generic sampler.

3:00-4:00pm, October 23, 796 COE, Professor Enlu Zhou, ISYE, Georgia Institute of Technology

Gradient-based Adaptive Stochastic Search (GASS)

Gradient-based adaptive stochastic search (GASS) is an algorithm for solving general optimization problems with little structure. GASS iteratively finds high quality solutions by randomly sampling candidate solutions from a parameterized distribution model over the solution space. The basic idea is to convert the original (possibly non-continuous, non-differentiable) problem into a differentiable optimization problem on the parameter space of the parameterized distribution, and then use a direct gradient search method to find improved distributions. Thus, GASS combines the robustness feature of stochastic search by considering a population of candidate solutions with the relative fast convergence speed of classical gradient methods. The performance of the algorithm is illustrated on a number of benchmark problems and a resource allocation problem in communication networks. If time permits, I will also talk about the extension of GASS to simulation optimization problems, where the objective function can only be evaluated through a stochastic simulation model.

2:00-3:00pm, September 11, 796 COE, Professor and Dean, Tao Wang, School of Mathematics, Yunnan Normal University, China

The Estimation and Exact Lower Confidence Limit of the Conditional Reliability for Weibull Distribution in the Life Tests with Fixed Stopping Time

In this talk a new method for calculating the lower confidence limit of the conditional reliability for Weibull distribution in the life time test with fixed stopping time is presented. For the data obtained from the tests with fixed stopping time, how to obtain the accurate lower confidence limit of the conditional reliability is a difficult problem. Based on the theory of ordering method in the sample space, for prearranged confidence level, with arbitrary sample size (not less than 2), we give the accurate lower confidence limit for the conditional reliability as well as its effective calculating method. The software is also presented. This is joint work with Jiading Chen, School of Mathematical Sciences, Peking University.

Statistics Seminar at Georgia State University

Fall 2015-Spring 2016, Fridays 3:00-4:00pm, Paul Erdos Conference room (796) COE

Organizer: Yichuan Zhao

If you would like to give a talk in Statistics Seminar, please send an email to Yichuan Zhao at yichuan@gsu.edu