Statistics Seminar at Georgia State University

Fall 2018-Spring 2019, Fridays 3:00-4:00pm, Seminar room (1441), 25 Park Place

Organizer: Yichuan Zhao

If you would like to give a talk in Statistics Seminar, please send an email to Yichuan Zhao at

2:00-3:00pm, March 15, 2019, 1441, 25 Park Place, Dr. Brani Vidakovic, Professor, H. Milton Stewart School of Industrial and Systems Engineering, and The Wallace H. Coulter Department of Biomedical Engineering, GaTech


3:00-4:00pm, March 8, 2019, 1441, 25 Park Place, Dr. Houping Xiao, Assistant Professor, Institute of Insight, Robinson College of Business, Georgia State University,
Exploring the Power of Source Reliability in Information Integration

Abstract: In the era of Big Data, data entries, even describing the same objects or events, can come from a variety of sources. There are some sources that typically provide accurate information, but due to various reasons such as recording errors, device malfunction, background noise, or even intent to manipulate the data, some other sources may contain noisy or even erroneous information. Therefore, during information integration, it is critical to identify reliable sources that more often provide accurate information. Unfortunately, there is no oracle telling us which information source is more reliable a priori. In this talk, novel information integration methods are developed that incorporate the estimation of source reliability in both data-level and model-level information integration. In both works, we prove some nice properties of the proposed approaches via theoretical analysis and demonstrate their impact on some real applications such as indoor floorplan construction and crowdsourced question answering.

2:00-3:00pm, March 1, 2019, 1441, 25 Park Place, Dr. Zifeng Zhao, Assistant Professor, Department of IT, Analytics & Operations, University of Notre Dame,
A Composite Likelihood-based Approach for Change-point Detection in Spatiotemporal Process

Abstract: This paper develops a unified, accurate and computationally efficient method for change-point inference in non-stationary spatiotemporal processes. By modeling a non-stationary spatiotemporal process as a piecewise stationary spatiotemporal process, we consider simultaneous estimation of the number and locations of change-points, and model parameters in each segment. A composite likelihood-based criterion is developed for parameters estimation under the minimum description length principle. Asymptotic theory including consistency and distribution of the estimators are derived under mild conditions. A computational efficient pruned dynamic programming algorithm is developed for the challenging criterion optimization problem. Simulation studies and an application to U.S. precipitation data are provided to demonstrate the effectiveness and practicality of the proposed method.

3:00-4:00pm, February 22, 2019, 1441, 25 Park Place, Jun Kong, PhD, Associate Professor, Department of Mathematics and Statistics, Georgia State University,
High Performance Computing for Quantitative Analysis of Multi-Dimensional Tumor Micro-Environment with Microscopy Image Data

Abstract: In biomedical research, the availability of an increasing array of high-throughput and high- resolution instruments has given rise to large datasets of imaging data. These datasets provide highly detailed views of tissue structures at the cellular level and present a strong potential to revolutionize biomedical translational research. However, traditional human-based tissue review is not feasible to obtain this wealth of imaging information due to the overwhelming data scale and unacceptable inter- and intra- observer variability. In this talk, I will first describe how to efficiently process Two-Dimension (2D) digital microscopy images for highly discriminating phenotypic information with development of microscopy image analysis algorithms and Computer-Aided Diagnosis (CAD) systems for processing and managing massive in-situ micro-anatomical imaging features with high performance computing. Additionally, I will present novel algorithms to support Three-Dimension (3D), molecular, and time-lapse microscopy image analysis with HPC. Specifically, I will demonstrate an on-demand registration method within a dynamic multi-resolution transformation mapping and an iterative transformation propagation framework. This will allow us to efficiently scrutinize volumes of interest on-demand in a single 3D space. For segmentation, I will present a scalable segmentation framework for histopathological structures with two steps: 1) initialization with joint information drawn from spatial connectivity, edge map, and shape analysis, and 2) variational level-set based contour deformation with data-driven sparse shape priors. For 3D reconstruction, I will present a novel cross section association method leveraging Integer Programming, Markov chain based posterior probability modelling and Bayesian Maximum A Posteriori (MAP) estimation for 3D vessel reconstruction. I will also present new methods for multi-stain image registration, biomarker detection, and 3D spatial density estimation for For molecular imaging data integration. For time-lapse microscopy images, I will present a new 3D cell segmentation method with gradient partitioning and local structure enhancement by eigenvalue analysis with hessian matrix. A derived tracking method will be also presented that combines Bayesian filters with a sequential Monte Carlo method with joint use of location, velocity, 3D morphology features, and intensity profile signatures. Our proposed methods featuring by 2D, 3D, molecular, and time-lapse microscopy image analysis will facilitate researchers and clinicians to extract accurate histopathology features, integrate spatially mapped pathophysiological biomarkers, and model disease progression dynamics at high cellular resolution. Therefore, they are essential for improving clinical decisions, enhancing prognostic predictions, inspiring new research hypotheses, and realizing personalized medicine.

2:00-3:00pm, February 15, 2019, 1441, 25 Park Place, Dr. Hulin Wu, The Betty Wheless Trotter Professor & Chair, Department of Biostatistics & Data Science, School of Public Health, University of Texas Health Science Center at Houston,
Distinguished Lecture: Real World EHR Big Data: Challenges and Opportunities

Abstract: The real world EHR and health care Big Data may bring a revolutionary thinking on how to evaluate therapeutic treatments in a real world setting. Big EHR data may also allow us to identify specific patient populations for a specific treatment so that the concept of personalized treatment can be implemented and deployed directly on the EHR system. However, it is quite challenging to use the real world data in treatment assessment and disease predictions due to various reasons. In this talk, I will share our experiences on EHR and health care Big Data research. In particular, I will discuss the basic infrastructure and multi-disciplinary team that need to be established to deal with the Big Data. Then I will demonstrate how to identify meaningful clinical questions and develop the analytic pipelines to address a class of clinical questions based on Big Health Care Data. Examples from disease-disease interaction network modeling, heart failure prediction, and vasopressor treatment evaluation for subarachnoid hemorrhage (SAH) patients based EHR data will be used to illustrate the novel concepts, challenges and opportunities.

3:00-4:00pm, January 18, 2019, 1441, 25 Park Place, Dr. Naitee Ting, Director, Biostatistics and Data Science, Boehringer-Ingelheim Pharmaceuticals Inc.,
Career Developments for Graduate Students

Abstract: IIn order to help students to make the career successful in the future, we will organize the Career Developments for Graduate Students. During the meeting, Dr. Ting will discuss on how to make a successful career in industry, and introduce the opportunities and recruiting of statisticians from our students.

2:00-3:00pm, January 18, 2019, 1441, 25 Park Place, Dr. Naitee Ting, Director, Biostatistics and Data Science, Boehringer-Ingelheim Pharmaceuticals Inc.,
Subgroup Analysis

Abstract: In clinical development of new medicinal products, it is common to perform subgroup analysis after trial results are available. There are many reasons for performing subgroup analyses. However, most of these analyses are post hoc in nature and hence clinical findings simply from these analyses may not be thought of as confirmatory evidence. In this presentation, a few case studies are used to clarify some of the considerations in performing subgroup analysis.

2:00-3:00pm, Novemver 9, 2018, 1441, 25 Park Place, Assistant Professor Sixia Chen, Department of Biostatistics and Epidemiology, The University of Oklahoma Health Sciences Center,
A unified pseudo population bootstrap approach for complex survey data

Abstract: Inference with complex survey data is challenging due to complex sampling features in the design. Commonly used inference tools include Taylor linearization, Jackknife, Balanced Half-samples and bootstrap approaches. Taylor linearization approach needs tedious calculations for different parameters. Jackknife approach only works for estimating smooth parameters such as totals or means. Balanced Hal-samples approach only works with two primary sampling units per sampling stratum. Bootstrap approach can be regarded as a good compromise to balance different considerations. In this paper, we propose a unified pseudo population bootstrap approach, which works for general-purpose inference with designs such as one stage or multi-stage stratified probability proportional to size sampling designs. Our proposed approach can also naturally incorporate weighting process such as nonresponse adjustment, imputation and calibration. Simulation results show the benefits or our proposed approach compared with some existing approaches.

3:00-4:00pm, November 2, 2018, 1441, 25 Park Place, Dr. Alireza Aghasi, Assistant Professor, Institute for Insight, Georgia State University
Making Deep Neural Networks Sparse: Theory and Application

Abstract: We overview Net-Trim which is a tool to sparsify deep neural networks. Our algorithm prunes a trained network layer-wise, removing connections at each layer by addressing a convex problem. We present variants of the algorithm, namely the parallel and cascade frameworks along with the mathematical analysis of the consistency between the initial network and the retrained model. In terms of the sample complexity, we present a general result that holds for any layer within a network using rectified linear units as the activation. With mild assumptions about the network architecture, for a layer with $N$ inputs and a potential weight of cardinality $s$, we are able to present the optimal $O(slog N/s)$ sample complexity for our framework, which is identical to the sample complexity of LASSO.

3:00-4:00pm, October 19, 2018, 1441, 25 Park Place, Judith Parkinson, Department of Mathematics University of Salzburg, Austria
Combined multiple testing of multivariate data sets by censored empirical likelihood

Abstract: The choice of a fitting test or the optimal weight function depends heavily on the true shape of the hazard ratio under the alternative. For univariate survival times several methods were proposed to conquer this problematic, however not too many options exist for multivariate data. In this talk I will present a method to combine multiple constraints, formulated as linear functionals of the cumulative hazard function, to obtain better power for multivariate, censored survival time data. By considering the conditional hazards, this method can take the correlation structure into account when testing.

2:00-3:00pm, September 21, 2018, in 1441, 25 Park Place, Associate Professor Wei Zheng, Department of Business Analytics and Statistics, University of Tennessee, Knoxville,
Designed based incomplete U-statistics

Abstract: Many statistical and machine learning problems are formulated as a U-statistic. The statistical properties of variants of U-statistics has been extensively. Meanwhile, its computational cost is prohibitively expensive especially with the increasing size of data in modern applications. Even a moderate size data could be large enough to present a huge challenge for computing. In this talk, I will share some insight on how the design methods could help alleviate the computational issue while maintaining the minimum variance property of U-statistics.

3:00-4:00pm, September 7, 2018, 1441, 25 Park Place, Assistant Professor Pavel Skums, Department of Computer Science, Georgia State University,
Bayesian inference of epidemiological and evolutionary histories of highly mutable viruses

Abstract: Highly mutable viruses, such as HIV and HCV, are major public health threats. The aim of computational molecular epidemiology is to extract valuable information from viral genomic data provided by next-generation sequencing technologies. Two fundamental problems in this area of research are (a) inference of dynamics of epidemics and outbreaks ("who infected whom") and (b) inference of fitness landscapes of intra-host viral populations. We will present our recent research results, which address these problems by a synthetic approach combining methods of computational statistics, graph theory and discrete optimization.