Statistics Seminar at Georgia State University

Fall 2014-Spring 2015, Fridays 3:00-4:00pm, Paul Erdos Conference room (796) COE

Organizer: Yichuan Zhao

If you would like to give a talk in Statistics Seminar, please send an email to Yichuan Zhao at yichuan@gsu.edu



3:00-4:00pm, April 24, 796 COE, Professor Katherine Masyn, Division of Epidemiology and Biostatistics, Georgia State University
Rewriting Event History Analysis with Latent Variables

Abstract: Modeling not only the "if" but the "when" of events, such age of first major depressive episode, time-to-relapse in alcohol treatment programs, etc., has become increasingly common in psychological science. This talk focuses on the exciting modeling extensions enabled by the integration of survival and event history analysis into a broader latent variable modeling framework, allowing researchers to directly relate individual variability in time-to-event processes with other concurrent and sequential longitudinal outcomes.?

2:00-3:00pm, April 10, 150 COE, Professor Hanxiang Peng, Department of Mathematical Sciences, Indiana University-Purdue University at Indianapolis (IUPUI)
An Easy Empirical Likelihood Approach of Efficient Estimation in M-estimation With Side Information

Abstract: Suppose there is available side information. In this talk, we discuss how to use side information to improve efficiency by an easy empirical likelihood approach. We construct maximum empirical likelihood estimates (MELE) and show they are computationally less than the usual MELE by comparing their complexities. We study four examples of side information. We report two simulation studies about computing time and efficient gains and illustrate how these easy MELE can be obtained by using the existing software.

12:45-1:45pm, March 3, 796 COE, Professor Jing Zhang, Department of Mathematics and Statistics, Georgia State University
Bayesian inference and application

Abstract: In this talk I will like to propagate Bayesian inference and its recent famous applications. I start from several famous words cited from New Scientist journal and Nature about Bayesian inference in modern high-tech science (machine learning, artificial intellengence), and then show some famous Bayesian applications including Bayesian prediction on US elections of 2008 and 2012, IBM's Using Big Data to Invent Creative Recipes. Then I will go details in a most simple example of Bayesian inference for math undergrad. Then I will introduce Bayesian network and its use in the next generation of pocket medical diagnostic tool (which is expected to replace Dr Google). If we have more time I will introduce my own research problem using Bayesian inference in HIV drug resistance and genetic diseases association.

2:00-3:00pm, Feburary 27, 796 COE, Dr. Chi Song, Department of Statistics, Yale University
Detection of Chromosome Copy Number Variations in Multiple Samples

Abstract: The chromosome copy number variation (CNV) is the deviation of genomic regions from their normal copy number states, which may associate with many human diseases. Current genetic studies usually collect hundreds to thousands of samples to study the association between CNV and diseases. CNVs can be called by detecting the change-points in the means from sequences of measurements with noise. Although multiple samples are of interest, the majority of the available CNV calling methods are single sample based. Only a few multiple sample methods were proposed. They all used scan statistics similar to the circular binary segmentation (CBS) algorithm that is computationally expensive, and were designed toward either common or rare change-points detection. In this talk, I will introduce a novel multiple sample method by adaptively combining the scan statistic of the screening and ranking algorithm (SaRa), which is computationally efficient and able to detect both common and rare change-points. I will also show that asymptotically this method can find the true change-points with certainty, and will demonstrate in theory that multiple sample methods are superior to single sample methods when shared change-points are of interest. Additionally, I will give extensive simulation results and a real data example to illustrate the performance of our proposed method.

2:00-3:00pm, Feburary 25, 796 COE, Dr. Ningtao Wang, Pennsylvania State University, University Park
A block mixture model to map eQTLs for gene clustering

Abstract: Expression quantitative trait loci (eQTL) is the genetic region associated with gene expression variations among individuals. The single polymorphism associated with expression variation for clusters of genes informs the eQTL hotspot with potentially pleiotropic effects. Here we propose a block mixture model based approach to detect eQTL hotspots for both DNA microarray and RNA sequence. We integrate unsupervised gene expression pattern discovery, interval mapping, and eQTL hotspots detecting into a single framework. A maximum composite likelihood approach, implemented with the two-layer EM algorithm, is developed to provide the estimates of eQTL hotspot positions. More importantly, the block mixture model allows for detecting numerous types of eQTL hotspots, including the eQTL hotspot of global expression level, the eQTL hotspot of sub-pattern expression level, and the eQTL hotspot of sub-pattern interactions. Simulations and real data analysis demonstrate the power of our method.

2:00-3:00pm, Feburary 19, 796 COE, Dr. Weining Shen, The University of Texas MD Anderson Cancer Center
HCC risk assessment for patients with Hepatitis C: an outcome model-free scoring system approach

Abstract:

11:00-12:00pm, February 16, 796 COE, Yi Yang, University of Minnesota, Twin Cities
A Fast Unified Algorithm for Solving Group Lasso Penalized Learning Problems

Abstract:

3:00-4:00pm, January 23, 796 COE, Director Yimin Yang, Protiviti, New York
Statistical analysis in Anti-Money Laundering monitoring process

Abstract: The presentation will introduce bank's Anti-Money Laundering systems and regulatory requirements. We will discuss common issues and statistical methods regarding monitoring thresholds of financial transactions.

2:00-3:00pm, November 7, 796 COE, Professor Jiming Jiang, Department of Statistics, University of California, Davis
Consistency of MLE in GLMM: Answer to an Open Problem and Beyond

Abstract: We give answer to an open problem regarding consistency of the maximum likelihood estimators (MLEs) in generalized linear mixed models (GLMMs) involving crossed random effects. The solution to the open problem introduces an interesting, nonstandard approach to proving consistency of the MLEs in cases of dependent observations. Using the new technique, we extend the results to MLEs under a general GLMM. An example is used to further illustrate the technique.

3:00-4:00pm, Oct. 24, 796 COE, Professor Matt Hayat, Division of Epidemiology and Biostatistics, Georgia State University
Covariance Modeling with a Modified Cholesky Decomposition using Bayesian Inference

Abstract: Heterogeneity of variance and serial correlation is often present in measurements taken over time. The most popular parametric dependence models for serial correlation are stationary autoregressive models and other second-order stationary models. In these models, variances are constant over time and correlations between measuremnets equidistat in time are equal. These assumptions may not be reasonable. This work considers the class of nonstationary models of Pourahmadi (1999) that allow for heterogeneity of variance and serial correlation. Full Bayesian inference is implemented. The methodology is applied to a study of late-deafened adults receiving cochlear implants.

3:00-4:00pm, Oct. 17, 796 COE, Professor Christian Houdre, School of Mathematics, Georgia Tech
On the Limiting Distribution of the Length of the longest Common Subsequence

Abstract: Let (X_k)_{k \geq 1} and (Y_k)_{k\geq1} be two independent sequences of independent identically distributed random variables having the same law and taking their values in a finite alphabet \mathcal{A}_m. Let LC_n be the length of the longest common subsequence of the random words X_1\cdots X_n and Y_1\cdots Y_n. Under assumptions on the distribution of X_1, LC_n is shown to satisfy a central limit theorem. This is in contrast to the Bernoulli matching problem or to the random permutations case, where the limiting law is the Tracy-Widom one. (Joint with Umit Islak)

3:00-4:00pm, Oct. 3, 796 COE, Professor Howard Chang, Department of Biostatistics and Bioinformatics, Emory University
Assessment of critical exposure and outcome windows in time-to-event analysis with application to air pollution and preterm birth study

Abstract: In reproductive epidemiology, there is a growing interest to examine associations between air pollution exposures during pregnancy and the risk of preterm birth (PTB), the leading cause of neonatal mortality. One important research objective is to identify critical periods of exposure and estimate the associated effects at different stages of pregnancy. However, population studies have reported inconsistent findings. This may be due to limitations from the standard analytic approach of treating PTB as a binary outcome without considering time-varying exposures together over the course of pregnancy. To address this research gap, we present a Bayesian hierarchical model for conducting a comprehensive examination of gestational air pollution exposure by estimating the joint effects of weekly exposures during different vulnerable periods. Our model also treats PTB as a time-to-event outcome to addresses the challenge of different exposure lengths among ongoing pregnancies. The proposed model is applied to a dataset of geocoded birth records in the Atlanta metropolitan area between 1999 - 2005 to examine the risk of PTB associated with gestational exposure to ambient fine particulate matter less than 2.5 ug/m3 in aerodynamic diameter (PM2.5. We find positive associations between PM2.5 exposure during early and mid-pregnancy, and evidence that associations are stronger for preterm births occurring around week 30.