2:00-3:00pm, Feburary 27, 796 COE,
Dr. Chi Song,
Department of Statistics, Yale University
Detection of Chromosome Copy Number Variations in Multiple Samples
Abstract: The chromosome copy number variation (CNV) is the deviation of genomic regions
from their normal copy number states, which may associate with many human diseases. Current genetic studies
usually collect hundreds to thousands of samples
to study the association between CNV and diseases. CNVs can be called by detecting the change-points in
the means from sequences of measurements with noise. Although multiple samples are of interest, the majority
of the available CNV calling methods are single sample based. Only a few multiple sample methods were proposed.
They all used scan statistics similar to the circular binary segmentation (CBS) algorithm that is computationally expensive,
and were designed toward either common or rare change-points detection. In this talk,
I will introduce a novel multiple sample method by adaptively combining the scan statistic
of the screening and ranking algorithm (SaRa), which is computationally efficient and able to detect
both common and rare change-points. I will also show that asymptotically this method can find the true change-points with certainty, and will demonstrate in theory that multiple sample methods are superior to single sample methods when shared change-points are of interest. Additionally, I will give extensive simulation results and a real data example to illustrate the performance of our proposed method.
2:00-3:00pm, Feburary 25, 796 COE,
Dr. Ningtao Wang,
Pennsylvania State University, University Park
A block mixture model to map eQTLs for gene clustering
Abstract: Expression quantitative trait loci (eQTL) is
the genetic region associated with gene expression variations among individuals. The single polymorphism associated with expression variation for clusters of genes informs the eQTL hotspot with potentially pleiotropic effects. Here we propose a block mixture model based approach to detect eQTL hotspots for both DNA microarray and RNA sequence. We integrate unsupervised gene expression pattern discovery, interval mapping, and eQTL hotspots detecting into a single framework. A maximum composite likelihood approach, implemented with the two-layer EM algorithm, is developed to provide the estimates of eQTL hotspot positions. More importantly, the block mixture model allows for detecting numerous types of eQTL hotspots, including the eQTL hotspot of global expression level, the eQTL hotspot of sub-pattern expression level, and the eQTL hotspot of sub-pattern interactions. Simulations and real data analysis demonstrate the power of our method.
2:00-3:00pm, Feburary 19, 796 COE,
Dr. Weining Shen,
The University of Texas MD Anderson Cancer Center
HCC risk assessment for patients with Hepatitis C: an outcome model-free scoring system approach
Abstract:
11:00-12:00pm, February 16, 796 COE,
Yi Yang,
University of Minnesota, Twin Cities
A Fast Unified Algorithm for Solving Group Lasso Penalized Learning Problems
Abstract:
3:00-4:00pm, January 23, 796 COE,
Director Yimin Yang,
Protiviti, New York
Statistical analysis in Anti-Money Laundering monitoring process
Abstract: The presentation will introduce bank's Anti-Money
Laundering systems and regulatory requirements.
We will discuss common issues and statistical methods regarding monitoring thresholds of financial transactions.
2:00-3:00pm, November 7, 796 COE,
Professor Jiming Jiang,
Department of Statistics, University of California, Davis
Consistency of MLE in GLMM: Answer to an Open Problem and Beyond
Abstract: We give answer to an open problem regarding consistency of the
maximum likelihood estimators (MLEs) in generalized linear mixed models
(GLMMs) involving crossed random effects. The solution to the open problem
introduces an interesting, nonstandard approach to proving consistency of
the MLEs in cases of dependent observations. Using the new technique, we
extend the results to MLEs under a general GLMM. An example is used to
further illustrate the technique.
3:00-4:00pm, Oct. 24, 796 COE,
Professor Matt Hayat,
Division of Epidemiology and Biostatistics, Georgia State University
Covariance Modeling with a Modified Cholesky Decomposition using Bayesian Inference
Abstract:
Heterogeneity of variance and serial correlation is often present in measurements taken
over time. The most popular parametric dependence models for serial correlation are stationary
autoregressive models and other second-order stationary models. In these models, variances are
constant over time and correlations between measuremnets equidistat in time are equal.
These assumptions may not be reasonable. This work considers the class of nonstationary models of
Pourahmadi (1999) that allow for heterogeneity of variance and serial correlation. Full Bayesian inference
is implemented. The methodology is applied to a study of
late-deafened adults receiving cochlear implants.
3:00-4:00pm, Oct. 17, 796 COE,
Professor Christian Houdre,
School of Mathematics, Georgia Tech
On the Limiting Distribution of the Length of the longest Common Subsequence
Abstract: Let (X_k)_{k \geq 1} and (Y_k)_{k\geq1} be two independent sequences
of independent identically distributed random variables having the
same law and taking their values in a finite alphabet \mathcal{A}_m.
Let LC_n be the length of the longest common subsequence of the random
words X_1\cdots X_n and Y_1\cdots Y_n. Under assumptions on the
distribution of X_1, LC_n is shown to satisfy a central limit theorem.
This is in contrast to the Bernoulli matching problem or to the random
permutations case, where the limiting law is the Tracy-Widom one.
(Joint with Umit Islak)
3:00-4:00pm, Oct. 3, 796 COE,
Professor Howard Chang,
Department of Biostatistics and Bioinformatics, Emory University
Assessment of critical exposure and outcome windows in time-to-event analysis with application to air pollution and preterm birth study
Abstract:
In reproductive epidemiology, there is a growing interest to examine associations between air
pollution exposures during pregnancy and the risk of preterm birth (PTB), the leading cause of neonatal
mortality. One important research objective is to identify critical periods of exposure and estimate
the associated effects at different stages of pregnancy. However, population studies have reported inconsistent
findings. This may be due to limitations from the standard analytic approach of treating PTB as a binary
outcome without considering time-varying exposures together over the course of pregnancy.
To address this research gap, we present a Bayesian hierarchical model for conducting a comprehensive
examination of gestational air pollution exposure by estimating the joint effects of weekly exposures
during different vulnerable periods. Our model also treats PTB as a time-to-event outcome to addresses
the challenge of different exposure lengths among ongoing pregnancies. The proposed model is applied to a
dataset of geocoded birth records in the Atlanta metropolitan area between 1999 - 2005 to examine the risk
of PTB associated with gestational exposure to ambient fine particulate matter less than 2.5 ug/m3 in
aerodynamic diameter (PM2.5. We find positive associations between PM2.5 exposure during early and mid-pregnancy,
and evidence that associations are stronger for preterm births occurring around week 30.