## Statistics Seminar at Georgia State University

### If you would like to give a talk in Statistics Seminar, please send an email to Yichuan Zhao at yichuan@gsu.edu

2:00-3:00pm, April 3, 2020, in 1441, 25 Park Place, Associate Professor Pengsheng Ji, Department of Statistics, University of Georgia,
Collaborative Spectral Clustering in Attributed Networks

Abstract: We proposed a novel spectral clustering algorithm for attributed networks, where $n$ nodes split into $R$ non-overlapping communities and each node has a $p-$dimensional meta covariate from various of formats such as text, image, speech etc.. The connectivity matrix $W_{n \times n}$ is constructed with the adjacent matrix $A_{n \times n}$ and covaraite matrix $X_{n \times p}$, and $W = (1-\alpha)A + \alpha K(X,X')$, where $\alpha \in [0,1]$ is a tuning parameter and $K$ is a Kernel to measure the covariate similarities. We then perform the classical $k$-means algorithm on the element-wise ratio matrix of the first $K$ leading eigenvector of $W$. Theoretical and simulation studies showed the consistent performance under both Stochastic Block Model (SBM) and Degree-Corrected Block Model (DCBM), especially in imbalanced networks where most community detection algorithms fail.

3:00-4:00pm, March 27, 2020, in 1441, 25 Park Place, Assistant Professor Bin Zou, Department of Mathematics, University of Connecticut,
Optimal Bookmaking

Abstract: We introduce a general framework for continuous-time betting markets, in which a bookmaker can dynamically control the prices of bets on outcomes of random events. In turn, the prices set by the bookmaker affect the rate or intensity of bets placed by gamblers. The bookmaker seeks a price process that maximizes his expected (utility of) terminal wealth. We obtain explicit solutions or characterizations to the bookmaker’s optimal bookmaking problem in various interesting models. Joint work with Matthew Lorig and Zhou Zhou.

3:00-4:00pm, March 20, 2020, in 1441, 25 Park Place, Assistant Professor Yanqing Wang, Institute for Insight, Georgia State University,
Mediation analysis for zero-inflated mediators with applications to microbiome data

Abstract: We propose a novel mediation analysis approach under the potential-outcomes framework to model mediators with zero-inflated distributions. This approach can allow a mixture of true zero-value data points and fake zeros that result from data collection procedure. For continuous outcomes, our method is able to decompose the mediation effect into two components that are inherent for zero-inflated mediators: one component is attributable to jump from zero to non-zero state and the other component is attributable to the numeric change on the continuum scale. So the mediation effect is actually a total mediation effect of the two components each of which and the total mediation effect can be estimated and tested. Since there are no existing mediation approaches targeted for zero-inflated mediators, we did a simulation study to assess our approach and show superior performance compared with a standard practice in causal mediation analyses that simply treat zero-inflated mediators as continuous variables. Two real data applications will be presented.

3:00-4:00pm, March 13, 2020, in 1441, 25 Park Place, Distinguished University Professor N. Balakrishnan, Department of Mathematics and Statistics, McMaster University, Canada,
Multivariate Stochastic Orderings with Applications in Actuarial Science

Abstract: In this talk, I will first introduce the notions of univariate stochastic orderings, a technique by which two random variables can be compared. Next, I will describe some multivariate notions of orderings. I will then consider the total claim amount from two portfolios in an actuarial setup and apply these univariate and multivariate orderings to present some comparison results. These results will then be generalized for the general families of proportional hazards and scale models. Finally, I will present a multivariate stochastic ordering result for the whole set of order statistics drawn from a distribution. Several illustrative examples will be presented through out to illustrate the results obtained.

2:00-3:00pm, March 06, 2020, in 1441, 25 Park Place, Assistant Professor Ganggang Xu, Department of Management Science, University of Miami,
Semi-parametric Learning of Structured Temporal Point Processes

Abstract: We propose a general framework of using multi-level log-Gaussian Cox processes to model repeatedly observed point processes with complex structures; such type of data have become increasingly available in various areas including medical research, social sciences, economics and finance due to technological advancement. A novel nonparametric approach is developed to efficiently and consistently estimate the covariance kernels of the latent Gaussian processes at all levels. To predict the functional principal component scores, we propose a consistent estimation procedure by maximizing the conditional likelihoods of super-positions of point processes. We further extend our procedure to the bivariate point process case in which potential correlations between the processes can be assessed. Asymptotic properties of the proposed estimators are investigated, and the effectiveness of our procedures is illustrated through a simulation study and an application to a stock trading dataset.

3:00-4:00pm, February 28, 2020, in 1441, 25 Park Place, Dr. Tzu-Jung Huang, Data Scientist at Videa, COX Enterprise,
Novel Marginal Screening Methods for High-dimensional Survival Data

Abstract: This talk introduces two novel methods of screening high-dimensional predictors for survival outcomes. Motivated by high-throughput genomic data for diffuse large-B-cell lymphoma, the first approach introduces a marginal screening test to detect the presence of significant predictors for a right censored survival outcome, named Adaptive Resampling Test for Survival data (ARTS). This approach is designed under a high-dimensional accelerated failure time (AFT) model; adopts the test statistic based on the maximally selected estimator from a marginal AFT working model, and applies a regularized bootstrap method to calibrate the test. This testing procedure is more powerful and less conservative than both a Bonferroni correction of the marginal tests and other competing methods. The second screening approach is based on a more efficient estimator than the one used in ARTS. This proposal circumvents the computationally expensive bootstrap resampling required for ARTS, which enables it to address screening problems with ultrahigh-dimensional predictors. This is a joint work with Alex Luedtke (Department of Statistics, University of Washington) and Ian W. McKeague (Department of Biostatistics, Columbia University).

3:00-4:00pm, January 31, 2020, in 1441, 25 Park Place, Assistant Professor Kai Zhao, Institute for Insight, Georgia State University,
Time Series Prediction-Approaching the Limits of Predictability

Abstract: Time series prediction has wide applications ranging from stock price prediction, product demand estimation to economic forecasting. In this paper, we treat the taxi and Uber demand in each location as a time series and reduce the taxi and Uber demand prediction problem to a time series prediction problem. We answer two key questions in this area. First, time series have different temporal regularity. Some are easy to be predicted and others are not. Given a predictive algorithm such as LSTM (deep learning) or ARIMA (time series), what is the maximum prediction accuracy that it can reach if it captures all the temporal patterns of that time series? Second, given the maximum predictability, which algorithm could approach the upper bound in terms of prediction accuracy? To answer these two questions, we use temporal-correlated entropy to measure the time series regularity and obtain maximum predictability. Testing with 14 million data samples, we find that the deep learning algorithm is not always the best algorithm for prediction. When the time series has a high predictability a simple Markov prediction algorithm (training time 0.5s) could outperform a deep learning algorithm (training time 6 hours). The predictability can help determine which predictor to use in terms of the accuracy and computational costs. We also find that the Uber demand is easier to be predicted compared to the taxi demand due to different cruising strategies as the former is demand-driven with higher temporal regularity.

2:00-3:00pm, November 08, 2019, in 1441, 25 Park Place, Associate Professor Zhigang Li, Biostatistics Department, University of Florida,
Mediation analysis for zero-inflated mediators with applications to microbiome data

Abstract: We propose a novel mediation analysis approach under the potential-outcomes framework to model mediators with zero-inflated distributions. This approach can allow a mixture of true zero-value data points and fake zeros that result from data collection procedure. For continuous outcomes, our method is able to decompose the mediation effect into two components that are inherent for zero-inflated mediators: one component is attributable to jump from zero to non-zero state and the other component is attributable to the numeric change on the continuum scale. So the mediation effect is actually a total mediation effect of the two components each of which and the total mediation effect can be estimated and tested. Since there are no existing mediation approaches targeted for zero-inflated mediators, we did a simulation study to assess our approach and show superior performance compared with a standard practice in causal mediation analyses that simply treat zero-inflated mediators as continuous variables. Two real data applications will be presented.

2:00-3:00pm, November 01, 2019, in 1441, 25 Park Place, Assistant Professor Qian Xiao, Department of Statistics, University of Georgia,
A Novel Bayesian Optimization Approach for Both Quantitative and Sequence Inputs

Abstract: Drug combinations have been widely applied in disease treatment, especially chemotherapy for cancer. Traditionally, researchers only focus on optimizing drugs' dosages. Yet, some recent studies show that the orders of adding drug components are also important to the efficacy of drug combinations. In practice, experiments enumerating all possible sequences with different drug-doses are not usually affordable. Thus, statistical tools that can identify optimal drug therapies consisting of both quantitative and sequence inputs within a few runs are required. Such problems are also encountered in both computer and physical experiments in the fields of engineering, chemistry, physics, managements, food science and etc. Due to the complexity of data, there is very limited existing literature on this problem. In this paper, we propose a novel Bayesian optimization approach, which includes a innovative Mapping-based additive Gaussian process (MaGP) model for both quantitative and sequence (QS) inputs, a new class of optimal experimental designs and an improved evolutionary global optimization algorithm. The proposed method can identify optimal settings within a few runs, provide accurate predictions for response surfaces, and give clear interpretations on model structure. It can also be generalized to further include qualitative inputs, e.g. blocking. We illustrate the superiority of the proposed method via a real drug experiment on lymphoma, two single machine scheduling tasks and a traveling salesman problem.

3:00-4:00pm, October 25, 2019, in 1441, 25 Park Place, Alexander Kirpich, PhD, Assistant Professor, Department of Population Health Sciences School of Public Health at Georgia State University,
Assessing the impact of a community intervention targeting HIV transmission among PWID in multiple sites in India: insights from transmission models

Abstract: People who inject drugs (PWID) are at high risk of HIV acquisition and may play a central role in ongoing transmission in some populations. The integration of multiple interventions has been proposed as an effective strategy of control of HIV in PWID. To explore the impact of multiple, integrated interventions to reduce HIV in PWID, we examined data from a cluster randomized clinical trial conducted in multiple locations in India. In this trial, integrated care centers (ICC) that aggregated counseling and treatment for both HIV and drug addiction were placed in intervention clusters. In control clusters, services remained separated among multiple locations. Using these data, the relative importance of transmission interventions (needle exchange program, opioid replacement therapy, counseling) and treatment (antiretroviral therapy) was evaluated with the help of transmission models.

3:00-4:00pm, October 4, 2019, in 1441, 25 Park Place, Associate Professor Jonathan Ji, Computer Science, Georgia State University,
Neural Plasticity Networks: A Unified Framework for Network Sparsification and Expansion

Abstract: Deep Neural Networks (DNNs) have achieved great success in a broad range of predictive tasks. Along with this success is a paradigm shift from feature engineering to architecture design. Latest DNN architectures, such as ResNet, DenseNet and Wide-ResNet, incorporate hundreds of millions of parameters to achieve state-of-the-art predictive performance. However, the expanding number of parameters not only increases the risk of overfitting, but also leads to high computational costs. A practical solution to this problem is network sparsification, by which weights, neurons, or channels can be pruned significantly with minor accuracy losses. A less explored alternative is network expansion, by which weights, neurons or channels can be gradually added to the network to improve its predictive accuracy. In this talk, I will present our latest work on neural plasticity network (NPN) that unifies network sparsification and network expansion into an end-to-end training pipeline modulated by a simple parameter. We demonstrate that our framework can sparsify or expand a network as needed to solve a learning task. The performance of NPNs will be demonstrated via web demos and iPhone apps.

3:00-4:00pm, September 27, 2019, in 1441, 25 Park Place, Assistant Professor Wenjing Liao, School of Mathematics, Georgia Institute of Technology,
Exploring low-dimensional structures in data science

Abstract: Many data sets in image analysis and signal processing are in a high-dimensional space but exhibit a low-dimensional structure. For example, data can be modeled as point clouds in a high-dimensional space but concentrated on a low-dimensional manifold. I will present two ways of building efficient representations of data or functions on data. The first one gives a multiscale low-dimensional empirical approximation to the manifold. We prove that the mean squared error for the approximation of the manifold converges as the training samples increases with a rate depending on the intrinsic dimension of the manifold instead of the ambient dimension of the space. Moreover, our approximations can adapt to the regularity even when this varies at different scales or locations. The second part of my talk is about efficient approximations of deep ReLU networks for functions supported on low-dimensional manifolds. We constructed a ReLU network for such function approximation where the size of the network grows exponentially with respect to the intrinsic dimension of the manifold. These works are joint with Mauro Maggioni (Johns Hopkins University), Stefano Vigogna (University of Genova), and Minshuo Chen, Haoming Jiang, Tuo Zhao (Georgia Institute of Technology).

9:30-10:30am, September 13, 2019, in Student Center West, 462, Professor Leslie McClure, Chair of the Epidemiology and Biostatistics Department, Drexel University, Dornsife School of Public Health,
Sample Size and Re-Estimation in Clinical Trials: What Happens in Real Life

Abstract: