

This project belongs to the field of mathematical statistics in the discipline of mathematics and applied mathematics. In recent years, with the advancement of science and technology and the rapid development of data collection technology, the data faced by the scientific field has become more and more complex. The existence of high-dimensional data, missing data and complex designs has brought great significance to the statistical analysis of data. The challenge has become an international hot and difficult issue in statistical research. The main contribution of this project is to propose some internationally initiated statistical analysis methods with important theoretical and application significance for these types of complex data, establish their theoretical framework, and lay a solid foundation for their widespread application. The main innovations include: First, a sparse linear discriminant analysis method based on threshold method and its optimality theory were constructed for the first time. For high-dimensional linear models with fixed design matrices, a threshold ridge regression method is established and its convergence is proved. It theoretically shows that this method has a faster convergence rate than the classic method. Second, for the first time, the theoretical error bound for high-dimensional nonlinear models is established by using the sufficient dimension reduction method; for the first time, a forward regression algorithm for high-dimensional nonlinear models is constructed, and it is proved that the algorithm has variable screening under the weakest conditions. Consistency. A general theoretical framework for partial sufficient dimensionality reduction is established, and reliable, effective and simple sample estimates for the indicator parameters of partial linear multi-indicator models are provided. Third, for the first time, the sensitivity and model selection problems of parametric models in the tool variable method of non-negligible missing data are solved, providing a new tool for processing non-negligible missing data. On the one hand, we developed the tendency function from a parametric form to a semi-parametric form, which greatly reduced the sensitivity to the model. On the other hand, we innovatively proposed a method for selecting instrumental variables and parameter models under non-negligible missing data, which proved the consistency of model selection. Fourth, the conservatism of the t-test under the framework of the generalized linear model in adaptive design is proved for the first time and an effective test method based on the bootstrap method is proposed. This discovery subverts people's long-term cognition and has important guiding significance for experimental design and testing in many fields such as medicine and sociology. The eight representative papers in this project were all published in top international journals in statistics, such as Journal of the American Statistical Association, The Annals of Statistics, Biometrika, Biometrics. He cited 8 representative papers 140 times and SCI 121 times. The cited journals include many top international journals in the fields of statistics and machine learning such as JASA, AOS, Biometrika, JMLR, etc., and are also widely cited in international journals in many fields such as medicine, sociology, and engineering. Relevant results have led more than 10 research groups at home and abroad to follow up research. The methods proposed by the project, such as sparsity discriminant analysis, trace tracing sufficient dimension reduction, and semi-parametric tendency function for modern complex data such as high-dimensional and missing are of great significance to the ideas, methods, theories and applications of statistics.
See original page on ![]()

