default image
Research on Statistical Analysis Methods of Several Types of Complex Data
Innovative solutions for statistical analysis of complex data overcome the problem of high-dimensional deficiencies and have both theoretical applications.
Type
Statistical methods
Tags
Other resource gains
Adequate dimension reduction
Mathematical statistics other subjects
Missing data
High-dimensional data
Model selection
Solution maturity
Mass promotion / Mass production
Cooperation methods
Joint venture cooperation
Applicable industry
Scientific research and technology services
Applications
Data science
Key innovations
This project proposes a number of internationally first statistical analysis methods for complex data such as high-dimensional and missing.
Potential economic benefits
The text describes mathematical statistical research projects rather than green technology products. Therefore, it is impossible to analyze its potential economic benefits as a green technology product. Its methodological innovation mainly contributes to improving the accuracy and efficiency of complex data analysis and serving scientific and industrial decision-making.
Potential climate benefits
This statistical analysis method can improve data insight and prediction accuracy in fields such as energy management, climate modeling, and intelligent manufacturing by optimizing the processing of complex data. This can promote more efficient resource allocation, reduce energy waste, and optimize production processes, thereby indirectly supporting carbon emission reduction.
Solution supplier
View more
East China Normal University
East China Normal University
East China Normal University, a national key normal university, cultivates high-quality talents and serves social development with excellent education and scientific research.
China
Solution details

This project belongs to the field of mathematical statistics in the discipline of mathematics and applied mathematics. In recent years, with the advancement of science and technology and the rapid development of data collection technology, the data faced by the scientific field has become more and more complex. The existence of high-dimensional data, missing data and complex designs has brought great significance to the statistical analysis of data. The challenge has become an international hot and difficult issue in statistical research. The main contribution of this project is to propose some internationally initiated statistical analysis methods with important theoretical and application significance for these types of complex data, establish their theoretical framework, and lay a solid foundation for their widespread application. The main innovations include: First, a sparse linear discriminant analysis method based on threshold method and its optimality theory were constructed for the first time. For high-dimensional linear models with fixed design matrices, a threshold ridge regression method is established and its convergence is proved. It theoretically shows that this method has a faster convergence rate than the classic method. Second, for the first time, the theoretical error bound for high-dimensional nonlinear models is established by using the sufficient dimension reduction method; for the first time, a forward regression algorithm for high-dimensional nonlinear models is constructed, and it is proved that the algorithm has variable screening under the weakest conditions. Consistency. A general theoretical framework for partial sufficient dimensionality reduction is established, and reliable, effective and simple sample estimates for the indicator parameters of partial linear multi-indicator models are provided. Third, for the first time, the sensitivity and model selection problems of parametric models in the tool variable method of non-negligible missing data are solved, providing a new tool for processing non-negligible missing data. On the one hand, we developed the tendency function from a parametric form to a semi-parametric form, which greatly reduced the sensitivity to the model. On the other hand, we innovatively proposed a method for selecting instrumental variables and parameter models under non-negligible missing data, which proved the consistency of model selection. Fourth, the conservatism of the t-test under the framework of the generalized linear model in adaptive design is proved for the first time and an effective test method based on the bootstrap method is proposed. This discovery subverts people's long-term cognition and has important guiding significance for experimental design and testing in many fields such as medicine and sociology.  The eight representative papers in this project were all published in top international journals in statistics, such as Journal of the American Statistical Association, The Annals of Statistics, Biometrika, Biometrics. He cited 8 representative papers 140 times and SCI 121 times. The cited journals include many top international journals in the fields of statistics and machine learning such as JASA, AOS, Biometrika, JMLR, etc., and are also widely cited in international journals in many fields such as medicine, sociology, and engineering. Relevant results have led more than 10 research groups at home and abroad to follow up research. The methods proposed by the project, such as sparsity discriminant analysis, trace tracing sufficient dimension reduction, and semi-parametric tendency function for modern complex data such as high-dimensional and missing are of great significance to the ideas, methods, theories and applications of statistics.

Last updated
07:11:37, Nov 05, 2025
Information contributed by

See original page on

Report