Gothatamang Patrick Nthoiwa, Ramasaymy Sivasamy, 2024. "AI-Driven Predictive Maintenance in HVAC Systems: Strategies for Improving Efficiency and Reducing System Downtime" ESP International Journal of Advancements in Science & Technology (ESP-IJAST) Volume 2, Issue 3: 20-28.
Principal Component Analysis (PCA) is a powerful tool for understanding the underlying structure and relationships within multivariate datasets, often collected through extensive field surveys and monitoring programs. This study explores the best practices for performing PCA on incomplete datasets with missing values, with a focus on the significance of sophisticated imputation techniques or resilient missing data strategies to maintain the analytical value of ecological datasets. The study proposes leveraging the power of Singular Value Decomposition (SVD) and the inherent low-rank structure of ecological data, offering a robust framework for analyzing complex ecological systems, enabling the identification of latent ecological factors, prediction of missing observations, and ultimately, a deeper understanding of the dynamics governing these systems. The PCA results conducted on a simulated dataset illustrate the performance comparison between two different methods for handling missing data in PCA. The NIPALS method, while offering an alternative standardization approach, should be used with caution due to its potential to significantly alter the PCA outcomes. Regularized SVD demonstrated the most consistent performance across all levels of missingness, indicating its robustness for handling the missing data. Future research should explore alternative etiologies and their effects on PCA outcome, as well as sensitivity analyses to determine optimal regularization parameters.
[1] Pech, R., Hao, D., Pan, L., Cheng, H., Zhou, T.: Link prediction via matrix completion. Europhysics Letters 117(3), 38002 (2017)
[2] Panuju, D.R., Paull, D., Griffin, A.L.: Change Detection Techniques Based on Multispectral Images for Investigating Land Cover Dynamics (2020). https://doi. org/10.3390/rs12111781.
[3] Wold, S., Esbensen, K.H., Geladi, P.: Principal component analysis (1987). https://doi.org/10.1016/0169-7439(87)80084-9.
[4] Mehareb, E.M., Gad-Allah, A.: Yield and quality of some sugarcane varieties as affected by irrigation number (2020). https://doi.org/10.21608/svuijas.2020. 38830.1023.
[5] Gui, Y., Barber, R., Ma, C.: Conformalized matrix completion. Advances in Neural Information Processing Systems 36, 4820–4844 (2023)
[6] Butcher, Smith, B.J.: Feature Engineering and Selection: A Practical Approach for Predictive Models (2020). https://doi.org/10.1080/00031305.2020. 1790217. https://doi.org/10.1080/00031305.2020.1790217
[7] Hossie, T.J., Gobin, J., Murray, D.L.: Confronting missing ecological data in the age of pandemic lockdown. Frontiers in Ecology and Evolution 9, 669477 (2021)
[8] Harel, O., Mitchell, E.M., Perkins, N.J., Cole, S.R., Tchetgen, E.J.T., Sun, B., Schisterman, E.F.: Multiple Imputation for Incomplete Data in Epidemiologic Studies (2017). https://doi.org/10.1093/aje/kwx349
[9] Schafer, J.L., Graham, J.W.: Missing data: Our view of the state of the art. (2002). https://doi.org/10.1037/1082-989x.7.2.147.
[10] Enders, C.K.: Applied Missing Data Analysis (2010). http://library.mpib-berlin. mpg.de/toc/z2010 1182.pdf
[11] L- opucki, R., Kiersztyn, A., Pitucha, G., Kitowski, I.: Handling missing data in ecological studies: Ignoring gaps in the dataset can distort the inference. Ecological Modelling 468, 109964 (2022)
[12] Nakagawa, S.: Missing data: mechanisms, methods, and messages. Ecological statistics: Contemporary theory and application, 81–105 (2015)
[13] Taugourdeau, S., Villerd, J., Plantureux, S., Huguenin-Elie, O., Amiaud, B.: Filling the gap in functional trait databases: use of ecological hypotheses to replace missing data. Ecology and Evolution 4(7), 944–958 (2014)
[14] Johnson, T.F., Isaac, N.J., Paviolo, A., Gonz´alez-Su´arez, M.: Handling missing values in trait data. Global Ecology and Biogeography 30(1), 51–62 (2021)
[15] Xiao, J., Bulut, O.: Evaluating the performances of missing data handling methods in ability estimation from sparse data. Educational and Psychological Measurement 80(5), 932–954 (2020)
[16] Hadeed, S.J., O’rourke, M.K., Burgess, J.L., Harris, R.B., Canales, R.A.: Imputation methods for addressing missing data in short-term monitoring of air pollutants. Science of the Total Environment 730, 139140 (2020)
[17] Austin, M.P.: Spatial prediction of species distribution: an interface between ecological theory and statistical modeling. Ecological modelling 157(2-3), 101–118 (2002)
[18] Su, H., Yao, W., Wu, Z., Zheng, P., Du, Q.: Kernel low-rank representation with elastic net for China coastal wetland land cover classification using gf-5 hyperspectral imagery. ISPRS Journal of Photogrammetry and Remote Sensing 171, 238–252 (2021)
[19] Rivera-Mun˜oz, L., Giraldo-Forero, A.F., Martinez-Vargas, J.: Deep matrix factorization models for estimation of missing data in a low-cost sensor network to measure air quality. Ecological Informatics 71, 101775 (2022)
[20] Udell, M., Horn, C., Zadeh, R., Boyd, S., et al.: Generalized low rank models. Foundations and Trends® in Machine Learning 9(1), 1–118 (2016)
[21] Zliobaite, I.: Recommender systems meet species distribution modeling. In: Perspectives@ RecSys (2021)
[22] Zˇliobaite˙, I.: Recommender systems for fossil community distribution modeling. Methods in ecology and evolution 13(8), 1690–1706 (2022) https://doi.org/10. 1111/2041-210x.13916
[23] Chen, Z., Wang, S.: A review on matrix completion for recommender systems. Knowledge and Information Systems 64(1), 1–34 (2022)
[24] Bertsimas, D., Li, M.L.: Fast exact matrix completion: A unified optimization framework for matrix completion. Journal of Machine Learning Research 21(231), 1–43 (2020)
[25] Beattie, J.R., Esmonde-White, F.: Supplementary figures for exploration of principal component analysis: Deriving pca visually using spectra
[26] Seu, K., Kang, M.-S., Lee, H.: An intelligent missing data imputation techniques: A review. JOIV: International Journal on Informatics Visualization 6(1-2), 278– 283 (2022)
Incomplete Datasets, Low-Rank, Matrix Completion, Principal Component Analysis (PCA), Regularization, Singular Value Decomposition (SVD).