Data Preparation and Feature Engineering in High-Dimensional Predictive Modeling

Authors

  • B M Taslimul Haque Master of Science in Information Systems, Central Michigan University, Mt Pleasant, Michigan, USA Author
  • Md. Arifur Rahman Master of Science (M.S.) in Information Studies, Trine University, Indiana, USA Author

DOI:

https://doi.org/10.63125/5txtr530

Keywords:

Data Preprocessing, Feature Engineering, Predictive Modeling, Machine Learning, Data Analytics

Abstract

Data preprocessing and feature engineering are critical determinants of predictive modeling effectiveness, particularly in large-scale environments characterized by inconsistencies, missing values, and heterogeneous variable structures. This study examined the impact of structured preprocessing and feature engineering strategies on machine learning performance using a quantitative experimental design applied to a dataset of 12,500 observations and 48 predictor variables. Multiple preprocessing techniques, including missing value imputation, normalization, categorical encoding, feature construction, and feature selection, were evaluated across five supervised learning algorithms using repeated 10-fold cross-validation. Results demonstrated substantial performance gains over the baseline model, with average classification accuracy improving from 71.4% to 84.7%, F1-score increasing from 0.69 to 0.86, and AUC-ROC rising from 0.74 to 0.91. Statistical testing confirmed significant improvements at the 0.05 level, with moderate to large effect sizes observed for feature engineering and selection interventions. These findings provide empirical evidence that comprehensive preprocessing pipelines meaningfully enhance predictive accuracy, model robustness, and analytical reliability, underscoring their importance as a foundational component of predictive analytics workflows.

Downloads

Published

2024-12-28

How to Cite

B M Taslimul Haque, & Md. Arifur Rahman. (2024). Data Preparation and Feature Engineering in High-Dimensional Predictive Modeling. Journal of Sustainable Development and Policy, 3(04), 205-244. https://doi.org/10.63125/5txtr530

Cited By: