This study advances understanding of the oceanic biological carbon pump, integrating satellite (MODIS and SeaWiFS) and in-situ O2/Ar data with advanced machine learning to enhance Net Community Production (NCP) modeling. Utilizing a comprehensive dataset free from coastal biases and spanning a decade, the research employs Genetic Programming (GP) and models such as Feedforward Neural Networks, Ensemble Learning Models (Random Forests, XGBoost, AdaBoost), and Support Vector Machines. This methodological innovation, particularly in data splitting for spatial and temporal integrity, addresses spatial autocorrelation and enhances NCP estimate robustness. A key finding is the efficacy of the Random Forest model, which demonstrated superior accuracy (lowest Mean Absolute Error) and resilience to over-fitting (R2 value of 91.71). This interdisciplinary approach has significant implications for climate change mitigation policy by improving ocean carbon sequestration insights, offering a more detailed understanding of oceanic carbon dynamics, and challenging traditional methodologies. The study exemplifies the impact of combining big data analytics and machine learning in tackling environmental challenges, marking a significant contribution to the field. |