Modern society depends heavily on technologies like GPS navigation, satellite communications, and power grids, all of which can be disrupted by solar activity such as solar flares and geomagnetic storms. These events, driven by interactions between the Sun and Earth's ionosphere, pose risks to aviation safety, satellite operations, and critical infrastructure, making accurate forecasting essential. The paper addresses this by introducing a curated, open-access dataset designed to support machine learning models for ionospheric forecasting, aiming to enhance predictions that protect everyday technologies from space weather impacts.
The researchers developed a comprehensive dataset that integrates diverse data sources, including solar wind measurements, geomagnetic activity indices, and total electron content (TEC) maps from global navigation satellite systems. This dataset aligns heterogeneous information, such as solar irradiance data from the Solar Dynamics Observatory and geomagnetic indices like Kp and SYM-H, into a single, machine learning-ready structure. By harmonizing temporal and spatial scales, the dataset enables models to learn from multiple modalities, addressing gaps in current operational frameworks that rely on sparse or inconsistent data.
To build this dataset, the team combined data from sources like NASA's OMNI dataset for solar wind parameters, JPL's dense TEC maps, and crowdsourced Android smartphone measurements, all aligned to a common timeline from 2010 to 2024. They handled s like missing values and varying cadences by standardizing gaps as NaNs and using forward-filling techniques, where short data interruptions are filled with the most recent valid samples. This approach ensures the dataset is modular and supports both physical and data-driven modeling, with code publicly available on GitHub for further customization and use.
The dataset has already enabled the training of models like IonCast, which includes LSTM, Spherical Neural Operator, and GraphCast architectures, showing promising in forecasting TEC up to 12-hour lead times. These models outperform baseline persistence forecasts, particularly under geomagnetic storm conditions classified by events like G2H6, where accuracy is critical for operational applications. The data product, hosted on Google Cloud, includes an event catalog that identifies storm periods to prevent data leakage, ensuring robust model validation and reliable predictions for both quiet and active space weather scenarios.
This work matters because it provides a foundation for improving space weather forecasts, which can mitigate disruptions to GPS accuracy, communication systems, and satellite operations that affect daily life. By making the dataset open-access, the researchers support broader scientific inquiry into Sun-Earth interactions, potentially leading to more resilient infrastructure and safer aviation. However, limitations include dependencies on the availability of underlying data sources and the need for ongoing updates to maintain relevance as new observations emerge.
Original Source
Read the complete research paper
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn