Skip to main content
Enterprise AI Analysis: A dataset for machine learning model to convective initiation detection and nowcasting over southeastern China

Data Publication

A dataset for machine learning model to convective initiation detection and nowcasting over southeastern China

This study introduces CIDS, a high-quality, high spatio-temporal resolution dataset designed for AI models to detect and nowcast Convective Initiation (CI) over southeastern China. Spanning 2018-2023, it includes radar mosaic products and FY-4A satellite radiance data, providing both features and CI labels (Developing/Declining). The dataset contains 136,728 samples, identifying over 4.1 million CIs, including 1.7 million developing CIs. Its detailed data aims to enhance AI model performance for severe weather forecasting.

Executive Impact: Elevating Severe Weather Preparedness

This research delivers a foundational dataset, CIDS, crucial for developing advanced AI models to predict Convective Initiation (CI) – the precursor to severe weather. By integrating high-resolution radar and satellite data across southeastern China, CIDS offers unparalleled granularity in identifying and classifying CI events. Enterprises leveraging weather-dependent operations can utilize this dataset to build highly accurate nowcasting systems, significantly improving preparedness, mitigating risks, and optimizing logistics.

For industries impacted by severe weather, such as logistics, agriculture, energy, and event management, accurate CI nowcasting can translate directly into operational resilience and cost savings. CIDS enables the development of predictive AI tools that offer earlier warnings than traditional methods, leading to optimized resource allocation, reduced damages from storms, and enhanced safety protocols. This dataset provides the 'ground truth' and rich feature set necessary to train state-of-the-art machine learning models, offering a competitive edge in weather-sensitive decision-making.

0 Total CI Identified
0 Developing CIs for Enhanced Prediction
0 Minute Resolution for Nowcasting
0 Severe Convective Events Sampled

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

The Imperative for Convective Initiation Forecasting

Convective Initiation (CI) is the initial formation of convective clouds, crucial for early severe weather warnings. Traditionally, CI is defined as the first observation of a locally generated convective cell with a radar reflectivity factor exceeding 35 dBZ. Accurate forecasting of CI timing and location is vital for preparedness and mitigating socio-economic impacts.

Traditional empirical methods and numerical weather prediction (NWP) models struggle with CI forecasting. However, Artificial Intelligence (AI) and Deep Learning (DL) present promising alternatives. AI models use radar observations as ground truth (labels) and pre-CI satellite/other data as features, learning relationships to produce forecasting models. The quality and size of training datasets are paramount for model generalization, especially given that severe convective weather is a rare event. This study addresses these challenges by developing CIDS, a high-quality, high spatiotemporal resolution dataset tailored for AI-based CI forecasting.

Convective Initiation Identification Framework

Radar Mosaic CR Images
Locate CSs (CR>=35dBZ)
Identify a CS first occurrence
Estimate area of CS at last 10min & No CS present?
Yes (Probable CI)
Estimate area of CS at next 10min & CS Exists?
Yes (CI)
Radar Echo Decay for next 30min?
No (Developing CI)

Comprehensive Data Sources and Sampling

CIDS integrates ground-based meteorological observations, weather radar data, and satellite data, all sourced from the National Meteorological Information Center, China Meteorological Administration. The dataset covers southeastern China (104-125°E, 20-40°N) from March to September between 2018-2023, capturing periods dominated by the East Asian monsoon, where most severe weather events occur.

Severe Convective Events (SCE) were sampled from 1,008 national-level surface weather stations, identifying periods of heavy precipitation (≥20 mm/60 min), thunderstorm winds (≥17 m/s with lightning), and hail (≥2 mm). To maximize samples, SCE-S periods were extended, then merged into Regional Severe Convective Events (SCE-R). From 2018-2023, 829 SCE-R events generated 136,728 samples at 10-minute intervals. Radar data undergoes rigorous quality control, including noise filtering, radial interference recognition, and echo elimination. The dataset provides ten radar features (Composite Reflectivity, Hybrid Scan Reflectivity, CAPPI Reflectivity at 2–7 km, Echo Top, Vertical Integrated Liquid) and nine FY-4A satellite spectral band radiances/reflectances.

0.01° Radar Spatial Resolution (approx. 1 km)

FY-4A Satellite Feature Data in CIDS

No. Channel Wavelength (µm) Spatial Resolution
1 VIS 0.65 0.005°
2 SWIR 1.61 0.02°
3 MWIR 3.75 0.02°
4 WV 6.25 0.04°
5 WV 7.1 0.04°
6 LWIR 8.5 0.04°
7 LWIR 10.8 0.04°
8 LWIR 12 0.04°
9 LWIR 13.5 0.04°

Robust CI Identification and Classification

The CI identification algorithm is based on pioneering radar-based methods, adapted to use optical flow tracking for dynamic cell motion instead of fixed spatial search. This allows for more accurate identification of CIs, especially along cloud cluster edges or within complex systems.

The process involves identifying convective cells (CS) on radar composite reflectivity maps (≥35 dBZ, 0.01°×0.01° resolution, minimum 16 km² area). Optical flow calculates CS velocity vectors. A CS is determined to be a CI if no preceding CS overlaps its estimated prior position. Potential CIs are verified against subsequent observations to eliminate 'transient echoes.' Manual inspection of CI results was also performed to remove non-precipitation echoes and anomalies.

Identified CIs are further classified as 'Developing' or 'Declining' based on their future evolution over the next 30 minutes. Developing CIs show enhancements in area and average echo intensity, while Declining CIs do not. This classification provides crucial labels for forecasting not just CI occurrence, but also its potential for intensification.

Validation includes analyzing CI spatial and diurnal distributions, which largely align with precipitation patterns and known convective activity. Anomalous high-frequency CI zones, often due to radar quality issues, are largely filtered out when considering only Developing CIs, indicating the superior quality of the 'Developing CI' labels.

Case Study: Identifying CI in Complex Convective Systems

Figure 5 illustrates CI identification across two cases, demonstrating the algorithm's robustness. In one case (Fig. 5a), 38 convective cells were observed over six time steps, with 16 identified as CI (12 Developing, 4 Declining). These were isolated cells, newly generating strong reflectivity, which subsequently developed into robust cells.

Another case (Fig. 5b) shows CI forming at the periphery of an existing thunderstorm system during a severe weather episode. Radar imagery revealed an extensive linear mesoscale convective system, with most identified CIs generating along the system's periphery rather than from storm movement. After formation, these CIs merged and expanded. Such peripheral CIs near strong echoes are challenging for fixed spatial threshold methods but are effectively captured by our dynamic approach. In this case, 223 CS cells were identified over six times, with 52 CIs (29 Developing, 23 Declining) successfully pinpointed. This highlights the algorithm's ability to handle both isolated and embedded CI events.

Quantify Your AI ROI Potential

Estimate the return on investment for integrating advanced CI nowcasting into your operations. Adjust the parameters to reflect your enterprise context.

Estimated Annual Savings $0
Annual Employee Hours Reclaimed 0

AI Integration Roadmap for CI Nowcasting

A typical phased approach to deploying robust AI-powered Convective Initiation forecasting within your enterprise.

Phase 1: Discovery & Data Assessment

Understand current weather impact, assess existing data infrastructure, and identify specific business objectives for CI nowcasting. Evaluate CIDS integration feasibility.

Phase 2: Model Prototyping & Customization

Develop initial AI models using CIDS and other relevant data. Customize algorithms for specific geographic regions and forecast lead times pertinent to your operations.

Phase 3: System Integration & Validation

Integrate the prototype model into existing operational systems. Conduct rigorous validation against historical data and real-time observations, refining accuracy and reliability.

Phase 4: Deployment & Continuous Optimization

Full deployment of the CI nowcasting system. Establish feedback loops for continuous model retraining and performance optimization, ensuring long-term value and adaptability.

Ready to Transform Your Weather Resilience?

Harness the power of AI to predict severe weather initiation with unprecedented accuracy. Schedule a consultation to explore how CIDS can drive operational efficiency and safety for your enterprise.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking