CaSPAr Data team archive for weather

Juliane Mai, a post-doctoral fellow, and Bryan Tolson, associate professor, were part of the team to create the Canadian Surface Prediction Archive

Waterloo engineers have created the first publicly-accessible archive of Environment Canada’s operational weather forecasts – a game-changing tool for researchers who use weather data.

Bryan Tolson, an associate professor in the Department of Civil and Environmental Engineering, and Juliane Mai, a post-doctoral fellow in the same department, teamed up with other researchers in the NSERC Canadian FloodNet Strategic Network grant to create the Canadian Surface Prediction Archive (CaSPAr). The team created CaSPAr when they realized how inefficient it was for flood forecast researchers to access the forecast data they needed.

Anyone who depends on numerical weather data to do research – like flood forecasters, electricity grid managers, health and air quality forecasters, or travel forecasters – can make use of CaSPAr.

To make a flood prediction, for example, forecasters use a predictive model and a database of forecasted weather information, like temperature and precipitation data. The forecasted weather data is entered into the models, and forecasters run simulations. Predictions are made with the outcomes of these simulations. This allows forecasters to issue accurate flood warnings to the public.

CaSPAr data logo archive for weather researchersCaSPAr contains the data entered into these simulations. More importantly, it contains the data used to test them.

“Forecasting systems need to be tested before forecasters can use them to create predictions” says Tolson. “Testing the system will make sure it works – that it can accurately predict previous events using historical numerical data. Without an archive like CaSPAr, you can’t do that.”

A new source for weather data

For years, researchers in weather-related forecasting relied on homegrown, cumbersome archives to access historical weather data. Environment Canada provides temporary access to operational forecasts, and researchers would download this data daily to populate their own archives.

However, they were forced to download global- or North-American-scale information with hundreds of variables they didn’t need. After downloading, they would parse through the data to find the numbers specific to their interests.

“With CaSPAr it’s different,” says Mai. “You specify the variables, location, and dates you need, and the system provides you with exactly that information. It saves you from downloading unnecessary data, and it saves the effort of post-processing because you only download what you need. Plus, the data is in a format where you can put it directly into your models.”

Using CaSPAr, researchers can save hours of time and terabytes of storage space.

Over 170TB of data – and counting

CaSPAr provides global, North American, and Canadian numerical weather forecasts issued by Environment and Climate Change Canada. After 16 months of building the database, CaSPAr contains 171 terabytes of archived data, and 8.5 terabytes are added each month.

“Every single night, CaSPAr downloads numerical forecast data,” explains Mai. “Since June, researchers have been able to log in and request the data that they want.”

Anybody can register to access CaSPAr and start making requests for information. Researchers can access step-by-step instructions to help them get started on CaSPAr, and register for the tool at no cost. So far, 50 researchers are registered with the system, and they have pulled over 100 requests since June. Tolson and Mai hope to expand CaSPAr to include more data points and users.

“We want to make CaSPAr a more robust and efficient data management tool,” says Tolson.

Currently, the archive includes the most popular data sets for weather and climate forecasters, but they hope to add more information and features to it in the future. Researchers who are new to the tool are encouraged to provide feedback about other data they would like to see in the archive.

“I hope that it becomes a platform for any kind of data that you want to get from Environment Canada,” says Mai. “I want to make it easier to get exactly the data you need.”

Data files are available  in NetCDF format, making it easier for researchers to use them. Forecast data are available on CaSPAr approximately seven days after being released by Environment Canada.  

FloodNet logoCaSPAr was created out of a partnership between the Natural Sciences and Engineering Research Council (NSERC) FloodNet, Environment and Climate Change Canada, Esri Canada, the University of Waterloo, and McMaster University. Storage and processing are performed on Compute Canada’s Graham cluster (supported by Compute Canada’s 2017 Resource Allocation Competition) and wouldn’t have been possible without the support of Compute Canada staff.

For any further questions, contact the CaSPAr development team via caspar.data@uwaterloo.ca.