Data Science Trend

Weather Data and Machine Learning: The Ultimate Pairing In Predictive Modeling

With ever-improving machine learning capabilities, tackling weather forecasting has gained some interesting new techniques.  Traditional weather forecasting relies on physics-based models, but with enough data, machine learning is able to make remarkably accurate predictions in many areas.  It wasn’t long ago that only large organizations had access to any raw weather data, with the majority of us relying on weather reports through news, websites, or apps.  Today, anyone can use a direct API feed to get raw data for countless locations.

On the other side of this equation, machine learning techniques, software, and ease of use have all dramatically improved such that anyone can gain access to the software, tutorials, and sample programs to learn how to build a machine learning model, all for free.

Given these two near-miracles, it would be interesting to see if you could get a hyperlocal forecast of your own community, pull historical data, and use it to build and train an accurate weather forecasting tool using only what weather data that is available through an API.  Doing so would allow you to complete the initial training, then continue honing the skills of the model each day as you compare the predicted weather the model gave vs, the actual weather that occurred.

This is fascinating because it would allow anyone to completely disregard the impossibly complex physics-based weather models and use a sizable data set dedicated to a local community to train, improve, and publish a weather forecasting model potentially even better than the local news.

Weather forecasting accuracy has long been especially hard to predict given the complexity of weather systems, but the more detailed data we can capture for a given area, the better we can use various machine learning models to not only improve the forecast, but also perhaps improve the physics-based models we’ve relied on for so long.  Let’s dive into the data capture part of this equation, pulling vast amounts of data for a single location in order to create a training set for the weather data, using it to train a predictive machine learning model.

Pulling Weather Data for Use In Machine Learning Prediction

In this tutorial, we will show you how to pull a variety of different weather data points that combined will give a comprehensive snapshot of the weather system for a given area.  By continuously capturing this data, it can be used to set up a time series prediction algorithm using machine learning to predict the next day’s weather.  Selecting a solid machine learning algorithm warrants its own tutorial, so this will focus on pulling the weather you need, and to do so you will need to know how to use a weather API.

Introduction

The set up for this will point to the Tomorrow.io API, as that is where we will pull all the necessary data.

API Key

To get an API key, you need to first sign up with the platform, then log in and get your key.

Select The Location

To select the location you can use as input either a latlong pair (shown below), or use a predefined list of locations set up as locationId inputs.

Select Fields of Interest

Because machine learning is able to take vast amounts of possibly related data and determine the accurate weights of correlation, we want to select a large list of fields to pull.  You can review the full list to see what other data points you might want to include.

Select Unit System

Simple: select either imperial or metric.

Set Up Timesteps

For the timesteps you can select the data points as either current, in hour increments, or in day increments.  The type of machine learning algorithm will dictate what timestep you need.

Configure The Time Frame and Time Zone

If you want to pull a large amount of historical data to help boost your training data set, you can expand the time frame to include historical data.  The time zone can be set anywhere, but probably makes the most sense to set it where you are pulling the weather data.

Request the Timelines Data

Now that you’ve set up the data, you can activate the request as shown below in order to get the data set you need.  After retrieving the data you will want to review the fields, the values, and the labels to ensure everything was pulled correctly.

Final Thoughts

Now that you have your weather data, you can start collecting new sets of values continuously to feed your training model and continue improving your machine learning model.  You can use the past data to help train the model, and each new day you can review the machine learning model’s predicted outcome vs. the actual outcome.  Keep fine tuning your data, and you may quickly develop the most accurate weather model for your community!