Monday, August 24, 2015

Column: Forecasts Have Improved in the 10 Years Since Katrina, And We Hope Messaging Has Too

Dr. Peter P. Neilley, Senior Vice President of Global Forecasting Services at The Weather Company
Published: August 24,2015

On Aug. 14, 2005 — 10 years ago, as I write this — a little swirl of a disturbance in the central Atlantic Ocean garnered enough strength and organization to be officially classified as Tropical Depression 10 of the 2005 Atlantic hurricane season. This little ragged group of clouds and showers got little attention and was largely only a sideshow in the remarkable 2005 hurricane season that had so far produced eight named storms, including four hurricanes and two major hurricanes (Dennis and Emily).
While Tropical Depression 10 never became anything significant in itself, it did end up helping form Tropical Depression 12 over the Bahamas some eight days later, which rapidly intensified into Hurricane Katrina by Aug. 25, hence setting the stage for one of the most significant weather events to ever impact the U.S. However, on August 14, little did we know how impactful those incipient clouds of Tropical Depression 10 over the central Atlantic would end up being.
Hurricanes have long plagued the U.S., and our history is riddled with these storms and their impacts. The personal and economic devastation of hurricanes can be so enormous that our nation has invested a tremendous amount of resources to understand and predict these storms. While we likely will never be able to prevent hurricanes from hitting our shores, better prediction can lead to better preparation, which can significantly reduce the impact the storm has upon landfall.
So how are we doing in hurricane forecasting accuracy in general? This graph shows the 45-year trend in the accuracy of forecasts issued by the NOAA National Hurricane Center (NHC) on where tropical storms and hurricanes will go once they form. It shows how far off, on average, our forecasts of the storms' locations have been for each year since 1970. There is a different graph for each forecast day, with errors in the location of one-day-ahead forecasts shown in red, two-day-ahead forecasts in green and three-day-ahead forecasts in yellow. Forecasts out four (brown) and five days ahead (blue) only began in the early 2000s, so those trends lines are shorter.  
Figure 1. Trends in the errors of Atlantic tropical storm forecast and hurricane locations. (Courtesy NOAA, National Hurricane Center)
(WATCH: Flying Drones into Hurricanes) 
Overall, hurricane forecasts are clearly getting better as the trends in the lines are all downward toward less error. If you look closely at the error in the three-day-ahead forecasts (yellow) in 2014, it is about the same (roughly 100 miles) as what the two-day-ahead error was in 2005 and what the one-day-ahead error was in 1995 (or perhaps 1993). That is, the forecasts of the location of these storms three days in advance made today are as good as the two-day-ahead forecasts were just a decade ago and as good as the one-day-ahead forecasts were two decades ago. This trend of forecasts getting better by a day-ahead for every decade mirrors the improvements seen for many other types of day-to-day weather forecasts.
Knowing where a hurricane is going is one thing, but it’s also important to know how strong it will be when it gets there. Unfortunately, up until recently, that part of the story was not so good. This figure shows the trends in NHC's ability to predict the strength of the winds near the center of a hurricane. It shows that between 1989 and 2010, there was very little improvement in the ability to predict the strength of the storms and nowhere near the rate of improvement in track forecasts. The two-day-ahead hurricane intensity forecasts over that period improved only about 15 percent and hardly made any gains against the one-day-ahead forecasts.  
Figure 2. Trends in the errors of Atlantic tropical storm and hurricane intensity forecasts. (Courtesy NOAA, National Hurricane Center)
This relative lack of progress to better predict hurricane intensity was one of the key factors that led to the formation of the national Hurricane Forecast Improvement Project (HFIP) in 2009. HFIP has amongst its goals a 20 percent improvement in hurricane intensity forecasts by 2014 and a 50 percent improvement by 2019. The project brings together many of the world’s best hurricane and forecasting scientists to work together toward these goals. Improvements to a specialized computer forecasting model known as the HWRF (the Hurricane Weather Research and Forecasting model), which is designed specifically for hurricane forecasting, has been one of the primary foci of the program.
And it appears to be paying off. Since 2010, hurricane intensity forecasts have gotten dramatically better (as seen in Figure 2). There is some concern that these recent dramatic improvements evident in that graph may be a bit of red herring since many of the recent hurricane seasons have been somewhat atypical. But at the very minimum, the results are quite encouraging.
So, back to Katrina. In 2005, the ability of forecasters to make a useful prediction of where a hurricane was likely to go was limited to about three days, on average. For now, I define a “useful” forecast to mean when the average error in the forecast of its location is less than the typical size of the core of a storm (about 150 miles), but I’ll come back to this.
(MORE: Hurricane Season's Peak - Most Intense U.S. Hurricanes Hit in This 2-Week Period) 
Looking back at the Katrina forecasts that were issued by the NHC three days before landfall near New Orleans at 11a.m. CDT on August 26 (Figure 3), New Orleans was within the possible path of Katrina but clearly on the edge of the cone with the focus of the forecast more on the Florida panhandle. Arguably, this forecast was useful to New Orleans as it indicated they were in the threat zone, but being on the edge of the cone most certainly took some of the urgency off the forecast for the city. New Orleans was on alert, but actions to prepare were probably more passive as many people were merely thinking about preparations, rather than actually making them.
Figure 3. Official NHC forecast track and cone for Hurricane Katrina issued 11 a.m. CDT on Friday, August 26th.
By the time NHC issued that forecast, computer models were already starting to indicate that the likely path of Katrina was probably much further west, with New Orleans, perhaps, squarely in its sights. Meteorologists were getting very nervous. By that evening, a little over two days before Katrina began to seriously impact the city, the National Hurricane Center issued a revised forecast (Figure 4).
Figure 4. Official NHC forecast track and cone for Hurricane Katrina issued 11 p.m. CDT on Friday, August 26th.
Uh oh.
The new forecast called for a near worst-case scenario for New Orleans. A major hurricane bearing down on the city along a track that would be close to the most damaging and impactful for the region, with the potential for a huge storm surge from eastern Louisiana to Mississippi and Alabama and possible inundation of New Orleans.
Precisely what happened.
From nearly every perspective, this forecast should have epitomized the definition of a useful forecast. It outlined nearly exactly what was about to unfold. But was it useful? While the forecast was nearly perfect, it was not used, or at least not used as optimally and as quickly as hindsight tells us it should have been. Evacuation orders for New Orleans were not issued for another day and a half after that forecast and only after a specific plea from the head of the National Hurricane Center to the mayor. The delay in issuing those evacuation orders almost certainly led to some loss of life.
While social scientists and politicians can debate why such critical decisions were delayed, a likely contributing factor was that society’s perspective on forecast accuracy lagged behind the true gains that our science had made up until that point. We all have a built-in confidence test we apply to information we receive, and in this case, the confidence that the city’s decision makers attributed to this hurricane forecast was probably based on years or decades of prior experiences with weather forecast accuracy. But because our science was undergoing rapid improvements in the accuracy of our hurricane track forecasts, there was a gap between the perceived accuracy of the forecast and the real accuracy. New Orleans decision makers should have applied more confidence to the dire forecast, but their historically-tuned “smell tests” probably inhibited them from doing so.
(MORE: Deadliest Hurricanes - Atlantic Storms That Killed 8,000+) 
Part of the blame goes to the meteorologists. While we knew better than anyone how much our forecasts had gotten better, not only for hurricanes but for all types of weather, we were not good at articulating it. While we pleaded our case that this was “The Big One,” and we knew it internally, our pleas fell somewhat on ears deafened by our past cry-wolf failures. And we didn’t know how to overcome that legacy other than crying out louder. Our Katrina forecast was absolutely useful, but we didn’t know how to say it in a way that made it as useful as it should have been.
So, what is a useful forecast? Is it a forecast upon which a decision can be made or a forecast in which a decision is made? This is where it can get tricky, since every decision by every one of us is different and based on different criteria. There can be no one definition of usefulness of a weather forecast since the use of every forecast is unique to the individual and circumstance. Decisions are often made based on the significance of the decision, the loss incurred by indecision, the probabilities that the forecasts will be right or wrong and countless other factors that are unique to each situation. There’s a whole mathematical theory around this, referred to as “Cost-Loss Theory,” but despite the rigor of that math, the bottom line is that most day-to-day life decisions that every one of us makes is based on an internal calculation using the information we have, our confidence in that information and the significance of the decision.
So can we ubiquitously define a useful forecast? For one specific decision by one specific individual (e.g. issue an evacuation order), the answer is yes. But for a collection of similar decisions by a common group of individuals (e.g. “Should I evacuate?”), the answer gets murky. It’s murky because the factors each individual faces start coming into play. Do I have a place to go? Can I bring my pets? What about my parents in the nursing home? At the individual level, usefulness becomes complex, even for the seemingly same decision that many people are considering. Hence, the certainty and significance of the forecast, and the thresholds which trigger decisions, is absolutely personal. And then when you start to think of all the possible decisions that need to be made, you can see how it becomes very difficult to define the usefulness of a forecast ubiquitously.
Therefore, meteorologists generally don’t try to measure their forecasts with metrics of usefulness or utility, even though this is how society may very well be measuring us. Rather, we generally characterize our forecasts using statements of mathematical error (as in Figures 1 and 2) and leave it to the individual to decide if that information has utility.  
(MORE: These 10 Atlantic Hurricane Seasons Started Slow but Escalated Quickly) 
At The Weather Company, we strive to provide the most accurate weather information available anywhere so that the most people in the most situations can trust our forecasts to guide their decisions best. And statistics from third parties show that we are achieving this goal. Figure 5 shows forecast accuracy for all of 2015 (so far) based on data available from an independent, third-party accuracy watchdog organization. It shows that The Weather Company’s forecast accuracy is materially better than any other provider, including the National Weather Service. And the results are identical if you look at longer-range forecasts, or forecasts for other parts of the world, or other metrics of accuracy. As a rule, our forecasts are simply the most accurate, hence they can be relied upon in the everyday decisions that we all make. In that sense, they are the most useful.
Figure 5. 2015 Day 1-3 forecast accuracy defined as the percent of forecasts deemed correct. Forecasts were analyzed for temperature and precipitation across the U.S. and are considered correct is the occurrence of precipitation was forecast correctly, and the temperatures were within 3F.
Are our forecasts perfect? Certainly not, and forecasts will never be 100 percent perfect. But we’re working on it and making gains toward that. Further, one thing that we are particularly focused on doing is getting better at articulating the errors in our forecasts. The expected error in a forecast is not a static number, but something that changes with the weather, with some days being much more predictable than others. For example, Superstorm Sandy was exceptionally predictable, as some forecasts as many as ten days in advance ended up being nearly perfect.
What’s even more interesting is that not only does the predictability of the weather change day to day, we can actually forecast that predictability. That is, not only can we tell you what the weather is most likely going to be, we can also tell you with reasonable confidence what the chances are that this particular forecast will be correct, and if it's incorrect, what the most likely alternative outcome(s) might be.
While we know how to do these sorts of “predictions of predictability,” we have not found the secret sauce in articulating this information well, and that was part of the issue with Katrina.  However, we are getting better at this. Today, most people are now familiar with spaghetti plots of tropical storm track forecasts showing all the possible tracks that a storm might take. When those tracks look like cooked spaghetti thrown against the wall, we know the forecast is more likely to have significant error and change a lot with each update. But when those tracks all line up like uncooked spaghetti still in the box, the forecast is inherently more accurate and reliable. Hence the spaghetti plots are a technique which meteorologists use to convey the certainty or uncertainty in the forecast.  Knowing both the forecast and the probabilities of something else happening can be used to enable better decisions. So rather than applying a personal smell test to a forecast based on one’s experiences with past forecasts, a decision can and should be made using confidence information that comes along with the forecast. Perhaps if the decision makers in New Orleans were shown a spaghetti plot several days in advance, which indicated that nearly every possible outcome had their city in its gun sights, earlier and better decisions may have been made.
So, did we know 10 years ago on Aug. 14, 2015 that the little swirl of clouds in the central Atlantic just deemed Tropical Depression 10 would unfold into the epic event known as Katrina? Certainly not, or, certainly not with any form of confidence. But what we do know that if another Katrina occurs today, the forecast for that storm would almost certainly be even better than it was for Katrina 2005 and that it would be richened with predictions of confidence. And chances are The Weather Company is where you would find the most accurate forecasts of it, the level of confidence behind it and the context to help you understand and use the forecast.
MORE: Hurricane Katrina Before and After

No comments:

Post a Comment