Proper data science practices and reliable models fill out a small historical dataset on wildfire hazard
Welcome to the third installment of our four-part series on wildfire hazard and risk management, where we will explore CoreLogic’s wildfire risk models and their optimal use within the insurance industry ecosystem. We recommend you read parts one and two prior to reading this piece. Watch for part four, which will be released in the coming weeks.
The California Department of Insurance (CDI) enacted several regulatory measures to address homeowners’ and business owners’ insurance-related concerns regarding wildfires. Citing a lack of transparency, the CDI requires that models meet a series of requirements before they can be used in ratemaking and underwriting.
One stipulation from the CDI is that for risk score models, including the CoreLogic® Wildfire Risk Score, to be considered an approved tool for insurers, they must be validated using historical wildfire events and losses.
In order to fully support its clients, CoreLogic aligned its suite of wildfire offerings to meet all CDI regulatory requirements.
However, it is crucial to consider both the current and the long-term views of wildfire risk. To understand both, it is necessary to factor in model development and evaluation best practices while putting an additional focus on why historical losses alone cannot be used to develop a model.
How To Train a Model: Reducing Biases by Combining Data and Wildfire Science
Some CDI-approved models validate well against historical losses since they were developed and trained using only historical data. However, this presents an issue when it comes to their validity modeling today’s wildfire hazard and its accompanying risk.
When a model is trained on historical events, it will obviously validate well against historical losses; the hazard component of the model is an accurate reflection of the hazard at that time. This looks good on paper but creates an obvious bias in the model and prevents it from adapting to the dynamic nature of wildfire hazard and risk.
A historical bias would skew risk toward previously burned areas, and its static nature does not account for how the risk has changed since those past events.
Additionally, the historic record alone does not capture the full range of potential wildfire events that could occur today, let alone those that might happen in the future. To be considered realistic and effective, models should be trained on a variety of datasets — past and present — and pass all historical and current validation tests.
Most importantly, a model should not use the same dataset for development and validation. Using the same dataset for both steps could bias the validation process and fail to provide a realistic review of the long-term efficacy of the model’s output.
Uncover Insights Into Areas With High Wildfire Risk
The CoreLogic Wildfire Risk Score incorporates data from many sources (See part one of our blog series) that are not directly linked to previous wildfire activity. There is one minor caveat here: The composition data input layer in the model accounts for the propensity of certain vegetation types to grow in post-wildfire soil chemistry.
The granularity, accuracy, and scientific validity of the data inputs are what make the CoreLogic Wildfire Risk Score a constantly evolving and accurate representation of wildfire hazard today.
This method validates appropriately with historical losses when one compares the Wildfire Risk Score at the time of the fire. Likewise, today’s Wildfire Risk Score is an accurate representation of 2023 wildfire activity when compared to current wildfire footprints. The risk score methodology remains unchanged, but the new input data layers (fuels, composition, etc.) are included in the annual updates, which accurately reflect the wildfire risk today. When considering the wildfire risk of the past, present, and future, this approach is a much more effective means of modeling wildfire hazard.
How To: A Real Risk Evaluation
Proper model development and validation is a challenging task, but it is fundamentally necessary to ensure accuracy, especially if models are going to be used to make major financial decisions.
In the case of wildfire, modelers need to account for a reference timeframe because it is easy to build a model that is biased towards historical hazard data when the requisite validation tests are historical losses. Even if a model validates under these conditions, it may not accurately reflect the risk today, and either under- or overestimate wildfire risk, which can have severe consequences.
©2023 CoreLogic, Inc. All rights reserved. The CoreLogic statements and information in this report may not be reproduced or used in any form without express written permission. While all the CoreLogic statements and information are believed to be accurate, CoreLogic makes no representation or warranty as to the completeness or accuracy of the statements and information and assumes no responsibility whatsoever for the information and statements or any reliance thereon. CoreLogic® is the registered trademark of CoreLogic, Inc. and/or its subsidiaries.
Contact: Please email [email protected] for any questions regarding your wildfire risk or the CoreLogic Wildfire Risk Score, or any CoreLogic Wildfire products.
Please visit www.HazardHQ.com for up-to-date information on current natural catastrophe activity across the globe.