At CoreLogic, we have a long history of applying artificial intelligence and data analytics to our data to derive predictive products and solutions for our customers. These data analytics models are typically developed to solve specific customer use cases, which limits the applications to certain business units.
With the advent of big data and computing technology that has enabled capture, processing and storage of huge volumes of data, the challenge for enterprises is not only applying data science in high-data-volume environments, but also across significant data variety, as well as contending with data velocity.
Together with advanced machine learning methods, the deployment of the CoreLogic Smart Data Platform™, which addresses these challenges, has made the application of data science at the enterprise level not only feasible, but highly advantageous. This powerful combination makes solutions derived from machine learning available to all products and business units.
Methods
There are many ways to apply data science at the enterprise level that would benefit multiple business units, but we will focus on three applications to illustrate the corporate advantage:
- Data Mastering: Data mastering is a key function for data management in a multi-sourced data supply chain environment. In addition to using traditional methods, such as rule-based master record identification, we also apply sophisticated machine learning techniques to create a set of trusted master records. Key to our effective property data mastering is the assignment of a unique identifier, CoreLogic Integrated Property Number™, to each unique property data record.
- Entity Matching: There are several techniques used for entity matching in our enterprise data repository. In addition to employing rule-based and deterministic methods for matching data records, a multi-tier entity matching engine, including machine learning algorithms, is used for matching data at the enterprise level.
- Data Enrichment: CoreLogic has acquired an impressive array of data assets to support our various businesses, which introduces a unique opportunity to leverage such diverse data sets. Working with a strong foundation of data mastering and entity matching capabilities, we can accurately combine disparate data sets from multiple sources. By applying data science methods at an enterprise level, we can improve and enrich property data records to a higher level of completeness and accuracy.
Advantages
To make the CoreLogic Smart Data Platform successful, we need to acquire the right data to analyze and leverage in order to achieve these outcomes. In order to support efficient data operations and gain insights for our customers, our data science and artificial intelligence techniques allow us to fully leverage the benefits of the vast number of sources of structured and unstructured data. By combining data sets that, by human perception, may seem to be unrelated, we can now detect patterns and behaviors that would not have been possible to observe in data silos and without machine learning methods.
With our Smart Data Platform, we have the opportunity to acquire, blend, integrate, and converge all kinds of data, regardless of source and format. By performing data science upstream at the enterprise level, we increase data connectivity, completeness and accuracy, enabling more effective data-driven insights. As a result, this newly created and enhanced dataset can be used more effectively by applications and products downstream.
Staying Focused
There are many benefits to performing data science at the enterprise level, yet as is typical, there are also risks and challenges. One challenge is keeping research and development aligned with business drivers. With a plethora of diverse data, experimentation can expand to fill an endless loop.
- Rigorously Define Scope: We have found that one key to success is working closely with business stakeholders at the outset to identify the specific business problem we are trying to solve, and then rigorously defining the scope and deliverables for a Proof of Concept (POC). Once the POC gets underway, we engage in a disciplined technical peer review process to ensure that the machine learning algorithm and/or data science approach selected for the solution is viable.
- Plan for Model Production: We work with our business partners to establish a clear Minimum Viable Product (MVP) definition to avoid any misalignment of expectations. Engaging early with our engineering team has also proven to be important to ensure that the right technical resources are aware of Model Deployment and Model Training requirements and can plan accordingly.
To realize the benefits of data science for any company, it is important to establish an effective process to bring experimentation all the way to the finish line – model production. Sophisticated machine learning algorithms that never get deployed are just good ideas.
Conclusion
Ultimately the biggest challenge for data science to succeed, regardless of where it is applied, is to get the greatest value out of all the available data. Applying data science at the enterprise level maximizes the benefits for the entire organization, as opposed to one specific product or solution. As more machine learning applications are deployed and used, more artificial intelligence can be introduced into the data supply chain to increase insights and predictive power. The holy grail is to achieve prescriptive solutions enabled by systems that continuously learn from the data and adapt real-time to a dynamically changing environment of incoming data.
By Stanley Wu, Senior Leader, Data Science & Architect