Author’s note: this article is a follow-up of The Coronavirus: Forecasting During Periods of Structural Change.
Never before did we attach such importance to a stable internet connection, did everyone unanimously agree about the great value of the healthcare industry, and was a trip to the supermarket the adventure of the week. The impact of COVID-19 had completely infiltrated our society, and we’ve been forced to face this new situation and adjust. This holds true for analytical models too. How can we evaluate which models should be adjusted, and which results can be trusted afterwards?
Black box models
Many models are so-called black box models: models with outputs that cannot be explained or interpreted by humans, as their inner workings are not visible or understandable to humans. David Cox, IBM director of the MIT-IBM Watson AI Lab, explains the dangers these black box models entail, specifically when facing changing circumstances: “It’s very dangerous if you don’t understand what’s going on internally within a model in which you shovel data on one end to get a result on the other end. The model is supposed to embody the structure of the world, but there is no guarantee that it will keep working if the world changes” (Leprince-Ringuet, 2020). By definition, a model represents – models – some part of the world around us. In order to be valuable, the model’s structure needs to resemble the structure of the reality it represents; this is accomplished by basing the model’s structure on data that describes this reality.
In a changing world, the structure of reality evolves. Models based on ‘old’ data, data collected prior to (some of) the changes, most likely will not sufficiently resemble the new reality. Some models will be rendered useless due to the changing circumstances; but as black box models do not readily show what their results are based on, it can be hard to figure out which models are impacted to which extend.
Under the assumption it is known which models no longer produce satisfactory results, these models still need to be adjusted to the new situation. Even more questions arise here: which changes will most likely improve models, and which new models will optimise results? Again, it’s hard to know, since these new models cannot be interpreted by humans and often cannot even be sufficiently tested against historical data yet.
Piyanka Jain, highly-regarded industry thought leader in Data Science, stresses the importance of intuition in Data Science: “… we are all intuitive beings, and if we marry data to that, we can really optimize our decisions” and “… putting data to work is all about marrying data to intuition that you already have” (Eremenko, 2020). However, in order to apply intuition to a model’s results, one needs to understand this model’s inner workings.
This is where white box, or explainable models come into play. Explainable models can be understood by humans, or even expressed in human-readable format. Using such models, it becomes much more feasible to match them against intuition.
when looking at an explainable model built to forecast future air pollution, it might become clear that this model bases its result partly on traffic data. Intuitively, it’s clear that geographical regions currently in lockdown (due to the Coronavirus, for example) might experience a decrease in traffic. The model might never have been trained using such low traffic numbers, and could thus produce inaccurate results when exposed to this data. Thanks to the model’s explainability, intuition can thus indicate that this model probably needs to be adjusted. This goes even further: thanks to this intuition it can become clear what part(s) of the model needs to be adjusted, namely the model’s decision making process (structure) when confronted with low traffic rates.
Model Features and Visualisation
Tangent Works’ model building engine, TIM (Tangent Information Modeller), generates forecasting and anomaly detection models based on time series, and presents these models in human-readable format. TIM accomplishes this by automating the feature engineering process. First, TIM goes through all input variables and derives which subselection will contribute to the final result. Secondly, TIM generates many additional artificially created features based on the original variables (expansion). Then, this newly created set of features is brought back to a smaller, useful subset (reduction) to achieve model stability and prevent overfitting. A user can then understand the model based on the features that it uses, and the extend to which each feature contributes to the final result.
A final step in profiting from explainable models lies in model visualisation, making the model fully explainable by allowing users to interpret it at first glance. One such visualisation is a tree map, graphically showing the proportionate importance of not only input variables, but also all features derived from them. An exemplary tree map can be found below. Alternatively, sunburst diagrams can do a nice job in visualising explainable models too. An example of such a sunburst diagram can also be found below.
In an evolving world and ever changing society, one can never be sure what the future will bring. However, entire species survive and perish on their ability to adapt. On their own level, so do businesses. There’s power in explainability.
About the author
Elke Van Santvliet is a Product Manager at Tangent Works. She focuses on bringing TIM’s capabilities to business users, by exposing the underlying functionality through various platforms and tools. This includes Tangent Works’ own web interface TIM Studio, as well as a range of data-related products such as Alteryx, Power BI and Qlik Sense.
Elke is passionate about data in all its aspects, and is always open to discuss the newest trends in AI or dive deep into a specific data science use case.
Leprince-Ringuet, D. (2020, May 14). Artificial intelligence is struggling to cope with how the world has changed. Retrieved May 19, 2020, from https://www.zdnet.com/article/ai-models-are-struggling-to-cope-with-our-rapidly-changing-world/
Eremenko, K. (Producer). (2020, May 06). SDS 363: Intuition, Frameworks, and Unlocking the Power of Data [Audio podcast]. https://www.superdatascience.com/podcast/intuition-frameworks-and-unlocking-the-power-of-data