Using big data to reduce risk in the workplace

I recently had the privilege to join a panel discussion on the use of big data for predicting and reducing occupational risk. The Q&A below is a wrap-up of the most important points.

Q: How can big data be leveraged to enhance organisations' proactive response to potential risks?

A: In simple terms, “Big data” are datasets with many rows (units of analysis), and many columns (attributes of the unit of analysis). And often the number of rows grows continuously. What it means is that we know a whole lot (attributes) about a lot of people, environments, products, business units, measured repeatedly over a long period of time. When the dataset also contains columns that represent the outcomes or events we want to predict and prevent, we can use supervised machine learning methods. Analysing big data for risk prediction and prevention then comes down to using algorithms to find and quantify those attributes and patterns in the data that are associated with elevated risk. The attributes may point to specific job roles, times in the day, weeks in the year, geographical locations, or demographical characteristics. Sometimes the dataset doesn’t include outcome data. In that case we may still use machine learning methods to recognise risky situations by deploying a pre-trained model. For instance, we can use computer vision models to analyse video feeds on construction sites and look out for any unauthorised persons entering a high-risk zone.

Q: How can organisations effectively employ predictive analytics to reduce workplace accidents and injuries?

A: The first step is to create clarity from the organisation’s leadership on the objectives, the targets, and the levers that are available to make changes to the environment and the processes that the risk originates from. Once this context is clear, it’s a matter of using the organisation’s data assets and infrastructure (and potentially improving them in the process) to develop fit-for-purpose datasets. The third step is to apply the appropriate methods to build the models. Which method is best will depend entirely on the precise question and objective at hand, and the available data. Lastly, the results of the models must be turned into action. Some actions are simple and can be automated. For instance, a computer vision model that detects an unauthorised person entering a high-risk zone can send an automatic signal that sounds an alarm and stops the machine, thus neutralising the risk. Sometimes the actions are more complex, multi-layered and require active buy-in from a range of stakeholders. In that case, the model results must be turned into understandable, relatable, engaging “data stories” that feature interactive graphs of forecasts and “what-if scenarios”. These data stories can help the organisation’s leadership and all of the stakeholders understand what must be done to avert future accidents and injuries. Importantly, preventative analytics is not a once-off initiative. It’s a continuous process of learning, improving the quality of the source data, retraining the models, and updating the insights and data stories.

Q: What key factors ensure the reliability and accuracy of risk prediction data?

A: Risk predictions are only as good as the data they are based on. To produce accurate and reliable risk predictions, one needs:

  • Sufficient data. Small datasets produce predictions with a lot of residual uncertainty;
  • Complete data. When certain sections of the organisation are underrepresented, this can lead to biased and misguided predictions.
  • Timely data. If things are evolving, then it is crucial to have a view on the most recent situation and how it has changed from before.
  • Accurate data. Measurements (e.g. sensors or self-reported behaviour) that are imprecise and/or skewed) produce proof quality risk predictions.

For this reason, initiatives to predict risk should not only include risk models, but also analysis of “meta data”. The organisation must have a real-time view of the quality of the data that went into the risk model. This drives efforts to address and improve data quality and ultimately the value of the risk predictions.

Q: If successful application of predictive models leads to the absence of unwanted events, how does one prove that the models are effective and how does one make the business case for continued investment in preventative analytics of big data?

A: To get a sense of the effectiveness of risk-reducing models in production, we can develop an additional layer of scenario models on top of the predictive machine learning models. In scenario modelling, we construct a counterfactual scenario, i.e. what we believe would have happened if we hadn’t implemented the predictive models. We also construct a factual scenario that aligns with what we actually observed. By comparing both scenarios against one another and quantifying the difference, we can get a quantitative estimate of the impact that the predictive models have made on safety and health.

Besides scenario modelling, we can also make the business case for continued investment in preventative analytics of big data by reframing the objectives: instead of merely aiming to avert undesirable outcomes such as injury, death or depression, we can pursue the more positive and holistic outcome of a thriving workforce: why stop at aiming for the absence of injury and disease when we could set our sight on models that can contribute to higher job satisfaction, more social cohesion, better mental & physical health, and productivity gains.

Continue reading