By now, just about every organisation that is serious about its mission and (growth) targets, has recognised that data science and AI have an important role to play as accelerators and facilitators. Both for high-level strategic decisions and lower-level operational decisions, leaders who are supported by advanced analytics and other AI, make better decisions, faster.

To get there, however, a Data First mindset is needed: the realisation that an organisation‘s internal data are unique raw ingredients that must be harvested in order to create priceless assets. The first big speed bump in the road is fragmented data. Over the years, a wide range of platforms and tools have been adopted across departments. But because all of the organisation’s digital solutions were designed and used for a specific purpose, the data that is generated and stored by each platform and tool is siloed. This typically leads to data that is difficult to integrate, analyse, learn from and act upon.

Common situations include:

  • Data that logically belongs to the same data subject (e.g. a customer or an employee) but cannot easily be joined, i.e. connected, because the datasets don’t share common unique identifiers for the data subjects;
  • Data that is duplicated, resulting in wasteful use of storage space and computing resources;
  • Slow analysis process, which costs the organisation more time and money than necessary;
  • Superficial or even flawed insights from the data, because important features, i.e. variables, are missing from the analysis.

An entire industry has emerged to solve the problem of fragmented data. The upside of that, is that there is most definitely a solution that meets your organisation’s needs. The downside of course, is that it is difficult to make a confident decision around which solution to pursue. Solutions vary greatly in configuration costs and monthly subscription fees, the ideal volume, velocity and variety of the data to be handled, user-friendliness and their ability to adapt to changes in data sources and in applications that the integrated data will be used for.

Hence, designing and implementing a fit-for-purpose solution starts with a detailed understanding of the organisation’s current and future data ecosystem and the questions that the integrated data is expected to answer. In our experience, while doing this business analysis is essential, it is also guaranteed to be incomplete. As an organisation gains control over its own data and learns what they have and don’t have available, new data collecting tools may be commissioned, old ones may be retired, and new use cases for the integrated data will likely emerge. Therefore, it is key to build an adaptable, modular data architecture that is able to grow with the organisation.

We have found that producing value fast is important to get organisation-wide buy-in and traction. We typically start with a few datasets and focus on completing the “golden thread” from ingesting the raw data, curating the relevant data tables, producing meaningful insights and communicating them with the relevant stakeholders. With all stakeholders committed and excited, data integration becomes an exciting, iterative journey on which the client organisation and we as data science consultants co-create an evolving and maturing solution that is able to solve increasingly large and complex problems, and unlock more and more opportunities for growth and impact.

Continue reading