Five steps that can save your data analytics – and help you save face
By SAS Insights staff
Of everything awkward I’ve had the misfortune to sit through, nothing’s been as bad as an analyst unable to defend his results. It’s usually not because he lacks skill or smarts. Instead, it’s normally because he didn’t think about his data value chain.
Being put on the spot is a great leaning experience, even if it’s like extracting teeth. No one likes being hammered. The nasty thing is that it’s so easy to fall into the trap. Getting from data to insight is rarely a straight path. And, the more complex the process, the harder it becomes to debug and explain.
While it’s probably a sad indictment on my SQL skills, I rarely get what I want the first time. After a few queries I’m usually close enough to start doing some real analysis. Unfortunately, that’s when I realise the data I need is the data I thought I didn’t. So, it’s back to the warehouse for another round of SQL fun.
After joining, transforming, modeling, re-modeling, and banging my head against a wall for a few hours, I’ve usually ended up with something I can use. Ideally, it’s something I’ll use more than once. So, I’ll automate the process.
And (insert giant flag here), it’s here where people shoot themselves in the foot.
To make things as efficient as possible, they’ll take everything they did and wrap it up in a single job. Data comes in one end, magic happens, and answers pop out the other.
Data comes in one end, magic happens, and answers pop out the other…unfortunately, it’s only perfect until it suddenly isn’t.
It’s efficient and works perfectly. It’s a marvel of engineering. It’s something that data scientists will talk about in hushed tones for decades, usually only after extensive initiation rituals.
Unfortunately, it’s only perfect until it suddenly isn’t. In the immortal words of Jeff Goldblum, “Life will find a way.” In the also immortal words of Murphy, “Anything that can go wrong, will.”
At some stage, things break. And, without being able to track changes through the underlying data value chain, it’s absolutely impossible to determine why or where things are going wrong.
A classic example involved an organisation I knew that saw their customer base shift dramatically. On Monday, they serviced mainly young professionals. By Friday, their models were telling them that they serviced octogenarians almost exclusively.
This obviously made no sense. Neither their churn nor acquisition rates suggested that anything like this was possible. Short of discovering a hitherto unknown time machine, it seemed more than a little likely to be a mistake. Needless to say, it also did little to build the analytical marketing team’s credibility.
After more than a few late nights and sour stomachs, they eventually worked out that one of their data sources had stopped updating. This issue cascaded and eventually ruined their segmentation model.
Sadly, they only got to the answer after they’d re-validated their model, reviewed all their imputations, and double-checked all their reports for accuracy. While they eventually fixed the problem, they could have avoided it entirely had they preserved linage through their data value chain.
Every analytical process goes through the same five steps. The easiest way to quickly identify and fix issues with analytical data management is to make sure every step (where possible) is isolated and preserved. These five milestones are:
- Sourcing raw data.
- Cleansing and imputing data.
- Calculating transformations and derivations.
- Calculating value-added data.
- Translating data into a consumable form.
Isolating and preserving the data associated with each of these steps makes it easier for teams to maintain, optimise, and debug their day-to-day work. Not only does this reduce risk but it drives efficiency, and in the long run, reduces ramp-up times. While it carries additional storage costs, the non-technical benefits and maintenance efficiencies greatly outweigh the incremental investment needed to cover storage.
It’s a simple step, but it’s a lifesaver when it comes to driving incremental efficiencies. Everything eventually breaks and time spent fixing things is time taken away from creating new value-generating processes and assets.
And, taking too long to fix things is embarrassing. We’re supposed to be experts, right?