The DevOps approach to software delivery manages risk by applying change in small packages instead of big releases. By increasing release frequency, overall risk falls since more working capabilities are delivered more often. The consequence of this is that problems with your data can be amplified. And, as a result, you can squeeze the risk out of one aspect of your delivery just to introduce it in another. Thin cloning attacks that risk, enhancing and amplifying the value of DevOps by reducing the data risk inherent in your architecture.
Data Delivery
How is there risk in your architecture? Well, just because you’ve embraced Agile and DevOps doesn’t mean that your architecture can support it. For example, one customer with whom I spoke had a 3-week infrastructure plan to go along with every 2-week agile sprint because it took them that long to get their data backed up, transmitted, restored and ready for use. So, sure, the developers were a lot more efficient. But, the cost in infrastructure resources, and the corresponding Total Cost of Data was still very high for each sprint. And, if a failure occurred in data movement, the result would be catastrophic to the Agile cycle.
Data Currency and Fidelity
Another common tradeoff has to do with the hidden cost of using stale data in development. The reason this cost is hidden (at least from the developer’s viewpoint) is that the cost shows up as a late breakage event. For example, one customer described their data as evolving so fast that a query developed using stale data might work just fine in development but then be unable to respond to several cases that appear in more recent production data. Another customer had a piece of code tested against a subset of data that came to a crawl 2 months later during production-like testing. Had they not caught it, it would have resulted in a full outage.
I contend that the impact of these types of problems is chronically underestimated because we place too much emphasis on the number of errors, and not enough on their early detection. I contend that being able to remediate errors sooner is significantly more important than being able to reduce the overall error count. Why? First, because the cost of errors rises dramatically as you proceed through a project. Second, because remediating faster means avoiding secondary and tertiary effects that can result in time wasted chasing ghost errors and root causing things that simply would not be a problem if we fixed things faster and operated on fresher data.
Thought Experiment
To test this, I did a simple thought experiment where I compared two scenarios. In both scenarios, time is measured by 20 milestones and the cost of error rises exponentially from “10” at milestone 7 to “1000” at milestone 20. In Scenario A, I hold the number of errors constant and force remediation to occur in 5% less time. In Scenario B, I leave the time for all remediation constant and shrink the total number of errors down by 10%.
Scenario A: Defects Held Constant; Remediation Time Reduced by 10%
Scenario B: Remediation Time Held Constant; Defects Reduced by 10%
In each graph, the blue curve represents the before state, and the green curve the after state. For both Scenarios, in the before state, the total cost of errors was marked at $2.922M. The comparison of the two graphs shows that the savings from shrinking the total time to remediate by 10% was $939k vs. the savings from shrinking the total number of errors was $415k. In other words, even though these graphs didn’t change much at all – the dollar value of the change was significant when time to remediate was the focus. And, the value of reducing the time to remediate by 10% was more than twice as much then the value of just reducing the number of defects by 10%. In this thought experiment, TIME is the driving factor driving the cost companies pay for quality – the sooner and faster something gets fixed, the lower it costs. In other words, shifting left saves money. And, it doesn’t have to be a major shift left to result in a big increase in savings.
The Promise of thin cloning.
The power of thin cloning is that it addresses both of the key aspects of data freshness: currency and timeliness. Currency measures how stale it is compared to the source [see Segev ICDE 90] and timeliness how old it is since its creation or update at the source [See Wang JMIS 96]. These two concepts capture the real architectural issue with most organizations. There is a single point of truth somewhere that has the best data (high timeliness). But, it’s very difficult to make all of the copies of that data maintain fidelity with that source (currency) and the difficulty to do so rises in proportion to the size of the dataset, and the frequency with which the target copy needs currency. But, it’s clear that DevOps goes in this direction.
Today, most people accept the consequences of low fidelity/lack of currency because of the benefits of a DevOps approach. That is, they accept that some code will fail because its not tested on full size data, or because they will miss cases because data is evolving too quickly, or that they will chase down ghost errors because of old or poor data. And, they accept it because the benefit of DevOps is so large.
But, with thin cloning solutions like Delphix, this issue just goes away. Large – even very large databases can be fully refreshed in minutes. That means full size datasets with minutes old timeliness and minutes old currency.
So what?
Even in shops that are state of the art – with the finest minds and the best processes – the results of thin cloning can be dramatic. One very large customer struggling to close their books each quarter was struggling with a close period of over 20 days, with more than 20 major errors requiring remediation. With Delphix, that close is now 2 days, and the errors have become undetectable. For a large swath of customers, we’re seeing an average reduction of 20-30% in the overall development cycle. With Delphix, you’re DevOps ready, prepared for short iterations, and capable of delivering a smooth data supply at a much lower risk.
Shifting your quality curve left saves money. Data Quality through fresh data is key to shifting that curve left. Delphix is the engine to deliver the high quality, fresh data to the right person in a fraction of the time that it takes today.
Comments