Data Control

kyle Hailey
Sep 9, 2013
8 min read

Video of webinar given on Data Control by myself and James Murtagh of Red Gate

Businesses today face a difficult conundrum: industry giants such as Google, Facebook, and others have taught people to expect service that’s reliable and fast, yet budgets are getting tighter by the fiscal year. Instead of being able to provide quality, on-time applications that are bug free and high performance, struggling IT departments are forced to cut back on systems and resources for developers.

The most vital resource for applications can also be one of the most cumbersome: the database. Data is the backbone to our most important applications, and as such it is stockpiled and maintained in a robust database engine—yet the contents can become quite large and unruly. This makes data difficult to manage, particularly when mobility is required. A lack of data mobility can cause severe issues in the lifecycle of a product as it impacts good QA and development.

This problem has become so widespread in fact that development teams have turned to new measures in order to complete their work despite bottlenecks in data management. These measures are collectively summed up as the DevOps movement.

From Wikipedia: DevOps (a portmanteau of development and operations) is a software development method that stresses communication, collaboration and integration between software developers and information technology (IT) professionals. DevOps is a response to the interdependence of software development and IT operations. It aims to help an organization rapidly produce software products and services.

One of the most important aspects of DevOps is release management. Standardized development environments are becoming more common and tools like git and SVN are almost universally usable to ensure a quality release and deployment process. This facilitates the goals of fast development, quality assurance, and rapid feature/maintenance release schedules. There’s only one problem with this: the database doesn’t play along. Yet.

Databases have no release-based data control like developers have for code. With code it is easy to have a trunk with branches, merge in changes, and preserve integrity with a check-out and check-in process. In fact, most databases used for development purposes are not only woefully behind production in terms of data freshness, but also are used for development across multiple releases without any possibility of identifying how the data looked or should look at the time of release. The database is just there—a giant glob of data that doesn’t follow the same rules as the code and often gets in the way of an otherwise efficient process.

In the database world, it’s common to hear the idiom “Cheap, fast, or good – pick two.” Some might even say you can only pick one . In all, the outlook is very bleak.

Here is what we at Delphix are seeing in the industry:

Inefficient QA ends up costing companies thousands or millions as bugs make their way into production.
QA delays mean extra work for project managers, QA teams, and IT teams as people scramble to find and fix issues and manage project deadlines.
Developers and QA teams sharing databases causes bottlenecks that result in difficult coding and testing practices and slower time to delivery.
Creating development or QA databases from subsets of production data (usually due to size) results in bugs due to impractical data volumes.
Slow provisioning and infrastructure rollout means delays that impact an application from the very foundation on up and result in massive schedule overruns.

Let’s look at these issues in a bit more detail.

1. Inefficient QA

One Delphix customer we were talking to spent 94% of every QA cycle building the QA environment and only 6% actually running the QA testing suite. This cycle was repeated every single time a set of features required testing and had to happen for every QA suite. The long pole in the environment build was provisioning the database the code had to run on. The burden was so overwhelming it completely monopolized their allotted time.

2. QA Delays make fixing bugs more expensive

When building a QA environment takes a long time the integrity of the application suffers. In the above case, if only 6% of QA time is spent on actual testing it is very easy for bugs to go unnoticed. When this happens there can only be two possible outcomes: either the bug finally does get caught in QA but due to the delay in bug capture it requires more code or testing to roll back and fix, or the bug doesn’t go noticed and makes its way into production where it can cost customer confidence or large sums of revenue. The longer it takes for a problem to be found and fixed, the more cost goes up.

3. Using subset databases causes bugs

For most companies it is impossible to give all developers their own full size copy of the production database (at some companies, they can’t even provide one full size copy). As such, they will create subset copies of production for development and QA purposes. This takes a huge amount of time to write and maintain scripts to create data subsets and verify data coherency and referential integrity; in fact, one customer of ours estimates that 50% of their DBA time is dedicated to creating subset database clones. And because the data is not representative of production data, it often leads to the introduction of bugs. When the QA environment is similarly crippled, it may be impossible to isolate and verify a fix to the bug once it is caught. And with a subset, there will be bugs—another customer of ours estimated that 20% of their production bugs were due to using subsets.

Queries that run fine on subsets either run into new anomalies on production data, or hit the wall as performance of a query tanks when run on a full dataset size.

4. Development teams sharing on full copy leads to bottlenecks

An alternative to subsets (or sadly, sometimes used in combination) is having the development team share a single copy of the production database. Sharing a full copy requires that any changes to the database schema, content, or metadata pass through code review to ensure that the changes won’t break the work of other developers. Instead of developers being able to perform their own work quickly and efficiently and merge it into the final release, they are forced to work in a synchronous manner that wastes time. A large online auction site we talked to said this review cycle took one to two weeks before a developer could get changes into a shared database copy. In today’s fast paced world that is simply unacceptable.

Because so many people are using the shared resource and because that resource is so big and time consuming to build, it can be a very long time between refreshes of the data. This leads to old unrepresentative data, which incurs its own set of risks every bit as dangerous as data subsets.

5. Slow environment builds and the culture of “NO”

A huge IT hurdle to tackle is creating development environments in the first place. Data is growing faster than our infrastructures are improving, and the size of our databases means that cloning or refreshing environments can take days or even weeks to accomplish. Because development environments are incredibly dependent on real-time data, this results in lost productivity and reliability. One large global bank told us that it generally took 6 months to build their environments out. Another customer who used NetApp storage snapshots said it still took them 2-4 days due to their system of entering tickets for requests and then trying to schedule time and resources from the storage admin, system admin, and DBA teams. All of this hassle leads to a culture of “NO”, where developers and other front-line product creators are simply told that meeting their needs is not possible. Sadly, many developers are used to the idea that if they need a copy of the data they will be told “NO” in most cases.

Can I get a copy ? No ! Result: Developer motivation

Data is the Problem

Getting the right data to the right people at the right time and in the right form is the problem. The problem is data agility—or more to the point, a lack thereof.

A lot of people think they understand the extent of their data problems. But let’s go through what we’ve learned from our customers in the industry:

96% of QA time is spent on building environments instead of running QA tests
90% of lost development days attributed to waiting for development environments
50% of DBA time spent making database copies
20% of bugs because of using subset data instead of full copies

Budget cuts put pressure on IT teams which limits system resources, storage, and personnel. These limits reduce the number of systems available and increase the time to provision these resources. These limits impact developers and contractors that work on these systems leading to project delays and reduced quality.

The Total Cost of Data

Data is expensive. In most calculations, the total cost of data only includes storage costs; however, storage costs are just the tip of the iceberg. For every GB or TB of storage there are teams of storage admins, system admins, and DBAs who have to manage the storage, move data, backup data, and provision clones for development and other needs. The more data there is to move, the longer it takes and the more expensive it is due to the personnel involved, schedule overruns, and time spent discussing the requirements and tasks in meetings. The total cost of data goes far beyond simple storage costs.

Delphix Solves the Data Problem

I’ve written extensively on Delphix elsewhere. You can see some of the more pertinent cases here:

In a nutshell, Delphix makes creating clone copies so fast that it is nearly unimaginable to even the most modern of IT teams. These clones can be provisioned in only a few minutes no matter what the source database size and require almost no storage overhead because duplicate blocks are shared across all clone copies. Because the clones are fast and almost free they can be provisioned for all the developers; in fact, a developer can easily be given multiple copies of the full production database. The clones can also be made fresh, just a few minutes behind production, in a matter of minutes.

When this technology was explained to one company we work with, the overwhelming response was “No way.” When the first clone of a 7TB database was created in 7 minutes and a second clone (from a time period in the past, no less) was created in an additional 10 minutes the response was, “No freaking way.” The implications of having two full-sized fresh clones to any point in time in a matter of minutes with no additional storage overhead were almost impossible to comprehend.

Database Clone Copies with Delphix: Full , Fresh, Fast, Free

What the Industry can accomplish with Delphix

Efficient QA: Low cost, high utilization
Quick QA: Fast bug fixes and problem resolution
Developer Agility: Everyone gets a database for parallelized development
Full DBs: Less bugs in development, QA, and production
Fast builds: Fast development and the culture of Yes

1.Efficient QA

With Delphix, QA environments can be spun up in minutes and branched directly from the development database.

2.Quick QA

Because QA environments can be spun up quickly, QA can test immediately at the end of a sprint to find bugs quicker and reduce the time and work needed to fix those bugs.

3.Developer Agility

With Delphix, each developer can have one or more full copies of a database and can refresh with new data in minutes. Developers will no longer be blockers to each other because of a single point of wait.

4.Full DBs

With Delphix, the development databases are full copies so there are no longer bugs due to small and inadequate representations of real data.

5.Fast Builds

With Delphix, developers can provision their own clone copies in minutes without having to coordinate with storage admins, system admins and DBAs.

Dev Ops Management with Delphix

A big question is: If each developer has their own database, how do we manage the merging of their changes into a release? This is where Red Gate Schema Compare comes into play. For example, consider this process:

Delphix: Create 3 Virtual Databases
DEV1
DEV2
TRUNK
Have two developers perform their work
Developer 1 works on DEV1
Developer 2 works on DEV2
Red Gate Schema Compare performs merges
DEV1 changes move into TRUNK
DEV2 changes move into TRUNK

This process can be easily summed up in a visual flow:

From a timeline point of view, the process is as simple as it is agile:

Quality Assurance can easily be a part of this process. The steps are more robust, but with near instant cloning it becomes trivial while remaining thorough: