What I see over and over again are development and QA teams using subsets of data. A full set of data is not used for testing until near the end of the development release cycle. Once a full set of data is used, testing flushes out more bugs than can be fixed in the time remaining forcing release dates to be pushed or releases to include bugs:
Why do people use subsets? Here is one story. There was a problem at a customer in application development where using full copies for developers and QA was causing excessive storage usage and they wanted to reduce costs , so they decided to use subsets of the production development and QA
Data growing, storage costs too high, decided to roll out subsetting
App teams and IT Ops teams had to coordinate and manage the complexity of the shift to subsets in dev/test
Scripts had to be written to extract the correct and coherent data, such as correct date ranges and respect referential integrity
It’s difficult to get 50% of data 100% of skew instead of 50% of data 50% of skew
Scripts were constantly breaking as production data evolved requiring more work on the subsetting scripts
QA teams had to rewrite automated test scripts to run correctly on subsets
Time lost in ADLC, SDLC to enable subsets to work (converting CapEx into higher OpEx) put pressure on release schedules
Errors were caught late in UAT, performance, and integration testing, creating “integration or testing hell” at the end of development cycles
Major incidents occurring post deployment, forcing more detailed tracking of root cause analysis (RCA)
Production bugs causing downtime were due 20-40% to non-representative data sets and volumes.
Moral of the story, if you roll out subsetting, it’s worth holding the teams accountable and tracking the total cost and impact across teams and release cycles. What is the real cost impact of going to subsetting? How much extra time goes into building and maintaining the subsets and more importantly what is the cost impact of letting bugs slip into production because of the subsets? If on the other hand you can test on full size data sets you will flush bugs out early where they can be fixed fast and cheaply.
A robust, efficient and cost savings alternative solution would be to use database virtualization. With database virtualization, database copies take up almost no space, can be made in minutes and all the over head and complexities listed above go way. In addition database virtualization will reduce CapEx/OpEx in many other areas such as
Provisioning operational reporting environments
Providing controlled backup/restore for DBAs
Full scale test environments.
And subsets do not provide the data control features that database virtualization provides to accelerate application projects (including analytics, MDM, ADLC/SDLC, etc.). Our customers repeatedly see 50% acceleration on project timelines and cost, which generally dwarf the CapEx, OpEx storage expense lines, due to the features we make available in our virtual environments:
Fast data refresh
Integration
Branching (split a copy of dev database off for use in QA in minutes)
Automated secure branches (masked data for dev)
Bookmarks for version control or compliance preservation
Share (pass errors + data environment from test to dev, if QA finds a bug, they can pass a copy of db back to dev for investigation)
Reset/rollback (recover to pre-test state or pre-error state)
Parallelize all steps: have multiple QA databases to run QA suites in parallel. Give all developers their own copy of the database so they can develop without impacting other developers.
留言