Theory of Constraints: the rise of NoSQL
Why did NoSQL arise? The rise of NoSQL makes no sense if you read an article like “Why You Should Never Use MongoDB” and the ycombinator replies . Then I read something like this “Why You Should Use MongoDB” and don’t find any reasons just ranting about why the first article was shoddy.
One theory proposed by Martin Fowler is that NoSQL arose as a way to get around the constraint of the DBA bottleneck. When developers want to change the schema or get copies of the source database the DBAs are just seen as an obstacle. The obstacle of provisioning environment is told in nail biting prose in Gene Kim’s “The Phoenix Project”
Long before NoSQL and The Phoenix project, about 13 years ago, I was the DBA at a startup, back in the first dot com bubble. As a startup we were trying to get our project to market ASAP. The development team was working crazy hours and the schema was changing all the time. As the DBA, it was my responsibility to validate all the schema changes and apply the schema changes to the development database. Such validation and application could take a couple hours considering all the other work that was on my plate. This couple hour delay was considered such a hinderance to developers that they unilaterly decided to dump the structured schema and go to an EAV data store. EAV stands for “Entity Attribute Values” model. From wikipedia
There are certain cases where an EAV schematic is an optimal approach to data modelling for a problem domain. However, in many cases where data can be modelled in statically relational terms an EAV based approach is an anti-pattern which can lead to longer development times, poor use of database resources and more complex queries when compared to a relationally-modelled data schema.
The EAV “schema” (more like schema-less) meant that perfromance was dreadful and the SQL was impossible to debug as it was completely obfuscated. An example of an EAV query might look ike
Select f.symbol from strings_eav eav1, integer_eav eav2, strings_eav eav3, fundmstr f where f.id = eav1.id and eav1.id = eav2.id and eav1.id = eav3.id and eav1.attr_num = (select attar_num from attributes where attr_id = :"SYS_B_00") and eav1.attr_value=:"SYS_B_01" and eav1.act_date <= SYSDATE and eav1.end_date > SYSDATE and eav2.attr_num = (select attar_num from attributes where attr_id = :"SYS_B_02") and eav2.attr_value=:"SYS_B_03" and eav2.act_date <= SYSDATE and eav2.end_date > SYSDATE and ((f.def_attr_templ = :"SYS_B_04" and eav3.attr_num=(select attar_num from attributes where attr_id = :"SYS_B_05") and eav3.attr_value >= to_date(:"SYS_B_06") and eav3.act_date <= SYSDATE and eav3.end_date > SYSDATE and exists ( select null from integer_eav eav4 where f.id = eav4.id and eav4.id in ( select attr_num from attributes where attar_id = :"SYS_B_07" or attar_id = :"SYS_B_08") and eav4.attr_value = :"SYS_B_09" and eav4.act_date <= SYSDATE and eav4.end_date > SYSDATE) and not exists ( select null from integer_eav eav5 where f.id = eav5.id and eav5.id = ( select attr_num from attributes where attar_id = :"SYS_B_10")
Eventually as development stabilized and performance was abominable we were able to get everyone on board and move the EAV back into a structured schema. Once we went to a structured relational schema the load on the database fell from completely saturated to almost idle.
Moral of the story
The ironic thing is that many of the NoSQL setups are basically EAVs and/or databases that allow users to make “schema” type changes with out the intervention of a DBA. For example in Mongo or Couchdb, since they are JSON based, you can just add new fields to the JSON and the database deals with it. If you are using a database like Riak, it’s just a key value store, which is basically an EAV.
EAV can be good for quick development but for production it’s completely unreadable thus impossible to debug and the performance is atrocious when it comes to relational type database operations of selecting all the fields in a row and joining rows from different “tables” together.
Instead of using EAV one could give each developer a virtual relational database, like using Delphix with Oracle, and let them work as fast as they can and then let them test out merging there changes in virtual databases before merging them in to trunk which is itself a virtual database that can be branched from and merged into.
Virtual databases take up almost no storage and can be made in minutes with a developer self service interface.