The lost art of the database

As tools simplify and replace the relational database model, we risk a generation of junior developers missing a key skill.

Matt Burgess

--

When I was first starting as a developer nearly 20 years ago, my mentor gave me some of the best advice I’ve ever had.

60% of your application design is in your database.

The point was that the frontend validation and backend logic should merely enforce what the database already contains. If the entries should be unique, constrain the database with a UNIQUE index. If that field is required, it should be NOT NULL. If your data has relationships, they should be a foreign key. Wherever possible, the database should enforce its own logic and integrity.

The intervening years haven’t changed my view on this. If anything, the percentage is low. There are paradigms, technologies, and approaches now that weren’t common then, and we build new classes of application. Yet the vast majority of what we do really comes down to writing interfaces to a database.

It’s partially because of this that I find arguments about how “active record is an anti-pattern” to be misguided.

I used to be very fussy about my database structure. I made sure the order of the fields was always “correct” (id first, other ids next, short text fields next, longer text fields, and finally any kind of status). I maintained that order even after altering tables — harder than it sounds with some tools. I made sure I had indexes (or indices if you like) on relevant fields. I carefully assigned appropriate field types and lengths. I normalized tables strictly and efficiently.

But one thing that has defined my career is not what I’ve started doing, but what I’ve stopped doing. When using frameworks we must accept a loss of “control” and optimisation for a gain in overall productivity. When I started using Laravel I changed a few of my personal conventions (but honestly surprisingly few) and accepted its own. I also accepted its migration patterns and approaches as an ideal solution to the fairly challenging problem of version controlling a database. I acknowledged that the order of fields doesn’t really matter, and got on with being more effective and productive with the software.

But I also got slack.

Laravel doesn’t need foreign keys to join tables, as long as you follow the conventions I already used. And you don’t really have to put in the field lengths. And it works fine without indexing.

In development and the small applications I was building, none of this would really make a difference to performance or reliability.

But recently I worked with a junior developer to help him structure his own Laravel application’s data, and I realised that he had literally no knowledge of normalization or database design. He had a category with a 1:1 relationship joined via an association table (which he called a pivot because Laravel). He had fields with different naming conventions. He had fields named as the entity, ie product rather than product_id. He had fields with the wrong type, too long, or too short. He had repeated data for category listed in the table as strings instead of out in a lookup table. He was setting no defaults, and didn’t have fields appropriately flagged as NOT NULL. He had no index other than (thankfully) an auto-incrementing primary key.

None of this is his fault. He’s self-taught, like many of us, and was just never taught this skill. Increasingly, though… is anyone?

Look at the market now. Stacks and frameworks like MEAN and Meteor promote and encourage MongoDB, a schema-less NoSQL database as their persistence layer. Mongo and similar tools are used and recommended fairly uncritically as a “modern” storage solution.

I’ve spoken quite unkindly about Mongo before, but to dispense with that, at very least, Mongo is a technology with limited uses that should be chosen advisedly with a full knowledge of the alternatives. Mongo is great for certain usages, especially when data structure is fluid or variable.

But I have also seen a number of people use Mongo inappropriately for highly relational data simply because that’s all they know.

This should be concerning.

I should also clarify that this isn’t intended to be a criticism of ORMs. I wouldn’t even consider not using an ORM for any significant project now. Whether a DataMapper or an Active Record one. They are too valuable an abstraction. But using them well requires a decent knowledge of what that abstraction is written over, and that includes database optimization and normalization.

Tools like Laravel’s Eloquent make starting and migrating databases trivially easy but, like any other tool that lowers the barrier to entry, they can amplify a lack of knowledge in this area.

Knowledge of RDBMS is too valuable a skill to let fade out. We need to encourage the art and science of quality database design. If you’re developer who has started off by learning the MEAN stack and then just stuck with Mongo as a generic solution I strongly urge you to look at something like Postgres and gain the knowledge to make informed decisions for your storage platform.

To help out I’ve written a moderately extensive article covering database normalisation, what it means, and what the benefits are.

--

--

Matt Burgess

Senior Web Developer based in Bangkok, Thailand. Javascript, Web and Blockchain Developer.