In Defence of ORMs
I’ve seen a strange amount of bickering about this particular pattern, and it honestly surprises me that there are two sides of this. Let me explain why one side’s opinion is so very wrong.
For a start let me get one thing very clear. ORMs are a tool that I support. That doesn’t mean I think that all ORMs are good by default. You can make a dogshit version of an excellent pattern. Stating that UnicornTearsORM is hot garbage doesn’t make ORMs hot garbage in general, it makes UnicornTears a bad library.
I say this as a user and supporter of ORMs who thinks Sequelize is a punishment for my varied, and enthusiastic sins.
One thing that seems to be abundantly clear in the debate is that people fundamentally misunderstand what ORMs are for. They are an abstraction. I have written about abstractions previously and in fact ORMs were an example I provided of them.
The issue people seem to be mostly missing is that ORMs are an abstraction, and specifically what they are an abstraction of.
A database can store two things. The first is obvious: data. The second is less obvious, but equally important: the relationships between data.
It is that second part that is hard to abstract. It is that second part that is hard to understand. It is that second part that ORMs are designed to help you deal with.
This post was spawned mostly by two articles. The first is Why You Should Avoid ORMs. The second is Objection To ORM Hatred . Largely a response to the first, which counters some points but, IMO, comes to the wrong conclusions.
These aren’t the only places I’ve seen this discussion, though. It’s come up more than once on Reddit threads I can’t find or be bothered to search for. Many of these brought up the same - IMO invalid - talking points.
I’m going to talk about ORMs here in the context of Laravel specifically. The reason for that is simple: Laravel’s ORM, called Eloquent, is exceptionally good and very well documented. The comments I’m making mostly apply to any ORM, but specifically to any Active Record ORM.
Active Record is a design pattern popularised by Rails, whose own ORM is simply called Active Record for that reason. The general model has been adopted by a huge number of ORMs since. The overall pattern is one of simply mapping one database table to an entity type, such that one row is equal to one object instance. Built-in methods help with things like updating, creating, and relationship configuration.
It’s not a complex pattern, and in fact it’s intuitive enough that some of the first PHP software I ever wrote I reinvented the pattern for myself in custom code, before later rediscovering it in ORMs.
Active Record isn’t the final word in ORMs. There are more advanced solutions like Data Mapper patterns that some of these comments simply don’t apply to. Just assume I’m referring to “ORMs” as a shorthand for Active Record-Based ORMs.
What Are ORMs For (and why do people use them)?
ORMs are intended to make the process of accessing and storing data in a database more flexible and manageable. They are intended not to *avoid* the complexity of SQL queries, but to isolate and codify it. Using an ORM you define the shape of your data and more critically the shape of your relationships.
What ORMs are abstracting isn’t “string generation”. It’s the entities, and the relationships between them. Using an ORM, and using it well, isolates the nature of the relationship between the entities into one fixed place.
Take something like a post, with comments. The comments belongTo the post. The post hasMany comments. That’s not going to change. No other relationship really makes sense. Explicitly defining the relationship between the two in the ORM model might look something like this.
public function comments() {
return $this->hasMany(Comment::class);
}
And then you never have to think about how they connect again.
The above is written in Eloquent code, by the way.
More importantly you never have to think about the fact that they are a join. The way the data is retrieved… not your problem. For better or for worse and we’ll get back to that.
This has a lot of benefits. For a start, you can get your post and comments really easily to display, again, in Eloquent.
$posts = Post::with('comments')->get();
But selecting data is the easy part. It seems like everyone forgets that you need to do more than select. You also need to add or edit data, create associations, etc.
Post->findOrFail($id)->addComment(new Comment())
We could do the same thing in Sequelize, too. Sequelize is the dominant SQL-based ORM for NodeJS.
(await Post.findByPk(id)).addComment({whateverData})
Which is the easier update query?
const post = Post.getById(12);
post.title = 'ORMs are cool'
await post.save();
Or this?
UPDATE posts SET title="ORMs are cool" WHERE id = 12;
Let’s be honest, it’s pretty much a wash, right? Neither is really much harder or easier and they’re both quite clear. The SQL is one line, but I’ve skipped the code required to actually execute it.
So what about this?
$post = Post.findOrFail(12);
$post->update(request->body);
$post.save();
We’ll get a bit more clear and put in the actual code required to run the query this time. To do more of a like-for-like this is also in PHP, using PDO with.
//PDO example$query = "UPDATE posts SET title = :title, content = :content, summary = :summary, category_id = :category_id WHERE id = :id";
$statement = $database->prepare($query);$statement->execute($data_array);
I should also note that you’d damn well better hope your $data_array
has a value for summary
or you’re going to wipe it. The above ORM request won’t. So actually this would be more like this.
$query = "UPDATE posts SET "
if($data_array['title']) $query .= "title = :title, ";
if($data_array['content']) $query .= "content = :content, ";
if($data_array['summary']) $query .= "summary = :summary, ";
if($data_array['category_id']) $query .= "category_id = : category_id ";
$query .= " WHERE id = :id";
Of course, if you don’t pass in a category ID you’ll have =:summary, WHERE
, which will error. So actually you’d need to loop over $data_array
and add to an array, and then join them with a ,
. Assuming, of course, that your data array is clean and doesn’t include anything like a CSRF token in which case you might have to remove that.
The point is, managing queries as strings, and doing it well, is actually hard. It’s certainly harder than flinging it at an ORM and letting it deal with the implications.
ORMs let you think about your data, your entities, as objects, and act on them accordingly. I don’t want to insert a row containing the following foreign key relationship. I want to add a product to the order. I don’t want to update the task’s Owner ID field, I want to assign it. I want to “add a tag”, not “insert an association table row with the current post id and the desired tag id“.
Another point that is important to note is that ORMs assemble data into meaningful structures. If you do an SQL query to request four posts that each have five comments, the query will return you 20 rows. It will then be up to you to re-assemble that data into a format that makes sense to display or iterate. Want to also include the post tags? Good luck with that, as it multiplies out the rows.
By contrast, the ORM will return an array or collection of four Post objects, each of which has a comments array, and a tags array.
While I’m talking about benefits, ORMs also don’t exist in isolation. Laravel, for example, doesn’t actually return arrays. It returns collections, an extended form of iterable providing a lot of additional functionality and an easier, more functional-style API. Additionally, Laravel allows you to directly bind an object type in the route definition.
Route::get('comments/{comment}', function (Comment $comment) {
return $comment;
});
This sort of binding doesn’t seem like a huge improvement, but it’s just a few lines of code that simply aren’t needed. Note also that there’s no need to handle the JSON serialization process for this entity - it’s built into the framework and ORM as well.
My own personal experience has been one that includes exactly why you need this kind of tool, and why I’m always horrified to hear people dismissing them in favour of some kind of “just manage the string” utility function.
I built a complex system a number of years ago. It was a big job, a large application for managing car sales between dealerships. And it had a moderately complicated taxonomy. Makes had models, models had series, (optionally) and the model potentially had a badge, as well, but only occasionally.
We built the queries manually, and we had some tools in place to help manage the bulk of the selections, the bulk of the joins. Nevertheless, the maintenance burden was extreme. A specific subset of data wouldn’t be available in this version of the query because that table hadn’t been joined. The “currently available” list would have slightly different fields to what the “your recent purchases” list would have. The structure of the data and even the naming of fields was also often different, because of potential conflicts requiring aliases. Bearing in mind that all the data came from SQL queries it was just a flat row of data, so it required a ton of rebuilding.
The point is that I know from first hand experience what NOT to do. And I know the solution is to use an ORM.
What is wrong with ORMs
Now we know why people use an ORM, let’s talk about why other people say not to. These are the same arguments that come up again and again.
They make inefficient code.
This is absolutely true… ish. The fact is it can be true. You can make inefficient queries. Our Post with comments request above can potentially fall prey to an N+1 issue, where it does a single query and gets all the ids, then has to run a query on each id. But the fact is it doesn’t have to do that, and I can very easily improve its performance with eager loading.
This sort of inefficiency is simply an encouragement to learn to use an ORM well, not a criticism of them in general, and eager loading is a fundamental part of the knowledge of any ORM.
To contrast, honestly the worst performing set of queries I’ve ever seen were not written by an ORM but by hand. It was a contact list with a standard contacts table. But then the fields and values were “abstracted” out, so there was a field_id and a value_id. There were no joins, and every piece of data required a query. For about 50 records it ran 470 queries.
(If you’re curious I made it run a join on each row of data, got it down to 51 queries and got on with my life.)
Anyway the point is that bad code doesn’t necessarily mean that the language or pattern is bad. You can SELECT *
in an SQL query just as easily as you can in an ORM. You just might not notice. Which brings me to my next point.
Do you actually care?
I know. Look, yes, I know. I know that we are supposed to care. But premature optimisation is the root of all evil. If your code is doing an N+1 query and you didn’t even notice that it was slow… maybe it’s not such a big deal? If your inefficient query gets you to market, great. You can worry about optimising later when you start to find you have a bit of a cumulative delay.
Most ORM inefficiencies of this kind are simply solved by eager loading. This means that you tell the ORM in advance what data it WILL need, and it will use what it knows about the relationships to do queries to fetch that data.
Many times the issue isn’t that you go from a complex query to a less efficient one, it’s that you go from one complex query to two simple ones. The performance hit on that isn’t a dealbreaker, and can actually be more efficient.
By all means do the bare minimum and throw an “eager” here and there, but is the SELECT *
actually (not hypothetically, but actually currently) causing you a memory issue or over-large data set? If not… who gives a shit?
It should be noted that good ORMs have systems for optimising their queries, from eager loading to explicit field selection. You can use them later if you actually have the need, or just don’t bother.
It’s undoubtedly true that ORMs can be a performance trade-off. But the performance loss isn’t necessarily significant, and the gain can be enormous - not just in terms of simpler and more maintainable code, but having data that is structured in a way that makes sense.
You’re learning the ORM not SQL, and it’s not transferable
This is bullshit in two ways. First of all, I’ve been doing SQL for nearly 20 years. I’m not by any means an expert, but I can certainly get the job done in an RMDBS. It’s not my priority in a project to learn SQL. It shouldn’t be yours either.
As a developer it is your responsibility to prioritise the quality, performance, and maintainability of the code, and to deliver features and value, not to learn what’s most convenient. That’s a side-effect.
To clarify, SQL is an enormously powerful thing to learn. One of the ways I’ve seen this phrased is that people learned an ORM to avoid learning SQL. For the love of God do not do that.
I’m absolutely not dismissing the value of SQL. I genuinely think it’s an art people are losing and have written quite a bit about how to do SQL and database design well. But is it reasonable to suggest that the business/project priority is you learning core technologies instead of making the best possible software?
Secondly, the suggestion that the skills are not transferable is also completely bullshit. I knew Laravel’s Eloquent well and it took very little time to skill up on the Lucid ORM used by AdonisJS. Nor was Sequelize any real challenge (except for being a garbage fire). Or TypeORM. Or the semi-ORM used for data access in Ember. Or even Mongoose.
There are hasMany and belongsTo relationships, there are migrations and scopes, there are models defining the data structure and relationships. These are extended with additional methods, lifecycle hooks, etc.
The syntax may differ a little, but the core principles and approach are more or less identical. Googling “relationships sequelize” found the section on “sequelize model associations”. Which… whatever, same thing.
An ORM Can’t Do Everything
Well, that’s possibly true. But doesn’t it matter? First of all, it’s not actually true. There are occasions when ORMs don’t offer a good solution, or it’s not that intuitive. Subselect queries are examples. Aggregates are another.
But the “ORMs suck” article misunderstands the point of an ORM. It isn’t to recreate every query verbatim. The query listed in the article I’m talking about is a good example. Do you really need to do this?
SELECT *
FROM item
WHERE
id NOT IN
(SELECT item_id FROM ingredient WHERE dish_id = 2)
AND id IN
(SELECT item_id FROM ingredient WHERE dish_id = 1);
If you wanted to do this with an ORM you don’t need to recreate the query. You need to think about what the query is doing and what you actually want.
The fact is, I can’t tell from this query what the relationship is between an item and an ingredient. The point of an ORM is to obviate and codify the nature of the relationship, so the fact that I have literally no clue what the relationship does here is surely a point for ORMs.
“I know that I have already purchased all the ingredients for Dish 2 in our database, however, I still need to purchase whatever ingredients are needed for Dish 1”
Do I actually need a subquery? Given that this is a subquery it’s likely to be slow, so in fact doing two simple queries might be simpler, both conceptually and in terms of actual execution.
If I think of this is ORM terms I actually probably want to do this.
$missing_ingredients = Dish->find(1)->ingredients->all()
->diff(Dish->find(2)->ingredients)->all();
This might actually be a little inefficient. But I could do it like this:
$ingredients = Ingredient::where('dish_id', 1)->get();
$ingredients2 = Ingredient::where('dish_id', 2)->get();$missing_ingredients = $ingredients->diff($ingredients2)->all();
That should be only two queries, and given a lack of subqueries both should be extremely fast. I can’t guarantee it’s faster than the original version, but I can guarantee that it’s probably fine.
None of this takes into account what the hell the item
table has, but given the $missing_ingredients
array is already full of the data we’d need it should be pretty easy to query that table.
There is a point to this. ORMs aren’t intended to provide a means to transliterate every single possible query. They are intended to provide a comprehensible and flexible way to access data and the relationships inherent in them.
A good ORM works by encouraging you to think in terms of objects and their relationships, and not in terms of inner and outer joins.
Artificial queries aside, do you actually need to query for complex exclusions the vast majority of the time? 99% of the time you want list of dishes. Or a dish and a list of ingredients. Or a list of dishes containing a specific ingredient. Or a list of the dishes in a category.
These sorts of things are trivial with an ORM.
// all the dishes
Dish::all();// the dish with all the ingredients
Dish::find(15)->with('ingredients')->get()// all the recipes this ingredient can make
Ingredient::find(123)->recipes()->get()
Why not optimise your development process around the stuff you’re actually going to do? Tell me honestly these are less readable stripped of “left inner joins” and association tables.
Even if I conceded that you couldn’t handle a subquery with the ORM — and you actually can — then I would still think that the solution that handles 99% of the query requirements was still solid.
Especially if (as above) there are workarounds for the few exceptions. And I have to be honest, I’m not conceding at all that the exact query couldn’t be replicated in an ORM like Eloquent, I just don’t understand the entities enough to do so. To be frank I’m also out of practise with Eloquent. It’s got select subquery support now that it didn’t used to have, though it always supported it as where clauses.
There is a bigger reason this isn’t valid, though. For a start, an ORM like Laravel or Sequelize will let you inject or execute random chunks of raw SQL query. Either an entire query, or just a chunk as a discrete section, such as a subquery.
Additionally, though Sequelize doesn’t support this well (as far as I can find) Eloquent actually is a query builder, or more specifically extends one. This means that you can use either the query builder or the ORM to do whatever you need.
$posts_query = Post::with('comments')->where(id, 123);
$posts_query2 = DB::table('post')
->join('comments', 'comment.post_id', '=', 'post.id')
->where('id', 123);
These should return the same query if you run ->toSql()
on them.
To give an example, you could do this:
$dish1_ids = Ingredient::where('dish_id', 1)
->pluck('item_id')->get();
$dish2_ids =Ingredient::where('dish_id', 2)
->pluck('item_id')->get();$users = DB::table('item')
->whereIn('id', $dish1_ids)
->whereNotIn('id', $dish2_ids)
->get();
This would actually cover the above query completely as far as I can tell.
The Other Article’s Points
I wanted to move onto the second discussion, though, the tepid defence of ORMs. Though it doesn’t necessarily come to a conclusion I necessarily agree with, it’s making a much better point.
Objection is the barrow this particular writer is pushing. I specifically need to point out that the writer of that article, Sami Koskimäki is the developer of Objection. This doesn’t make him biased by any means. It means he has found a solution that he likes, and is defending his view, which I 100% support. But it does also mean that he has an intimate knowledge of the capabilities and syntax, which are not necessarily available to everyone.
SELECT persons.id as parent_id, pets.name as pet_name, children.*
FROM persons
LEFT JOIN children_pivot ON children_pivot.parent_id = persons.id
LEFT JOIN persons as children ON children_pivot.child_id = children.id
LEFT JOIN pets ON pets.owner_id = persons.id
WHERE persons.first_name = 'Arnold'
AND pets.name = 'Fluffy'
Unlike the more contrived items and recipes and ingredients, this is pretty comprehensible. People (persons…? weird) have a pivot that relates back to the parent, and then we’re also getting their pets. I’m pretty confident we’re not also tracking the idea that the children might have pets. That would really complicate things.
This is the Objection approach.
Person
.query()
.select('persons.id as parent_id', 'pets.name as pet_name', 'children.*')
.loadEager({
children: true,
pets: true
})
.where('first_name', 'Arnold')
.andWhere('pets.name', 'Fluffy')
It’s reasonable. It’s doing what you’d expect it to do. But note that we still have to think in terms of a “left join”. Why do I honestly care whether it’s a left join, a right join, or a burlap sack filled with maple syrup? I just want a person who has the name Arnold, and a pet called Fluffy. I don’t care what the SQL has to do, or how the tables have to connect. What I have to care about is what the relationships are.
$fluffyOwningArnolds = Person::with('children', 'pets')
->where([
['first_name', 'Arnold'],
['pets.name', 'Fluffy']
])->get();
Note that nothing in this suggests that it even runs on a database. It’s just… stuff. It’s asking for data. The implementation details don’t have to flow in here.
It should be noted that we’re actually getting more data here than the above query, which would have omitted a number of fields from the “pets” table.
It should also be noted that… you know. I don’t care. I can fix it if I want. But for now, I don’t really care.
Conceding a Point
There’s something I need to admit. Hunter’s article is specifically Why You Should Avoid ORMs (With examples in Node.JS). If he’d said Why You Should Avoid ORMs in NodeJS he might have had a stronger point.
NodeJS is in an interesting place in terms of ORMs. Because it lacks a dominant batteries-included web framework like Rails, Laravel or Django, it also lacks their accompanying ORMs. Node is sort of idiomatically used with Express and MongoDB, whose ORM Mongoose is perfectly adequate for that. But on the SQL side of things the story is not as clear.
The dominant option is Sequelize. I’ve alluded to this earlier, but I’m not a fan, so I’ll go into a bit more details. IMO Sequelize is not well documented. Certain things that are needed - such as querying via relatively deeply nested criteria, like getting the comments where the employer of the writer of the post the comment is on is a given id (for example) - are spottily documented if at all. Typescript support is rough and only documented in a single section.
It’s also extremely verbose, and constantly duplicated. Because Sequelize supports syncing, it requires the model file to also include the full create details for the entity.
This means the following migration
await queryInterface.createTable(
"users",
{
id: {
allowNull: false,
primaryKey: true,
},
email: {
allowNull: false,
type: Sequelize.STRING,
},
name: {
allowNull: false,
type: Sequelize.STRING,
},
password: {
allowNull: false,
type: Sequelize.STRING,
},
salt: {
allowNull: true,
type: Sequelize.STRING,
},
createdAt: {
allowNull: false,
type: Sequelize.DATE,
},
updatedAt: {
allowNull: false,
type: Sequelize.DATE,
},
deletedAt: {
allowNull: true,
type: Sequelize.DATE,
},
},
{
uniqueKeys: {
unique_tag: {
customIndex: true,
fields: ["email"],
},
},
}
);
Might need the following model.
class User extends Model {
public id!: string;
public name!: string;
public email!: string;
public password!: string;
public salt!: string;public readonly createdAt!: Date;
public readonly updatedAt!: Date;
public readonly deletedAt!: Date;
}User.init(
{
name: {
type: DataTypes.STRING,
allowNull: false,
},
email: {
type: new DataTypes.STRING(),
allowNull: false,
},
password: {
type: new DataTypes.STRING(),
allowNull: false,
},
salt: {
type: new DataTypes.STRING(),
allowNull: false,
},
createdAt: {
type: new DataTypes.DATE(),
allowNull: false,
defaultValue: NOW,
},
updatedAt: {
type: new DataTypes.DATE(),
allowNull: false,
defaultValue: NOW,
},
deletedAt: {
type: new DataTypes.DATE(),
allowNull: true,
defaultValue: null,
},
},
{
tableName: "users",
sequelize,
paranoid: true
}
);
(Note that the model is TypeScript, but the migration isn’t.)
Anyway, if that looks like it’s heavily duplicating my above migration, you’re not alone in thinking that. I’ve seen this criticised on Sequelize issues, but the criticism was dismissed as invalid - because one set of data is needed for the migration and one set for the model. This isn’t a reasonable point. Default values and allow null are constraints of the database tables, not the model object, and in either case it’s clearly heavily duplicated.
Moreover, it looks like the sort of data that could simply be gathered by the ORM itself from the table structure if it was needed. Additionally, it’s extremely error-prone. Is the salt field allowed to be null above? Duplicated code provides ambiguity.
The syntax used is to provide for the syncing feature, imo an anti-pattern, yet not one I can opt out of. All of these details are mandatory.
Note that this would be the equivalent Eloquent model file. The migration file would be roughly as above (PHP syntax of course) but the Model is exceptionally different.
class User extends Model
{
use SoftDeletes;
}
Yes, that’s it. Because I’m following the conventions, there’s no need to add anything else. Eloquent knows the fields in the table, knows their defaults, etc.
Another clear example is the use of “scopes”.
Scopes are a way of limiting data in a way that’s more readable in the query. You could repeatedly do WHERE age > 18
, but you could create an isAdult
scope and reuse that.
This will provide an Eloquent model a developer experience like this:
$adult_users = User->isAdult()->get();
The Sequelize experience differs only marginally.
const adultUsers = await User.scope('adult').findAll({});
Not so different. But it’s when you need to pass an argument to them that things change so drastically.
$adult_users = User->agedOver(18)->get();
Vs
const adultUsers = await User.scope({method: ['agedOver', 18]}).findAll({});
Honestly this is just awful. Really horrible developer experience by comparison to the Eloquent equivalent. The same carries through much of Sequelize. Clumsy, awkward, and not well documented beyond very simple data. How do I do something with two arguments? What if I need multiple scopes with arguments?
Eager loading is another good example. Let’s say I want the posts with all comments.
// Eloquent
$posts = Post->with('comments')->get();// Sequelize
const posts = await Post.findAll({include: "comments"});
Not so different. Posts with category, and comments and the comment authors?
// Eloquent
$posts = Post->with('category', 'comments', 'comments.author')->get();
Sequelize takes a dark turn. You need to nest the includes. I found this to work.
const posts = await Post.findAll({include: [
{ model: Category, as: "category" },
{
model: Comment,
as: "comments",
include: [{
model: Writer,
as: "author"
}]
}
]});
Sometimes simpler includes work, but sometimes they don’t. I’d be unsurprised if this works:
const posts = await Post.findAll({include: ["category", {
model: Comment,
as: "comments",
include: "author"
}]});
It’s hard to tell what will and won’t, to be honest. Simple includes by the relationship name seem to work sometimes, but never if they’re nested from, only if they’re the “end branch”.
In any case, none of this is relevant to the topic, it’s just to make the point that just because an ORM is bad, doesn’t mean ORMs are bad.
Further to that, and just to get back on track, just because ORMs can be used badly doesn’t mean ORMs are bad. Looking at Reddit threads and similar communities finds a lot of examples of people sharing anecdotes of othere people’s badly built ORM-based queries that slowed applications to a crawl.
These anecdotes are of course valid, but the solution isn’t to delete the ORM and write bespoke SQL queries. The solution is to use them better.