Stephen Haberman

10 posts by Stephen Haberman

Dual Release Channels

Jan 23, 2026

Joist now supports dual release channels, allowing you to choose between a stable and a bleeding-edge version of the ORM.

Bringing Back Unnest

Jan 11, 2026

Stephen Haberman

The latest Joist release brings back leveraging the Postgres unnest function (see the PR), for a nice 9% bump on our alpha latency-oriented benchmarks (joist_v2 is the merged PR):

What is `unnest`?

unnest is a Postgres function that takes an array and returns a set of rows, one for each element in the array.

The simplest example is converting one array and turning it into a set of rows:

-- Pass in 1 array, get back 3 rows
select unnest(array[1, 2, 3]) as i;
 i
---
 1
 2
 3
(3 rows)

But you can also use multiple unnest statements to get multiple columns on those rows:

select
  unnest(array[1, 2, 3]) as id,
  unnest(array['foo', 'bar', 'zaz']) as first_name;
 id | first_name
----+------------
  1 | foo
  2 | bar
  3 | zaz
(3 rows)

Why `unnest` is useful

unnest is great for bulk SQL statements, such as inserting 10 authors in one INSERT; without INSERT you might have 4 columns * 10 authors = 40 query parameters:

INSERT INTO authors (first_name, last_name, publisher_id, favorite_color)
  VALUES (?, ?, ?, ?). -- #a1
    (?, ?, ?, ?), -- #a2
    (?, ?, ?, ?), -- #a3
    (?, ?, ?, ?), -- #a4
    (?, ?, ?, ?), -- #a5
    (?, ?, ?, ?), -- #a6
    (?, ?, ?, ?), -- #a7
    (?, ?, ?, ?), -- #a8
    (?, ?, ?, ?), -- #a9
    (?, ?, ?, ?) -- #a10

Where we have 10 first_name parameters, 10 last_name parameters, etc., but with unnest we can instead send up “just 1 array per column”:

INSERT INTO authors (first_name, last_name, publisher_id, favorite_color)
  SELECT * FROM unnest(
    $1::varchar[], -- first_names
    $2::varchar[], -- last_names
    $3::int[],     -- publisher_ids
    $3::varchar[] -- colors
  )

The benefits of fewer query parameters are:

Smaller SQL statements going over the wire (one of our benchmarks saw 41kb of SQL without unnest, and 350 bytes with unnest)
“Stable” SQL statement that don’t change for each “number of authors being updated”, so will have better prepared statement cache hit rates
Observability tools like Datadog will also better group “stable” SQL statements with a fixed number of parameters

This is generally a well-known approach, i.e. TimeScale had a blog post highlighting a 2x performance increase, albeit you have to get to fairly large update sizes to have this much impact.

Bringing it back?

Joist had previously used unnest in our INSERTs and UPDATEs, but we’d removed it because it turns out unnest is finicky with array columns—it “overflattens” and requires reactangular arrays.

(I.e. array columns like varchar[] for storing multiple values favorite_colors = ['red', 'blue'] in a single column.)

The 1st unfortunate thing with unnest is that it “overflattens”, i.e. if we want to update two author’s favorite_colors columns using unnest, we’d intuitively think “let’s just pass an array of arrays”, one array for each author:

-- Pass two arrays, with two elements each
-- We expect to get back two rows of {red,blue} and {green,purple}
select * from unnest(array[array['red','blue'],array['green','purple']]);
 unnest
--------
 red
 blue
 green
 purple
(4 rows)

…wait, we got 4 rows instead.

Unfortunately this is just how unnest works—when given 2-dimensional arrays (like a matrix), it creates a row per each value/cell in matrix.

Another unfortunate wrinkle with unnest is that our intuitive “array of arrays” creates fundamentally invalid arrays if the authors have a different number of favorite colors:

-- Try to create 1 array of {red,blue} and 1 array of {purple}
select * from unnest(array[array['red','blue'],array['purple']]);
-- ERROR:  multidimensional arrays must have array expressions with matching dimensions

Our error is treating the varchar[][] as “an array of arrays”, when fundamentally Postgres treats it as “a single array, of two dimensions”, like mathematical n-dimensional arrays or matrices: they must be “rectangular” i.e. every row of our m x n matrix must be the same length (we’ve been trying to create “jagged” multidimensional arrays, which is not supported).

One final wrinkle is, not only must all rows be the same length, but think about nullable columns—how could we set a1 favorite_colors='red', 'blue'] but then set a2 favorite_colors=null? With unnests strict array limitations we cannot.

The combination of these issues is why we’d previously removed unnest usage, but now have introducing our own unnest_arrays custom function that solves each of these problems.

`unnest_arrays` Custom Function

Our custom unnest_arrays function works around unnests limitations by coordinating with the Joist runtime to create 2-dimensional arrays that satisfy Postgres’s requirements, but still produce the desired values:

When updating favorite_colors for multiple authors with different number of colors, we pad trailing nulls to the end of each author’s colors array, until the array is rectangular
When updating favorite_colors to null, we also pad a single leading null to indicate the desired nullness (and pad a “not-null marker” for other rows).

This is simpler to see with an example, of updating three authors:

Author 1 should update favorite_colors=red,green,blue
Author 2 should update favorite_colors=green
Author 3 should update favorite_colors=null

We are able to issue a SQL UPDATE like:

WITH data AS (
  SELECT
    unnest($1) as id
    unnest_arrays($2) as favorite_colors,
)
UPDATE authors SET favorite_colors = data.favorite_colors
FROM data WHERE authors.id = data.id

And our favorite_colors array looks like:

-- Created by the Joist runtime by reading the Author's favoriteColors
-- property and then adding padding as needed to a rectangular 2D array
array[
  array['', 'red', 'green', 'blue'], -- a:1
  array['', 'green', null, null], -- a:2
  array[null, null, null, null]] -- a:3
]

This array is passed our unnest_arrays custom function that knows about each of these conventions:

CREATE OR REPLACE FUNCTION unnest_arrays(arr ANYARRAY, nullable BOOLEAN = false, OUT a ANYARRAY)
  RETURNS SETOF ANYARRAY
  LANGUAGE plpgsql IMMUTABLE STRICT AS
$func$
BEGIN
  FOREACH a SLICE 1 IN ARRAY arr LOOP
    -- When invoked for nullable columns, watch for the is-null/is-not-null marker
    IF nullable THEN
      -- If we should be null, drop all values and  return null
      IF a[1] IS NULL THEN a := NULL;
      -- Otherwise drop the is-not-null  marker
      ELSE a := a[2:array_length(a, 1)];
      END IF;
    END IF;
    -- Drop all remaining/trailing nulls
    a := array_remove(a, NULL);
    RETURN NEXT;
  END LOOP;
END

And that’s it; we get out the other side our desired rows:

select unnest_arrays(array[
  array['', 'red', 'green', 'blue'], -- a:1
  array['', 'green', null, null], -- a:2
  array[null, null, null, null]] -- a:3
, true);
  unnest_arrays
------------------
  {red,green,blue}
  {green}

(3 rows)

Pros/Cons

Our solution has a few pros/cons:

Pro: We’ve restored our ability to use unnest for all of our batched SELECTs, UPDATEs, and INSERTs 🎉
Con: Joist users with array columns in their schemas will need to create the unnest_arrays function
- This is a one-time migration so seems reasonable
Con: With all the null padding tricks, we’re giving up the ability to have null values within our array values
- I.e. we cannot have favorite_colors=[red, null, blue]
- For our domain modeling purposes, this is a fine/acceptable tradeoff, b/c we’ve always modeled varchar[] columns as string[] and not Array<string | null> — we actively don’t want nulls in our varchar[] columns anyway

So far these pros/cons are worth it 🚀; but, as always, we’ll continue adjusting our approach as we learn more from real-world use cases & usage.

Using CTEs and Query Rewriting for Versioning

Dec 5, 2025

Stephen Haberman

Joist is an ORM primarily developed for Homebound’s GraphQL majestic monolith, and we recently shipped a long-awaited Joist feature, SQL query rewriting via an ORM plugin API, to deliver a key component of our domain model: aggregate level versioning.

We’ll get into the nuanced details below, but “aggregate level versioning” is a fancy name for providing this minor “it’s just a dropdown, right? 😰” feature of a version selector across several major subcomponents of our application:

Where the user can:

“Time travel” back to a previous version of what they’re working on ⌛,
Draft new changes (collaboratively with other users) that are not seen until they click “Publish” to make those changes active ✅

And have the whole UI “just work” while they flip between the two.

As a teaser, after some fits & painful spurts, we achieved having our entire UI (and background processes) load historical “as of” values with just a few lines of setup/config per endpoint—and literally no other code changes. 🎉

Read on to learn about our approach!

Aggregate What Now?

Besides just “versioning”, I called this “aggregate versioning”—what is that?

It’s different from traditional database-wide, system time-based versioning, that auditing solutions like cyanaudit or temporal FOR SYSTEM_TIME AS OF queries provide (although we do use cyanaudit for our audit trail & really like it!).

Let’s back up and start with the term “aggregate”. An aggregate is a cluster of ~2-10+ “related entities” in your domain model (or “related tables” in your database schema). The cluster of course depends on your specific domain—examples might be “an author & their books”, or “a customer & their bank accounts & profile information”.

Typically there is “an aggregate parent” (called the “aggregate root”, since it sits at the root of the aggregate’s subgraph) that naturally “owns” the related children within the aggregate; i.e. the Author aggregate root owns the Book and BookReview children; the Customer aggregate root owns the CustomerBankAccount and CustomerProfile entities.

Historically, Aggregate Roots are a pattern from Domain Driven Design, and mostly theoretically useful—they serve as a logical grouping, which is nice, but don’t always manifest as specific outcomes/details in the implementation (at least from what I’ve seen).

Why Version An Aggregate?

Normally Joist blog posts don’t focus on specific domains or verticals, but for the purposes of this post, it helps to know the problem we are solving.

At Homebound, we’re a construction company that builds residential homes; our primary domain model supports the planning and execution of our procurement & construction ops field teams.

The domain model is large (currently 500+ tables), but two key components are:

The architectural plans for a specific model of home (called a PlanPackage), and
The design scheme for products that go into a group of similar plans (called a DesignPackage)—i.e. a “Modern Farmhouse” design package might be shared across ~4-5 separate, but similarly-sized/laid out, architectural plans

Both of these PlanPackages and DesignPackages are aggregate roots that encompass many child PlanPackage... or DesignPackage... entities within them:

What rooms are in the plan? PlanPackageRooms
What materials & labor are required (bricks, lumber, quantities)? PlanPackageScopeLines
What structural/floor plan options do we offer to customers? PlanPackageOptions
- How do these options change the plan’s materials & labor? A m2m between PlanPackageScopeLines and PlanPackageOptions
What are the appliances in the kitchen? DesignPackageProducts
What spec levels, of Essential/Premium/etc, do we offer? DesignPackageOptions
- How do these spec levels change the home’s products? A m2m between DesignPackageProducts and DesignPackageOptions

This web of interconnected data can all be modeled successfully (albeit somewhat tediously)—but we also want it versioned! 😬

Change management is extremely important in construction—what was v10 of the PlanPackage last week? What is v15 of the PlanPackage this week? What changed in each version between v10 and v15? Are there new options available to homebuyers? What scope changed? Do we need new bids? Etc.

And, pertintent to this blog post, we want each of the PlanPackages and DesignPackages respective “web of interconnected data” (the aggregate root & its children) versioned together as a group with application-level versioning: multiple users collaborate on the data (making simultaneous edits across the PlanPackage or DesignPackage), co-creating the next draft, and then we “Publish” the next active version with a changelog of changes.

After users draft & publish the plans, and potentially start working on the next draft, our application needs to be able to load a complete “aggregate-wide snapshot” for “The Cabin Plan” PlanPackage v10 (that was maybe published yesterday) and a complete “aggregate-wide snapshot” of “Modern Design Scheme” DesignPackage v8 (that was maybe published a few days earlier), and glue them together in a complete home.

Hopefully this gives you an idea of the problem—it’s basically like users are all collaborating on “the next version of the plan document”, but instead of it being a single Google Doc-type artifact that gets copy/pasted to create “v1” then “v2” then “v3”, the collaborative/versioned artifact is our deep, rich domain model (relational database tables) of construction data.

Schema Approach

After a few cough “prototypes” cough of database schemas for versioning in the earlier days of our app, we settled on a database schema we like: two tables per entity, a main “identity” table, and a “versions” table to store snapshots, i.e. something like:

authors table: stores the “current” author data (first_name, last_name, etc), with 1 row per author (we call this the “identity” table)
author_versions table: stores snapshots of the author over time (author_id, version_id, first_name, last_name), with 1 row per version of each author (we call this the “version” table)

This is an extremely common approach for versioning schemas, i.e. it’s effectively the same schema suggested by PostgreQL’s SQL2011Temporal docs, albeit technically they’re tracking system time, like an audit trail, and not our user-driven versioning.

This leads to a few differences: the SQL2011Temporal history tables use a _system_time daterange column to present “when each row was applicable in time” (tracking system time), while we use two FKs that also form “an effective range”, but the range is not time-based, it’s version-based: “this row was applicable starting at first=v5 until final=v10”.

So if we had three versions of an Author in our author_versions table, it would look like:

id=10 author_id=1 first_id=v10 final_id=v15 first_name=bob last_name=smith
- From versions v10 to v15, the author:1 had a firstName=bob
id=11 author_id=1 first_id=v15 final_id=v20 first_name=fred last_name=smith
- From versions v15 to v20, the author:1 had a firstName=fred
id=11 author_id=1 first_id=v20 final_id=null first_name=fred last_name=brown
- From versions v20 to now, the author:1 had a lastName=brown

We found this _versions table strikes a good balance of tradeoffs:

It only stores changed rows—if v20 of the aggregate root (i.e. our PlanPackage) only changed the Authors and two of its Books, there will only be 1 author_versions and two book_versions, even if the Author has 100s of books (i.e. we avoid making full copies of the aggregate root on every version)
When a row does change, we snapshot the entire row, instead of tracking only specific columns changes (storing the whole row takes more space, but makes historical/versioned queries much easier)
- Technically we only include mutable columns in the _versions table, i.e. our entities often have immutable columns (like type flags or parent references) and we don’t bother copying these into the _versions tables.
We only store 1 version row “per version”—i.e. if, while making PlanPackage v20, an Authors first name changes multiple times, we only keep the latest value in the draft author_versionss row.
- This is different from auditing systems like SQL2011Temporal, where _history rows are immutable, and every change must create a new row—we did not need/want this level of granularity for our application-level versioning.

With this approach, we can reconstruct historical versions by finding the singular “effective version” for each entity, with queries like:

select * from author_versions av
join authors a on av.author_id = a.id
where a.id in ($1)
  and av.first_id <= $2
  and (av.final_id is null or av.final_id < $3)

Where:

$1 is whatever authors we’re looking for (here a simple “id matches”, but it could be an arbitrarily complex WHERE condition)
$2 is the version of the aggregate root we’re “as of”, i.e. v10
$3 is also the same version “as of”, i.e. v10

The condition of first_id <= v10 < final_id finds the singular author_versions row that is “effective” or “active” in v10, even if the version itself was created in v5 (and either never replaced, or not replaced until “some version after v10” like v15).

…but what is “Current”?

The authors and author_versions schema is hopefully fairly obvious/intuitive, but I left out a wrinkle: what data is stored in the authors table itself?

Obviously it should be “the data for the author”…but which version of the author? The latest published data? Or latest/WIP draft data?

Auditing solutions always put “latest draft” in authors, but that’s because the application itself is always reading/writing data from authors, and often doesn’t even know that the auditing author_versions tables exist—but in our app, we need workflows & UIs to regularly read “the published data” & ignore the WIP draft changes.

So we considered the two options:

The authors table stores the latest draft version (as in SQL2011Temporal), or
The authors table stores the latest published version,

We sussed out pros/cons in a design doc:

Option 1. Store “latest draft” data in authors
- Pro: Writes with business logic “just read the Author and other entities” for validation
- Con: All readers, even those that just want the published data, must use the _versions table to “reconstruct the published/active author”
Option 2. Store the “latest published” data in authors
- Pro: The safest b/c readers that “just read from authors” will not accidentally read draft/unpublished data (a big concern for us)
- Pro: Any reader that wants to “read latest” doesn’t have to know about versioning at all, and can just read the authors table as-is (also a big win)
- Con: Writes that have business logic must first “reconstruct the draft author” (and its draft books and its draft book reviews if apply cross-entity business rules/invariants) from the _versions tables

By “reconstruct the [published or draft] author”, what we mean is that instead of “boring CRUD code” that just SELECTs from the authors, books, etc. tables, version-aware code must:

a) actually read from the author_verisons, book_versions tables,
b) know “which version” it should use to find those authors/books
- …and we might be looking for “PlanPackage v10” but “DesignPackage v8” so track multiple, contextual versions within a single request, not just a single “as of” timestamp
c) do version-aware graph traversal, i.e. “boring walk the graph” code like for (const book in author.books) that is ubiquitous in our codebase needs the author.books relation to use the version-aware book_versions.author_id instead of the books.author_id, b/c the Book might have changed its Author over time.
- So nearly every graph navigation needs checked for “is this relation actually versioned?”, and if so, opt-into the more complex, version-aware codepath.

This “reconstruction” problem seemed very intricate/complicated, and we did not want to update our legacy callers (which were mostly reads) to “do all this crazy version resolution”, so after the usual design doc review, group comments, etc., we decide to store latest published in authors.

The key rationale being:

We really did not want to instrument our legacy readers to know about _versions tables to avoid seeing draft data (we were worried about draft data leaking into existing UIs/reporting)
We thought “teaching the writes to ‘reconstruct’ the draft subgraph” when applying validation rules would be the lesser evil.

(You can tell I’m foreshadowing this was maybe not the best choice.)

Initial Approach: Not so great

With our versioning scheme in place, we started a large project for our first versioning-based feature: allowing home buyers to deeply personalize the products (sinks, cabinet hardware, wall paint colors, etc) in the home their buying (i.e. in v10 of Plan 1, we offered sinks 1/2/3, but in v11 of Plan 1, we now offer sinks 3/4/5).

And it was a disaster.

This new feature set was legitimately complicated, but “modeling complicated domain models” is supposed to be Joist’s sweet spot—what was happening?

We kept having bugs, regressions, and accidental complexity—that all boiled down to the write path (mutations) having to constantly “reconstruct the drafts” that it was reading & writing.

Normally in Joist, a saveAuthor mutation just “loads the author, sets some fields, and saves”. Easy!

But with this versioning scheme:

the saveAuthor mutation has to first “reconstruct the latest-draft Author from the author_versions (if it exists)
any writes cannot be “just update the Author row” (that would immediately publish the change), they need to be staged into the draft AuthorVersion
any Joist hooks or validation rules also need to do the same thing:
- validation rules have to “reconstruct the draft” view of the author + books + book reviews subgraph of data they’re validating,
- hooks that want to make reactive changes must also “stage the change” into a draft BookVersion or BookReviewVersion instead of “just setting some Book fields”.

We had helper methods for most of these incantations—but it was still terrible.

After a few months of this, we were commiserating about “why does this suck so much?” and finally realized—well, duh, we were wrong:

We had chosen to “optimize reads” (they could “just read from authors” table).

But, in doing so, we threw writes under the bus—they needed to “read & write from drafts”—and it was actually our write paths that are the most complicated part of our application—validation rules, side effects, business process, all happen on the write path.

We needed the write path to be easy.

New Approach: Write to Identities

We were fairly ecstatic about reversing direction, and storing drafts/writes directly in authors, books, etc.

This would drastically simplify all of our write paths (GraphQL mutations) back to “boring CRUD” code:

// Look, no versioning code!
const author = await em.load(Author, "a:1");
if (author.shouldBeAllowedToUpdateFoo) {
  author.foo = 1;
}
await em.flush();

…but what about those reads? Wouldn’t moving the super-complicated “reconstruct the draft” code out of the writes (yay!), over into “reconstruct the published” reads (oh wait), be just as bad, or worse (which was the rationale for our original decision)?

We wanted to avoid making the same mistake twice, and just “hot potato-ing” the write path disaster over to the read path.

Making Reads not Suck

We spent awhile brainstorming “how to make our reads not suck”, specifically avoiding manually updating all our endpoints’ SQL queries/business logic to do tedious/error-prone “remember to (maybe) do version-aware reads”.

If you remember back to that screenshot from the beginning, we need our whole app/UI, at the flip of a dropdown, to automatically change every SELECT query from:

-- simple CRUD query
SELECT * FROM authors WHERE id IN (?) -- or whatever WHERE clause

To a “version-aware replacement”:

-- find the right versions row
SELECT * FROM author_versions av
WHERE av.author_id in (?) -- same WHERE clause
  AND av.first <= v10 -- and versioning
  AND (av.final IS NULL or av.final > v10)

And not only for top-level SELECTs but anytime authors is used in a query, i.e. in JOINs:

-- simple CRUD query that joins into `authors`
SELECT b.* FROM books
  JOIN authors a on b.author_id = a.id
  WHERE a.first_name = 'bob';

-- version-aware replacement, uses books_versions *and* author_versions
-- for the join & where clause
SELECT bv.* FROM book_versions bv
  -- get the right author version
  JOIN author_versions av ON
    bv.author_id = av.author_id AND (av.first <= v10 AND (av.final IS NULL or a.final > v10))
  -- get the right book version
  WHERE bv.first <= v10 AND (bv.final IS NULL or bv.final > v10)
  -- predicate should use the version table
  AND av.first_name = 'bob';

1. Initial Idea: Using CTEs

When it’s articulated like “we want every table access to routed to the versions table”, a potential solution starts to emerge…

Ideally we want to magically “swap out” authors with “a virtual authors table” that automatically has the right version-aware values. How could we do this, as easily as possible?

It turns out a CTE is a great way of structuring this:

-- create the fake/version-aware authors table
WITH _authors (
  SELECT
    -- make the columns look exactly like the regular table
    av.author_id as id,
    av.first_name as first_name
    -- but read the data from author_versions behind the scenes
  FROM author_versions
  WHERE (...version matches...)
)
-- now the rest of the application's SQL query as normal, but we swap out
-- the `authors` table with our `_authors` CTE
SELECT * FROM _authors a
WHERE a.first_name = 'bob'

And that’s (almost) it!

If anytime our app wants to “read authors”, regardless of the SQL query it’s making, we swapped authors-the-table to _authors-the-CTE, the SQL query would “for free” be using/returning the right version-aware values.

So far this is just a prototype; we have three things left to do:

Add the request-specific versioning config to the query
Inject this rewriting as seamlessly as possible
Evaluate the performance impact

2. Adding Request Config via CTEs

We have a good start, in terms of a hard-coded prototype SQL query, but now we need to get the first <= v10 AND ... pseudo code in the previous SQL snippets actually working.

Instead of a hard-coded v10, we need queries to use:

Dynamic request-specific versioning (i.e. the user is currently looking at, or “pinning to”, PlanPackage v10), and
Support pinning multiple different aggregate roots in the same request (i.e. the user is looking at PlanPackage v10 but DesignPackage v15)

CTEs are our new hammer 🔨—let’s add another for this, calling it _versions and using the VALUES syntax to synthesize a table:

WITH _versions (
  SELECT plan_id, version_id FROM
   -- parameters added to the query, one "row" per pinned aggregate
   -- i.e. `[1, 10, 2, 15]` means this request wants `plan1=v10,plan2=v15`
  VALUES (?, ?), (?, ?)
), _authors (
  -- the CTE from before but joins into _versions for versioning config
  SELECT
    av.author_id as id,
    av.first_name as first_name
  FROM authors a
  JOIN _versions v ON a.plan_id = v.plan_id
  -- now we know:
  -- a) what plan the author belongs to (a.plan_id, i.e. its aggregate root)
  -- b) what version of the plan we're pinned to (v.version_id)
  -- so we can use them in our JOIN clause
  JOIN author_versions av ON (
    av.author_id = a.id AND
    av.first_id >= v.version_id AND (
      av.final_id IS NULL OR av.final_id < v.version_id
    )
  )
)
SELECT * FROM _authors a WHERE a.first_name = 'bob'

Now our application can “pass in the config” (i.e. for this request, plan:1 uses v10, plan:2 uses v15) as extra query parameters into the query, they’ll be added as rows to the _versions CTE table, and then the rest of the query will resolve versions using that data.

Getting closer!

One issue with the query so far is that we must ahead-of-time “pin” every plan we want to read (by adding it to our _versions config table), b/c if a plan doesn’t have a row in the _versions CTE, then the INNER JOINs will not find any p.version_id available, and none of its data will match.

Ideally, any plan that is not explicitly pinned should have all its child entities read their active/published data (which will actually be in their author_versions tables, and not the primary authors tables); which we can do with (…wait for it…) another CTE:

WITH _versions (
  -- the injected config _versions stays the same as before
  SELECT plan_id, version_id FROM VALUES (?, ?), (?, ?)
), _plan_versions (
   -- we add an additional CTE that defaults all plans to active unless pinned in _versions
  SELECT
    plans.id as plan_id,
    -- prefer `v.version_id` but fallback on `active_version_id`
    COALESCE(v.version_id, plans.active_version_id) as version_id,
  FROM plans
  LEFT OUTER JOIN _versions v ON v.plan_id = plans.id
)
_authors (
  -- this now joins on _plan_versions instead _versions directly
  SELECT
    av.author_id as id,
    av.first_name as first_name
  FROM authors a
  JOIN _plan_versions pv ON a.plan_id = pv.id
  JOIN author_versions av ON (
    av.author_id = a.id AND
    av.first_id >= pv.version_id AND (
      av.final_id IS NULL OR av.final_id < pv.version_id
    )
  )
)
-- automatically returns & filters against the versioned data
SELECT * FROM _authors a WHERE a.first_name = 'bob'

We’ve basically got it, in terms of a working prototype—now we just need to drop it into our application code, ideally as easily as possible.

3. Injecting the Rewrite Automatically

We’ve discovered a scheme to make reads automatically version-aware—now we want our application to use it, basically all the time, without us messing up or forgetting the rewrite incantation.

Given that a) we completely messed this up the 1st time around 😕, and b) this seems like a very mechanical translation 🤖, what if we could automate it? For every read?

Our application already does all reads through Joist (of course 😅), as EntityManager calls:

// Becomes SELECT * FROM authors WHERE id = 1
const a = em.load(Author, "a:1");
// Becomes SELECT * FROM authors WHERE first_name = ?
const as = em.find(Author, { firstName: "bob" });
// Also becomes SELECT * FROM authors WHERE id = ?
const a = await book.author.load();

It would be really great if these em.load and em.find SQL queries were all magically rewritten—so that’s what we did. 🪄

We built a Joist plugin that intercepts all em.load or em.find queries (in a new plugin hook called beforeFind), and rewrites the query’s ASTs to be version-aware, before they are turned into SQL and sent to the database.

So now what our endpoint/GraphQL query code does is:

function planPackageScopeLinesQuery(args) {
  // At the start of any version-aware endpoint, setup our VersioningPlugin...
  const plugin = new VersioningPlugin();
  // Read the REST/GQL params to know which versions to pin
  const { packageId, versionId } = args;
  plugin.pin(packageId, versionId);
  // Install the plugin into the EM, for all future em.load/find calls
  em.addPlugin(plugin);

  // Now just use the em as normal and all operations will automatically
  // be tranlated into the `...from _versions...` SQL
  const { filter } = args;
  return em.find(PlanPackageScopeLine, {
    // apply filter logic as normal, pseudo-code...
    quantity: filter.quantity,
  });
}

How does this work?

Internally, Joist parses the arguments of em.find into an AST, ParsedFindQuery, that is a very simplified AST/ADT version of a SQL query, i.e. the shape looks like:

// An AST/ADT for an `em.find` call that will become a SQL `SELECT`
type ParsedFindQuery = {
  // I.e. arrays of `a.*` or `a.first_name`
  selects: ParsedSelect[];
  // I.e. the `authors` table, with its alias, & any inner/outer joins
  tables: ParsedTable[];
  // I.e. `WHERE` clauses
  condition?: ParsedExpressionFilter;
  groupBys?: ParsedGroupBy[];
  orderBys: ParsedOrderBy[];
  ctes?: ParsedCteClause[];
};

After Joist takes the user’s “fluent DSL” input to em.find and parses it into this ParsedFindQuery AST, plugins can now inspect & modify the query:

class VersioningPlugin {
  // List of pinned plan=version tuples, populated per request
  #versions = []

  // Called for any em.load or em.find call
  beforeFind(meta: EntityMetadata, query: ParsedFindQuery): void {
    let didRewrite = false;
    for (const table of [...query.tables]) {
      // Only rewrite tables that are versioned
      if (hasVersionsTable(table)) {
        this.#rewriteTable(query, table);
        didRewrite = true;
      }
    }
    // Only need to add these once/query
    if (didRewrite) {
      this.#addCTEs(query);
    }
  }

  #rewriteTable(query, table) {
    // this `table` will initially be the `FROM authors AS a` from a query;
    // leave the alias as-is, but swap the table from "just `authors`" to
    //  the `_authors` CTE we'll add later
    table.table = `_${table.table}`;

    // If `table.table=author`, get the AuthorMetadata that knows the columns
    const meta = getMetadatFromTableName(table.table);

    // Now inject the `_authors` CTE that to be our virtual table
    query.ctes.push({
      alias: `_${table.table}`,
      columns: [...],
      query: {
        kind: "raw",
        // put our "read the right author_versions row" query here; in this blog
        // post it is hard-coded to `authors` but in the real code would get
        // dynamically created based on the `meta` metadta
        sql: `
          SELECT
            av.author_id as id,
            av.first_name as first_name,
            -- all other author columns...
          FROM authors a
          JOIN _plan_versions pv ON a.plan_id = pv.id
          JOIN author_versions av ON (
            av.author_id = a.id AND
            av.first_id >= pv.version_id AND (
              av.final_id IS NULL OR av.final_id < pv.version_id
            )
          )
        `
      }
    })
  }

  // Inject the _versions and _plan_versions CTEs
  #addCTEs(query) {
    query.ctes.push( {
      alias: "_versions",
      columns: [
        { columnName: "plan_id", dbType: "int" },
        { columnName: "version_id", dbType: "int" },
      ],
      query: {
        kind: "raw",
        bindings: this.#versions.flatMap(([pId, versionId]) => unsafeDeTagIds([pId, versionId])),
        sql: `SELECT * FROM (VALUES ${this.#versions.map(() => "(?::int,?::int)").join(",")}) AS t (plan_id, version_id)`,
      },
    },
    // Do the same thing for _plan_versions
    query.ctes.push(...);
  }
}

This is very high-level pseudo-code but the gist is:

We’ve mutated the query to use our _authors CTE instead of the authors table
We’ve injected the _authors CTE, creating it dynamically based on the AuthorMetadata
We’ve injected our _versions and _plan_versions config tables

After the plugin’s beforeFind is finished, Joist takes the updated query and turns it into SQL, just like it would any em.find query, but now the SQL it generates automatically reads the right versioned values.

4. Performance Evaluation

Now that we have everything working, how was the performance?

It was surprisingly good, but not perfect—we unfortunately saw a regression for reads “going through the CTE”, particularly when doing filtering, like:

WITH _versions (...),
 _plan_versions (...),
 _authors (...),
)
-- this first_name is evaluating against the CTE results
SELECT * FROM _authors a WHERE a.first_name = 'bob'

We were disappointed b/c we thought since our _authors CTE is “only used once” in the SQL query, that ideally the PG query planner would essentially “inline” the CTE and pretend it was not even there, for planning & indexing purposes.

Contrast this with a CTE that is “used twice” in a SQL query, which our understanding is that then Postgres executes it once, and materializes it in memory (basically caches it, instead of executing it twice). This materialization would be fine for smaller CTEs like _versions or _plan_versions, but on a potentially huge table like authors or plan_package_scope_lines, we definitely don’t want those entire tables sequentially scanned and creating “versioned copies” materialized in-memory before any WHERE clauses were applied.

So we thought our “only used once” _authors CTE rewrite would be performance neutral, but it was not—we assume because many of the CTE’s columns are not straight mappings, but due to some nuances with handling drafts, ended up being non-trivial CASE statements that look like:

-- example of the rewritten select clauses in the `_authors` CTE
SELECT
  a.id as id,
  a.created_at as created_at,
  -- immutable columns
  a.type_id as type_id,
  -- versioned columns
  (CASE WHEN _a_version.id IS NULL THEN a.first_name) ELSE _a_version.first_name END) as first_name,
  (CASE WHEN _a_version.id IS NULL THEN a.last_name) ELSE _a_version.last_name END) as last_name,

And we suspected these CASE statements were not easy/possible for the query planner to “see through” and push filtering & indexing statistics through the top-level WHERE clause.

So, while so far our approach has been “add yet another CTE”, for this last stretch, we had to remove the _authors CTE and start “hard mode” rewriting the query by adding JOINs directly to the query itself, i.e. we’d go from a non-versioned query like:

-- previously we'd "just swap" the books & authors tables to
-- versioned _books & _authors CTEs
SELECT b.* FROM books
  JOIN authors a ON b.author_id = a.id
  WHERE a.first_name = 'bob';

To:

SELECT
  -- rewrite the top-level select
  b.id as id,
  -- any versioned columns need `CASE` statements
  (CASE WHEN bv.id IS NULL THEN b.title ELSE bv.title) AS title,
  (CASE WHEN bv.id IS NULL THEN b.notes ELSE bv.notes) AS notes,
  -- ...repeat for each versioned column...
  -- ...also special for updated_at...
FROM books
  -- keep the `books` table & add a `book_versions` JOIN directly to the query
  JOIN book_versions bv ON (bv.book_id = b.id AND bv.first_id >= pv.version_id AND ...)
  -- rewrite the `ON` to use `bv.author_id` (b/c books can change authors)
  JOIN authors a ON bv.author_id = a.id
  -- add a author_version join directly to the query
  JOIN author_versions av ON (av.author_id = a.id AND av.first_id >= pv.version_id AND ...)
  -- rewrite the condition from `a.first_name` to `av.first_name`
  WHERE av.first_name = 'bob';

This is a lot more rewriting!

While the CTE approach let us just “swap the table”, and leave the rest of the query “reading from the alias a” & generally being none-the-wiser, now we have to find every b alias usage or a alias usage, and evaluate if the SELECT or JOIN ON or WHERE clause is touching a versioned column, and if so rewrite that usage to the bv or av respective versioned column.

There are pros/cons to this approach:

Con: Obviously the query is much trickier to rewrite
Pro: But since our rewriting algorithm is isolated to the VersioningPlugin file, we were able to “refactor our versioned query logic” just once and have it apply everywhere which was amazing 🎉
Pro: The “more complicated to us” CTE-less query is actually “simpler to the Postgres query planner” b/c there isn’t a CTE “sitting in the way” and so all the usual indexing/filtering performance optimizations kicked in, and got us back to baseline performance

Removing the _authors / _books CTEs and doing “inline rewriting” (basically what we’d hoped Postgres would do for us with the “used once” CTEs, but now we’re doing by hand) gave us a ~10x performance increase, and returned us to baseline performance, actually beating the performance of our original “write to drafts” approach. 🏃

Skipping Some Details

It would make the post even longer, so I’m skipping some of the nitty-gritty details like:

Soft deletions—should entities “disappear” if they were added in v10 and the user pins to v8? Or “disappear” if they were deleted in v15 and the user is looking at v20?
- Initially we had our VersionPlugin plugin auto-filter these rows, but in practice this was too strict for some of our legacy code paths, so in both “not yet added” and “previously deleted” scenarios, we return rows anyway & then defer to application-level filtering
Versioning m2m collections, both in the database (store full copies or incremental diffs?), and teaching the plugin to rewrite m2m joins/filters accordingly.
Reading updated_at from the right identity table vs. the versions table to avoid oplock errors when drafts issue UPDATEs using plugin-loaded data
Ensuring endpoints make the pin and addPlugin calls without accidentally loading “not versioned” copies of the data they want to read into the EntityManager, which would cache the non-versioned data, & prevent future “should be versioned” reads for working as expected.
Migrating our codebase from the previous “by hand” / “write to drafts” initial versioning approach, to the new plugin + “write to identities” approach, which honestly was a lot of fun—lots of red code that was deleted & simplified by the new approach. 🔪

Thankfully we were able to solve each of these, and none turned into dealbreakers that compromised the overall approach. 😅

Wrapping Up

This was definitely a long-form post, as we explored the Homebound problem space that drove our solution, rather than just a shorter announcement post of “btw Joist now has a plugin API”.

Which, yes, Joist does now have a plugin API for query rewriting 🎉, but we think it’s important to show how/why it was useful to us, and potentially inspire ideas for how it might be useful to others as well (i.e. an auth plugin that does ORM/data layer auth 🔑 is also on our todo list).

That said, we anticipate readers wondering “wow this solution seems too complex” (and, yes, our production VersionPlugin code is much more complicated than the pseudocode we’ve used in this post), “why didn’t you hand-write the queries”, etc 😰.

We can only report that we tried “just hand-write your versioning queries”, in the spirit of KISS & moving quickly while building our initial set of version-aware features, for about 6-9 months, and it was terrible. 😢

Today, we have versioning implemented as “a cross-cutting concern” (anyone remember Aspect Oriented Programming? 👴), primarily isolated to a single file/plugin, and the rest of our code went back to “boring CRUD” with “boring reads” and “boring writes”.

Our velocity has increased, bugs have decreased, and overall DX/developer happiness is back to our usual “this is a pleasant codebase” levels. 🎉

If you have any questions, feel free to drop by our Discord to chat.

Thanks

Thanks to the Homebound engineers who worked on this project: Arvin, for bearing the brunt of the tears & suffering, fixing bugs during our pre-plugin “write to drafts” approach (mea cupla! 😅), ZachG for owning the rewriting plugin, both Joist’s new plugin API & our internal VersioningPlugin implementation 🚀, and Roberth, Allan, and ZachO for all pitching in to get our refactoring landed in the limited, time-boxed window we had for the initiative ⏰🎉.

Lazy Fields for 30x speedup without Decorators or Transforms

Oct 6, 2025

Stephen Haberman

Joist leverages JavaScript’s prototypes to blend great developer ergonomics/DX with performance.

Avoiding ORM Decorators

Jul 27, 2025

Stephen Haberman

Entity-based ORMs, going back to Java’s Hibernate & earlier, often use decorators or annotations like @PrimaryKey to define the domain model; Joist pushes back on this pattern, and prefers schema-driven code generation.

Pipelining for 3-6x Faster Commits

Apr 20, 2025

Stephen Haberman

Pipelining INSERTs and UPDATEs statements to make commits 3-6x faster

Evolution of Defaults

Feb 27, 2025

Stephen Haberman

Joist’s mission is to model your application’s business logic, with first-class support for domain modeling features & concepts.

A great example of this is Joist’s support for something as simple as default values: for example, the Author.status field should default to Active.

Joist’s default values support grew from “the simplest thing possible” (requiring adhoc patterns that engineers would copy/paste around) to a robust, first-class feature (an explicit setDefault API that “just works”).

This is a microcosm of Joist’s goal to identify the repeated patterns and pain points involved in “building a domain model”, and provide elegant features with a great DX.

Version 1. Schema Defaults

Joist’s initial defaults support was purposefully “as simple as possible”, and limited to DEFAULTs declared in the database schema, i.e. an is_archived field that defaults to FALSE, or a status_id that defaults to DRAFT:

CREATE TABLE example_table (
    id SERIAL PRIMARY KEY,
    is_archived BOOL DEFAULT false,
    status_id INTEGER DEFAULT 1,
);

Joist’s codegen would recognize these, and “apply them immediately” when creating an entity:

const a = em.create(Author, {});
expect(a.status).toBe(AuthorStatus.Draft); // already Draft
expect(a.isArchived).toBe(false); // already false

This was super-simple, and had a few pros:

Pro: The status is immediately within the em.create
- I.e. you don’t have to wait for an em.flush to “see the database default”
- Any business logic can immediately start using the default
Pro: No duplication of “draft is the default” between the database schema & TypeScript code
Con: Only supports static, hard-coded values
- Ideally we’d like to write lambdas to calculate defaults, based on business logic

Version 2. beforeCreate hooks

Being limited to static DEFAULT values is not great, so the first way of implementing more complicated “dynamic defaults” was using Joist’s beforeCreate hooks:

/** Any author created w/non-zero amount of books defaults to Published. */
authorConfig.beforeCreate("books", a => {
  if (a.status === undefined) {
    a.status = a.books.get.length > 0 ? AuthorStatus.Published : AuthorStatus.Draft;
  }
})

This was a quick-win b/c Joist already supported beforeCreate hooks, but had a few cons:

Pro: Supports arbitrary business logic
- The load hint easily enables cross-entity calculations
Con: The default logic isn’t ran until em.flush
- Harder for business logic to rely on
- Creates inconsistency between “hard-coded defaults” (applied immediately in em.create) and “dynamic defaults” (applied during flush)
Con: Susceptible to hook ordering issues
- If our default’s value depends on other defaults, it is hard to ensure the other “runs first”
Con: Boilerplate/imperative (not really a first-class feature)
- The code has to 1st check if a.status is already set (not a huge deal, but boilerplate)
- There is nothing in the code/API that identifies “this is a default”, instead we just have an adhoc pattern of “this is how our app sets defaults”
Con: Caused duplication with test factories
- Our test factories often wanted “the same defaults” applied, but Joist’s factories are synchronous, which meant any logic that was “set in beforeCreate” wouldn’t be seen right away.
- To work around this, we often “wrote twice” default logic across our entities & test factories—not great!

Version 3: Adding setDefault

We lived with the Version 1 & 2 options for several years, because they were “good enough”, but for the 3rd version, we wanted to start “setting defaults” on the road to being “more than just good enough”.

Specifically, we wanted a first-class, idiomatic way to “declaratively specify a field’s default value” instead of the previous “manually check the field in a beforeCreate hook”.

So we added config.setDefault, which accepts the field name, it’s dependencies (if any), and a lambda that would calculate the default value:

/** Calculate the Author.status default, based on number of books. */
authorConfig.setDefault("status", "books", (a) => {
  // a.books.get is available, but a.firstName is not, b/c it's not listed as a dependency
  return a.books.get.length > 0 ? AuthorStatus.Published : AuthorStatus.Draft;
})

This was a great start, but we pushed it out knowingly half-baked:

Pro: Provided scaffolding of a better future
- Gave an idiomatic way to “declare defaults”
Pro: The type system enforces that the lambda only calls fields explicitly listed in the dependency param
- This reused our ReactiveField infra and is great for ensuring dependencies aren’t missed
Con: The dependencies weren’t actually used yet
- “…ship early!”
Con: setDefault lambdas were still not invoked until em.flush
- So we still had the “write defaults twice” problem with test factories

Version 4: Dependency Aware

After having the setDefault API in production for a few months, the next improvement was to capitalize on “knowing our dependencies” and allow defaults to depend on other defaults.

For example, maybe our Author.status default needs to know whether any of the books are published (which itself is a default):

// In `Author.ts`
authorConfig.setDefault("status", { books: "status" }, a => {
  const anyBookPublished = a.books.get.some(b => b.status === BookStatus.Published);
  return anyBookPublished ? AuthorStatus.Published : AuthorStatus.Draft;
})

// In `Book.ts`
bookConfig.setDefault("status", {}, b => {
  // Some business logic that dynamically determines the status
  return BookStatus.Published;
});

Now, if both a Book and an Author are created at the same time, em.flush will ensure that the Book.status is calculated before invoking the Author.status default—we’ve solved our ordering issue!

This was a major accomplishment—cross-entity defaults had been a thorn in our side for years.

(Fwiw we readily admit this is a rare/obscure need—in our domain model of 100s of entities, we have only ~2-3 of these “cross-entity defaults”, so we want to be clear this is not necessarily a “must have” feature—but, when you need it, it’s extremely nice to have!)

Pro: Finally unlocked cross-entity defaults
Con: Still have the “write defaults twice” problem with factories

Version 5: Teaching Factories!

The next DX iteration was solving the duplication of “factories want the defaults too!”.

Looking more closely at this issue, Joist’s test factories are synchronous, which means we can create test data easily without any awaits:

// Given an author
const a = newAuthor(em);
// And a book
const b = newBook(em, { author: a });
// And setup something else using b.title
// ...if there is "default title logic", it will not have ran yet, which
// can be confusing for tests/other logic expecting that behavior
console.log(b.title);

The lack of awaits is very nice! But it does mean, if we really wanted b.title to immediately reflect its production default, we had recode the default logic into the newBook factory:

export function newBook(em: EntityManager): DeepNew<Book> {
  return newTestInstance(em, Book, {
    title: "recode the Book default logic here",
  });
}

As before, for a while this was “good enough”—but finally in this iteration, we taught the factories to leverage their “each test’s data is already in memory” advantage and just invoke the defaults immediately during the newTestInstance calls.

This works even for setDefaults that use load hints, like “author status depends on its books”:

// In `Author.ts`
authorConfig.setDefault("status", { books: "status" }, a => {
  const anyBookPublished = a.books.get.some(b => b.status === BookStatus.Published);
  return anyBookPublished ? AuthorStatus.Published : AuthorStatus.Draft;
})

In production, Joist can’t assume “the author’s books are already in-memory”, so em.flush would first load / await for the a.books to be loaded, and then invoke the lambda.

However, because our tests know that a.books is already in memory, they can skip this await, and immediately invoke the lambda.

Pro: We finally can remove the factory’s “write it twice” defaults

Version Next: findOrCreates

Always looking ahead, the next itch we have is that, currently, default lambdas that call async methods like em.find or em.findOrCreate are still skipped during newTestInstance and only run during em.flush.

Which means, for these defaults, we still have remnants of the “write it twice” defaults anti-pattern—albeit very few of them!

We should be able to lift this restriction as well, with a little bit of work (…maybe :thinking:, the newBook call is fundamentally synchronous, so maybe not).

Slow Grind to Perfection

Wrapping up, besides a “walk down memory lane”, the larger point of this post is highlighting Joist’s journey of continually grinding on DX polish—we’re about five years into Joel’s Good Software Takes 10 Years, so only another 5 to go! :smile:

Of course, it’d be great for this evolution to happen more quickly—i.e. if we had a dependency-aware, factory-aware, amazing setDefault API from day one.

But, often times jumping to an abstraction can be premature, and result in a rushed design—so sometimes it doesn’t hurt to “sit with the itch” for a little while, evolve it through multiple iterations of “good enough”, until finally a pleasant/robust solution emerges.

And, perhaps most pragmatically, small iterations helps spread the implementation out over enough hack days that it can actually get shipped. :ship:

Recursive Relations

Jul 20, 2024

Stephen Haberman

Joist’s development is currently very incremental, and doesn’t have “big release” milestones & release notes, but we recently released a notable new feature: recursive relations. Check them out! :tada:

Is Joist the Best ORM, Ever?

May 11, 2024

Stephen Haberman

I’ve been working on the Joist docs lately, specifically a Why Joist? page, which ended up focusing more on “why Domain Models?” than a feature-by-feature description of Joist.

Which is fine, but a good friend (and early Joist user) proofread it, and afterward challenged me that I was being too humble, and I should be more assertive about Joist being “THE BEST ORM FOR TYPESCRIPT AND POSTGRES” (his words), as he listed off his own personal highlights:

If it compiles, it works. “If you love TypeScript, you’ll love Joist.”
It’s “really effing fast” (no N+1s, ever).
We solve many common problems for you (auto-batching updates, handling the insertion order of related entities, and have many patterns for enums, polymorphic relations, etc.)
Factories make testing amazing.

All of these are true.

But in thinking about his challenge, of pitching Joist specifically as “the best ORM for TypeScript & Postgres”, I actually think I can be even more bullish and assert Joist is, currently, the best ORM, in any language, ever, TypeScript or otherwise.

Which is crazy, right? How could I possibly assert this?

I have three reasons; admittedly the first two are not technically unique to Joist, but both foundational to its design and implementation, and the third that is one of Joist’s “special sauces”:

JavaScript’s ability to solve N+1s via the event loop, and
TypeScript’s ability to model loaded-ness in its type system.
Joist’s “backend reactivity”

No N+1s: JavaScript’s Event Loop

I’ve used many ORMs over the years, going back to Java’s Hibernate, Ruby’s ActiveRecord, and a few bespoke ones in between.

Invariably, they all suffer from N+1s.

I don’t want to repeat Joist’s existing Avoiding N+1s docs, but basically “entities are objects with fields/methods that incrementally lazy-load their relations from the database” is almost “too ergonomic”, and tempts programmers into using the abstraction when they shouldn’t (i.e. in a loop), at which point N+1s are inevitable.

Again as described in “Avoiding N+1s”, JavaScript’s event loop forcing all I/O calls to “wait just a sec”, until the end of the event loop tick, gives Joist an amazing opportunity, of course via dataloader, to de-dupe all the N+1s into a single SQL call.

For everything.

This works so well, that personally I don’t know that I ever want to work in a programming language/tech stack that cannot use this trick (at least to build backend/line-of-business applications).

Granted, JavaScript is not the only language with an event loop—async Rust is a thing, Python has asyncio, and even Vert.x on the JVM provides it (I prototyped “dataloader ported to Vert.x” several years ago), and either Rust or the JVM (Scala!) would be pretty tempting just in terms of “faster than JavaScript” performance.

But the event loop is only part of the story—another critical part is TypeScript’s type system.

Other TypeScript ORMs like Prisma & Drizzle “solve” N+1s by just not modeling your domain as entities (with lazy-loaded relations), and instead force/assume a single/large up-front query that returns an immutable tree of POJOs.

This does remove the most obvious N+1 footgun (lazy-loaded relations), but it also fundamentally restricts your ability to decompose business logic into smaller/reusable methods, because now any logic that touches the database must be done “in bulk” directly by your code, and often crafted in SQL specifically to how each individual endpoint is accessing the data.

(Concretely, if you had a saveAuthor endpoint with logic/queries to validate “this author is valid”, and now write a batch saveAuthors endpoint, you could not reuse the “written for one entity” logic without rewriting it to work at the new endpoint’s grouped/batch level of granularity. Or similar for saveBook logic that you want to use within a saveAuthor that also upserts multiple children books.)

Instead, Joist’s auto-batching lets you ergonomically write code at the individual entity abstraction level (whether in a loop, or in per-entity validation rules or lifecycle hooks), but still get performant-by-default batched queries.

Loaded Subgraphs: TypeScript’s Type System

After solving N+1s with the event loop, the next biggest ergonomic problem in traditional, entity-based ORMs is tracking (or basically not tracking) loaded-ness in the type system.

Because you can’t have your entire relational database in memory, domain models must incrementally load their data from the database, as your business logic’s codepaths decide which parts they need to read.

This was another downfall of the Hibernate/ActiveRecord ORMs: there was no notion of “is this relation loaded yet?”, and so any random relation access could trigger the surprise of an expensive database I/O call, as that relation was lazy-loaded from the database.

Joist solves this by statically typing all relations as “unloaded” by default, i.e. accessing an Author’s books requires calling a1.books.load(), which returns a Promise (which is also key to the N+1 prevention above).

Which is great, I/O calls are now obvious, but “do an await for every relation access” would really suck (we tried that), so Joist goes further and uses TypeScript’s type system to not only track individual relation loaded-ness (like author1.books or book2.authors), but mark entire subgraphs of entities as populated/loaded relations and hence synchronously accessible:

// Load the Author plus the specific books + reviews subgrpah
const a1 = await em.load(Author, "a:1", {
  populate: { books: { reviews: "comments" } },
});

// a1 is typed as Loaded<Author, { books: { reviews: "comments" } }>
// Tada, no more await Promise.all
a1.books.get.forEach((book) => {
  book.reviews.get.forEach((review) => {
    console.log(review.comments.get.length);
  });
})

This combination of:

Explicit .load() / await calls for any I/O, but leveraging
Mapped types to allow compiler-checked synchronous access

For me, is also something that I never want to work without again. It’s just so nice.

Unlike JavaScript not having a monopoly on the event loop, for these mapped types I believe TypeScript effectively does have a lock on this capability, from a programming language/type system perspective.

Creating “new types” in other programming languages is generally handled by macros (Scala and Rust), or I suppose Haskell’s higher-kinded-types. But, as far as I know, none of them can combine TypeScript “mapped type + conditional type” features in a way that would allow this “take my user-defined type (Author)” and “this user-defined populate hint type” and fuse them together into a new type, that is “the author with this specific subgraph of fields marked as loaded”.

I’m happy to be corrected on this, but I think TypeScript is the only mainstream programming language that can really power Joist’s Loaded<Author, { books: "reviews" }>-style adhoc typing of subgraphs, or at least this easily.

Other TypeScript ORMs (Prisma, Drizzle, Kysley, etc.) also leverage TypeScript’s mapped types to create dynamic shapes of data, which is legitimately great.

However, they all have the fundamental approach of issuing “one-shot” queries that return immutable trees of POJOs, directly mapped from your SQL tables, and not subgraphs of entities that can have non-SQL abstractions & be further incrementally loaded as/if needed (see Why Joist for more on this).

You can generally see, for both issues covered so far (N+1s and statically-typed loaded-ness), most TypeScript ORMs have “solved” these issues by just removing the features all together, and restricting themselves to be “sophisticated query builders”.

Joist’s innovation is keeping the entity-based, incremental-loading mental model that is historically very popular/idiomatic for ORMs (particularly Ruby’s ActiveRecord), and just fundamentally fixing it to not suck.

Joist’s Backend Reactivity

This 3rd section is the first feature that is unique to Joist itself: Joist’s “backend reactivity”.

Many ORMs have lifecycle hooks (this entity was created, updated, or deleted—which Joist does as well), to organize side effects/business logic of “when X changes, do Y”.

But just lifecycle hooks by themselves can become tangled, complicated, and a well-known morass of complexity and “spooky action at a distance”.

This is because they’re basically “Web 1.0” imperative spaghetti code, where you have to manually instrument each mutation that might trigger a side effect.

(Concretely, lets say you have a rule that needs to look at both an author and its books. With raw lifecycle hooks, you must separately instrument both the “author update” and “book update” hooks to call your “make sure this author + books combination is still valid” logic. This can become tedious and error-prone, to get all the right hooks instrumented.)

Instead, Joist’s reactive fields and reactive validation rules take the lessons of “declarative reactivity” from the Mobx/Solid/reactivity-aware frontend world, and bring it to the backend: reactive rules & fields declare in one place what their “upstream dependencies” are, and Joist just handles wiring up the necessary cross-entity reactivity.

This brings a level of ease, specificity, and rigor to what are still effectively lifecycle hooks under the hood, that really makes them pleasant to work with.

The declarative nature of Joist’s domain model-wide reactivity graph is also very amenable to DX tooling & documentation generation, but we’ve not yet deeply explored/delivered any functionality that leverages it.

Conclusion: Best ORM Ever?

So, these three features are what back up my exaggerated “best ORM ever” assertion.

If tomorrow, I suddenly could not use Joist, and had to find another ORM to use (or, in general, build any sort of application backend on top of a relational database), in any current/mainstream programming language, without a doubt I would want:

Bullet-proof N+1 prevention,
Tracking loaded relation/subgraph state in the type system, and
Backend reactivity, for declarative cross-entity validation rules and reactive fields.

And Joist is the only ORM that does all three of these: two of which are uniquely enabled by the JavaScript/TypeScript stack, and the third just part of Joist’s own innovation.

Disclaimer 1: Uncomfortably Bold Claims

I usually don’t like making bold/absolutist claims, like “this or that framework is ‘the best’” or “technology x/y/z is terrible” or what not.

I did enough of that early in my career, and at this point I’m more interested in “what are the trade-offs?” and “what’s the best tool for this specific use case?”

So, I hold two somewhat incongruent thoughts in my head, as I am both:

Very confident that Joist is “the best” way to build application backends on top of a relational database, for a large majority of use cases/teams/codebases, but I also
Recognize it’s “framework” / entity approach (see Why Joist) might be either too opinionated or too much abstraction for some people’s tastes, and just in general choices & alternatives are always great to have.

My guess is if you tried Joist, you would quickly come to like it, but it’s also perfectly fine if not!

Disclaimer 2: Still a Lot To Do

Similar to the two incongruent thoughts above, another two semi-contradictory thoughts is the disclaimer that:

Joist’s core is very solid and vetted by 4+ years of production usage & continual iteration at Homebound, but also
There’s still a lot of work to do, obviously supporting other databases, but also the myriad fun, incremental improvement ideas we’re tracking in the issue tracker, and of course even more that we’ve not thought of yet.

Feedback

If you have thoughts, questions, or feedback, please let us know! Feel free to join the Joist discord, or file issues on the GitHub repo if you try Joist and run into any issues.

Despite all the hubris in this post, we are still a very small project & community, and so have a lot of growth and improvement ahead of us.

Thanks for the read!

New NextJS Sample App

Apr 21, 2024

Stephen Haberman

We’ve added a new NextJS + Joist sample app that shows how Joist can be used in a NextJS application, with several benefits:

Automatic N+1 Prevention
JSON Payload/Props Creation
Optional Join-based Preloading

This post gives a short overview; if you’d like to watch a video, we also have a YouTube video that walks through the sample app.

Two Render Tree Approaches

While building the sample app, we found two fundamental ways of structuring a NextJS app’s render tree:

Fewer RSCs (left side), that prop drill data to the Client Components
- table.tsx is a server component that loads all data for the tree
- author-rcc-card.tsx and book-rcc-preview.tsx are client components that accept prop-drilled data
Mostly RSCs (right side), with Client Components only at the bottom
- table.tsx is a server component but only loads what it needs
- author-rsc-card.tsx and book-rsc-preview.tsx are RSC and do their own data loading

The top-level Table / table.tsx component renders each of these side-by-side, so we can see the differences, and observe some pros/cons of each approach.

With mostly RSC components, it’s easy to decompose data loading away from the top-level component.

For example, the AuthorRscCard can make its own data loading calls, and even if it’s render many pages on the page, Joist will de-dupe across the N sibling AuthorRscCards, and batch into a single SQL call.

type AuthorCardProps = {
  /** RSCs can accept the domain model enities as a prop. */
  author: Author;
  addBook: (id: string) => Promise<void>;
};

/** The RSC version of AuthorCard can load it's own data. */
export async function AuthorRscCard({ author, addBook }: AuthorCardProps) {
  // This will be auto-batched if many cards render at once
  const books = await author.books.load();
  // Or if you wanted a tree of data, this will also be auto-batched
  const loaded = await author.populate({ books: { reviews: "ratings" } });
  return <div>...jsx</div>;
}

This is nice because it allows the AuthorRscCard to be more self-sufficient, and allow the parent table component to be unaware of its children loading details.

With mostly Client components, the opposite happens, and only the parent can make database / EntityManager calls, and so is responsible for loading all the data for its children, and passing it as JSON via props:
```
type AuthorCardProps = {
  /** RCCs must accept a POJO of `Author` + all nested data. */
  author: AuthorPayload;
  addBook: (id: string) => Promise<void>;
};

/** The RCC version of AuthorCard accepts the `AuthorPayload`. */
export function AuthorRccCard({ author, addBook }: AuthorCardProps) {
  // can only use data already available on `author`
}
```
Even though the up-front data load can become awkward, it does give more opportunities for optimizations; for example Joist can use join-based preloading to load a single tree of Author + Book + Review entities in a single SQL call, which is even better optimization than the “one query per layer” N+1 prevention of the RSC-based approach.

Automatic N+1 Prevention

In either approach, Joist’s N+1 prevention auto-batches database calls, even if they are made across separate component renders.

I.e. in the RSC components:

The top-level Table component makes 1 SQL call for all Author entities.
All 2nd-level AuthorRscCard cards each make their own author.books.load() (or author.populate(...)) call, but because they’re all rendered in the same event loop, Joist batches all the load calls into 1 SQL call
Any 3rd-level components would have their load calls batched as well.

In the React Client Component approach, this auto-batching is admittedly not as necessary, assuming a singular top-level component, like Table, loads all the data at once anyway (although, as mentioned later, Joist can optimize that as well).

See the Avoiding N+1s section of our docs for more information.

JSON Payload/Props Creation

Since the client components cannot make their own async data calls, the top-level Table components is responsible for loading all the data into a JSON payload, and passing it down to the children as props.

Joist entities have an easy way of doing this is, via a toJSON method that takes the shape of data to create:

// Define the shape of data to create
export const authorHint = {
  id: true,
  firstName: true,
  books: {
    id: true,
    title: true,
    reviews: ["id", "rating"],
  },
  customField: (a) => a.id + a.title,
} satisfies JsonHint<Author>;

// This typedef can be used in the client-side props, or to match any
// endpoint-based respones types like for REST/OpenAPI.
export type AuthorPayload = JsonPayload<Author, typeof authorHint>;

const payload = await a.toJSON(authorHint);

The toJSON implementation will:

Load any relations that are not yet loaded from the database
Output only the keys that are requested in the authorHint
Call any lambdas like customField to generate custom values

As with previous examples, all data loading is N+1 safe, and also potentially join-based preloaded.

See the toJSON docs for more information.

This recursive toJSON payload generation is a relatively new feature of Joist, so if you have feature ideas that would make it more useful, please let us know!

Join-Based Preloading

The last optimization that Joist can do is join-based preloading, which can be used in either the RSC or RCC approach.

This is also a newer feature that requires opt-ing in to, but in em.ts you can add a preloadPlugin:

/** Returns this request's `EntityManager` instance. */
export const getEm = cache(() => {
  // Opt-in to preloading
  const preloadPlugin = new JsonAggregatePreloader();
  return new EntityManager({}, { driver, preloadPlugin });
});

This will allow Joist to load a deep tree/subgraph of entities in a single SQL call.

For example, normally a Joist em.find a call like:

const a = await em.find(
  Author,
  { id: 1 },
  {populate: { books: "reviews" } },
);
// Now access all the data in memory
console.log(a.books.get[0].reviews.get[0].rating)

Will issue three SQL calls:

SELECT * FROM authors WHERE id = 1;
SELECT * FROM books WHERE author_id = 1;
SELECT * FROM reviews WHERE book_id IN (1, 2, 3, ...);

But with the preloadPlugin enabled, it will use a single SQL call that uses CROSS JOIN LATERAL and json_agg to return the author’s books, and the book’s reviews (omitted for brevity) in a single row:

select a.id, _b._ as _b from authors as a
  cross join lateral
    -- create a tuple for each book, and aggregate then into an array of books
    select json_agg(json_build_array(_b.id, _b.title, _b.foreword, _b.author_id) order by _b.id) as _
    from books _b
    where _b.author_id = a.id
  ) _b
  where a.id = ? limit ?

Joist’s join-based preloading is still a beta feature, so if you run into any issues, please let us know!

What about Complex Queries?

So far, our queries have focused on loading “just entities”, and then putting those on the wire (or rendering them to HTML).

This is because Joist’s focus is on building robust domain models, and specifically helping solve the “write-side” of your application’s business logic (running the correct validation rules, lifecycle hooks, reactive updates), and less so on the “read-side” of complex queries (i.e. that using aggregates using GROUP BY, multiple nested subqueries/projections/etc.).

As such, Joist does not yet have a sophisticated query builder that can create arbitrary SQL queries, like Kysley or Drizzle.

Instead, Joist encourages an approach that uses its robust write-side features to create materialized columns in the database, such that the majority of your pages/responses really can be served by “super simple SELECT statements”, instead of using complicated queries to calculate aggregates on-the-fly.

Although you can of course use both approaches, and just use a lower-level query builder where needed.

Sample App Feedback

Joist’s roots come from the GraphQL world, so this sample app was our first foray into using it for a NextJS application. If we’ve missed any key features that would make it easier to use Joist in a NextJS app, please let us know!

Stephen Haberman

What is unnest?

Why unnest is useful

Bringing it back?

unnest_arrays Custom Function

Pros/Cons

Aggregate What Now?

Why Version An Aggregate?

Schema Approach

…but what is “Current”?

Initial Approach: Not so great

New Approach: Write to Identities

Making Reads not Suck

1. Initial Idea: Using CTEs

2. Adding Request Config via CTEs

3. Injecting the Rewrite Automatically

4. Performance Evaluation

Skipping Some Details

Wrapping Up

Thanks

Version 1. Schema Defaults

Version 2. beforeCreate hooks

Version 3: Adding setDefault

Version 4: Dependency Aware

Version 5: Teaching Factories!

Version Next: findOrCreates

Slow Grind to Perfection

No N+1s: JavaScript’s Event Loop

Loaded Subgraphs: TypeScript’s Type System

Joist’s Backend Reactivity

Conclusion: Best ORM Ever?

Disclaimer 1: Uncomfortably Bold Claims

Disclaimer 2: Still a Lot To Do

Feedback

Two Render Tree Approaches

Automatic N+1 Prevention

JSON Payload/Props Creation

Join-Based Preloading

What about Complex Queries?

Sample App Feedback

What is `unnest`?

Why `unnest` is useful

`unnest_arrays` Custom Function