Skip to content

Blog

Initial Pipelining Benchmark

I’ve known about Postgres’s pipeline mode for a while, and finally have some prototyping of pipelining in general, and alpha builds of Joist running with pipeline mode (coming soon!).

This post is an intro to pipelining, using postgres.js and mitata to benchmark some examples.

Pipelining, as a term in networking, allows clients to send multiple requests, immediately one after each other, without first waiting for the server to respond.

Using NodeJS talking to Postgres for illustration, the default flow of SQL statements, without pipelining, involves a full round-trip network request for each SQL statement:

Without Pipelining

  • Send an INSERT authors
  • …wait several millis for work & response…
  • Send an INSERT books
  • …wait several millis for work & response…
  • Send an INSERT reviews
  • …wait several millis for work & response…

Note that we have to wait for both:

  1. The server to “complete the work” (maybe 1ms), and
  2. The network to deliver the responses back to us (maybe 2ms)

Before we can continue sending the next request.

This results in a lot of “wait time”, for both the client & server, while each is waiting for the network call of the other to transfer over the wire.

Pipelining allows us to remove this “extra wait time” by sending all the requests at once, and then waiting for all responses:

With Pipelining

  • Send INSERT authors
  • Send INSERT books
  • Send INSERT reviews
  • …wait several millis for all 3 requests to complete…

The upshot is that we’re not waiting on the network before sending the server more work to do.

Not only does this let our client “send work” sooner, but it lets the server have “work to do” sooner as well—i.e. as soon as the server finishes INSERT authors, it can immediately start working on INSERT books.

One wrinkle with pipelining is that if 1 SQL statement fails (i.e. the INSERT authors statement), all requests that follow it in the pipeline are also aborted.

This is because Postgres assumes the later statements in the pipeline relied on the earlier statements succeeding, so once earlier statements fail, the later statements are considered no longer valid.

This generally means pipelining is only useful when executing multi-statement database transactions, where you’re executing a BEGIN + some number of INSERT, UPDATE, and DELETE statements + COMMIT, and we already expect them to all atomically commit.

Serendipitously, this model of “this group of statements all need to work or abort” is exactly what we want anyway for a single backend request that is committing its work, by atomically saving its work to the database in a transaction—and is exactly what Joist’s em.flush does. :-)

Per above, network latency between your machine & the database is the biggest factor in pipelining’s performance impact.

This can make benchmarking difficult and potentially misleading, because benchmarks often have the “web backend” and “the database” on the same physical machine, which means there is effectively zero network latency.

Thankfully, we can use solutions like Shopify’s toxiproxy to introduce an artificial, deterministic amount of latency to the network requests between our Node process and the Postgres database.

toxiproxy is particularly neat in that it’s easy to run as a docker container, and control the latency via POST commands to a minimal REST API it exposes:

docker-compose.yml
services:
toxiproxy:
image: ghcr.io/shopify/toxiproxy:2.12.0
depends_on:
db:
condition: service_healthy
ports:
- "5432:5432"
- "8474:8474"
volumes:
- ./toxiproxy.json:/config/toxiproxy.json
command: "-host=0.0.0.0 -config=/config/toxiproxy.json"
toxiproxy.json
[
{
"name": "postgres",
"listen": "0.0.0.0:5432",
"upstream": "db:5432",
"enabled": true
}
]
toxi-init.sh
curl -X POST http://localhost:8474/reset
curl -X POST http://localhost:8474/proxies/postgres/toxics -d '{
"name": "latency_downstream",
"type": "latency",
"stream": "downstream",
"attributes": { "latency": 2 }
}'

Is all we need to control exactly how much latency toxiproxy injects between every Node.js database call & our docker-hosted postgres instance.

We’ll look at Joist’s pipeline performance in a future post, but for now we’ll stay closer to the metal and use postgres.js to directly execute SQL statements in a few benchmarks.

We’re using postgres.js, instead of the venerable node-pg, because postgres.js implements pipelining, while node-pg does not yet.

postgres.js also has an extremely seamless way to use pipelining—any statements issued in parallel (i.e. a Promise.all) within a sql.begin are automatically pipelined for us.

Very neat!

We’ll use mitata for timing info—it is technically focused on CPU micro-benchmarks, but its warmup & other infra make it suitable to our async, I/O oriented benchmark as well.

For SQL statements, we’ll test inserting tag rows into a single-column table—for these tests, the complexity/cost of the statement itself is not that important, and a simple insert will do.

We have a few configuration parameters, that can be tweaked across runs:

  • numStatements the number of tags to insert
  • toxiLatencyInMillis the latency in millis that toxiproxy should delay each statement

As we’ll see, both of these affect the results—the higher each becomes (the more statements, or the more latency), the more performance benefits we get from pipelining.

As a baseline benchmark, we execute numStatements inserts sequentially, with individual awaits on each INSERT:

bench("sequential", async () => {
await sql.begin(async (sql) => {
for (let i = 0; i < numStatements; i++) {
await sql`INSERT INTO tag (name) VALUES (${`value-${nextTag++}`})`;
}
});
});

We expect this to be the slowest, because it is purposefully defeating pipelining by waiting for each INSERT to finish before executing the next one.

This is postgres.js’s canonical way of invoking pipelining, returning a string[] of SQL statements from the sql.begin lambda:

bench("pipeline string[]", async () => {
await sql.begin((sql) => {
const statements = [];
for (let i = 0; i < numStatements; i++) {
statements.push(sql`INSERT INTO tag (name) VALUES (${`value-${nextTag++}`})`);
}
return statements;
});
});

We expect this to be fast, because of pipelining.

This last example also uses postgres.js’s pipelining, but by invoking the statements from within a Promise.all:

bench("pipeline Promise.all", async () => {
await sql.begin(async (sql) => {
const statements = [];
for (let i = 0; i < numStatements; i++) {
statements.push(sql`INSERT INTO tag (name) VALUES (${`value-${nextTag++}`})`);
}
await Promise.all(statements);
});
});

This is particularly important for Joist, because even within a single em.flush() call, we’ll execute a single BEGIN/COMMIT database transaction, but potentially might have to make several “waves” of SQL updates (technically only when ReactiveQueryFields are involved), and so can’t always return a single string[] of SQL statements to execute.

We expect this to be fast as well.

I’ve ran the benchmark with a series of latencies & statements.

1ms latency, 10 statements:

toxiproxy configured with 1ms latency
numStatements 10
clk: ~4.37 GHz
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
runtime: node 23.10.0 (x64-linux)
benchmark avg (min … max)
-------------------------------------------
sequential 15.80 ms/iter
pipeline string[] 4.16 ms/iter
pipeline Promise.all 4.21 ms/iter
summary
pipeline string[]
1.01x faster than pipeline Promise.all
3.8x faster than sequential

1ms latency, 20 statements:

toxiproxy configured with 1ms latency
numStatements 20
clk: ~4.52 GHz
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
runtime: node 23.10.0 (x64-linux)
benchmark avg (min … max)
-------------------------------------------
sequential 30.43 ms/iter
pipeline string[] 4.55 ms/iter
pipeline Promise.all 4.51 ms/iter
summary
pipeline Promise.all
1.01x faster than pipeline string[]
6.74x faster than sequential

2ms latency, 10 statements:

toxiproxy configured with 2ms latency
numStatements 10
clk: ~4.53 GHz
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
runtime: node 23.10.0 (x64-linux)
benchmark avg (min … max)
-------------------------------------------
sequential 28.85 ms/iter
pipeline string[] 7.27 ms/iter
pipeline Promise.all 7.54 ms/iter
summary
pipeline string[]
1.04x faster than pipeline Promise.all
3.97x faster than sequential

2ms latency, 20 statements:

toxiproxy configured with 2ms latency
numStatements 20
clk: ~4.48 GHz
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
runtime: node 23.10.0 (x64-linux)
benchmark avg (min … max)
-------------------------------------------
sequential 55.05 ms/iter
pipeline string[] 9.17 ms/iter
pipeline Promise.all 10.13 ms/iter
summary
pipeline string[]
1.1x faster than pipeline Promise.all
6x faster than sequential

So, in these benchmarks, pipelining makes our inserts (and ideally future Joist em.flush calls!) 3x to 6x faster.

A few notes on these numbers:

  • 1-2ms latency I think is a generally correct/generous latency, based on what our production app sees between an Amazon ECS container and RDS Aurora instance.

    (Although if you’re using edge-based compute this can be as high as 200ms :-O)

  • 10 statements per em.flush seems like a lot, but if you think about “each table that is touched”, whether due to an INSERT or UPDATE or DELETE, and include many-to-many tables, I think it’s reasonable for 10-tables to be a not-uncommon number.

    Note that we assume your SQL statements are already batched-per-table, i.e. if you have 10 author rows to UPDATE, you should be issuing a single UPDATE authors that batch-updates all 10 rows. If you’re using Joist, it already does this for you.

I wanted to focus on this raw SQL benchmark, just to better understand pipelining’s performance impact, and I think it’s an obvious win: 3-6x speedups in multi-statement transactions merely from making sure our driver supports pipelining.

In the Postgres pipelining docs, they make a valid point that pipelining requires async behavior, which in traditional blocking languages like Java & C, is a significant complexity increase, that pipelining may not be worth the trade-off.

But JavaScript is already async & non-blocking, so pipelining is a mere Promise.all away.

In a future/next post, we’ll swap these raw SQL benchmarks out for higher-level ORM benchmarks, to see pipelining’s impact in hopefully more realistic scenarios.

Evolution of Defaults

Joist’s mission is to model your application’s business logic, with first-class support for domain modeling features & concepts.

A great example of this is Joist’s support for something as simple as default values: for example, the Author.status field should default to Active.

Joist’s default values support grew from “the simplest thing possible” (requiring adhoc patterns that engineers would copy/paste around) to a robust, first-class feature (an explicit setDefault API that “just works”).

This is a microcosm of Joist’s goal to identify the repeated patterns and pain points involved in “building a domain model”, and provide elegant features with a great DX.

Joist’s initial defaults support was purposefully “as simple as possible”, and limited to DEFAULTs declared in the database schema, i.e. an is_archived field that defaults to FALSE, or a status_id that defaults to DRAFT:

CREATE TABLE example_table (
id SERIAL PRIMARY KEY,
is_archived BOOL DEFAULT false,
status_id INTEGER DEFAULT 1,
);

Joist’s codegen would recognize these, and “apply them immediately” when creating an entity:

const a = em.create(Author, {});
expect(a.status).toBe(AuthorStatus.Draft); // already Draft
expect(a.isArchived).toBe(false); // already false

This was super-simple, and had a few pros:

  • Pro: The status is immediately within the em.create
    • I.e. you don’t have to wait for an em.flush to “see the database default”
    • Any business logic can immediately start using the default
  • Pro: No duplication of “draft is the default” between the database schema & TypeScript code
  • Con: Only supports static, hard-coded values
    • Ideally we’d like to write lambdas to calculate defaults, based on business logic

Being limited to static DEFAULT values is not great, so the first way of implementing more complicated “dynamic defaults” was using Joist’s beforeCreate hooks:

/** Any author created w/non-zero amount of books defaults to Published. */
authorConfig.beforeCreate("books", a => {
if (a.status === undefined) {
a.status = a.books.get.length > 0 ? AuthorStatus.Published : AuthorStatus.Draft;
}
})

This was a quick-win b/c Joist already supported beforeCreate hooks, but had a few cons:

  • Pro: Supports arbitrary business logic
    • The load hint easily enables cross-entity calculations
  • Con: The default logic isn’t ran until em.flush
    • Harder for business logic to rely on
    • Creates inconsistency between “hard-coded defaults” (applied immediately in em.create) and “dynamic defaults” (applied during flush)
  • Con: Susceptible to hook ordering issues
    • If our default’s value depends on other defaults, it is hard to ensure the other “runs first”
  • Con: Boilerplate/imperative (not really a first-class feature)
    • The code has to 1st check if a.status is already set (not a huge deal, but boilerplate)
    • There is nothing in the code/API that identifies “this is a default”, instead we just have an adhoc pattern of “this is how our app sets defaults”
  • Con: Caused duplication with test factories
    • Our test factories often wanted “the same defaults” applied, but Joist’s factories are synchronous, which meant any logic that was “set in beforeCreate” wouldn’t be seen right away.
    • To work around this, we often “wrote twice” default logic across our entities & test factories—not great!

We lived with the Version 1 & 2 options for several years, because they were “good enough”, but for the 3rd version, we wanted to start “setting defaults” on the road to being “more than just good enough”.

Specifically, we wanted a first-class, idiomatic way to “declaratively specify a field’s default value” instead of the previous “manually check the field in a beforeCreate hook”.

So we added config.setDefault, which accepts the field name, it’s dependencies (if any), and a lambda that would calculate the default value:

/** Calculate the Author.status default, based on number of books. */
authorConfig.setDefault("status", "books", (a) => {
// a.books.get is available, but a.firstName is not, b/c it's not listed as a dependency
return a.books.get.length > 0 ? AuthorStatus.Published : AuthorStatus.Draft;
})

This was a great start, but we pushed it out knowingly half-baked:

  • Pro: Provided scaffolding of a better future
    • Gave an idiomatic way to “declare defaults”
  • Pro: The type system enforces that the lambda only calls fields explicitly listed in the dependency param
    • This reused our ReactiveField infra and is great for ensuring dependencies aren’t missed
  • Con: The dependencies weren’t actually used yet
    • “…ship early!”
  • Con: setDefault lambdas were still not invoked until em.flush
    • So we still had the “write defaults twice” problem with test factories

After having the setDefault API in production for a few months, the next improvement was to capitalize on “knowing our dependencies” and allow defaults to depend on other defaults.

For example, maybe our Author.status default needs to know whether any of the books are published (which itself is a default):

// In `Author.ts`
authorConfig.setDefault("status", { books: "status" }, a => {
const anyBookPublished = a.books.get.some(b => b.status === BookStatus.Published);
return anyBookPublished ? AuthorStatus.Published : AuthorStatus.Draft;
})
// In `Book.ts`
bookConfig.setDefault("status", {}, b => {
// Some business logic that dynamically determines the status
return BookStatus.Published;
});

Now, if both a Book and an Author are created at the same time, em.flush will ensure that the Book.status is calculated before invoking the Author.status default—we’ve solved our ordering issue!

This was a major accomplishment—cross-entity defaults had been a thorn in our side for years.

(Fwiw we readily admit this is a rare/obscure need—in our domain model of 100s of entities, we have only ~2-3 of these “cross-entity defaults”, so we want to be clear this is not necessarily a “must have” feature—but, when you need it, it’s extremely nice to have!)

  • Pro: Finally unlocked cross-entity defaults
  • Con: Still have the “write defaults twice” problem with factories

The next DX iteration was solving the duplication of “factories want the defaults too!”.

Looking more closely at this issue, Joist’s test factories are synchronous, which means we can create test data easily without any awaits:

// Given an author
const a = newAuthor(em);
// And a book
const b = newBook(em, { author: a });
// And setup something else using b.title
// ...if there is "default title logic", it will not have ran yet, which
// can be confusing for tests/other logic expecting that behavior
console.log(b.title);

The lack of awaits is very nice! But it does mean, if we really wanted b.title to immediately reflect its production default, we had recode the default logic into the newBook factory:

export function newBook(em: EntityManager): DeepNew<Book> {
return newTestInstance(em, Book, {
title: "recode the Book default logic here",
});
}

As before, for a while this was “good enough”—but finally in this iteration, we taught the factories to leverage their “each test’s data is already in memory” advantage and just invoke the defaults immediately during the newTestInstance calls.

This works even for setDefaults that use load hints, like “author status depends on its books”:

// In `Author.ts`
authorConfig.setDefault("status", { books: "status" }, a => {
const anyBookPublished = a.books.get.some(b => b.status === BookStatus.Published);
return anyBookPublished ? AuthorStatus.Published : AuthorStatus.Draft;
})

In production, Joist can’t assume “the author’s books are already in-memory”, so em.flush would first load / await for the a.books to be loaded, and then invoke the lambda.

However, because our tests know that a.books is already in memory, they can skip this await, and immediately invoke the lambda.

  • Pro: We finally can remove the factory’s “write it twice” defaults

Always looking ahead, the next itch we have is that, currently, default lambdas that call async methods like em.find or em.findOrCreate are still skipped during newTestInstance and only run during em.flush.

Which means, for these defaults, we still have remnants of the “write it twice” defaults anti-pattern—albeit very few of them!

We should be able to lift this restriction as well, with a little bit of work (…maybe :thinking:, the newBook call is fundamentally synchronous, so maybe not).

Wrapping up, besides a “walk down memory lane”, the larger point of this post is highlighting Joist’s journey of continually grinding on DX polish—we’re about five years into Joel’s Good Software Takes 10 Years, so only another 5 to go! :smile:

Of course, it’d be great for this evolution to happen more quickly—i.e. if we had a dependency-aware, factory-aware, amazing setDefault API from day one.

But, often times jumping to an abstraction can be premature, and result in a rushed design—so sometimes it doesn’t hurt to “sit with the itch” for a little while, evolve it through multiple iterations of “good enough”, until finally a pleasant/robust solution emerges.

And, perhaps most pragmatically, small iterations helps spread the implementation out over enough hack days that it can actually get shipped. :ship:

Is Joist the Best ORM, Ever?

I’ve been working on the Joist docs lately, specifically a Why Joist? page, which ended up focusing more on “why Domain Models?” than a feature-by-feature description of Joist.

Which is fine, but a good friend (and early Joist user) proofread it, and afterward challenged me that I was being too humble, and I should be more assertive about Joist being “THE BEST ORM FOR TYPESCRIPT AND POSTGRES” (his words), as he listed off his own personal highlights:

  1. If it compiles, it works. “If you love TypeScript, you’ll love Joist.”
  2. It’s “really effing fast” (no N+1s, ever).
  3. We solve many common problems for you (auto-batching updates, handling the insertion order of related entities, and have many patterns for enums, polymorphic relations, etc.)
  4. Factories make testing amazing.

All of these are true.

But in thinking about his challenge, of pitching Joist specifically as “the best ORM for TypeScript & Postgres”, I actually think I can be even more bullish and assert Joist is, currently, the best ORM, in any language, ever, TypeScript or otherwise.

Which is crazy, right? How could I possibly assert this?

I have three reasons; admittedly the first two are not technically unique to Joist, but both foundational to its design and implementation, and the third that is one of Joist’s “special sauces”:

  1. JavaScript’s ability to solve N+1s via the event loop, and
  2. TypeScript’s ability to model loaded-ness in its type system.
  3. Joist’s “backend reactivity”

I’ve used many ORMs over the years, going back to Java’s Hibernate, Ruby’s ActiveRecord, and a few bespoke ones in between.

Invariably, they all suffer from N+1s.

I don’t want to repeat Joist’s existing Avoiding N+1s docs, but basically “entities are objects with fields/methods that incrementally lazy-load their relations from the database” is almost “too ergonomic”, and tempts programmers into using the abstraction when they shouldn’t (i.e. in a loop), at which point N+1s are inevitable.

Again as described in “Avoiding N+1s”, JavaScript’s event loop forcing all I/O calls to “wait just a sec”, until the end of the event loop tick, gives Joist an amazing opportunity, of course via dataloader, to de-dupe all the N+1s into a single SQL call.

For everything.

This works so well, that personally I don’t know that I ever want to work in a programming language/tech stack that cannot use this trick (at least to build backend/line-of-business applications).

Granted, JavaScript is not the only language with an event loop—async Rust is a thing, Python has asyncio, and even Vert.x on the JVM provides it (I prototyped “dataloader ported to Vert.x” several years ago), and either Rust or the JVM (Scala!) would be pretty tempting just in terms of “faster than JavaScript” performance.

But the event loop is only part of the story—another critical part is TypeScript’s type system.

Other TypeScript ORMs like Prisma & Drizzle “solve” N+1s by just not modeling your domain as entities (with lazy-loaded relations), and instead force/assume a single/large up-front query that returns an immutable tree of POJOs.

This does remove the most obvious N+1 footgun (lazy-loaded relations), but it also fundamentally restricts your ability to decompose business logic into smaller/reusable methods, because now any logic that touches the database must be done “in bulk” directly by your code, and often crafted in SQL specifically to how each individual endpoint is accessing the data.

(Concretely, if you had a saveAuthor endpoint with logic/queries to validate “this author is valid”, and now write a batch saveAuthors endpoint, you could not reuse the “written for one entity” logic without rewriting it to work at the new endpoint’s grouped/batch level of granularity. Or similar for saveBook logic that you want to use within a saveAuthor that also upserts multiple children books.)

Instead, Joist’s auto-batching lets you ergonomically write code at the individual entity abstraction level (whether in a loop, or in per-entity validation rules or lifecycle hooks), but still get performant-by-default batched queries.

Loaded Subgraphs: TypeScript’s Type System

Section titled “Loaded Subgraphs: TypeScript’s Type System”

After solving N+1s with the event loop, the next biggest ergonomic problem in traditional, entity-based ORMs is tracking (or basically not tracking) loaded-ness in the type system.

Because you can’t have your entire relational database in memory, domain models must incrementally load their data from the database, as your business logic’s codepaths decide which parts they need to read.

This was another downfall of the Hibernate/ActiveRecord ORMs: there was no notion of “is this relation loaded yet?”, and so any random relation access could trigger the surprise of an expensive database I/O call, as that relation was lazy-loaded from the database.

Joist solves this by statically typing all relations as “unloaded” by default, i.e. accessing an Author’s books requires calling a1.books.load(), which returns a Promise (which is also key to the N+1 prevention above).

Which is great, I/O calls are now obvious, but “do an await for every relation access” would really suck (we tried that), so Joist goes further and uses TypeScript’s type system to not only track individual relation loaded-ness (like author1.books or book2.authors), but mark entire subgraphs of entities as populated/loaded relations and hence synchronously accessible:

// Load the Author plus the specific books + reviews subgrpah
const a1 = await em.load(Author, "a:1", {
populate: { books: { reviews: "comments" } },
});
// a1 is typed as Loaded<Author, { books: { reviews: "comments" } }>
// Tada, no more await Promise.all
a1.books.get.forEach((book) => {
book.reviews.get.forEach((review) => {
console.log(review.comments.get.length);
});
})

This combination of:

  • Explicit .load() / await calls for any I/O, but leveraging
  • Mapped types to allow compiler-checked synchronous access

For me, is also something that I never want to work without again. It’s just so nice.

Unlike JavaScript not having a monopoly on the event loop, for these mapped types I believe TypeScript effectively does have a lock on this capability, from a programming language/type system perspective.

Creating “new types” in other programming languages is generally handled by macros (Scala and Rust), or I suppose Haskell’s higher-kinded-types. But, as far as I know, none of them can combine TypeScript “mapped type + conditional type” features in a way that would allow this “take my user-defined type (Author)” and “this user-defined populate hint type” and fuse them together into a new type, that is “the author with this specific subgraph of fields marked as loaded”.

I’m happy to be corrected on this, but I think TypeScript is the only mainstream programming language that can really power Joist’s Loaded<Author, { books: "reviews" }>-style adhoc typing of subgraphs, or at least this easily.

Other TypeScript ORMs (Prisma, Drizzle, Kysley, etc.) also leverage TypeScript’s mapped types to create dynamic shapes of data, which is legitimately great.

However, they all have the fundamental approach of issuing “one-shot” queries that return immutable trees of POJOs, directly mapped from your SQL tables, and not subgraphs of entities that can have non-SQL abstractions & be further incrementally loaded as/if needed (see Why Joist for more on this).

You can generally see, for both issues covered so far (N+1s and statically-typed loaded-ness), most TypeScript ORMs have “solved” these issues by just removing the features all together, and restricting themselves to be “sophisticated query builders”.

Joist’s innovation is keeping the entity-based, incremental-loading mental model that is historically very popular/idiomatic for ORMs (particularly Ruby’s ActiveRecord), and just fundamentally fixing it to not suck.

This 3rd section is the first feature that is unique to Joist itself: Joist’s “backend reactivity”.

Many ORMs have lifecycle hooks (this entity was created, updated, or deleted—which Joist does as well), to organize side effects/business logic of “when X changes, do Y”.

But just lifecycle hooks by themselves can become tangled, complicated, and a well-known morass of complexity and “spooky action at a distance”.

This is because they’re basically “Web 1.0” imperative spaghetti code, where you have to manually instrument each mutation that might trigger a side effect.

(Concretely, lets say you have a rule that needs to look at both an author and its books. With raw lifecycle hooks, you must separately instrument both the “author update” and “book update” hooks to call your “make sure this author + books combination is still valid” logic. This can become tedious and error-prone, to get all the right hooks instrumented.)

Instead, Joist’s reactive fields and reactive validation rules take the lessons of “declarative reactivity” from the Mobx/Solid/reactivity-aware frontend world, and bring it to the backend: reactive rules & fields declare in one place what their “upstream dependencies” are, and Joist just handles wiring up the necessary cross-entity reactivity.

This brings a level of ease, specificity, and rigor to what are still effectively lifecycle hooks under the hood, that really makes them pleasant to work with.

The declarative nature of Joist’s domain model-wide reactivity graph is also very amenable to DX tooling & documentation generation, but we’ve not yet deeply explored/delivered any functionality that leverages it.

So, these three features are what back up my exaggerated “best ORM ever” assertion.

If tomorrow, I suddenly could not use Joist, and had to find another ORM to use (or, in general, build any sort of application backend on top of a relational database), in any current/mainstream programming language, without a doubt I would want:

  1. Bullet-proof N+1 prevention,
  2. Tracking loaded relation/subgraph state in the type system, and
  3. Backend reactivity, for declarative cross-entity validation rules and reactive fields.

And Joist is the only ORM that does all three of these: two of which are uniquely enabled by the JavaScript/TypeScript stack, and the third just part of Joist’s own innovation.

I usually don’t like making bold/absolutist claims, like “this or that framework is ‘the best’” or “technology x/y/z is terrible” or what not.

I did enough of that early in my career, and at this point I’m more interested in “what are the trade-offs?” and “what’s the best tool for this specific use case?”

So, I hold two somewhat incongruent thoughts in my head, as I am both:

  • Very confident that Joist is “the best” way to build application backends on top of a relational database, for a large majority of use cases/teams/codebases, but I also
  • Recognize it’s “framework” / entity approach (see Why Joist) might be either too opinionated or too much abstraction for some people’s tastes, and just in general choices & alternatives are always great to have.

My guess is if you tried Joist, you would quickly come to like it, but it’s also perfectly fine if not!

Similar to the two incongruent thoughts above, another two semi-contradictory thoughts is the disclaimer that:

  • Joist’s core is very solid and vetted by 4+ years of production usage & continual iteration at Homebound, but also
  • There’s still a lot of work to do, obviously supporting other databases, but also the myriad fun, incremental improvement ideas we’re tracking in the issue tracker, and of course even more that we’ve not thought of yet.

If you have thoughts, questions, or feedback, please let us know! Feel free to join the Joist discord, or file issues on the GitHub repo if you try Joist and run into any issues.

Despite all the hubris in this post, we are still a very small project & community, and so have a lot of growth and improvement ahead of us.

Thanks for the read!

New NextJS Sample App

We’ve added a new NextJS + Joist sample app that shows how Joist can be used in a NextJS application, with several benefits:

  • Automatic N+1 Prevention
  • JSON Payload/Props Creation
  • Optional Join-based Preloading

This post gives a short overview; if you’d like to watch a video, we also have a YouTube video that walks through the sample app.

While building the sample app, we found two fundamental ways of structuring a NextJS app’s render tree:

  1. Fewer RSCs (left side), that prop drill data to the Client Components
    • table.tsx is a server component that loads all data for the tree
    • author-rcc-card.tsx and book-rcc-preview.tsx are client components that accept prop-drilled data
  2. Mostly RSCs (right side), with Client Components only at the bottom
    • table.tsx is a server component but only loads what it needs
    • author-rsc-card.tsx and book-rsc-preview.tsx are RSC and do their own data loading

The top-level Table / table.tsx component renders each of these side-by-side, so we can see the differences, and observe some pros/cons of each approach.

  • With mostly RSC components, it’s easy to decompose data loading away from the top-level component.

    For example, the AuthorRscCard can make its own data loading calls, and even if it’s render many pages on the page, Joist will de-dupe across the N sibling AuthorRscCards, and batch into a single SQL call.

    type AuthorCardProps = {
    /** RSCs can accept the domain model enities as a prop. */
    author: Author;
    addBook: (id: string) => Promise<void>;
    };
    /** The RSC version of AuthorCard can load it's own data. */
    export async function AuthorRscCard({ author, addBook }: AuthorCardProps) {
    // This will be auto-batched if many cards render at once
    const books = await author.books.load();
    // Or if you wanted a tree of data, this will also be auto-batched
    const loaded = await author.populate({ books: { reviews: "ratings" } });
    return <div>...jsx</div>;
    }

    This is nice because it allows the AuthorRscCard to be more self-sufficient, and allow the parent table component to be unaware of its children loading details.

  • With mostly Client components, the opposite happens, and only the parent can make database / EntityManager calls, and so is responsible for loading all the data for its children, and passing it as JSON via props:

    type AuthorCardProps = {
    /** RCCs must accept a POJO of `Author` + all nested data. */
    author: AuthorPayload;
    addBook: (id: string) => Promise<void>;
    };
    /** The RCC version of AuthorCard accepts the `AuthorPayload`. */
    export function AuthorRccCard({ author, addBook }: AuthorCardProps) {
    // can only use data already available on `author`
    }

    Even though the up-front data load can become awkward, it does give more opportunities for optimizations; for example Joist can use join-based preloading to load a single tree of Author + Book + Review entities in a single SQL call, which is even better optimization than the “one query per layer” N+1 prevention of the RSC-based approach.

In either approach, Joist’s N+1 prevention auto-batches database calls, even if they are made across separate component renders.

I.e. in the RSC components:

  • The top-level Table component makes 1 SQL call for all Author entities.
  • All 2nd-level AuthorRscCard cards each make their own author.books.load() (or author.populate(...)) call, but because they’re all rendered in the same event loop, Joist batches all the load calls into 1 SQL call
  • Any 3rd-level components would have their load calls batched as well.

In the React Client Component approach, this auto-batching is admittedly not as necessary, assuming a singular top-level component, like Table, loads all the data at once anyway (although, as mentioned later, Joist can optimize that as well).

See the Avoiding N+1s section of our docs for more information.

Since the client components cannot make their own async data calls, the top-level Table components is responsible for loading all the data into a JSON payload, and passing it down to the children as props.

Joist entities have an easy way of doing this is, via a toJSON method that takes the shape of data to create:

// Define the shape of data to create
export const authorHint = {
id: true,
firstName: true,
books: {
id: true,
title: true,
reviews: ["id", "rating"],
},
customField: (a) => a.id + a.title,
} satisfies JsonHint<Author>;
// This typedef can be used in the client-side props, or to match any
// endpoint-based respones types like for REST/OpenAPI.
export type AuthorPayload = JsonPayload<Author, typeof authorHint>;
const payload = await a.toJSON(authorHint);

The toJSON implementation will:

  • Load any relations that are not yet loaded from the database
  • Output only the keys that are requested in the authorHint
  • Call any lambdas like customField to generate custom values

As with previous examples, all data loading is N+1 safe, and also potentially join-based preloaded.

See the toJSON docs for more information.

This recursive toJSON payload generation is a relatively new feature of Joist, so if you have feature ideas that would make it more useful, please let us know!

The last optimization that Joist can do is join-based preloading, which can be used in either the RSC or RCC approach.

This is also a newer feature that requires opt-ing in to, but in em.ts you can add a preloadPlugin:

/** Returns this request's `EntityManager` instance. */
export const getEm = cache(() => {
// Opt-in to preloading
const preloadPlugin = new JsonAggregatePreloader();
return new EntityManager({}, { driver, preloadPlugin });
});

This will allow Joist to load a deep tree/subgraph of entities in a single SQL call.

For example, normally a Joist em.find a call like:

const a = await em.find(
Author,
{ id: 1 },
{populate: { books: "reviews" } },
);
// Now access all the data in memory
console.log(a.books.get[0].reviews.get[0].rating)

Will issue three SQL calls:

SELECT * FROM authors WHERE id = 1;
SELECT * FROM books WHERE author_id = 1;
SELECT * FROM reviews WHERE book_id IN (1, 2, 3, ...);

But with the preloadPlugin enabled, it will use a single SQL call that uses CROSS JOIN LATERAL and json_agg to return the author’s books, and the book’s reviews (omitted for brevity) in a single row:

select a.id, _b._ as _b from authors as a
cross join lateral
-- create a tuple for each book, and aggregate then into an array of books
select json_agg(json_build_array(_b.id, _b.title, _b.foreword, _b.author_id) order by _b.id) as _
from books _b
where _b.author_id = a.id
) _b
where a.id = ? limit ?

Joist’s join-based preloading is still a beta feature, so if you run into any issues, please let us know!

So far, our queries have focused on loading “just entities”, and then putting those on the wire (or rendering them to HTML).

This is because Joist’s focus is on building robust domain models, and specifically helping solve the “write-side” of your application’s business logic (running the correct validation rules, lifecycle hooks, reactive updates), and less so on the “read-side” of complex queries (i.e. that using aggregates using GROUP BY, multiple nested subqueries/projections/etc.).

As such, Joist does not yet have a sophisticated query builder that can create arbitrary SQL queries, like Kysley or Drizzle.

Instead, Joist encourages an approach that uses its robust write-side features to create materialized columns in the database, such that the majority of your pages/responses really can be served by “super simple SELECT statements”, instead of using complicated queries to calculate aggregates on-the-fly.

Although you can of course use both approaches, and just use a lower-level query builder where needed.

Joist’s roots come from the GraphQL world, so this sample app was our first foray into using it for a NextJS application. If we’ve missed any key features that would make it easier to use Joist in a NextJS app, please let us know!