Pipelining for 3-6x Faster Commits
Pipelining INSERTs and UPDATEs statements to make commits 3-6x faster
Pipelining INSERTs and UPDATEs statements to make commits 3-6x faster
Joist’s mission is to model your application’s business logic, with first-class support for domain modeling features & concepts.
A great example of this is Joist’s support for something as simple as default values: for example, the Author.status
field should default to Active
.
Joist’s default values support grew from “the simplest thing possible” (requiring adhoc patterns that engineers would copy/paste around) to a robust, first-class feature (an explicit setDefault
API that “just works”).
This is a microcosm of Joist’s goal to identify the repeated patterns and pain points involved in “building a domain model”, and provide elegant features with a great DX.
Joist’s initial defaults support was purposefully “as simple as possible”, and limited to DEFAULT
s declared in the database schema, i.e. an is_archived
field that defaults to FALSE
, or a status_id
that defaults to DRAFT
:
CREATE TABLE example_table ( id SERIAL PRIMARY KEY, is_archived BOOL DEFAULT false, status_id INTEGER DEFAULT 1,);
Joist’s codegen would recognize these, and “apply them immediately” when creating an entity:
const a = em.create(Author, {});expect(a.status).toBe(AuthorStatus.Draft); // already Draftexpect(a.isArchived).toBe(false); // already false
This was super-simple, and had a few pros:
status
is immediately within the em.create
em.flush
to “see the database default”Being limited to static DEFAULT
values is not great, so the first way of implementing more complicated “dynamic defaults” was using Joist’s beforeCreate
hooks:
/** Any author created w/non-zero amount of books defaults to Published. */authorConfig.beforeCreate("books", a => { if (a.status === undefined) { a.status = a.books.get.length > 0 ? AuthorStatus.Published : AuthorStatus.Draft; }})
This was a quick-win b/c Joist already supported beforeCreate
hooks, but had a few cons:
em.flush
em.create
) and “dynamic defaults” (applied during flush
)a.status
is already set (not a huge deal, but boilerplate)beforeCreate
” wouldn’t be seen right away.We lived with the Version 1 & 2 options for several years, because they were “good enough”, but for the 3rd version, we wanted to start “setting defaults” on the road to being “more than just good enough”.
Specifically, we wanted a first-class, idiomatic way to “declaratively specify a field’s default value” instead of the previous “manually check the field in a beforeCreate
hook”.
So we added config.setDefault
, which accepts the field name, it’s dependencies (if any), and a lambda that would calculate the default value:
/** Calculate the Author.status default, based on number of books. */authorConfig.setDefault("status", "books", (a) => { // a.books.get is available, but a.firstName is not, b/c it's not listed as a dependency return a.books.get.length > 0 ? AuthorStatus.Published : AuthorStatus.Draft;})
This was a great start, but we pushed it out knowingly half-baked:
ReactiveField
infra and is great for ensuring dependencies aren’t missedsetDefault
lambdas were still not invoked until em.flush
After having the setDefault
API in production for a few months, the next improvement was to capitalize on “knowing our dependencies” and allow defaults to depend on other defaults.
For example, maybe our Author.status
default needs to know whether any of the books are published (which itself is a default):
// In `Author.ts`authorConfig.setDefault("status", { books: "status" }, a => { const anyBookPublished = a.books.get.some(b => b.status === BookStatus.Published); return anyBookPublished ? AuthorStatus.Published : AuthorStatus.Draft;})
// In `Book.ts`bookConfig.setDefault("status", {}, b => { // Some business logic that dynamically determines the status return BookStatus.Published;});
Now, if both a Book
and an Author
are created at the same time, em.flush
will ensure that the Book.status
is calculated before invoking the Author.status
default—we’ve solved our ordering issue!
This was a major accomplishment—cross-entity defaults had been a thorn in our side for years.
(Fwiw we readily admit this is a rare/obscure need—in our domain model of 100s of entities, we have only ~2-3 of these “cross-entity defaults”, so we want to be clear this is not necessarily a “must have” feature—but, when you need it, it’s extremely nice to have!)
The next DX iteration was solving the duplication of “factories want the defaults too!”.
Looking more closely at this issue, Joist’s test factories are synchronous, which means we can create test data easily without any await
s:
// Given an authorconst a = newAuthor(em);// And a bookconst b = newBook(em, { author: a });// And setup something else using b.title// ...if there is "default title logic", it will not have ran yet, which// can be confusing for tests/other logic expecting that behaviorconsole.log(b.title);
The lack of await
s is very nice! But it does mean, if we really wanted b.title
to immediately reflect its production default, we had recode the default logic into the newBook
factory:
export function newBook(em: EntityManager): DeepNew<Book> { return newTestInstance(em, Book, { title: "recode the Book default logic here", });}
As before, for a while this was “good enough”—but finally in this iteration, we taught the factories to leverage their “each test’s data is already in memory” advantage and just invoke the defaults immediately during the newTestInstance
calls.
This works even for setDefault
s that use load hints, like “author status depends on its books”:
// In `Author.ts`authorConfig.setDefault("status", { books: "status" }, a => { const anyBookPublished = a.books.get.some(b => b.status === BookStatus.Published); return anyBookPublished ? AuthorStatus.Published : AuthorStatus.Draft;})
In production, Joist can’t assume “the author’s books are already in-memory”, so em.flush
would first load / await
for the a.books
to be loaded, and then invoke the lambda.
However, because our tests know that a.books
is already in memory, they can skip this await
, and immediately invoke the lambda.
Always looking ahead, the next itch we have is that, currently, default lambdas that call async methods like em.find
or em.findOrCreate
are still skipped during newTestInstance
and only run during em.flush
.
Which means, for these defaults, we still have remnants of the “write it twice” defaults anti-pattern—albeit very few of them!
We should be able to lift this restriction as well, with a little bit of work (…maybe :thinking:, the newBook
call is fundamentally synchronous, so maybe not).
Wrapping up, besides a “walk down memory lane”, the larger point of this post is highlighting Joist’s journey of continually grinding on DX polish—we’re about five years into Joel’s Good Software Takes 10 Years, so only another 5 to go! :smile:
Of course, it’d be great for this evolution to happen more quickly—i.e. if we had a dependency-aware, factory-aware, amazing setDefault
API from day one.
But, often times jumping to an abstraction can be premature, and result in a rushed design—so sometimes it doesn’t hurt to “sit with the itch” for a little while, evolve it through multiple iterations of “good enough”, until finally a pleasant/robust solution emerges.
And, perhaps most pragmatically, small iterations helps spread the implementation out over enough hack days that it can actually get shipped. :ship:
Joist’s development is currently very incremental, and doesn’t have “big release” milestones & release notes, but we recently released a notable new feature: recursive relations. Check them out! :tada:
I’ve been working on the Joist docs lately, specifically a Why Joist? page, which ended up focusing more on “why Domain Models?” than a feature-by-feature description of Joist.
Which is fine, but a good friend (and early Joist user) proofread it, and afterward challenged me that I was being too humble, and I should be more assertive about Joist being “THE BEST ORM FOR TYPESCRIPT AND POSTGRES” (his words), as he listed off his own personal highlights:
All of these are true.
But in thinking about his challenge, of pitching Joist specifically as “the best ORM for TypeScript & Postgres”, I actually think I can be even more bullish and assert Joist is, currently, the best ORM, in any language, ever, TypeScript or otherwise.
Which is crazy, right? How could I possibly assert this?
I have three reasons; admittedly the first two are not technically unique to Joist, but both foundational to its design and implementation, and the third that is one of Joist’s “special sauces”:
I’ve used many ORMs over the years, going back to Java’s Hibernate, Ruby’s ActiveRecord, and a few bespoke ones in between.
Invariably, they all suffer from N+1s.
I don’t want to repeat Joist’s existing Avoiding N+1s docs, but basically “entities are objects with fields/methods that incrementally lazy-load their relations from the database” is almost “too ergonomic”, and tempts programmers into using the abstraction when they shouldn’t (i.e. in a loop), at which point N+1s are inevitable.
Again as described in “Avoiding N+1s”, JavaScript’s event loop forcing all I/O calls to “wait just a sec”, until the end of the event loop tick, gives Joist an amazing opportunity, of course via dataloader, to de-dupe all the N+1s into a single SQL call.
For everything.
This works so well, that personally I don’t know that I ever want to work in a programming language/tech stack that cannot use this trick (at least to build backend/line-of-business applications).
Granted, JavaScript is not the only language with an event loop—async Rust is a thing, Python has asyncio, and even Vert.x on the JVM provides it (I prototyped “dataloader ported to Vert.x” several years ago), and either Rust or the JVM (Scala!) would be pretty tempting just in terms of “faster than JavaScript” performance.
But the event loop is only part of the story—another critical part is TypeScript’s type system.
Other TypeScript ORMs like Prisma & Drizzle “solve” N+1s by just not modeling your domain as entities (with lazy-loaded relations), and instead force/assume a single/large up-front query that returns an immutable tree of POJOs.
This does remove the most obvious N+1 footgun (lazy-loaded relations), but it also fundamentally restricts your ability to decompose business logic into smaller/reusable methods, because now any logic that touches the database must be done “in bulk” directly by your code, and often crafted in SQL specifically to how each individual endpoint is accessing the data.
(Concretely, if you had a saveAuthor
endpoint with logic/queries to validate “this author is valid”, and now write a batch saveAuthors
endpoint, you could not reuse the “written for one entity” logic without rewriting it to work at the new endpoint’s grouped/batch level of granularity. Or similar for saveBook
logic that you want to use within a saveAuthor
that also upserts multiple children books.)
Instead, Joist’s auto-batching lets you ergonomically write code at the individual entity abstraction level (whether in a loop, or in per-entity validation rules or lifecycle hooks), but still get performant-by-default batched queries.
After solving N+1s with the event loop, the next biggest ergonomic problem in traditional, entity-based ORMs is tracking (or basically not tracking) loaded-ness in the type system.
Because you can’t have your entire relational database in memory, domain models must incrementally load their data from the database, as your business logic’s codepaths decide which parts they need to read.
This was another downfall of the Hibernate/ActiveRecord ORMs: there was no notion of “is this relation loaded yet?”, and so any random relation access could trigger the surprise of an expensive database I/O call, as that relation was lazy-loaded from the database.
Joist solves this by statically typing all relations as “unloaded” by default, i.e. accessing an Author’s books requires calling a1.books.load()
, which returns a Promise
(which is also key to the N+1 prevention above).
Which is great, I/O calls are now obvious, but “do an await
for every relation access” would really suck (we tried that), so Joist goes further and uses TypeScript’s type system to not only track individual relation loaded-ness (like author1.books
or book2.authors
), but mark entire subgraphs of entities as populated/loaded relations and hence synchronously accessible:
// Load the Author plus the specific books + reviews subgrpahconst a1 = await em.load(Author, "a:1", { populate: { books: { reviews: "comments" } },});
// a1 is typed as Loaded<Author, { books: { reviews: "comments" } }>// Tada, no more await Promise.alla1.books.get.forEach((book) => { book.reviews.get.forEach((review) => { console.log(review.comments.get.length); });})
This combination of:
.load()
/ await
calls for any I/O, but leveragingFor me, is also something that I never want to work without again. It’s just so nice.
Unlike JavaScript not having a monopoly on the event loop, for these mapped types I believe TypeScript effectively does have a lock on this capability, from a programming language/type system perspective.
Creating “new types” in other programming languages is generally handled by macros (Scala and Rust), or I suppose Haskell’s higher-kinded-types. But, as far as I know, none of them can combine TypeScript “mapped type + conditional type” features in a way that would allow this “take my user-defined type (Author)” and “this user-defined populate hint type” and fuse them together into a new type, that is “the author with this specific subgraph of fields marked as loaded”.
I’m happy to be corrected on this, but I think TypeScript is the only mainstream programming language that can really power Joist’s Loaded<Author, { books: "reviews" }>
-style adhoc typing of subgraphs, or at least this easily.
Other TypeScript ORMs (Prisma, Drizzle, Kysley, etc.) also leverage TypeScript’s mapped types to create dynamic shapes of data, which is legitimately great.
However, they all have the fundamental approach of issuing “one-shot” queries that return immutable trees of POJOs, directly mapped from your SQL tables, and not subgraphs of entities that can have non-SQL abstractions & be further incrementally loaded as/if needed (see Why Joist for more on this).
You can generally see, for both issues covered so far (N+1s and statically-typed loaded-ness), most TypeScript ORMs have “solved” these issues by just removing the features all together, and restricting themselves to be “sophisticated query builders”.
Joist’s innovation is keeping the entity-based, incremental-loading mental model that is historically very popular/idiomatic for ORMs (particularly Ruby’s ActiveRecord), and just fundamentally fixing it to not suck.
This 3rd section is the first feature that is unique to Joist itself: Joist’s “backend reactivity”.
Many ORMs have lifecycle hooks (this entity was created, updated, or deleted—which Joist does as well), to organize side effects/business logic of “when X changes, do Y”.
But just lifecycle hooks by themselves can become tangled, complicated, and a well-known morass of complexity and “spooky action at a distance”.
This is because they’re basically “Web 1.0” imperative spaghetti code, where you have to manually instrument each mutation that might trigger a side effect.
(Concretely, lets say you have a rule that needs to look at both an author and its books. With raw lifecycle hooks, you must separately instrument both the “author update” and “book update” hooks to call your “make sure this author + books combination is still valid” logic. This can become tedious and error-prone, to get all the right hooks instrumented.)
Instead, Joist’s reactive fields and reactive validation rules take the lessons of “declarative reactivity” from the Mobx/Solid/reactivity-aware frontend world, and bring it to the backend: reactive rules & fields declare in one place what their “upstream dependencies” are, and Joist just handles wiring up the necessary cross-entity reactivity.
This brings a level of ease, specificity, and rigor to what are still effectively lifecycle hooks under the hood, that really makes them pleasant to work with.
The declarative nature of Joist’s domain model-wide reactivity graph is also very amenable to DX tooling & documentation generation, but we’ve not yet deeply explored/delivered any functionality that leverages it.
So, these three features are what back up my exaggerated “best ORM ever” assertion.
If tomorrow, I suddenly could not use Joist, and had to find another ORM to use (or, in general, build any sort of application backend on top of a relational database), in any current/mainstream programming language, without a doubt I would want:
And Joist is the only ORM that does all three of these: two of which are uniquely enabled by the JavaScript/TypeScript stack, and the third just part of Joist’s own innovation.
I usually don’t like making bold/absolutist claims, like “this or that framework is ‘the best’” or “technology x/y/z is terrible” or what not.
I did enough of that early in my career, and at this point I’m more interested in “what are the trade-offs?” and “what’s the best tool for this specific use case?”
So, I hold two somewhat incongruent thoughts in my head, as I am both:
My guess is if you tried Joist, you would quickly come to like it, but it’s also perfectly fine if not!
Similar to the two incongruent thoughts above, another two semi-contradictory thoughts is the disclaimer that:
If you have thoughts, questions, or feedback, please let us know! Feel free to join the Joist discord, or file issues on the GitHub repo if you try Joist and run into any issues.
Despite all the hubris in this post, we are still a very small project & community, and so have a lot of growth and improvement ahead of us.
Thanks for the read!
We’ve added a new NextJS + Joist sample app that shows how Joist can be used in a NextJS application, with several benefits:
This post gives a short overview; if you’d like to watch a video, we also have a YouTube video that walks through the sample app.
While building the sample app, we found two fundamental ways of structuring a NextJS app’s render tree:
table.tsx
is a server component that loads all data for the treeauthor-rcc-card.tsx
and book-rcc-preview.tsx
are client components that accept prop-drilled datatable.tsx
is a server component but only loads what it needsauthor-rsc-card.tsx
and book-rsc-preview.tsx
are RSC and do their own data loadingThe top-level Table
/ table.tsx
component renders each of these side-by-side, so we can see the differences, and observe some pros/cons of each approach.
With mostly RSC components, it’s easy to decompose data loading away from the top-level component.
For example, the AuthorRscCard
can make its own data loading calls, and even if it’s render many pages on the page, Joist will de-dupe across the N
sibling AuthorRscCard
s, and batch into a single SQL call.
type AuthorCardProps = { /** RSCs can accept the domain model enities as a prop. */ author: Author; addBook: (id: string) => Promise<void>;};
/** The RSC version of AuthorCard can load it's own data. */export async function AuthorRscCard({ author, addBook }: AuthorCardProps) { // This will be auto-batched if many cards render at once const books = await author.books.load(); // Or if you wanted a tree of data, this will also be auto-batched const loaded = await author.populate({ books: { reviews: "ratings" } }); return <div>...jsx</div>;}
This is nice because it allows the AuthorRscCard
to be more self-sufficient, and allow the parent table component to be unaware of its children loading details.
With mostly Client components, the opposite happens, and only the parent can make database / EntityManager
calls, and so is responsible for loading all the data for its children, and passing it as JSON via props:
type AuthorCardProps = { /** RCCs must accept a POJO of `Author` + all nested data. */ author: AuthorPayload; addBook: (id: string) => Promise<void>;};
/** The RCC version of AuthorCard accepts the `AuthorPayload`. */export function AuthorRccCard({ author, addBook }: AuthorCardProps) { // can only use data already available on `author`}
Even though the up-front data load can become awkward, it does give more opportunities for optimizations; for example Joist can use join-based preloading to load a single tree of Author
+ Book
+ Review
entities in a single SQL call, which is even better optimization than the “one query per layer” N+1 prevention of the RSC-based approach.
In either approach, Joist’s N+1 prevention auto-batches database calls, even if they are made across separate component renders.
I.e. in the RSC components:
Table
component makes 1 SQL call for all Author
entities.AuthorRscCard
cards each make their own author.books.load()
(or author.populate(...)
) call, but because they’re all rendered in the same event loop, Joist batches all the load
calls into 1 SQL callload
calls batched as well.In the React Client Component approach, this auto-batching is admittedly not as necessary, assuming a singular top-level component, like Table
, loads all the data at once anyway (although, as mentioned later, Joist can optimize that as well).
See the Avoiding N+1s section of our docs for more information.
Since the client components cannot make their own async data calls, the top-level Table
components is responsible for loading all the data into a JSON payload, and passing it down to the children as props.
Joist entities have an easy way of doing this is, via a toJSON
method that takes the shape of data to create:
// Define the shape of data to createexport const authorHint = { id: true, firstName: true, books: { id: true, title: true, reviews: ["id", "rating"], }, customField: (a) => a.id + a.title,} satisfies JsonHint<Author>;
// This typedef can be used in the client-side props, or to match any// endpoint-based respones types like for REST/OpenAPI.export type AuthorPayload = JsonPayload<Author, typeof authorHint>;
const payload = await a.toJSON(authorHint);
The toJSON
implementation will:
authorHint
customField
to generate custom valuesAs with previous examples, all data loading is N+1 safe, and also potentially join-based preloaded.
See the toJSON docs for more information.
This recursive toJSON
payload generation is a relatively new feature of Joist, so if you have feature ideas that would make it more useful, please let us know!
The last optimization that Joist can do is join-based preloading, which can be used in either the RSC or RCC approach.
This is also a newer feature that requires opt-ing in to, but in em.ts
you can add a preloadPlugin
:
/** Returns this request's `EntityManager` instance. */export const getEm = cache(() => { // Opt-in to preloading const preloadPlugin = new JsonAggregatePreloader(); return new EntityManager({}, { driver, preloadPlugin });});
This will allow Joist to load a deep tree/subgraph of entities in a single SQL call.
For example, normally a Joist em.find
a call like:
const a = await em.find( Author, { id: 1 }, {populate: { books: "reviews" } },);// Now access all the data in memoryconsole.log(a.books.get[0].reviews.get[0].rating)
Will issue three SQL calls:
SELECT * FROM authors WHERE id = 1;SELECT * FROM books WHERE author_id = 1;SELECT * FROM reviews WHERE book_id IN (1, 2, 3, ...);
But with the preloadPlugin
enabled, it will use a single SQL call that uses CROSS JOIN LATERAL
and json_agg
to return the author’s books, and the book’s reviews (omitted for brevity) in a single row:
select a.id, _b._ as _b from authors as a cross join lateral -- create a tuple for each book, and aggregate then into an array of books select json_agg(json_build_array(_b.id, _b.title, _b.foreword, _b.author_id) order by _b.id) as _ from books _b where _b.author_id = a.id ) _b where a.id = ? limit ?
Joist’s join-based preloading is still a beta feature, so if you run into any issues, please let us know!
So far, our queries have focused on loading “just entities”, and then putting those on the wire (or rendering them to HTML).
This is because Joist’s focus is on building robust domain models, and specifically helping solve the “write-side” of your application’s business logic (running the correct validation rules, lifecycle hooks, reactive updates), and less so on the “read-side” of complex queries (i.e. that using aggregates using GROUP BY
, multiple nested subqueries/projections/etc.).
As such, Joist does not yet have a sophisticated query builder that can create arbitrary SQL queries, like Kysley or Drizzle.
Instead, Joist encourages an approach that uses its robust write-side features to create materialized columns in the database, such that the majority of your pages/responses really can be served by “super simple SELECT
statements”, instead of using complicated queries to calculate aggregates on-the-fly.
Although you can of course use both approaches, and just use a lower-level query builder where needed.
Joist’s roots come from the GraphQL world, so this sample app was our first foray into using it for a NextJS application. If we’ve missed any key features that would make it easier to use Joist in a NextJS app, please let us know!