Simplest Possible Closure Tables
Joist is a TypeScript ORM that lets you declaratively implement your business logic, which we explore in this post by examining this snippet of code:
class Employee { /** Create an employee -> all transitive managers closure/m2m table. */ readonly managersClosure: ReactiveManyToMany<Employee, Employee> = hasReactiveManyToMany( "managersRecursive", (e) => [e, ...e.managersRecursive.get], );}Which we assert is the simplest possible implementation of closure tables. 💥
It achieves this by combining two of Joist’s primitives:
- Recursive collections (the
managersRecursiverelation), and - Reactive infrastructure like
hasReactiveManyToMany
We’ll look at each of these, and also explain closure tables while we’re at it.
Closure Tables
Section titled “Closure Tables”Closure tables are a niche but well known way to store hierarchical data (trees, or arbitrarily nested parent/child data) in a relational database that makes it easy to query “up or down multiple levels of the tree” with a single regular/fast join.
For example, if we have an employees table with a manager_id column, some employees might have “just one manager” (they report directly to the CEO), while other employees might have “their manager, their manager’s manager, their manager’s manager’s manager”, etc. up to the CEO.
Historically, given the “static” nature of SQL queries (i.e. no for loops), queries for “all the transitive managers for this employee” were difficult/impossible to query using just the manager_id column by itself.
Closure tables are a pattern of storing data that make these “find all transitive managers” queries easy.
They do this by, at write-time when manager_id is changed, duplicating each hop from an employee to every single transitive manager as derived data—they don’t replace the manager_id column, they’re calculated from it.
For an example like Fred reports to Bob reports to Jill, the closure table approach stores:
fred.managersClosure=[fred, bob, jill]bob.managersClosure=[bob, jill]jill.managersClosure=[jill]
I’m purposefully representing these as “each employee has a collection of ‘all their managers’” because I think that’s the easiest to mental model to reason about.
In contrast, most closure table articles jump to “what does the managers_closure table look like in the database”, which is something like:
Which is correct, but for me was hard to initially reason about.
Instead, I think it’s most intuitive to articulate the “extra table” (the “closure table”) we’re adding as “each employee has a m2m between themselves and all their managers”—because that’s really what it is. 😅
Now with this managers_closure table we can “find all managers” with a regular/boring SQL query with a single INNER JOIN:
SELECT e.*FROM managers_closure mcJOIN employees e ON e.id = mc.ancestor_idWHERE mc.descendant_id = :employee_idAND mc.ancestor_id <> mc.descendant_id;When we execute this query with :employee_id = fred, it returns two rows:
id=2 descendant_id=fred ancestor_id=bobid=3 descendant_id=fred ancestor_id=jill
And successfully returns “all of Fred’s managers” in a simple, fast query, regardless of “how many levels up” we had to go.
…but Recursive CTEs!
Section titled “…but Recursive CTEs!”Most readers will probably have pointed out that “PostgreSQL solved this” by adding WITH RECURSIVE CTEs.
Which is true!
In modern PostgreSQL, we can often skip the managers_closure table with the following updated query, where the WITH RECURSIVE syntax basically acts as “the missing for loop” for us to walk up a variable number of levels, using only the manager_id column:
WITH RECURSIVE managers AS ( -- anchor: the employee's direct manager SELECT manager_id FROM employees WHERE id = :employee_id
UNION ALL
-- recursive: step up to each manager's manager SELECT e.manager_id FROM employees e JOIN managers m ON e.id = m.manager_id)SELECT manager_idFROM managersWHERE manager_id IS NOT NULL;I’ll defer to other blog posts to explain the syntax, but this is really great!
Joist already has first-class support for RECURSIVE CTEs, in that anytime joist-codegen sees a “self-referential foreign-key” (i.e. the manager_id column that points back at its same employees table), we automatically create a ...Recursive relation that is powered by a RECURSIVE CTE under the hood, i.e.:
const fred = await em.load(Employee, "e:1");// the regular m2o relation returns bobconsole.log(await fred.manager.load());// the recursive relation returns [bob, jill]console.log(await fred.managersRecursive.load());Where the managersRecursive.load() method will issue a SQL statement exactly like the WITH RECURSIVE CTE above, and get us all their transitive managers.
And in stereotypical “never N+1” fashion, Joist will auto-batch the SQL so that we could load 100 or 1000 employees’ worth of managesRecursive and it would still be a single SQL query. 🎉
Joist’s built-in support puts the power of RECURSIVE CTEs at your fingertips—just a load / populate call away. 😀
Why Still Use Closure Tables?
Section titled “Why Still Use Closure Tables?”So, with this great PostgreSQL feature, are closure tables still applicable?
For us, the answer is usually no—unless there is a very high-performance query to optimize. 🚀
For example, internally we’re prototyping an RBAC-based auth system where permission grants “inherit” down a stack of “permission buckets”—like if Jill is the CEO, she can read the salary data of Bob and Fred, Bob can read the salary data of Fred, etc.
In this setup, we’ll issue “what are your transitive auth permissions?” queries basically all the time, and need these queries to be extremely fast, i.e. easy for the Postgres query planner to optimize & execute, which it is still better at doing for the “dumb/single join” of closure table queries, over recursive CTE queries.
So, for this use case, we think it’s worth the write-time cost of generating an old-school permission_bucket_closures table, particularly if Joist can help us implement it as easily as possible.
Closure Table Maintenance
Section titled “Closure Table Maintenance”We skipped over one of the biggest cons of closure tables—keeping them up to date.
If we have our same chain of Fred is managed by Bob is managed Jill (that resulted in the 6 managers_closure rows in the image above), but now we add Jan as a layer of management between Bob and Jill, we have a lot of bookkeeping to update.
Specifically we need to add four rows:
Which can be intuitively explained as:
- All managers above Jan need Jan as a new report
- All employees below Jan need Jan as a new boss,
This doesn’t seem too bad—but what if the hierarchy had more than just a single bob.manager = jan mutation? What if several relationships changed at once?
It starts to get complicated to make it “just work”—until we lean into Joist’s reactivity.
Joist Reactivity for the Win
Section titled “Joist Reactivity for the Win”Let’s look at the two updates we need for “Jan is a new layer of management”:
- All managers above Jan need Jan as a new report
- All employees below Jan need Jan as a new boss,
The first important insight is to lean into our mental model of “closure tables are just fancy m2m collections”, such that:
- All managers above Jan need Jan as a new report —>
- This is really saying set
jan.managersClosure=[jan, jill] - I.e. calcing the m2m for Jan herself will “update the managers above her” because it is “Jan’s m2m collection” that fundamentally controls “who are all of her transitive managers”?
- This is really saying set
- All employees below Jan need Jan as a new boss —>
- This is really saying recalc
[fred, bob].managersClosure=[..., jan] - I.e. re-calcing the m2m for “Jan’s transitive reports” to let them see “now Jan is also a manager of me”
- This is really saying recalc
And the second insight is that “recalculating the m2m all of Jan’s downstream reports” is exactly what Joist’s reactivity does when it sees a employee.manager relationship change—it knows bob.manager changed, hence bob.managersRecursive changed, hence anyone “watching bob.managersRecursive” which requires “looking down instead of up” needs to be told about the change.
All of this is achieved by our managersClosure declaration:
readonly managersClosure: ReactiveManyToMany<Employee, Employee> = hasReactiveManyToMany( "managersRecursive", (e) => [e, ...e.managersRecursive.get], );When the jan.manager = jill and bob.manager = jan graph mutation happens, Joist does roughly:
- Setting
jan.manager = jilltriggers reactivity on “who is watching this write?”- We see
managersClosureis watchingjan.managersRecursive, and “reverse the hint” to find employee’s below Jan that need to know about this change - This queues
[jan, bob, fred]to have theirmanagersClosurerecalc-d
- We see
- Setting
bob.manager = janalso triggers “who is watching this write?”- We again see
managersClosureis watchingbob.managersRecursive, and “reverse the hint” to find employee’s below Bob - This queues
[bob, fred]to have theirmanagersClosurerecalc-d- …technically this is a noop b/c they’re already queued
- We again see
- We recalc the
[jan, bob, fred].managersClosurerelations all at once- Joist resolves the
managersRecursivereactive hint for all three entities- If any relations are already loaded, we use the in-memory results
- If any relations are not loaded, we issue a single batched CTE query for their data
- When the batched SQL resolves, the
managersClosurelambda is invoked for each of[jan, bob, fred] - This sets
jan.managersClosure = [jan, jill] - And sets
bob.managersClosure = [bob, jan, jill] - And sets
fred.managersClosure = [fred, bob, jan, jill]
- Joist resolves the
- We call
em.flush()- The three mutated
managerClosuresm2m collections are diffed - We issue a single
INSERT INTO managers_closurewith the new rows
- The three mutated
This is admittedly a lot 😵, but thankfully it’s all driven by Joist’s internal infra keeping the hasReactiveManyToMany m2m up-to-date, by respecting the managersRecursive reactive hint we declared in the code.
And the key outcome is that the closure table was completely updated for all affected employees, and we only issued 3 SQL calls, 2 reads and 1 write:
- A single “down the tree”
SELECTto find the employees to recalc - A single “up the tree”
SELECTto load the data for each lambda - A single
INSERTthe new m2m rows.
And because each of these operations is fundamentally batched, it will be the same 3 SQL calls if we’re changing 1 employee/manager relation or 100 employee/manager relationships. 🤯
A pretty amazing result for a 1-liner—can you imagine trying to achieve this with an adhoc, hand-coded implementation? It’s not something I would readily sign up for. 😬
Closing Thoughts
Section titled “Closing Thoughts”That is our deep-dive of how two Joist primitives, recursive CTE support & reactive m2m relations, can combine to drive a really powerful result—closure tables implemented as a 1-liner.
The neatest aspect, to us, is that when we started our internal RBAC auth system, we did not need to “build closure tables as a new feature of Joist”—instead the solution emerged elegantly as just a combination of our existing features.
To be clear, we’re not expecting readers to rush out and implement closure tables in their application—it is still a niche pattern, and, as we’ve seen, Postgres’s recursive CTEs provide a great modern alternative (which Joist’s recursive relations also put at your fingertips).
Instead our purpose is illustrating the power of Joist’s “declarative business logic” and “reactive calculations”, and encouraging readers to think about how these features could make their lives easier in their own work.