Hacker News

MSc Thesis – The Limits of Generalized Sync

19 points by bebraw ago | 9 comments
One of the main challenges related to web development is maintaining state across the client and the server and most web applications have to solve this somehow. Because of this reason, so-called sync engines have become available as they can own a large part of data synchronization.

In his MSc thesis, my student Mikael Siidorow looked into the space to find out the limits of generalized sync. He did his study through multiple methods including literature review, interviews, and a case study. In the end he came up with a taxonomy showing where generalized sync breaks and what you have to keep in mind when implementing these solutions.

This is not to say sync engines are useless, but that there are clear tradeoffs to consider when introducing them to your codebase especially if you have to deal with an offline requirement.

andersmurphy |next [-]

> Generalization breaks down for offline-capable applications. Offline writes require conflict resolution, create authorization edge cases, and demand coordinated schema management across server and client replicas.

> ...These constraints are structural; engineering effort cannot remove them...

> The trade-off analysis shows that three sync engine vendors converged independently on this conclusion from different starting positions.

This is the big irony. That the vendors all converged on the fact that sync engines only really "work" when you remove the offline part. But, at that point they are a complicated/over engineered cache or worse introducing hard distributed computer science problems unnecessarily.

aboodman |root |parent [-]

> sync engines only really "work" when you remove the offline part

I don't see the gotcha here. I don't care about the offline part. I mean I accept that some people do, but that's not where the value comes from for many major synced products like Linear, Notion, Superhuman.

> complicated/over engineered cache

Sync engines are nothing at all like a cache - that's the point.

They are a replica. Caches are by nature inconsistent. Every entry in the cache is from a different point in time.

Sync engines replicate a consistent subset of your database to the client as an atomic unit. This enables things caches can't do:

  * New, fresh queries can be returned instantly from local state
  * Mutations can apply locally (optimistically) automatically, without custom code
  * The UI updates automatically to reflect server changes
Whether these things are necessary depends on your application. But basically all productivity applications of any complexity keep re-implementing sync engines by hand so I guess most apps do in fact find them necessary.

Whether sync engines generalize is an open question. Is it hard? Yes. Is it a distributed systems problem? Yes! Is it worth doing? I think it is. Web applications often suck and sync engines are an important part of many of the ones that don't. I want to enable that experience for more apps without teams having to build it themselves.

drnick1 |next |previous [-]

Since when are master's theses published on HN? Not even Ph.D. work at a top school typically qualifies because it is too narrow to be of general interest.

WorkerBee28474 |root |parent |next [-]

Well I have seen A Symbolic Analysis of Relay and Switching Circuits here quite a few times, but I suspect this post will get far less engagement.

bebraw |root |parent |previous [-]

I brought this up because of the exceptional quality of the work. While limited to web, I believe the findings are interesting enough to be shared with broader public.

Overall that's a good point, though, and for this reason I do this with only select few of my students. :)

andersmurphy |next |previous [-]

Yeah, it's an intellectually intoxicating idea but incredibly hard to get right.

For me the problem is that in practice it only fits really well with quite a specific subset of problems, but we desperately want it to be a general solution that can apply to all the things (or at least it's often marketed that way).

BobbyTables2 |next |previous [-]

Given that the URL returns a “Forbidden” response, I’d say there are some other limits he didn’t consider…

bebraw |root |parent |previous [-]

Let me see if we can get this on ResearchGate or some other platform. Maybe Aalto servers didn't like us.

|previous [-]