Hacker News

Show HN: SQL-tap – Real-time SQL traffic viewer for PostgreSQL and MySQL

217 points by mickamy ago | 40 comments
sql-tap is a transparent proxy that captures SQL queries by parsing the PostgreSQL/MySQL wire protocol and displays them in a terminal UI. You can run EXPLAIN on any captured query. No application code changes needed — just change the port.

qxxx |next [-]

Just tried it out. Works fine. Love it! I tried it with a wordpress site. It is showing hundreds of sql queries in one request (thats probably why that wordpress site is so slow lol)

What I would love to see here is:

- some kind of sorting: eg. by excecution time or order. So I can see the slowest queries.

- search/filter feature.

- faster scrolling with pgup/pgdown keys.

- maybe how often the same query was executed. I could check the code and maybe optimize the queries.

buremba |next |previous [-]

This is very neat! IMO inspecting the queries the agents run on the database is a better approach to understand how the code works, even more than reviewing the code.

I just tried and it works smoothly. For those who doesn't want to plug in the agents to their database directly, I built a similar tool https://dbfor.dev for the exact purpose, it just embeds PGLite and implements PG wire protocol to spin up quick PG databases with a traffic viewer included.

ahachete |next |previous [-]

IMO transparent proxies for observability are not the best pattern. And I speak from experience, we developed the Postgres plugin for Envoy [1], [2] and we use it in StackGres [3], among others, for this very same reason, observability.

There's two main problems with said proxies:

* Latency. Yes, yes, yes, they add "microseconds" vs "milliseconds for queries", and that's true, but just part of the story. There's an extra hop. There's two extra sets of TCP layers being traversed. If the hop is local (say a sidecar, as we do in StackGres) it adds complexity in its deployment and management (something we solved by automation, but was an extra problem to solve) and consumes resources. If it's a network hop, then adds milliseconds, and not microseconds.

* Performance. It's not that hard to write a functioning PG wire proxy (it's not trivial either). But it is extremely hard to make it perform well under high load scenarios. Most of the proxies I have seen crack down under moderate to high performance.

What's the solution then? The Postgres extension model to capture the metrics (we also experimented with eBPF, but it causes too many kernel-user space context switches when you can do the same in an extension without them), and a small sidecar to push the metrics out via a standardized protocol like OTEL.

[1]: https://www.envoyproxy.io/docs/envoy/latest/configuration/li...

[2]: https://www.cncf.io/blog/2020/08/13/envoy-1-15-introduces-a-...

[3]: https://stackgres.io

Edit: formatting

Quarrel |root |parent |next [-]

I get what you're saying about a proxy like this, latency & performance would suffer, however minor, and in production DB land this really matters.

I've just not sure it is much of a slight on such proxies.

You don't need to run this always inline in production to get amazingly useful results. Yes, there are lots of production insight solutions out there, but lots of modern stacks can be complex enough that just getting a quick handle on how the page you're debugging talks to your DBs can be incredibly useful, which is where I love the idea of a solution like this.

Sure, it is mytop / pgtop, but trying to offering it at a different layer & with a modern interface. Seems useful to me.

tudorg |root |parent |previous [-]

> The Postgres extension model to capture the metrics (we also experimented with eBPF, but it causes too many kernel-user space context switches when you can do the same in an extension without them), and a small sidecar to push the metrics out via a standardized protocol like OTEL.

The extension model is great, but it doesn't work with existing postgres providers (RDS, Aurora, etc.). Unless one such extension becomes standard enough that all providers will support it. That would be ideal, IMO.

To be clear, I don't mean pg_stat_statements, that is standard enough, but an extension that pushes the actual queries in real-time.

> If it's a network hop, then adds milliseconds, and not microseconds.

Are you talking about connection establishing time or for query delay? I think it should normally be under a millisecond for the later.

ahachete |root |parent [-]

> The extension model is great, but it doesn't work with existing postgres providers (RDS, Aurora, etc.). Unless one such extension becomes standard enough that all providers will support it. That would be ideal, IMO.

That's true, but that's a problem of the PGaaS providers for them to fix by providing the best functionality available. I'm planning on following this route in a pure OSS basis.

> Are you talking about connection establishing time or for query delay? I think it should normally be under a millisecond for the later.

Network trip. If the proxy is not co-located with the database but rather a network hop away, that's usually adding at least 1ms there, could be more.

lasgawe |next |previous [-]

I run two production RDS instances by myself. I never thought I really needed something like this. What I usually did was just check the details on the AWS dashboard. Thanks for building this. I love it.

DavidKarlas |next |previous [-]

Could I put this into default docker-compose for developers, so when they work on project with micro services they can quickly inspect SQL queries if something weird is happening? How would this UI work with that scenario, feels like React frontend would better serve that purpose.

nullsanity |root |parent [-]

I was with you until you said react. Just export to existing metrics software like Prometheus. Or do anything other than use an entire JavaScript framework for a simple UI. I swear, JS-Brain is as terminal as microservice and cloud brain.

camel_gopher |next |previous [-]

Why do you need a proxy? Pull the queries off the network. You’re adding latency to every query!

https://github.com/circonus-labs/wirelatency

Sentinel-gate |root |parent |next [-]

The proxy vs packet capture debate is a bit of a non-debate in practice — the moment TLS is on (and it should always be on), packet capture sees nothing useful. eBPF is interesting for observability but it works at the network/syscall level — doing actual SQL-level inspection or blocking through eBPF would mean reassembling TCP streams and parsing the Postgres wire protocol in kernel space, which is not really practical.

I've been building a Postgres wire protocol proxy in Go and the latency concern is the thing people always bring up first, but it's the wrong thing to worry about. A proxy adds microseconds, your queries take milliseconds. Nobody will ever notice. The actual hard part — the thing that will eat weeks of your life — is implementing the wire protocol correctly. Everyone starts with simple query messages and thinks they're 80% done. Then you hit the extended query protocol (Parse/Bind/Execute), prepared statements, COPY, notifications, and you realize the simple path was maybe 20% of what Postgres actually does. Once you get through that though, monitoring becomes almost a side effect. You're already parsing every query, so you can filter them, enforce policies, do tenant-level isolation, rotate credentials — things that are fundamentally impossible with any passive approach.

PunchyHamster |root |parent |next [-]

You can decode TLS traffic with a little bit of effort, tho you have to control the endpoints which makes it a bit moot as if you control them you can just... enable query logging

Sentinel-gate |root |parent [-]

True, but logging tells you what happened, a proxy lets you decide what's allowed to happen before it hits the database. Policy enforcement, tenant isolation, that kind of thing. They're complementary really.

singpolyma3 |root |parent |next |previous [-]

TLS for your database? Are you connecting outside of the local machine or VPN?

Sentinel-gate |root |parent [-]

Yeah, more and more. Zero-trust is pushing TLS everywhere, even inside VPNs — lateral movement is a real thing. And several compliance frameworks now expect encryption in transit regardless of network topology. With connection pooling the overhead is basically zero anyway.

freedomben |root |parent [-]

Indeed, if you're running the db in production and aren't using TLS, you're doing it wrong nowadays. Nearly every compliance framework will require it, and it's a very good idea anyway even if you don't care about compliance.

debarshri |root |parent |previous [-]

Also, just to add to this, to run compile once and run anywhere, you need to have a BTF-enabled kernel.

Sentinel-gate |root |parent [-]

Exactly, and that's one more reason I went with a userspace proxy — no kernel deps, runs anywhere, way easier to debug.

nimrody |root |parent |previous [-]

Won't work for SSL encrypted connections (but, yes, this does add some latency)

ranger_danger |root |parent [-]

tudorg |root |parent [-]

Even then, though, it needs to run on the server so it's hard to guarantee to not impact performance and availability. There are many Postgres/Mysql proxies used for connection pooling and such, so at least we understand their impact pretty well (and it tends to be minimal).

debarshri |next |previous [-]

We do something similar in adaptive [1].

What you can also do is add frontend and backend user to the proxy and then agents won't ever get the actual db user and password. You can make it throwaway too as well as just in time if you want.

Traditionally it was database activity monitoring which kind of fell out of fashion, but i think it is going to be back with advent of agents.

[1] https://adaptive.live

stephenr |next |previous [-]

Can you explain how this is a better option than just enabling the general log for MySQL as needed?

kopirgan |root |parent |next [-]

Lol yeah.. Anyway in any serious application queries fly past like crazy, create temporary tables, pull the columns from several, and do stuff that's hard to interpret on the fly. Especially a software you just use, didn't write.

Turning on logging is likely more useful. Have done that to understand inner workings of some financial apps.

ahoka |root |parent |next |previous [-]

Or log_statement = 'all' in Postgres.

luckylion |root |parent |next |previous [-]

You don't need to access (or even have access to) the DB server itself (e.g. to read the query-log), you can do everything by just setting a different host to connect to.

anonymous344 |root |parent |previous [-]

yes, this was my first question.

why would i inspect this data, because maybe trying to find a cause to a problem.. are there any other reasons

hankchinaski |next |previous [-]

what are the latency implications of going through this proxy? looks good

Spixel_ |next |previous [-]

Maybe consider renaming this since pgTAP [0] exists and has nothing to do with this.

[0]: https://pgtap.org/

altmanaltman |next |previous [-]

Looks really cool, will try it out soon

nwellinghoff |next |previous [-]

Nice. I like how you made it an easy to drop in proxy. Will definitely use this when debugging issues!

odiroot |next |previous [-]

Not to be confused with sqltap which is also great for debugging queries: https://github.com/inconshreveable/sqltap

CodeWriter23 |next |previous [-]

Really been wanting something like this. Thanks!

ranger_danger |next |previous [-]

I prefer to use eBPF; no additional software, proxy or configuration needed.

https://eunomia.dev/tutorials/40-mysql/

stephenr |root |parent [-]

I prefer to just turn on query logging in the db server. I don't understand why you would use anything else.

ranger_danger |root |parent [-]

eBPF can work with multiple different database servers, and you don't have to flood your logs with every possible query if you don't want to, you could filter it however you want and it doesn't save to disk unless you tell it to. Plus you don't have to change your database configuration settings (like query logging) off and on... it's just easier to use IMO once you have a working script.

stephenr |root |parent [-]

It's a dedicated log file and it's a runtime change to enable or disable the logging.

Horses for courses. I'd prefer to just have a regular file I can tail or grep or whatever, out of the box.

sneak |next |previous [-]

Was AI used to build this? It looks a lot like the kind of scratch-an-itch projects I have been grinding out with AI lately, in size, timeline, code, and function. If not, you are a very very productive programmer.

If so, would you mind sharing which model(s) you used and what tooling?

jauntywundrkind |next |previous [-]

That's some sick observability, nice.

kittbuilds |next |previous [-]

[dead]

mergisi |next |previous [-]

[dead]

devcraft_ai |next |previous [-]

[dead]

MUSTANG303 |previous [-]

[dead]