Hacker News

Show HN: ctx – Search the coding agent history already on your machine

65 points by luca-ctx ago | 42 comments

Coding agents don't have long-term memory.

But you do have months of full-fidelity agent transcripts stored on your machine.

A simple solution that goes a long way: ingest those transcripts and logs into a structured SQLite database, then search them with ranked text match. Everything is fully local and doesn't require anything fancy like a graph database or hosted memory service.

This is the idea behind ctx, a Rust CLI that handles the ingestion and searching.

We give our agents a skill that tells them to reference past sessions before working in an area. Usually we do this through an "Agent History Research Subagent" whose job is just to prepare a short brief covering any relevant history before the task begins.

A real example: sometimes our test suite runs would fail because disk was full on the runner. The correct approach was to run the cleanup runbook, but the root cause of the failure was not clear to the agents, so they would think it was a test regression and go down the wrong rabbit hole debugging. When the agent searched history, it realized this failure had been encountered before and found the right workaround immediately. That got the agent onto the right cleanup path, and later we improved the log output so the same failure would be clearer next time. It's a boring story, but it's real agent productivity.

Another nice use case is quickly generating session transcripts for sharing. You can exclude the noisy intermediate messages, so the transcript shows the important parts of the session more cleanly. Try attaching a session transcript to your next PR so your teammate and their agent can review the provenance and prompting behind the change.

If you're up for an additional challenge, ask your agent to "exhaustively review all agent history in this repo and find where the SDLC is struggling or isn't agent-native". Using past sessions to recursively improve the agentic SDLC is a loop that we're using a lot today.

If you try it out, please let us know what you think!

wrs |next [-]

I often tell Claude Code to look at previous sessions in ~/.claude and it’s happy to jq/grep its way through them with no special tool. But being more efficient is always good.

luca-ctx |root |parent |next [-]

Yes this is how we started as well!

At first I thought the main improvement would be that the search would be faster, but rg is already pretty freakin fast when the fs cache is warm.

What really ended up being the big efficiency improvement is the token efficiency. When you structure all of the transcripts in a SQL table, the agent can retrieve exactly what is needed (such as "print me the lite transcript, without the intermediate messages").

CuriouslyC |root |parent |previous [-]

Claude has been heavily RL'd on using jq/grep, even if the tool is more efficient, Claude using it incorrectly or reading a book of examples in order to understand how to use it correctly is going to end up underperforming.

luca-ctx |root |parent [-]

This is likely true, but ultimately this tool is just SQL, which I believe Claude and others must be heavily RL'd on. We try to not do anything "special" and make it boringly SQL representation of past sessions.

beaugunderson |next |previous [-]

This is rapidly becoming the "todo list demo" of the LLM era... I say this as someone who also built one because I was tired of all the others! (https://github.com/beaugunderson/obliscence)

Terretta |root |parent |next [-]

First, OP's tool is cool, and worth trying. Yours as well, though when I see a vibed tool with no commits I wonder if the author is still using it themselves or achieved perfection. :-)

Meanwhile, the other todo list take, one I've undertaken as well, is to cross sync all the Claude Codes across all its instances on all your machines.

There are multiple projects that claim to do this. None do it fully. (They particularly have blind spots to tools that embed a Claude Code, such as the Xcode 26.5 and Xcode 27 beta.)

So: roll one's own, and in doing that, realize that it has first class tools to make back referencing transcripts normal.

Given those tools, you don't really need an extra layer.

beaugunderson |root |parent [-]

totally, I see that syncing workflow come up quite a bit as well... some of this stuff is going to get sherlocked into the harness proper (probably a net benefit) though of course people are still going to want the meta-version that works with all of the different harnesses they use... so much of being effective with this style of development is molding it to your specific workflows and sanding off the rough edges with context, then sharing those wins with the larger team (another layer where I see a lot of differing approaches, like skillshare for example)

luca-ctx |root |parent |previous [-]

Lol fair enough! Great project btw. Interesting choice to trigger incremental refresh on SessionStart hook, that's nice.

How have you enjoyed the semantic search?

beaugunderson |root |parent [-]

semantic search has been pretty good, it usually finds what it's looking for!

a couple of times I was certain that there was a session that contained some word but in reality it was in my personal claude.ai web account, so needed to add the import functionality there.

my favorite piece is the `corrections` command which surfaces all my frustrations/corrections in the last week for example... and I can then figure out if missing context would improve those scenarios going forward

luca-ctx |root |parent [-]

Nice, yea I typically spend about 1/3 of my sessions on finding ways to improve the agents' SDLC. Lots of random audits and things.

And yea on the import thing, there are quite a few instances when session records can live on other machines, like cloud agents, dev boxes, etc.

Do you have any interest in sharing some transcripts with team members? I'm trying to figure out the shape of this solution because often times people I work with want to see what I did or fork one of my sessions, but I also don't necessarily just want unlimited dumping because I'm sure I have personal details in there too.

beaugunderson |root |parent [-]

sometimes i'll share prompts but but never a whole transcript (have not had a reason to)

if i do want to share context i'll use something like "give me a prompt $coworker can share with their claude to continue this work"

AM1010101 |next |previous [-]

I would love a service that I could upload these chats to (anonymously) so that those developing open models can have it as training data and not just the closed model companies. My understanding is that it’s very valuable, look what cursor have managed to train. Obviously some filtering so that only chats or projects you want to share get shared would need to be in order.

luca-ctx |root |parent |next [-]

We have a private beta for a secure cloud version of the service, although its more geared towards teams/enterprise who want to share their work internally, rather than donating to open model developers. But interesting idea! I'm not very knowledgeable about crypto things, but I believe this is what people have considered "microtransactions" to be useful for.

matheusmoreira |root |parent |next |previous [-]

I love this idea. I would totally upload my sessions if doing so helped train the next generation of open weight models. I'm using Claude to work on my free software projects so there aren't many secrets there anyway.

adamsmark |root |parent |previous [-]

The Chinese would love this.

Terretta |next |previous [-]

> Coding agents usually start from zero. They can inspect the current repo, but they often cannot recover the discussions, decisions, failed attempts, commands, and test results from earlier work.

Sure they can. Just ask them. Some (like Claude Code) even have built in tools for it that work a treat. It'll happily rebuild an entire edit history diff by diff.

luca-ctx |root |parent [-]

This is true, maybe we could reword it to be less absolute.

The bigger point is that when they do go spelunking in the old session logs, it is extremely token inefficient, and you can often fill up an entire context window and force a compaction just by trying to put together a transcript or summary.

The goal here is less of doing something previously impossible, but doing it in a way that makes it so efficient and cheap that you can have agents do it very often, like before they start on every single task.

alex_hirner |next |previous [-]

I like the CLI interface, very comprehensive.

However, I'm puzzled by pi support: https://github.com/ctxrs/ctx/issues/40

luca-ctx |root |parent [-]

[dead]

luca-ctx |next |previous [-]

Building this made it obvious that there should be a standard format / specification for agent transcripts and logs (similar to ACP for runtime events). If you're interested in discussing this, please reach out!

scritty-dev |root |parent [-]

[flagged]

dang |root |parent [-]

Can you please not post AI-generated or AI-edited comments to HN? It's not allowed here - see https://news.ycombinator.com/newsguidelines.html#generated and https://news.ycombinator.com/item?id=47340079.

Of course, it's impossible to know for sure what was LLM processed or not, but some of your posts (like this one) have been getting classified that way.

scritty-dev |root |parent [-]

my post are not AI generated...I apologize if my tone/vernacular comes off as generated, but that's just my own voice. down voting my comment based on unfounded assumptions is upsetting and discouraging to a new member such as myself.

meowface |root |parent [-]

You did write the above comment with an LLM, though. Why are you now denying it?

dang |root |parent |next [-]

We can't know for sure, so it's best not to attack like this. All we can really say is that the posts were getting classified that way.

There's also a large fuzzy area these days where people are using tools to edit, "polish", etc., but do not think of it as using an LLM to write. This is particularly the case with non-native English speakers.

A few recent cases where this sort of thing came up:

https://news.ycombinator.com/item?id=48467726

https://news.ycombinator.com/item?id=48416592

https://news.ycombinator.com/item?id=48405497

scritty-dev |root |parent [-]

I am rather upset, the above user gave me negative karma for what? Defending myself against a false accusation?

This is ridiculous if I use proper grammar and punctuation I get flagged as AI.

`if i talk like this and dont use proper syntax and convention then i come off as an unintelligent and fake version of myself i am watering down to appease whatever algorithm flagged me as ai`

https://preview.redd.it/ai-detector-flagged-a-passage-from-m...

...now I am going to speak like myself again -- AI sounding and all (oh, double dash is not em dash I've been using them for 2 decades)

This is just to highlight how ridiculous this all is and honestly off-putting to a new member, forget non-native English speakers, all AI algorithms do is flag polished post. I don't want to water myself down and act dumb to avoid an algorithm and tip-toe around every post I make.

scritty-dev |root |parent |previous [-]

You're part of the problem with this unsubstantiated and unwarranted proclamation. I did not, and quite frankly your subjective opinion of whether I did or not does not change immutable reality; nor do I particularly care.

meowface |root |parent [-]

But just to be clear: you are absolutely, completely denying an LLM played any role in writing any of your comments on HN? You are certain? You're claiming you did not use an LLM in any way in the production of them? Not even to "edit"?

And for the record you were downvoted by other people long before I saw your reply.

scritty-dev |root |parent [-]

I don't use AI; only edits to my posts are made by auto-spell checker. I have exactly 1 downvote. Just yours....

Again with these demeaning comments "you are certain" -- who exactly are you -- the arbiter of what constitutes human vs AI generated content? Yes, I am certain.

EDIT: After testing my own content on GPTZero, I am curious, is that specific platform utilized to determine if my text is AI generated?

dang |root |parent |next [-]

> I don't use AI; only edits to my posts are made by auto-spell checker

I think you may find that this "auto spell checker" is making many more changes to your text than just spelling corrections. We've encountered this sort of thing in many cases already. This is the sort of thing I was describing in my comment upthread: https://news.ycombinator.com/item?id=48779752.

meowface |root |parent |previous [-]

I promise your post was already downvoted when I saw it. It is possible some people upvoted it afterwards, changing the net karma.

It is possible you were not intentionally choosing to use an LLM to write/modify your posts, but they largely read like LLM output. The tool you're using may use an LLM and may be rewriting significant portions of your text.

malandin |next |previous [-]

Very interesting project! I guess it could be even better if you didn't have to ingest the session data into a database but just build an index on top. I have an idea how to do it

luca-ctx |root |parent [-]

Thanks and this is a very interesting idea!

We considered this, but the main thing you gain from this tradeoff is some disk space and cleaner retention semantics from not having to duplicate all of the searchable text.

But you still have to do the parsing and ingestion work to build the index in the first place, so CPU time does not go away.

And you still have to store the indexes and enough metadata to map results back to the raw session files, which bounds the benefit of not duplicating the data.

The main downside is flexibility (you would lose the ability to do arbitrary SQL queries, semantic search on top of structured corpus, etc)

But I would love to see if I can be proven wrong on this!

mkornaukhov |root |parent [-]

We've been working on remote search indexing in our project and it works pretty good. since we are building a Postgres-compatible database, everything is pure SQL. I'd say we could join forces if you're up for it.

luca-ctx |root |parent [-]

Very cool! Do you have any of it OSS? Or drop me an email: luca@ctx.rs

sinisha_djukic |next |previous [-]

Really nice idea - there is certainly gold in the history of agent actions. But how do you keep the ctx from trusting stale history?

luca-ctx |root |parent |next [-]

It's all in the skill / instruction that you give to the agent. The agent should treat the history as anthropology - a record of what happened, not necessarily the ground truth.

Creating ground truth is an orthogonal problem - I try to work hard to put it into specs and docs and regularly update those.

Searching history is closer to "super git blame" or like looking through logs. We should expect a lot of stuff went wrong in there.

uikdjsah |root |parent |previous [-]

[flagged]

meowface |root |parent [-]

Seek help.

meowface |next |previous [-]

The number of LLMs (and possibly very odd humans) replying to this thread is unusual.

zaptheimpaler |next |previous [-]

I've been running https://github.com/kenn-io/agentsview for this, works well.

JeelVankhede |next |previous [-]

[flagged]

linggen |next |previous [-]

[flagged]

indigodaddy |previous [-]

Or you could just use a great agent/harness like Shelley that already uses sqlite to store converians is searchable from the UI as well.

https://github.com/boldsoftware/shelley

luca-ctx |root |parent |next [-]

We’re working on adding native support for Shelley!

The idea is that even with native recall from Shelley, ctx results are more accurate, ergonomic, and token efficient

For example search can retrieve a specific message and then window for trailing and leading N messages, in just a few hundred tokens

gb2d_hn |root |parent |previous [-]

I worked on this problem with https://www.agentkanban.io - this is a human in the loop integration for VS code that stores the context in the kanban task so it lives with the task (can also be forked, split etc).