Hacker News
Intro to TLA+ for the LLM Era: Prompt Your Way to Victory
gobdovan
|next
[-]
If you want to 'prompt your way' with TLA+, this is a decent one for ChatGPT. But don't expect to do wonders and find a better abstraction than you and if you run it 'agentically' with TLC in the loop, it will bloat the spec in no time:
Write the smallest TLA+ model that captures only the safety-relevant state of [the thing you're modeling]. Do not model implementation data structures unless they affect the property. Use clear names, not abbreviations. Start with: constants, variables, TypeOK, Init, and 2–4 invariants. Use copious amounts of comments at the module level (use multi-line comments) and for each key variable, action, invariant, and temporal property (use one or more single-line comments). Prefer fewer variables and coarser atomic steps unless finer interleavings are essential. Keep the spec under 100 LoC (excluding comments) unless there is a specific reason not to. After writing it, list what was intentionally abstracted away.
jmorse3
|next
|previous
[-]
"Running leading LLMs across the eleven systems shows that LLMs are great at producing correct TLA+ syntax but struggle to ensure conformance and appropriate invariants."
zihotki
|root
|parent
|next
[-]
> It’s still your responsibility to understand your system and define what “correctness” means, and you need a high-level understanding of temporal logic.
nyrikki
|root
|parent
[-]
Using a TLC model checker to verify invariants or properties, finding a valid counterexample can scale non-elementarily.
In fact, in some situations it can end up being Ackermann-complete[0], but even recursively enumerable problems can be Tower-complete.
I would say that unless you focus on more detailed temporal logic pitfalls you may have issues.
So IMHO this will stay a use case specific solution, that you choose based on context.
Even a common solution like adding Circumscription causes counterintuitive changes [1][2].
IMHO, if you want to use TLA+ as a primary method, you will need some depth or be ready to abandon it by time boxing etc…
Remember that we know the open domain frame problem in [2] is equal to HALT, it will not universally apply.
It is just another tool that works well when it works well.
[0] https://arxiv.org/abs/2104.13866
baq
|root
|parent
|previous
[-]
Ran it on my tricky caching changes and it found bugs the standard agentic review flows missed on all frontier models. Good stuff.
AYBABTME
|next
|previous
[-]
I wrote about this a bit on my blog[1], different angle but along the same line. You explain TLA+ and model checking well which makes the case concrete.
I'm curious of you have thoughts on these other methods and tools like P, Lean, Dafny, etc?
nextaccountic
|root
|parent
[-]
They have a development pace very quick and can verify real world Rust code. There's a huge graveyard of Rust verification tools, but I hope this one gets broader usage
arpinum
|next
|previous
[-]
leoqa
|next
|previous
[-]
some_random
|next
|previous
[-]
raidicy
|root
|parent
[-]
I understand that this is not the tool for everything but scxml, state chart XML, gives you something like this but something that you can actually read without investing time learning syntax.
It also has some plugins on vs code that allow you to edit it visually in case you don't want to write a bazillion XML tags.
aerodexis
|next
|previous
[-]
You can also work backwards from the races it generates and ensure that they're real races.
sigbottle
|next
|previous
[-]
vbernat
|next
|previous
[-]
BB == b > 1 /\ b' = b - 2 /\ w' = w + 1 \* Picked 2 black
It should b > 2, otherwise you'll get in an invalid state.
Jtsummers
|root
|parent
[-]
EDIT: Actually, the original spec doesn't even say what is a valid state after initialization. It provides only a requirement on the initial state (that w + b > 0 and that w and b are both in Nat, or >= 0, when initialized). The correctly formulated rules ensure that w and b remain in Nat, but there's no other piece of the specification that seems to require it.
Once the author starts adding invariants (like NotEmpty) that adds a requirement that w + b > 0 remains true for ever, but since w is incremented and b is decremented, even BB results in NotEmpty remaining satisfied. Each of the steps does, since they all add one bean back.
UltraSane
|next
|previous
[-]
relativeadv
|root
|parent
[-]
nextos
|root
|parent
[-]
It could be the case that Tony Hoare was right, just too early.