Hacker News
Memory Safe Inline Assembly
rurban
|next
[-]
Most stupid thing I ever heard. If a safety violation is known at compile-time, you error at compile-time. You might never catch it in a test, and there you have the panic at the customer. He will be pleased.
mrgriffin
|root
|parent
|next
[-]
I would guess for the use-case of "I have a C project and I want to run it in Fil-C" the ability for this to be a warning + run-time panic is very helpful for quickly getting started. Reminds me of GHC's -fdefer-type-errors.
I agree that I wouldn't want to deploy a program where those panics are reachable*, but it's still handy for local development and/or maybe the developer knows they aren't reachable.
I haven't checked, but I'd guess there's a warning and a -Werror -style flag to opt-in to having a hard error for unsafe assembly?
* Obviously a panic is better than not. But guaranteed safeness is better than either of those.
flotzam
|root
|parent
|next
|previous
[-]
"Using runtime panics has the nice property that inline assembly in dead code doesn't get in the way of porting software to Fil-C. Also, it's consistent with how Fil-C usually reports errors."
Jweb_Guru
|root
|parent
|previous
[-]
Against this background, crashing when inline assembly is determined to be doing something the author isn't sure how to deal with is pretty much par for the course -- it's a way to continue to claim that you can port over your old buggy C applications unchanged. You aren't supposed to actually use it.
pizlonator
|root
|parent
|next
[-]
What a bitter way to analyze a technology.
> under some circumstances Fil-C considers it legal to read a value from a totally different, possibly-inaccessible pointer when dereferencing an unrelated pointer
Fil-C does not consider such a thing to be legal.
You can only access capabilities that you legitimately loaded from the heap, from other capabilities you legitimately loaded, and so on. So, if you access a pointer, it's because you had access to the capability.
> You aren't supposed to actually use it.
Wat
layer8
|root
|parent
|next
|previous
[-]
There is no “instead” here. A C implementation defining behavior for certain UB still falls under what a C implementation is allowed to do for UB.
I don’t know if Fil-C is a fully conforming C implementation, but it could well be. In what way is it nonconforming that would warrant describing it as implementing “a related language” and not C?
achierius
|root
|parent
|next
|previous
[-]
Those two claims don't contradict eachother. Many, many people use C code not because their application needs to be blazing fast, but simply because the programs are already written in C. Rewriting a program from C into another language is likely to introduce a lot of bugs, even if the rewrite does manage to achieve 'full memory safety' (which most don't, instead liberally utilizing unsafe blocks or the equivalent). So what are such users to do, simply accept that there are going to be bugs? Fil-C seems to address a reasonable need.
mananaysiempre
|next
|previous
[-]
3a. rdpru (similar issues to cpuid) or rdpmc perhaps surrounded with lfence or cpuid inside the same assembly chunk
For obvious reasons, this is somewhat niche and may not even make it into production code, but it’s also important when you do need it. It’s also memory safe. I guess in such cases you’d use fast C rather than Fil-C though.
4a. rseq
Probably even less feasible than atomics TBH, as such blocks will usually also contain control flow (at least that implied by to the nature of rseqs).
> Before the advent of AI, writing a parser for x86_64 assembly would have been such an annoying task that I might have never gotten around to implementing support for memory safe inline assembly [...].
It is annoying, but even before the advent of AI that didn’t stop the developers of TCC for instance.
With that said, given Fil-C is Clang/LLVM-based, shouldn’t an assembly parser, at least, be already available somewhere? I was under the impression that Clang (unlike GCC) actually parsed asm blocks.
lifthrasiir
|root
|parent
[-]
mananaysiempre
|root
|parent
[-]
lifthrasiir
|root
|parent
[-]
mananaysiempre
|root
|parent
[-]
We recently (finally) got __attribute__((musttail)) in GCC[1], I’ve just tried it between functions with mismatched __attribute__((target))s and it does work, so theoretically you could code your interpreter that way. But it seems like you’re still bound to keep loading and storing vector state from and to memory and VZEROUPPERing your registers after each bytecode, and that doesn’t sound like a particularly good time.
anitil
|next
|previous
[-]
Edit to add: If I'm understanding this correctly we should be able to run this against projects and detect asm violations, I feel like this would be very valuable to be able to feed these back to maintainers
jdw64
|next
|previous
[-]
1.A developer identified the necessity of inline assembly.
2.Defined the safety boundaries for 'memory-safe' inline assembly.
3.Established strict policies for memory access.
4.Curated an allowlist of permissible instructions.
5.Set rigorous test criteria and 'done' conditions.
In short, with the overall guardrails in place, a sub agent loop was run, and this level of code was produced. This raises a number of interesting points about how we should use AI. I haven't looked at all the code, but the idea of passing assembly through safe zones without memory access, and using that as a foundation to achieve this level of implementation through AI, is quite impressive
throwaway27448
|root
|parent
|next
[-]
Anyway, this is also very useful for humans to use, so it's mostly a lovely coincidence this level of safety arrived with useful chatbots.
anitil
|next
|previous
[-]
pizlonator
|root
|parent
[-]
So currently most of those still have the hacks to go down the no-inlineasm path when building with Fil-C
For the few where I reinstated the inline assembly, there were no bugs found.
It would be a good experiment to try to reinstate the inlineasm paths in all of the programs that had them. I suspect there’s a low chance of finding a bug if it’s in inline assembly that’s on the critical path.
IAmLiterallyAB
|next
|previous
[-]
> This includes things like asm volatile("" : : : "memory"), which is an old-school way of saying atomic_signal_fence(memory_order_seq_cst).
Not quite. AIUI, the first is just a barrier for the compiler, while the second is also a CPU memory barrier. Godbolt seems to confirm that.
pizlonator
|root
|parent
|next
[-]
The quote uses atomic_signal_fence.
If you find a way to bypass my checks, file a bug. I tried very hard to break it. My agent loops tried even harder
jancsika
|root
|parent
|next
[-]
What happens if you ask to find the strings that will erroneously return True from validateSafeInlineAsm for disallowed asm? :)
pizlonator
|root
|parent
[-]
Example of a bug found most recently was that sahf was allowed without a cc constraint.
Anyway, if you find bugs, file them. Would be fun to see if there’s a case me and my agents missed
torginus
|next
|previous
[-]
I mean, I'm not sure if LLVM parses the assembly (I strongly suspect it does, I remember inline GCC assembly allowed stuff like referencing variables in asm), shouldn't LLVM figure out that the asm modifies things its not supposed to?
If you clobber a register in asm the compiler stores something into, your code certainly won't work right.
petesergeant
|next
|previous
[-]
Let's say I compile curl using Fil-C, and later an exploitable memory bug is found in curl. The implication here is that my fil-c-compiled curl will crash safely, rather than be able to be exploited? And the "cost" to me is that my curl executable will be slower than the standard one?
dataflow
|next
|previous
[-]
pizlonator
|root
|parent
[-]
There was some debugging thing where it embeds debug info using module level assembly that you have to disable.
dataflow
|root
|parent
[-]
pizlonator
|root
|parent
[-]
It’s module assembly
They’re different
dataflow
|root
|parent
[-]
sureglymop
|next
|previous
[-]
I mean one that infers as much context as possible and tries to help as much as possible.
This has to be assembler specific of course. For example, I use fasm which has higher level macros. An LSP could suggest struct fields and other stuff.
ozgrakkurt
|previous
[-]
Inline asm should take 10x or more effort compared to writing the surrounding c++ code and should be tested with protected pages at the edges if possible. It should always have assertions before/after that check invariants too.
Also there are at a lot of cases that this won’t work. One example is implementing strlen using avx512 where you want to align the address down to a multiple of 64 and run until the end of the page, so you can do simd while avoiding segfault.
Another example is just handling loop remainders with masking in avx512.
Also it is pretty naive to think an LLM got this right
Overall it seems like a huge waste of time.
If you are writing inline asm and want to make it better, just get as many LLMs or, even better, humans to review it. LLMs are really good at finding mistakes in inline asm, with a high false positive rate though, so you have to understand the concept.
For example one bug I had was about not consuming the inputs before writing to the outputs. Compiler can assign the same register to input and outputs unless outputs are marked with & (or something like that). It was super frustrating to debug this until I asked an LLM and it found the problem.