Hacker News
Unix "find" expressions compiled to bytecode
drob518
|next
[-]
> I was later surprised all the real world find implementations I examined use tree-walk interpreters instead.
I’m not sure why this would be surprising. The find utility is totally dominated by disk IOPS. The interpretation performance of find conditions is totally swamped by reading stuff from disk. So, keep it simple and just use a tree-walk interpreter.
adrian_b
|root
|parent
|next
[-]
For instance, I normally compile big software projects in RAM disks (Linux tmpfs), because I typically use computers with no less than 64 GB of DRAM.
Such big software projects may have very great numbers of files and subdirectories and their building scripts may use "find".
In such a case there are no SSD or HDD I/O operations, everything is done in the main memory, so the intrinsic performance of "find" may matter.
Someone
|root
|parent
|next
|previous
[-]
Also, decreasing CPU usage many not speed up find (much), but it would leave more time for running other processes.
drob518
|root
|parent
|next
[-]
maxbond
|root
|parent
|previous
[-]
chubot
|root
|parent
|previous
[-]
And then I pointed to this article on databases: https://notes.eatonphil.com/2023-09-21-how-do-databases-exec...
Even MySQL, Duck DB, and Cockroach DB apparently use tree-walking to evaluate expressions, not bytecode!
Probably for the same reason - many parts are dominated by I/O, so the work on optimization goes elsewhere
And MySQL is a super-mature codebase
drob518
|root
|parent
[-]
Sounds like many DBs do some level of compilation for complex queries. I suspect this is because SQL has primitives that actually compute things (e.g. aggregations, sorts, etc.). But find does basically none of that. Find is completely IO-bound.
hxtk
|root
|parent
[-]
SQLite talks about the reasons for each variation here: https://sqlite.org/whybytecode.html
tasty_freeze
|next
|previous
[-]
nasretdinov
|root
|parent
|next
[-]
loeg
|root
|parent
[-]
NFS has readdirplus, but I don't think it ever made its way into Linux/POSIX. (Some filesystems could efficiently return dirents + stat information.)
burnt-resistor
|previous
[-]
For archiving, I also wrote a parallel walker and file hasher that only does one pass of data and stores results to a sqlite database. It's basically poor-man's IDS and bitrot detection.