Hacker News
Show HN: CLI tool for detecting non-exact code duplication with embedding models
rkochanowski
|next
[-]
It finds similar-looking code with embeddings. This detects more than just copy-paste clones or even clones with minor changes. Similar code is often not a clone to refactor, and this is a trade-off. Initial results need to be verified, but coding agents can do this quickly. Example prompts are available on https://slopo.dev
Additionally, similar code distant in the codebase is ranked higher to focus on less obvious duplication.
The results differ a lot depending on the codebase. I noticed that sometimes most of the detected duplicates are false positives, but the remaining ones are strong candidates to refactor or even bugs. Sometimes it reveals much more real duplication.
forhadahmed
|next
|previous
[-]
philajan
|next
|previous
[-]
rkochanowski
|root
|parent
[-]
If you are interested in data, you can check my article. Analysis was done with this tool, but a previous version where exact-copy duplicates were excluded from analysis. https://rkochanowski.com/article/analysis-code-duplication/
murats
|next
|previous
[-]
hdz
|next
|previous
[-]
SpyCoder77
|previous
[-]
rufius
|root
|parent
|next
[-]
If so - maintainability, testability. This is old software engineering best practice at this point.
You shouldn’t hyper optimize for deduplication, but it’s usually worth considering. Fewer places to fix issues or improve as well.