Hacker News
Show HN: Create-LLM – Train your own LLM in 60 seconds
potamic
|next
[-]
theaniketgiri
|root
|parent
[-]
Didn’t go for massive models — more about making the whole setup process quick and reliable. You can actually train the nano one on CPU in a few minutes just to see it working.
theaniketgiri
|next
|previous
[-]
3abiton
|next
|previous
[-]
theaniketgiri
|root
|parent
|next
[-]
Key differences:
nanoGPT: - Minimal reference implementation (~300 lines) - Educational code for understanding transformers - Requires manual setup and configuration - Great for learning the internals
create-llm: - Production-ready scaffolding tool (like create-next-app) - One command: npx create-llm → complete project ready - Multiple templates (nano/tiny/small/base) - Built-in validation (warns about overfitting, vocab mismatches) - Includes tokenizer training, evaluation, deployment tools - Auto-detects issues before you waste GPU time
Think of it as: nanoGPT is the reference, create-llm is the framework.
nanoGPT teaches you HOW it works. create-llm lets you BUILD with what you learned.
You can actually use nanoGPT's architecture in create-llm templates - they're complementary tools!
Grimblewald
|root
|parent
|previous
[-]
3abiton
|root
|parent
|next
[-]
Karpathy clearly said that it wasn't vibe coded. Apparently it was more time consuming to fix gpt bugs than to do it by himself.
theaniketgiri
|root
|parent
|previous
[-]
Grimblewald
|root
|parent
[-]
As a side note, without looking it up, on your device, what is the process for typing an emdash?
endofreach
|root
|parent
|next
[-]
theaniketgiri
|root
|parent
|previous
[-]
Grimblewald
|root
|parent
[-]
freakynit
|root
|parent
[-]
LLM's are just the next evolution that assist you with coding tasks... similar to w3schools => blogs => stackoverflow, and now => llm's.
There's absolutely nothing wrong with using them. The problem is people who use them without reviewing their outputs.
darepublic
|next
|previous
[-]
> I wanted to understand how these things work by building one myself.
Directly to this:
What if training an LLM was as easy as npx create-next-app?
I mean that the second thought seems to be the opposite of the first (what if the entirety of training llm was abstracted behind a simple command)
theaniketgiri
|root
|parent
[-]
When I started, I wanted to understand LLMs deeply. But I hit a wall: tutorials were either "hello world" toys or "here's 500 lines of setup before you start."
What I needed was: "give me working code quickly, THEN let me modify and learn."
That's what create-llm does. It scaffolds the boilerplate (like create-next-app), so you can spend time learning the interesting parts: - Why does vocab size matter? (adjust config, see results) - What causes overfitting? (train on small data, see it happen) - How do different architectures perform? (swap templates, compare)
It's "easy to start, deep to master." The abstraction gets you running in 60 seconds, then you dig into the code
seg_lol
|next
|previous
[-]
efilife
|next
|previous
[-]
theaniketgiri
|root
|parent
[-]
joshribakoff
|root
|parent
[-]
Or the meaningful commit message of “.”
And the commit editing 1,000s of lines of python code mislabeled as a docs change?
theaniketgiri
|root
|parent
[-]
Docs / Markdown: AI handled repetitive stuff like READMEs and summaries.
Core logic / Python: fully written by me.
Commit messages: some minimal ones just for quick iterations — the real work is in the code.
AI helped with boilerplate so I could ship faster; all functionality is hand-crafted.
joshribakoff
|root
|parent
|next
[-]
The “meaningful commit messages” — again are a single period as the message for a single commit for the entire python portion of the codebase.
My question was rhetorical. Whether the AI did it or a human did, it burns credibility to refer to things that don’t exist (like “meaningful commit messages”)
teruakohatu
|root
|parent
[-]
Well done to the author for shipping code. I look forward to trying it out.
theaniketgiri
|root
|parent
|next
[-]
And yeah, the commit history is messy - I was learning and shipping fast. Not perfect, but the tool works and people are using it.
Let me know if you have any questions when you try it!
Grimblewald
|root
|parent
|previous
[-]
If it was their work your point would hold.
theaniketgiri
|root
|parent
[-]
What AI did: - Generated README templates (boilerplate markdown) - Suggested commit messages (I didn't always edit them) - Helped with documentation structure
What I wrote: - All Python training logic (train.py, trainer.py, callbacks) - All model architectures (gpt.py, tiny.py, small.py, etc.) - Tokenizer integration - Data pipeline - CLI scaffolding (
biinjo
|root
|parent
|next
[-]
It’s here, it’s happening. Try the project, if you like it thats great, if you don’t then move on.
And if you don’t intent to try it for whatever reason that’s fine as well but don’t be salty to the OP for sharing their passion project.
Grimblewald
|root
|parent
|previous
[-]
Your medium article, your *.md and most of your code ALL looks LLM generated, which isn't as much a problem in my books, but lying about it is a huge problem.
theaniketgiri
|root
|parent
[-]
Grimblewald
|root
|parent
[-]
The only time it would be faster to iterate with your scripts hard-coded into your TS files would be if an LLM is doing your iterating for you.
Why would anyone invest time and effort in a project where the author lies through their teeth about provenance? Why use your project that contributes, as it appears, nothing a LLM can't just give me? Why use this when I could just use an LLM to get the same directly in python without dicking around with npm?