Hacker News
Making a vintage LLM from scratch
mg794613
|next
[-]
I appreciate the honesty, but now there's no journey, and that's what I'm interested in. I can ask a LLM myself.
abetusk
|root
|parent
|next
[-]
There's a lot of pre-processing, experimentation and validation that went into this project. The training data collection and sanitization alone is a big undertaking.
As for the blog post itself, from the article:
> Note: This blog post is 100% written by me. No AI has been used whatsoever.
Put another way: You can ask the LLM yourself to do this project? Please do, share your prompt, I'd like to see it.
JayNitram
|root
|parent
|next
|previous
[-]
tancop
|next
|previous
[-]
im pretty sure its a real text in Welsh. there might be typos from ocr but yeah thats what the language really looks like, i dont speak it but its easy to recognize.
croqaz
|root
|parent
|previous
[-]
throw310822
|root
|parent
[-]
"It will be easy for the knowledgeable to fix the few errors that remain [in the text]". (Bydd yn rwydd iawn i'r cyfarwydd ddiwygio'r ychydig.")
Which is exactly what the OP is doing.
dennysora-main
|next
|previous
[-]
I've spent a ton of time reading up on math, ML, and DL through books, open courses, and papers, while also studying all the major open-source LLM architectures.
Since I only have one DGX Spark machine to run experiments, I can't train a massive LLM from the get-go. Instead, I'm experimenting with an auto-scaling parameter mechanism, which has led me to create a pretty unconventional and fun architecture!
Why go through all this effort when modern LLMs can basically write simple LLMs themselves, and I clearly can't out-compute the big tech giants?
Honestly, it's because I'm obsessed with the core mechanics of LLMs. I want to build something exclusively for myself and hopefully discover some completely undiscovered mechanisms along the way.
Just keeping a record and sharing my progress—having fun with it is truly the biggest reward!
I'll share it when I get a chance!
croqaz
|next
|previous
[-]
giancarlostoro
|root
|parent
[-]
croqaz
|root
|parent
|next
[-]
skerit
|root
|parent
|next
[-]
And anyway, I think the most important thing is dataset quality. Dumping in whatever dataset you find on Huggingface is a recipe for mediocrity, so I'm also spending a lot of time on that.
LoganDark
|root
|parent
|previous
[-]
cyberge99
|next
|previous
[-]
Thanks for the writeup. A more granular followup would be cool too.
croqaz
|root
|parent
|next
[-]
Do you mind expanding this question? More granular in what way? what would you like to know that is missing from the post?