Hacker News
CS336: Language Modeling from Scratch
164 points by kristianpaul
ago
|
17 comments
meken
|next
[-]
I have fond memories of cs224d [1] taught by richardsocher. It’s a bit dated at this point as it was created in the pre-transformer era, but it was a very cool introduction to applying deep learning to nlp at the time.
skerit
|next
|previous
[-]
> GPU compute for self-study
Those suggestions they make for a B200 start at $4.99 an hour.
Is that really required, for starting out? I've been tinkering with my own from-scratch LLM, but in the early phases I don't need anything more than a 4090 on Vast.ai
marcelroed
|root
|parent
|next
[-]
TA here. Definitely not! In fact we explicitly added sections in the first assignment to allow for scaling down to even local compute (M-series GPUs). For assignment 2 there are a few regions that require Triton support for your GPU, but everything can be adapted for much cheaper GPUs.
We were lucky enough to get Blackwell GPUs for Stanford students this year, which is why the writeups are written mostly around them.
sonabinu
|next
|previous
[-]
I brought a group together to do this class using the YouTube videos and course materials available online. It is challenging but rewarding. We tackled it one lecture video per week. Started with over 30 learners and by last session we were down to 8.
airstrike
|next
|previous
[-]
I wonder if people prefer to learn this on their own or if building a community around open learning is something that others are interested in
storus
|next
|previous
[-]
Thanks for releasing this again! What are this year's changes to prior offerings?
dominotw
|next
|previous
[-]
i recently started reading "build reasoning model from scratch" then i realized that i am not really interested in building part and just want to understand theory and practice behind it.
A want like a casual lesswrong style from ground up explanation.