Build a Large Language Model (From Scratch)
S**
So concise
This review may be pre mature because I’ve only made it through the first two chapters but so far absolutely amazing. The language is perfect. So many concepts that I’ve struggled with for a while are laid out so clearly. I look forward to doing all the exercises and finishing this book But I would just like to thank the author personally because this is a game changer for my understanding of General ML and AI concepts I struggled with in the past.
A**R
Excellent book, with great code, a must read!
This is a very good book. I recommend to do the code exercises along reading. The author provides all the code, and it's easy to follow in notebooks to really see what is happening. You can modify the code easily and learn a lot. Imho this is very good investment for anyone who wants to learn how LLM work
D**S
Excellent!
I'm still reading the book, and completed coding everything in Chapter 2. So far the approach of breaking down the concepts into fundamental parts and then showing how those parts are built into more complex implementations - that can then be better understood because of the author's presentation is perfect for how I learn.For the benefit of others with NVIDIA GPU configuring CUDA:1 Find the CUDA support level of your GPU - on Windows NVIDIA Control Panel -> System Information(at the bottom) -> Components tab - installed driver software SUPPORT LEVEL is listed - Not the actual software!2 Install MS Visual Studio (2022) Needed by NVIDIA CUDA software3 Install the version of the NVIDIA CUDA software supported by info from step 1 AND PyTorch (for example my HW supported CUDA up to 12.7, but PyTorch software support tops out at 12.4 (as of today 1/13/2025), so I went with 12.4 NVIDIA CUDA software.4. During the driver custom install (not the default simplified install) deselect NVIDIA GeForce Experience - caused errors for me5. Reboot after NVIDIA CUDA software installation6. On the NVIDIA CUDA installation page there are deviceQuery and bandwidthTest exe's that will validate the CUDA HW/SW interface is functioning7. Run the PyTorch installer - I use Anaconda environment- so ran the conda install command coped from the PyTorch installation web page (shown in the book), from a command line inside my conda target environment - restart anaconda, I use vs code restarted both when the install was completed8. On the NVIDIA CUDA installation page it states to install - conda install cuda -c nvidia to the conda target environment - when the book says run - torch.cuda.is_available() it should return TrueI don't consider this a defect of the book - there is already enough hand-holding by the author - imho some work still needs to be done by the reader!!So far getting a great appreciation/comprehension about what is behind Large Lanquage Models - Thank You!!
I**E
Very Informative-- definite extra buy!
As an Undergraduate in Intelligent Systems Engineering, this book is amazing. definitely had some good points not covered in classes!
W**N
I wish it was coloured printing
I appreciated the book for its thoroughness and attention to detail. However, I believe it would benefit from being printed in color, as many images on the O'Reilly website are more vibrant and clearer when viewed in color. Additionally, enhancing the resolution of some images would improve the overall experience. For these reasons, I would rate the book 4 out of 5. With these adjustments, I think it could easily earn a perfect score of 5 out of 5.
H**N
One of the best technical books I've ever purchased
I've bought tons of ML, DE, programming, cloud architecture books, etc...This book is absolutely fantastic! Especially combined by the current YouTube series published by the author (March 2025).Sebastian's Packt books are also excellent but I must say this book stands on its own. This book is extremely well written and clear, builds each component in the Transformer Architecture piece by piece, it makes me feel like I can actually build an LLM on my own.At a minimum this book will help you understand the Transformer Architecture (Attention Mechanism, Feed Forward, Layer Norm, etc...) rather than importing models from HugginFace and not really know what's going on in the background.If you are like me and are not satisfied with just building RAGs/LLM applications without understanding the model architecture, this book is for you!I'll keep buying from this author as long as the quality of his content is as good as this.
M**R
Excellent
This book shows step by step all ingredients which are put together in order to build a GPT-2 model from scratch. All functions are explained explicitely in python, before the equivalent functions of pytorch are used. I really liked to follow the book to the end.There is also a discussion forum about the book on github, where readers can ask questions, which are promptly answered by the author.That said, there remain many questions about WHY the method works, and why some steps are made. E.g. why use multihead attention: to my understanding this completely scrambles the embedding vectors, and it is like a miracle that the method works so well. But there were page limits for the book, and and going deeper into this kind of questions would pprobably have doubled the size of the book.
S**G
Excellent book that teaches LLMs by building one
The best way to learn something is to build it for yourself, and that is exactly what this book does for LLMs. You can get explanations of how LLMs work from a lot of sites on the Internet. What this book does uniquely (as far as I know) is combine that information with a guide for you to implement it for yourself. If you finish the book and work through the code examples and exercises, you will have a solid and up-to-date understanding of how LLMs work under the hood.
Trustpilot
4 days ago
1 day ago