> removed unflatten_index and rewriting math ops with stride-aware iteration now
> this makes ops work on views (slice/transpose) without copying and replaces slow div/mod loops with cheap stride math
> future ops (broadcasting, slicing, batching) will work without extra steps.
> removed unflatten_index and rewriting math ops with stride-aware iteration now
> this makes ops work on views (slice/transpose) without copying and replaces slow div/mod loops with cheap stride math
> future ops (broadcasting, slicing, batching) will work without extra steps. https://t.co/FWGzebnioR
> implemented 2D matmul with correct shape logic (%)
> I also have to rewrite math ops to stride-aware so that tensors created by views (slice, transpose) can share same memory without copying. If ops only use flat indexing, those views break on non-contiguous tensors.
> implemented 2D matmul with correct shape logic (%)
> I also have to rewrite math ops to stride-aware so that tensors created by views (slice, transpose) can share same memory without copying. If ops only use flat indexing, those views break on non-contiguous tensors. https://t.co/Pavkh5Jho2
> added add, sub, mul, div for elementwise tensor math with full broadcasting support, so tensors of different but compatible shapes interact automatically.
> added shared memory for views/reshape, letting tensors share data without copying.
> framework can do basic math now.
> added add, sub, mul, div for elementwise tensor math with full broadcasting support, so tensors of different but compatible shapes interact automatically.
> added shared memory for views/reshape, letting tensors share data without copying.
> framework can do basic math now. https://t.co/oj0VuMlvMp
> added metadata (dtype, device, owns_data)
> added reshape to create views without copying memory.
> improved printing to show dtype and device.
> added flat and multidimensional indexing for direct + stride based element access.
> added metadata (dtype, device, owns_data)
> added reshape to create views without copying memory.
> improved printing to show dtype and device.
> added flat and multidimensional indexing for direct + stride based element access. https://t.co/hW8zxyUe7i
Implemented the first step and built the core skeleton of a tensor:
> checks valid data vs shape
> computes strides
> track of grad storage (0 for now)
> doesnt do any math yet
> prints tensor in readable format.
Implemented the first step and built the core skeleton of a tensor:
> checks valid data vs shape
> computes strides
> track of grad storage (0 for now)
> doesnt do any math yet
> prints tensor in readable format. https://t.co/gmwHtqs1Zt
used a different approach and it worked.
the output for the equation gives correct forward and grad results.
used to 2 equations and both gave correct output.
the code can be made way better, but im still learning and its fine, made a working micrograd in c++
used a different approach and it worked.
the output for the equation gives correct forward and grad results.
used to 2 equations and both gave correct output.
the code can be made way better, but im still learning and its fine, made a working micrograd in c++ https://t.co/xDRHmfrgZL
I tried to convert andrej karpathy's micrograd python code to c++
I would say 95% is working but there is still small issue where its giving wrong grad output, been on this for HOURS trying to fix
I tried to convert andrej karpathy's micrograd python code to c++
I would say 95% is working but there is still small issue where its giving wrong grad output, been on this for HOURS trying to fix https://t.co/8BoDRPsl7D
for hours thinking there is smtn wrong with my code and thats why its not giving the gradient output but my dumbass didnt even print grad, i wanna kms 😭
Introducing Pluto:
ML tool that lets you train any dataset in minutes right in your browser, with no coding required.
Just upload a dataset, pick a target column, and run multiple models in one go, get results right away with plots.
more models, hyper-parameters coming soon.
🧵Thread on How Activation Functions Power Neural Networks:
The core of what makes neural networks powerful
we'll break it down step by step:
why stacking only linear layers fails, how activation functions add non-linearity, how that changes what a network can learn, and more.
both are very outdated, it can be made for very simple answers as it can't handle complex things, and gpt2 has a max context window of 1024 tokens so it would forget earlier convos and get messy.
both are very outdated, it can be made for very simple answers as it can't handle complex things, and gpt2 has a max context window of 1024 tokens so it would forget earlier convos and get messy.
241 Followers 206 FollowingI love to train Deep Neural Nets 🧠 in Low Resourced Datasets.
C programming and Python.
https://t.co/t7JdXfS2sb
https://t.co/FgdRHhY3BY
1.4M Followers 1K FollowingBuilding @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.
145K Followers 267 FollowingDesigner/Programmer of Braid and The Witness. President, Thekla, Inc. Partner in Indie Fund. Working hard on a new game and a programming language.
8K Followers 426 FollowingSolo game programmer. Prev Head of Engineering at Proxima and Senior Engineer at @Unity.
Built @playsuckup ($1M+), now building a next gen engine for the web
18K Followers 337 FollowingSoftware engineer and logic design hobbyist. Since 2021, Building RISC-V SoCs from scratch and hacking xv6/Linux to life, TU Berlin graduated
771 Followers 539 Followingex-backend intern @doceree | DL & Backend | certified schizoid | yeah, i don't touch much grass but you can find me on https://t.co/ERiLTBoM9O
25.0M Followers 336K FollowingOfficial X account of Liverpool Football Club. The Premier League champions. 🔴 Stop The Hate, Report It. #RedTogether ✊ @LFCHelp 💻
263 Followers 203 Following{Fintech | AI}
*On a never-ending journe¥ to become a person of great knowledge. I spend most of m¥ time on this platform writing on Artificial Intelligence*