• _brickner Profile Picture

    Will @_brickner

    9 months ago

    wrote a paper: it lets you *train* in 1.58b! could use 97% less energy, 90% less weight memory. leads to a new model format which can store a 175B model in ~20mb. also, no backprop!

    _brickner tweet picture

    107 329 5K 969K 4K
    Download Image
  • _brickner Profile Picture

    Will @_brickner

    9 months ago

    what about bitnet? bitnet does inference in 1.58b, but training uses precision weights. basically they clamp weights to ternary {-1,0,1} in forward pass, and pretend they didn’t in backward pass.

    _brickner tweet picture
    _brickner tweet picture
    keyboard_arrow_left Previous keyboard_arrow_right Next

    1 3 174 35K 28
    Download Image
  • gfodor Profile Picture

    gfodor.id @gfodor

    9 months ago

    @_brickner Tried running the pdf, says it’s not an executable

    2 2 127 9K 1
  • VictorTaelin Profile Picture

    Taelin @VictorTaelin

    9 months ago

    @_brickner does this mean *every* part for the architecture could be implemented without floats? like we could train and infer a model in a chip without FP arithmetic at all?

    4 1 88 13K 7
  • QuixiAI Profile Picture

    Eric Hartford @QuixiAI

    9 months ago

    @_brickner Can you please link the weights?

    0 0 31 4K 0
  • ZyMazza Profile Picture

    Zy @ZyMazza

    9 months ago

    @_brickner Seems big if true

    0 0 23 4K 0
  • dvruette Profile Picture

    Dimitri von Rütte @dvruette

    9 months ago

    @_brickner Are you planning on releasing any code with this?

    2 0 21 6K 0
  • MarkSchmidty Profile Picture

    Mark Schmidt 🌐 @MarkSchmidty

    9 months ago

    @_brickner Drop code. GitHub doesn't have reviewers.

    1 0 13 2K 0
  • rebelcrayon Profile Picture

    shaun @rebelcrayon

    9 months ago

    Hello. I am Reviewer #2, destroyer of dreams. My assessment is that the presented work cannot yet be taken seriously. The best that can be said is that if the claims are true, then their current presentation does them great disservice. It is likely that the author is not yet well-trained enough to understand what a rigorous demonstration of new techniques entails. This is not a question of gatekeeping but rather coherence and verifiability. The paper is too short to prove the striking claims being made about memory and energy. The given experimental results appear to be from a toy problem (MLP applied to MNIST) with no implementation code available for inspection. It is not made clear how one is meant to compute gradient sign without backpropagation. It is not clear how one can compute efficiently with a model that must be reconstructed from a random seed and perturbations at each step. The estimated memory footprint is announced as being made “with great hubris” and the argument for correctness appears to be a naked claim that ideas are “a priori” correct. This is not adequate: results that sound too good to be true are clearly not “a priori” to be regarded as true without argument or implementation. Indeed this appears more in alignment with the style of an amateur attempted proof of the Riemann hypothesis than a legitimate scientific exposition. Other language used throughout is nonstandard or otherwise too fluid to be meaningful. I recommend the manuscript be rejected.

    5 1 209 7K 23
  • basedanarki Profile Picture

    anarki @basedanarki

    9 months ago

    @_brickner fact check please if not busy @teortaxesTex i would amp and the thread is funny

    1 0 11 5K 0
  • sasuke___420 Profile Picture

    sasuke⚡420 @sasuke___420

    9 months ago

    @_brickner it's impressive that you're storing ~1k bits per bit there!

    0 0 6 841 0
  • zeroxBigBoss Profile Picture

    Allen @zeroxBigBoss

    9 months ago

    @_brickner What can be unburdened by what has been.

    1 0 4 3K 0
  • gaunernst Profile Picture

    Thien Tran @gaunernst

    9 months ago

    @_brickner Do you have an implementation of this anywhere? Would love to try this with LLM training.

    0 0 3 548 0
  • HDPbilly Profile Picture

    HDP @HDPbilly

    9 months ago

    @_brickner Finna get out the hood with this one

    0 0 2 659 1
  • threadreaderapp Profile Picture

    Thread Reader App @threadreaderapp

    9 months ago

    @_brickner Your thread is everybody's favorite! #TopUnroll threadreaderapp.com/thread/1871348… 🙏🏼@borsali24 for 🥇unroll

    0 0 0 778 0
  • victor_explore Profile Picture

    Victor @victor_explore

    9 months ago

    @_brickner What do you mean by no back propagation

    0 0 0 279 0
  • Download Image
    • Privacy
    • Term and Conditions
    • About
    • Contact Us
    • TwStalker is not affiliated with X™. All Rights Reserved. 2024 www.instalker.org

    twitter web viewer x profile viewer bayigram.com instagram takipçi satın al instagram takipçi hilesi twitter takipçi satın al tiktok takipçi satın al tiktok beğeni satın al tiktok izlenme satın al beğeni satın al instagram beğeni satın al youtube abone satın al youtube izlenme satın al sosyalgram takipçi satın al instagram ücretsiz takipçi twitter takipçi satın al tiktok takipçi satın al tiktok beğeni satın al tiktok izlenme satın al beğeni satın al instagram beğeni satın al youtube abone satın al youtube izlenme satın al metin2 metin2 wiki metin2 ep metin2 dragon coins metin2 forum metin2 board popigram instagram takipçi satın al takipçi hilesi twitter takipçi satın al tiktok takipçi satın al tiktok beğeni satın al tiktok izlenme satın al beğeni satın al instagram beğeni satın al youtube abone satın al youtube izlenme satın al buyfans buy instagram followers buy instagram likes buy instagram views buy tiktok followers buy tiktok likes buy tiktok views buy twitter followers buy telegram members Buy Youtube Subscribers Buy Youtube Views Buy Youtube Likes forstalk postegro web postegro x profile viewer