Show HN: I built a toy TPU that can do inference and training on the XOR problem

tinytpu.com

103 points by evxxan 14 hours ago

We wanted to do something very challenging to prove to ourselves that we can do anything we put our mind to. The reasoning for why we chose to build a toy TPU specifically is fairly simple:

- Building a chip for ML workloads seemed cool - There was no well-documented open source repo for an ML accelerator that performed both inference and training

None of us have real professional experience in hardware design, which, in a way, made the TPU even more appealing since we weren't able to estimate exactly how difficult it would be. As we worked on the initial stages of this project, we established a strict design philosophy: TO ALWAYS TRY THE HACKY WAY. This meant trying out the "dumb" ideas that came to our mind first BEFORE consulting external sources. This philosophy helped us make sure we weren't reverse engineering the TPU, but rather re-inventing it, which helped us derive many of the key mechanisms used in the TPU ourselves.

We also wanted to treat this project as an exercise to code without relying on AI to write for us, since we felt that our initial instinct recently has been to reach for llms whenever we faced a slight struggle. We wanted to cultivate a certain style of thinking that we could take forward with us and use in any future endeavours to think through difficult problems.

Throughout this project we tried to learn as much as we could about the fundamentals of deep learning, hardware design and creating algorithms and we found that the best way to learn about this stuff is by drawing everything out and making that our first instinct. In tinytpu.com, you will see how our explanations were inspired by this philosophy.

Note that this is NOT a 1-to-1 replica of the TPU--it is our attempt at re-inventing a toy version of it ourselves.

ganiszulfa 6 hours ago

Amazing project, and amazing write-up, I especially like the animations. What's the end goal here? Putting these TPUs in the consumer hands or edge devices?

jacquesm 12 hours ago

Sometimes it is the projects where you don't know that you really don't know what you are doing that are the most satisfying, kudos, amazing work you have done.

  • evxxan 11 hours ago

    Thank you!

skybrian 12 hours ago

It's unclear to me what the end result is. Did you build real hardware or is it simulated somehow? If it's hardware, what kind and how did you make it?

  • jacquesm 12 hours ago

    Verilog spec by the looks of it. So you should be able to make it work on an FPGA or if you happen to have a chip fab in your garage you might want to make your own silicon ;) I'd go the FPGA route.

  • antognini 12 hours ago

    Based on the code in the repo it looks like they designed the chip in verilog and then ran it in a simulator. But if they have the verilog code in principle they could send it off to a fab and get real hardware back.

    • UncleOxidant 10 hours ago

      Next step would be to try it out in an FPGA.

  • zhainya 12 hours ago

    I feel like I missed a whole section somewhere. "Built a toy TPU". What does that mean? I have no idea what was actually "built" here.

    • evxxan 11 hours ago

      By "toy TPU", we simulated forward pass + backprop on a minimal tpu-like accelerator.

  • evxxan 11 hours ago

    all in simulation :)

utopcell 6 hours ago

The Google team used Chisel instead of SystemVerilog. You could consider switching to that if it makes sense for your project.

  • FirmwareBurner 3 hours ago

    >The Google team used Chisel instead of SystemVerilog.

    Not sure blindly copying whatever Google is doing is always the right idea for small projects.

    They have unlimited ad money and some quirky hiring practices, so they can afford to have development practices that go against HW industry norms, just for shits and giggles, without worrying about the costs.

    • utopcell 2 hours ago

      Thank you, captain obvious.

UncleOxidant 10 hours ago

Have you tried it out in an FPGA?

  • evxxan 10 hours ago

    Not yet! But that's our next step.

    • utopcell 7 hours ago

      tang nano 20k. You can't find any cheaper fpga board than this.

      • addaon 5 hours ago

        At a higher price point but with more capability, Digilent has a one-week 20% sale on their FPGA boards this week. Some good options (Artix 7 and Spartan 7) within spitting distance of $100.