Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A step by step backpropagation example (2015) (mattmazur.com)
129 points by bryan0 on Nov 5, 2022 | hide | past | favorite | 14 comments


I recommend watching Andrej Karpathy's recent videos on backprop and neural nets. Makes it really simple and intuitive.

The spelled-out intro to neural networks and backpropagation: https://www.youtube.com/watch?v=VMj-3S1tku0

Becoming a backprop ninja: https://www.youtube.com/watch?v=q8SA3rM6ckI


Agreed, I learned it from his cs231 visual recognition YouTube lectures, makes it super simple


I was just studying for a test where I had to do this. There's a bunch of articles out there on how to do it but for my money, this is the best calculation focused one: https://www.anotsorandomwalk.com/backpropagation-example-wit...

And this is the best explanation focused one: https://alexander-schiendorfer.github.io/2020/02/24/a-worked...


Those python examples at the end are both simple and great, thanks.


The second one is great, I use it in class with my students.


To me, there's no better description of backpropagation than this Sanjeev Arora's et al post from 2016 [1].

[1] https://www.offconvex.org/2016/12/20/backprop/


As a noob on this subject, I have found the following two books very helpful for understanding ANNs;

1) Neural Networks for Applied Sciences and Engineering: From Fundamentals to Complex Pattern Recognition by Sandhya Samarasinghe. - A very nice, intuitive and comprehensible book with lots of illustrations.

2) Practical Neural Network Recipes in C++ by Timothy Masters - A old classic with step-by-step code you can follow.


The second link in TFA for the neural network visualization leads to a heroku app 404 page. Is this a victim of Salesforce's recent policy changes taking away free-tier apps?

http://www.emergentmind.com/neural-network

I'm hopeful it's only a typo. The level of detail and thoroughnes in this article is remarkable.


Turns out our brains are pretty good at math.


There's no conclusive evidence that brains do backpropagation.


Backprop is a different story but I believe brains kind of do dot products


The math going on in the brain is very different to all this stuff. They're dynamical systems.

Stochastic gradient descent is trivial compared to what the brain is doing.

Plus, there are thousands of different types of neurons, with very distinct behaviors.


There's an easier way of explaining this.

Consider this simple situation in a network that uses the identity function as activation function for every neuron:

        input   output
      0.2 o─────────o  = (0.2 * 0.2) = 0.04
           w1 = 0.2    
Now let's say the expected output was 0.06 instead of 0.04. What change would you make to w1? The answer is that you would make w1 = 0.3, because (0.2 * 0.3) = 0.06

Now consider a slightly more complicated situation:

        input    hidden      output
      0.2 o─────────o─────────o  = (0.2 * 0.2 * 0.4) = 0.016
           w1 = 0.2   w2 = 0.4
What changes would you make to w1 and w2 to get 0.06 as a result? You have multiple options here...

- w1 = 0.3 and w2 = 1.0

- w1 = 1.0 and w2 = 0.3

- w1 = 0.5 and w2 = 0.6

- w1 = 0.6 and w2 = 0.5

- many others

As you can see, you have multiple ways of arriving to the same result. How do you go about finding these numbers?

Let's say you try to find w1 and w2 via bruteforce, which is trying every number until you get the right result. How many attempts you need? A lot, considering you have to try different floating point numbers and there are a lot of possible numbers.

And for a slightly larger network (that's still tiny), it would take just simply too many attempts:

        input    hidden      output
          o────────┬o────────┬o
          o────────┼o────────┼o
          o────────┼o────────┼o
          o────────┼o────────┼o
          o────────┴o────────┴o
So what can you do to do better than bruteforce?

You know so far:

- how far you are from the result (the difference between the output and expected value)

- the sign of that difference

Therefore you know if you have to go higher or lower. Now what you can create a loop where you make make adjustments to the weights so that you go higher or lower depending on what your difference was... great.

It's a little bit like playing golf. You know how hard you have to hit the ball and what direction you have to go. And as you start approaching you start hitting softer.

But how do you do this when you have multiple layers? Well, you have to compute how much each node contributed to the difference in the expected result, by using the values from the weights. Once you do this, you can adjust the weights up or down depending on where you have to go. And do this for each layer, propagating the blame for the differences in result backwards ("backpropagation").

This would be a little bit like playing pool instead of golf, where one ball hits another and then another.

The actual situation is more complicated since you have multiple inputs and expected outputs and you have to work with the same weights for all of those.


(2015)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: