You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've read your slides in lecture 1b (Deep neural network are our friends). In slide: "Gradient are our friends" explaining arg min C(w, b): w0, b0 = 2, 2; C(w0, b0) = 68. This's correct. But after that, I don't understand why the results of expression sum(-2(y^ - y)*x) are: 8, -40, -72. I think that: -8, 40, 72 are correct.
By the way, I implemented this simple network but when I trained it through 100 times, the value of cost function was not convergent. Here is my code:
First: your gradient calculation is off. When you define the cost as (y - out)**2 then the derivative w.r.t. w will be -2*(y - out)*x and not -2*(out - y)*x. So it seems like you just mixed it up there. Same issue for your gradient w.r.t. b.
Second: Diverging cost is usually a sign for a too-high learning rate. Try something lower. Go in steps of dividing by 10.
, and here is result:
Epoch: 0 , cost: 68
Epoch: 10 , cost: 1.1268304493e+19
Epoch: 20 , cost: 3.00027905999e+36
Epoch: 30 , cost: 7.98849058743e+53
Epoch: 40 , cost: 2.12700154184e+71
Epoch: 50 , cost: 5.66331713039e+88
Epoch: 60 , cost: 1.50790492101e+106
Epoch: 70 , cost: 4.01492128811e+123
Epoch: 80 , cost: 1.06900592505e+141
Epoch: 90 , cost: 2.84631649237e+158
Epoch: 100 , cost: 7.57855254577e+175
Please explain for me. Thank you in advance!
The text was updated successfully, but these errors were encountered: