CIS 421/521 - Neural Networks and Neural Language Models - 2020-12-01 - Shared screen with speaker view
What do the derivatives represent at the original nodes?
Wait so we'll have to take both the forward pass and the backward pass - together?
what if we have an activation function that is not derivable at some point?
Is that related to BERT?
So the NN is basically learning which words are similar to each other, to use for generalization?