Category Statistics

115. Jacobian Matrix

Jacobian Matrix is a matrix which stores all the partial derivatives for multiple functions. For example, let’s consider ex1 (top left). F(x) is a function containing 1 variable. If you calculate the derivative of F(x) it would be 2x. Now,…

89. Max-Norm Regularization

Another useful regularization technique is called Max-Norm Regularization. Implementation layer = keras.layers.Dense(100, activation=”selu”, kernel_initializer=”lecun_normal”, kernel_constraint=keras.constraints.max_norm(1.)) By setting a hyper-parameter r you can set a max value for weights to prevent over-fitting. References: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow,…

88. Monte Carlo Dropout

Dropout is one of the most popular regularization techniques for deep neural networks. Monte Carlo Dropout may help boost the dropout model even more. Full Implementation ys = np.stack([model(X_test, training=True) for sample in range(100)]) y = ys.mean(axis=0) Predict() method returns…

69. Maximum Likelihood

Maximum Likelihood, as the naming goes, is maximizing the likelihood of your prediction. Let’s say you have an input x and you want to predict y. In this scenario, you want to maximize the probability of Y given x and the bias…

67. Partition Function

Unnormalized probability distributions are guaranteed to be nonnegative, but not guaranteed to sum or integrate to 1.We need to consider a partition function to obtain a valid probability distribution.But how do you calculate that..? Since it is most often intractable,…

66. Gibbs Sampling

Gibbs Sampling is used when the dimensions of the distribution you are trying to sample from are more than 2, AND when it is difficult to sample from that joint distribution. For example, we want to sample from a joint…

65. Markov Chains

The central limit theorem states that the distribution of independent samples approximates a normal distribution as the number of those samples increases. On the other hand, the Markov Chain states that even dependent samples will converge to a certain state…

64. Energy-Based Models

When training a model, we usually use a cost function to calculate how far away the predictions are from the actual results. By replacing that cost function with a function called energy function, we call that an energy-based model. What…