Category Statistics

Data, Statistics

396. Topological Data Analysis

▮ Data The growth of data volume has been exponentially fast, especially these past few years. The plot below by Statista shows that the data volume this year(2023) has nearly doubled compared to 2020. However, despite the abundance of data…

Kyosuke
February 27, 2023

Machine Learning, Statistics

389. Basic Error Analysis

▮ Error Analysis Error analysis is a process of examining the dev set that your ML model misclassified to understand the underlying causes of error. This can help you decide what to prioritize and the direction where the project should…

Kyosuke
February 1, 2023

Statistics

349. Confound Variables

What is it? Confound variables are like extra independent variables that affect the results. Issues This can cause the following issues. Increase Variance Introduce Bias Avoidance Here are some methods to avoid the above. Control considering variables Random assignment Counterbalancing

Kyosuke
October 22, 2022

Statistics

338. Pearson VS Spearman Correlation

Pearson Correlation Evaluates the linear relationship between 2 variables. Ranges from -1(When the value of one variable increases while the other decreases) to 1(When the value of one variable increases while the other increases as well). Spearman Rank-Order Correlation Evaluates…

Kyosuke
October 11, 2022

Data, Statistics

227. Polynomial Features

Adding Linear Complexity When we want to train a model, we can easily imagine that we are unable to capture patterns of the training data if only using straight lines. Polynomial features is useful when you want to add more…

Kyosuke
September 30, 2022

Statistics

171. Bayes Theorem

Bayes Theorem Bayes Theorem is about considering the posterior distribution considering prior distribution and currently available data. Let’s say we want to predict a man’s occupation. Is this man a librarian or a farmer given the following description? – He…

Kyosuke
August 1, 2022

AI, Statistics

152. KL Divergence

KL Divergence measures the distance between 2 distributions. This can be used to understand Cross-Entropy and deep learning model architectures such as VAE. For Example, lets say there is a coin which has 50% chance of being HEADS and 50%…

Kyosuke
July 12, 2022

AI, Statistics

151. Different Types of Optimizers

There are mainly two approaches for optimizing gradient descent. Adjusting the Learning Rate or Adjusting the Gradients.

Kyosuke
July 11, 2022

Statistics

147. Why Squared Loss?

Why do we use squared loss instead of absolute loss? One reason is because by squaring the loss you can magnify it which can help train the model. Another reason is because absolute loss is not differentiable when equals 0.…

Kyosuke
July 7, 2022

Statistics

145. Multi-Class vs Multi-Label / SoftMax vs Sigmoid

Multi-Class = 1 class per image Multi-Label = Includes multiple label in a single image Softmax = Scale output to 0~1 and make the sum equal to 1 so that it becomes probabilities. Useful for multi-class classification. Sigmoid = Scale…

Kyosuke
July 5, 2022