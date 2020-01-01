To best fit this curve, similar to linear regression we start with random parameters ($K$, $L$, $x_0$) for the logistic function, calculate the error, and update the parameters of the function. However, this time, the error is not simply how far is the label from the prediction, so we can't use MSE or $R^2$. Instead we use Maximum Likelihood (ML).

What is Maximum Likelihood

Ok You do not necessarily need to completely understand (ML), but in a nutshell, we can understand it through a nice plot.

Check out the curve drawn above.

We can calculate the likelihood of each point in our training data of being non-obese. How do we do that? Use the curve! Yes, that curve is basically the probability scaled by the features (which is in this example, the weight). You calulate the likelihoods of all the data points, and there you go, that's the likelihood of that line fitting your data, and that's what we are trying to maximize, hence the name maximum likliehood.

Computationally speaking, all we need to change from linear regression is the error function, so now it will look like:

$$-\frac{1}{n}\sum_{i=1}^N{y_i\log(\hat{y_i})+(1-y_i)\log(1-\hat{y_i})}$$

don't be afraid of this lengthy equation, it just is the multiplication of the predicted probability that an individual is obese $y_i$, with its log $\log(\hat{y_i})$, plus its counter part for the probability of observing a non-obese, which is $1-\hat{y_i}$

More on maximum likelihood