## Hypothesis function

Since the label type is different from Regression Problem, we should use another hypothesis for solving Classification Problem. Here, we are going to introduce a popular one — Logistic Regression.

**Logistic Regression** is also called a sigmoid function, which maps real numbers into probabilities, range in $[0, 1]$. Hence, the value of sigmoid function means **how certain the data belongs to a category**. The formula is defined as the following picture.

- Note that y represents the label, $y=1$ is the goal label and $y=0$ is the other label, and in sigmoid function, we always concern about the goal label.
- In most cases, we take 0.5 as the threshold of probability. If $h(x) \geq 0.5$, we predict the data belongs to label 1, while $h(x) < 0.5$, we predict the data belongs to label 0.
- Sigmoid Function $h_{\theta}(x)$: represents the probability of $y=1$

## Cost function

We learnt about the cost function $J(θ)$ in the Linear regression, the cost function represents optimization objective i.e. we create a cost function and minimize it so that we can develop an accurate model with minimum error.

If we try to use the cost function of the linear regression in `Logistic Regression`

then it would be of no use as it would end up being a **non-convex** function with many local minimums, in which it would be very **difficult** to **minimize the cost value** and find the global minimum.

We will use another error function called **Cross-Entropy or log loss**.

$\operatorname{cost}\left(h_{\theta}(x), y\right)=-y \times \log \left(h_{\theta}(x)\right)-(1-y) \times \log \left(1-h_{\theta}(x)\right)$

```
import numpy as np
def cost(theta, X, y):
theta = np.matrix(theta)
X = np.matrix(X)
y = np.matrix(y)
first = np.multiply(-y, np.log(sigmoid(X* theta.T)))
second = np.multiply((1 - y), np.log(1 - sigmoid(X* theta.T)))
return np.sum(first - second) / (len(X))
```

## Gradient descent

**Proof:**

Thus,