Logistic Regression

2019/04/23 Machine_Learning

Hypothesis function

Since the label type is different from Regression Problem, we should use another hypothesis for solving Classification Problem. Here, we are going to introduce a popular one — Logistic Regression.

Logistic Regression is also called a sigmoid function, which maps real numbers into probabilities, range in $[0, 1]$. Hence, the value of sigmoid function means how certain the data belongs to a category. The formula is defined as the following picture.

  • Note that y represents the label, $y=1$ is the goal label and $y=0$ is the other label, and in sigmoid function, we always concern about the goal label.
  • In most cases, we take 0.5 as the threshold of probability. If $h(x) \geq 0.5$, we predict the data belongs to label 1, while $h(x) < 0.5$, we predict the data belongs to label 0.
  • Sigmoid Function $h_{\theta}(x)$: represents the probability of $y=1$

Cost function

We learnt about the cost function $J(θ)$ in the Linear regression, the cost function represents optimization objective i.e. we create a cost function and minimize it so that we can develop an accurate model with minimum error.

If we try to use the cost function of the linear regression in Logistic Regression then it would be of no use as it would end up being a non-convex function with many local minimums, in which it would be very difficult to minimize the cost value and find the global minimum.

We will use another error function called Cross-Entropy or log loss.

$\operatorname{cost}\left(h_{\theta}(x), y\right)=-y \times \log \left(h_{\theta}(x)\right)-(1-y) \times \log \left(1-h_{\theta}(x)\right)$

import numpy as np
def cost(theta, X, y):
  theta = np.matrix(theta)
  X = np.matrix(X)
  y = np.matrix(y)
  first = np.multiply(-y, np.log(sigmoid(X* theta.T)))
  second = np.multiply((1 - y), np.log(1 - sigmoid(X* theta.T)))
  return np.sum(first - second) / (len(X))

Gradient descent

Proof:

Thus,

Reference

Search

    Table of Contents