Normal Equation

2019/04/24 Machine_Learning

$J\left(\theta_{0}, \theta_{1}, \ldots, \theta_{m}\right)=\frac{1}{2 m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2}$, $\theta \in \mathbb{R}^{n+1}$

$\frac{\partial}{\partial \theta_{j}} J(\theta)=\cdots=0$ ( for every $j$ )

Solve for $\theta_{0}, \theta_{1}, \ldots, \theta_{n}$

$m$ examples $(x^{(1)}, y^{(1)}), \dots, (x^{(m)}, y^{(m)})$; $n$ features

,$\quad$ ,$\quad$

$\theta=\left(X^{T} X\right)^{-1} X^{T} y$

Matlab code

pinv(X'*X)*X'*y

Python code

import numpy as np 
def normalEqn(X, y):
  theta = np.linalg.inv(X.T@X)@X.T@y #X.T@X 等价于 X.T.dot(X)
  return theta

Proof

Proof: $\theta=\left(X^{T} X\right)^{-1} X^{T} y$

$J(\theta)=\frac{1}{2 m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2}$, where, $h_{\theta}(x)=\theta^{T} x=\theta_{0} x_{0}+\theta_{1} x_{1}+\theta_{2} x_{2}+\ldots+\theta_{n} x_{n}$

$J(\theta)=\frac{1}{2}(X \theta-y)^{2}$, $X$: $m \times n$, $\theta$: $n \times 1$, $y$: $m \times 1$

As $\frac{d A B}{d B}=A^{T}$ and $\frac{d X^{T} A X}{d X}=2 A X$

Thus,

Set $\frac{\partial J(\theta)}{\partial \theta}=0$

Then, $\theta=\left(X^{T} X\right)^{-1} X^{T} y$

Search

    Table of Contents