2. Neural Networks / L2. Implementing Gradient Descent

티스토리 뷰

Deep Learning

2. Neural Networks / L2. Implementing Gradient Descent - Implementing Gradient Descent

chrisysl 2018. 7. 9. 21:36

Implementing Gradient Descent

- weights를 업데이트 하는 알고리즘은 위와같다.

- 단일 weight에 대한 업데이트 알고리즘인데, 이를 네트워크가 학습하여

- 다중 weights에 대한 업데이트를 진행할 수 있도록 하려면 어떻게해야할까?

- 전에 진행했던 데이터를 기반으로 대학에 입할 할 수 있는지 없는지를 예측하는 프로그램을 만들어보자.

- 단일 output layer와 단일 unit을 가진 프로그램을 먼저 만들어보도록 하자.

- sigmoid function을 output unit activation으로 사용할 것임.

- 세개의 input unit이 있다고 생각할 수 있지만(GRE score, GPA, Rank)

- 먼저 데이터를 사용가능하도록 변형을 해줘야한다.

- rank의 경우 범위를 지정해 줄 뿐, 어떠한 상대적인 값도 리턴하지 않는다.

- 따라서 dummy 변수들을 둬서 rank를 적용시킬 수 있도록 해줘야하고,

- 네개의 열로 쪼개서 0또는 1로 표현해줘야함

- 또한 전에 했던것과 동일하게 GRE 와 GPA 데이터를 표준화 해야한다.

- 즉, 평균이 0이고 표준편차가 1이 되도록 정규화를 시켜줘야함

- 이는 sigmoid function이 매우 작고 매우 큰 input들을 처리하기위해 필요한데,

- 실제로 매우 작거나 큰 값들의 gradient 값이 0으로 나오기 때문

- 따라서 input이 매우 극단적일 경우 신중하게 weight을 설정해야한다.

- 그렇지 않을경우 gradient descent step이 죽어버려서(die off) 네트워크가 학습을 제대로 할 수 없음

- 그렇기 때문에 정규화된 데이터를 제공해서 weights에 대한 초기화가 적절히 이뤄지도록 해야함

- data_prep.py 파일을 참고하여 적절한 데이터 처리를 위한 분류방법을 참고.

- 최종적으로 우리는 6개의 input features들과 1개의 label을 들고 학습을 시작할 수 있게 환경을 조성하여 진행한다.

Mean Square Error

- error를 계산하는 방법을 수정해서 진행해보자.

- SSE 대신, the mean of the square errors (MSE)를 사용한다.

- 이제 우리는 많은 양의 데이터를 사용하여 처리할 것이기 때문에,

- 모든 weight steps를 합하는것은 gradient descent를 계산할 때 전보다 훨씬 작은 learning rate를 적용한다.

- 그리고 평균을 구하기 위해 records의 개수로 나눠주면 된다.

- 이런방식으로 아무리 많은 데이터를 사용하더라도 learning rate가 0.01 에서 0.001과 같이

- 매우 작은 수로 할당되도록 하여 더 정교한 표현이 가능하다.

- 따라서 MSE를 위와같이 구하면 된다. 평균을 내준다는점이 포인트.

· gradient descent를 구하는 알고리즘

- 각 record에 대해 weight steps의 평균을 내는것 대신 모든 weights를 업데이트 하도록 변경할 수도 있다.

- Activation function 또는 Sigmoid function

- Sigmoid function에 대한 그래디언트 값

- h 는 output unit에 대한 input이 된다.

Implementing with Numpy

- weights 초기화 이후 전의 그래프에서 확인했듯, sigmoid 에 대입한 결과가 0에 가까운

- 결과가 나와야 error가 최소화되는 방향으로 네트워크를 진행해가면 된다.

- 또한 중요한것은, 최초의 시작이 랜덤하게 시작해서 매번 다른 시작값을 가지도록 초기화 해야한다.

- 그래서 결국 weights를 0에 중심을 맞춘 정규 분포에 초점을 맞춰 weights를 초기화 해간다.

- n이 input units의 개수일 때, scale은 위와같이 구한다.

- 이 scale값은 input units의 수가 증가하여도 sigmoid function에 대한 input을 낮게 유지시켜준다.

- weights 를 구하는 알고리즘

- NumPy는 내적을 계산해주는 함수를 제공하는데, 그게 바로 dot이다.

- 이를 이용해서 h를 쉽게 구할 수 있음.

- 요소별 계산을 진행해준다 즉, 배열 1의 첫번째 원소에 배열 2의 첫번쨰 원소 끼리의 곱을 처리하고

- 각 product를 합한다.

- 그 이후 최종적으로 델타Wi 와 Wi를 weights += ... 와 같이 작성하여 업데이트 할 수 있다.

- 또한 전부터 강조하였듯, sigmoid function인 f(h)의 미분값 즉,

- f'(h)의 경우, f'(h) = f(h)(1-f(h))와 같이 나오기 떄문에, f(h)만 알고있다면

- 그 미분값 즉, 그래디언트 값을 쉽게 구할 수 있다.

import numpy as np
from data_prep import features, targets, features_test, targets_test


def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1 / (1 + np.exp(-x))

# TODO: We haven't provided the sigmoid_prime function like we did in
#       the previous lesson to encourage you to come up with a more
#       efficient solution. If you need a hint, check out the comments
#       in solution.py from the previous lecture.

# Use to same seed to make debugging easier
np.random.seed(42)

n_records, n_features = features.shape
last_loss = None

# Initialize weights
weights = np.random.normal(scale=1 / n_features**.5, size=n_features)

# Neural Network hyperparameters
epochs = 1000
learnrate = 0.5

for e in range(epochs):
    del_w = np.zeros(weights.shape)
    for x, y in zip(features.values, targets):
        # Loop through all records, x is the input, y is the target

        # Activation of the output unit
        #   Notice we multiply the inputs and the weights here 
        #   rather than storing h as a separate variable 
        output = sigmoid(np.dot(x, weights))

        # The error, the target minus the network output
        error = y - output

        # The error term
        #   Notice we calulate f'(h) here instead of defining a separate
        #   sigmoid_prime function. This just makes it faster because we
        #   can re-use the result of the sigmoid function stored in
        #   the output variable
        error_term = error * output * (1 - output)

        # The gradient descent step, the error times the gradient times the inputs
        del_w += error_term * x

    # Update the weights here. The learning rate times the 
    # change in weights, divided by the number of records to average
    weights += learnrate * del_w / n_records

    # Printing out the mean square error on the training set
    if e % (epochs / 10) == 0:
        out = sigmoid(np.dot(features, weights))
        loss = np.mean((out - targets) ** 2)
        if last_loss and last_loss < loss:
            print("Train loss: ", loss, "  WARNING - Loss Increasing")
        else:
            print("Train loss: ", loss)
        last_loss = loss


# Calculate accuracy on test data
tes_out = sigmoid(np.dot(features_test, weights))
predictions = tes_out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))

binary.csv

data_prep.py

gradient.py

'Deep Learning' 카테고리의 다른 글

2. Neural Networks / L2. Implementing Gradient Descent - Backpropagation (0)	2018.07.11
2. Neural Networks / L2. Implementing Gradient Descent - Multilayer Perceptrons (0)	2018.07.10
2. Neural Networks / L2. Implementing Gradient Descent - Gradient Descent with Mean Squared Error Function (0)	2018.07.04
2. Neural Networks / L1. Introduction to Neural Networks - Lab : Analyzing Student Data (0)	2018.07.04
2. Neural Networks / L1. Introduction to Neural Networks - Backpropagation (0)	2018.07.03

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

글 보관함

내용정리

티스토리 뷰

2. Neural Networks / L2. Implementing Gradient Descent - Implementing Gradient Descent

'Deep Learning' 카테고리의 다른 글

티스토리툴바