pytorch를 이용한 자연어입문 2024. 4. 3. 14:20

SUMMARY

문제상황: Perceptron으로 XOR을 표현하려고 했는데, 표현 되지않았다.

해결책: multi layer perceptron으로 풀수있다는것을 실험으로 확인함.

XOR problem은 perceptron으로 풀수 없음을 실험으로 확인함.

그래서 XOR problem을 풀려면 multi layer perceptron으로 풀수있다는것을 실험으로 확인함.

XOR연산으로는 1개선으로 구분안됨(and, or은 1개선으로 구분됨)

==>해결책: hidden layer를 통해서 hidden space에 한쪽으로 class들을 모아놓음으로써 구분할수 있게 해준다.

아래그림) 퍼셉트론으로는 아무리 학습해도 동그라미랑 별을 구분할수 없음

아래그림)

원래그림을 maping하면 직선하나로 구분가능

activation function을 쓰는 이유

-> 선형변형 여러번은 선형변환 한번과 동일해서 , 중간에 비선형을 넣어줘야 데이터변환의 의미가 있다.

-> MLP는 activation이 없으면 mlp로 아무리 쌓아봤자 선형변환 한번한것과 동일해서 MLP가 Perceptron이 되버림

XW1 = H1

H1W2 = H2

H2 = H1W2 = (XW1)W2 = X(W1W2) = XW'

X(W1W2) -> 어짜피 행렬w1*w2를 곱한거는 행렬이니까

XW' -> 다른 어떤 행렬로 보기로 함

XW1 = H1

H1W2 = H2

XW'

-> 선형변형 여러번은 선형변환 한번과 동일해서 , 중간에 비선형을 넣어줘야 데이터변환의 의미가 있다.

Perceptron 과 MLP의 차이

Perceptron은 선형으로 구분하는 케이스만 학습가능

MLP는 복잡한 데이터도 구분가능

--------------------------------------------------------------------------------------------------------------------------------------------------------------

CENTERS = [(-3, -3), (3, 3), (3, -3), (-3, 3)]

or batch_i in range(batch_size):

center_idx = np.random.randint(0, n_centers)

x_data.append(np.random.normal(loc=CENTERS[center_idx])) # 위 centers 리스트중 하나를 중심으로 잡고 정규분포에서 하나를 뽑아서 x_data에 넣는다.

y_targets[batch_i] = LABELS[center_idx]

초기 데이터 그래프

seed = 24

torch.manual_seed(seed)

torch.cuda.manual_seed_all(seed)

np.random.seed(seed)

x_data, y_truth = get_toy_data(batch_size=1000) #batch size = 1000

print(x_data)

len(x_data)

numpy array (파이썬이랑 비슷하지만 다름)

[[ 2.771182  -3.1051946]
 [ 2.3904393  1.7719052]
 [-3.1335294 -1.5222008]
 ...
 [-4.7363553 -5.493645 ]
 [-3.752318   3.4247835]
 [-3.5867133  3.925775 ]]

모델 정의

class MultilayerPerceptron(nn.Module):

"""

def __init__(self, input_size, hidden_size=2, output_size=3,

num_hidden_layers=1, hidden_activation=nn.Sigmoid):

"""가중치 초기화

매개변수:

input_size (int): 입력 크기

hidden_size (int): 은닉층 크기

output_size (int): 출력 크기

num_hidden_layers (int): 은닉층 개수

hidden_activation (torch.nn.*): 활성화 함수

"""

super(MultilayerPerceptron, self).__init__()

self.module_list = nn.ModuleList()

interim_input_size = input_size

interim_output_size = hidden_size

for _ in range(num_hidden_layers):

self.module_list.append(nn.Linear(interim_input_size, interim_output_size))

self.module_list.append(hidden_activation())

interim_input_size = interim_output_size

퍼셉트론 훈련

input_size = 2

output_size = len(set(LABELS))

num_hidden_layers = 0 ## hidden layer =0 perceptron 하나 만든거랑 동일

hidden_size = 2 # 실제로 사용하지 않지만 지정합니다

seed = 24

torch.manual_seed(seed)

torch.cuda.manual_seed_all(seed)

np.random.seed(seed)

mlp1 = MultilayerPerceptron(input_size=input_size,

hidden_size=hidden_size,

num_hidden_layers=num_hidden_layers,

output_size=output_size)

print(mlp1)

input_size = 2

output_size = len(set(LABELS))

num_hidden_layers = 0 ## hidden layer =0 perceptron 하나 만든거랑 동일

hidden_size = 2 # 실제로 사용하지 않지만 지정합니다

seed = 24

torch.manual_seed(seed)

torch.cuda.manual_seed_all(seed)

np.random.seed(seed)

mlp1 = MultilayerPerceptron(input_size=input_size,

hidden_size=hidden_size,

num_hidden_layers=num_hidden_layers,

output_size=output_size)

print(mlp1)

batch_size = 1000 # 이거로 학습

퍼셉트론 훈련

input_size = 2

output_size = len(set(LABELS))

num_hidden_layers = 0 ## hidden layer =0 perceptron 하나 만든거랑 동일

hidden_size = 2 # 실제로 사용하지 않지만 지정합니다

seed = 24

torch.manual_seed(seed)

torch.cuda.manual_seed_all(seed)

np.random.seed(seed)

mlp1 = MultilayerPerceptron(input_size=input_size,

hidden_size=hidden_size,

num_hidden_layers=num_hidden_layers,

output_size=output_size)

print(mlp1)

batch_size = 1000 # 이거로 학습

# 1_x_1 * w1 , 1_x_2 * w2

# 2_x_1 * w1 , 2_x_2 * w2

# 3_x_1 * w1 , 3_x_3 * w2

# (1_x_1, 1_x_2) (1_x_1 * w1 , 1_x_2 * w2)

# (2_x_1, 2_x_2) * (w1) = (2_x_1 * w1 , 2_x_2 * w2)

# (3_x_1, 3_x_2) (w2) (3_x_1 * w1 , 3_x_3 * w2)

# ...1000개까지 batch

# (3 * 2) matrix (2 * 1) matrix => (3*1) matrix

결과 시각화 함수

def visualize_results(perceptron, x_data, y_truth, n_samples=1000, ax=None, epoch=None,

title='', levels=[0.3, 0.4, 0.5], linestyles=['--', '-', '--']):

_, y_pred = perceptron(x_data, apply_softmax=True).max(dim=1)

y_pred = y_pred.data.numpy()

x_data = x_data.data.numpy()

y_truth = y_truth.data.numpy()

n_classes = len(set(LABELS))

all_x = [[] for _ in range(n_classes)]

all_colors = [[] for _ in range(n_classes)]

colors = ['orange', 'green']

markers = ['o', '*']

edge_color = {'o':'orange', '*':'green'}

for x_i, y_pred_i, y_true_i in zip(x_data, y_pred, y_truth):

all_x[y_true_i].append(x_i)

if y_pred_i == y_true_i:

all_colors[y_true_i].append('white')

else:

all_colors[y_true_i].append(colors[y_true_i])

all_x = [np.stack(x_list) for x_list in all_x]

if ax is None:

_, ax = plt.subplots(1, 1, figsize=(10,10))

for x_list, color_list, marker in zip(all_x, all_colors, markers):

ax.scatter(x_list[:, 0], x_list[:, 1], edgecolor=edge_color[marker], marker=marker, facecolor=color_list, s=100)

xlim = (min([x_list[:,0].min() for x_list in all_x]),

max([x_list[:,0].max() for x_list in all_x]))

ylim = (min([x_list[:,1].min() for x_list in all_x]),

max([x_list[:,1].max() for x_list in all_x]))

# 초평면

xx = np.linspace(xlim[0], xlim[1], 30)

yy = np.linspace(ylim[0], ylim[1], 30)

YY, XX = np.meshgrid(yy, xx)

xy = np.vstack([XX.ravel(), YY.ravel()]).T

for i in range(n_classes):

Z = perceptron(torch.tensor(xy, dtype=torch.float32),

apply_softmax=True)

Z = Z[:, i].data.numpy().reshape(XX.shape)

ax.contour(XX, YY, Z, colors=colors[i], levels=levels, linestyles=linestyles)

# 부가 출력

plt.suptitle(title)

if epoch is not None:

plt.text(xlim[0], ylim[1], "Epoch = {}".format(str(epoch)))

학습시도

losses = []

batch_size = 10000

n_batches = 10

max_epochs = 10

loss_change = 1.0

last_loss = 10.0

change_threshold = 1e-3

epoch = 0

all_imagefiles = []

lr = 0.01

optimizer = optim.Adam(params=mlp1.parameters(), lr=lr) #Adam optimizer

cross_ent_loss = nn.CrossEntropyLoss() # cross entropy , binary classification

losses = []

batch_size = 10000

n_batches = 10

max_epochs = 10

loss_change = 1.0

last_loss = 10.0

change_threshold = 1e-3

epoch = 0

all_imagefiles = []

lr = 0.01

optimizer = optim.Adam(params=mlp1.parameters(), lr=lr) #Adam optimizer

cross_ent_loss = nn.CrossEntropyLoss() # cross entropy , binary classification

def early_termination(loss_change, change_threshold, epoch, max_epochs): #학습을 중지시킬지 말지의 함수

terminate_for_loss_change = loss_change < change_threshold

terminate_for_epochs = epoch > max_epochs #if epoch > max_epochs: true or false 반환이라서 둘중하나를 반환해서 대입

return terminate_for_epochs

#if epoch > max_epochs:

학습시도

losses = []

batch_size = 10000

n_batches = 10

max_epochs = 10

loss_change = 1.0

last_loss = 10.0

change_threshold = 1e-3

epoch = 0

all_imagefiles = []

lr = 0.01

optimizer = optim.Adam(params=mlp1.parameters(), lr=lr) #Adam optimizer

cross_ent_loss = nn.CrossEntropyLoss() # cross entropy , binary classification

def early_termination(loss_change, change_threshold, epoch, max_epochs): #학습을 중지시킬지 말지의 함수

terminate_for_loss_change = loss_change < change_threshold

terminate_for_epochs = epoch > max_epochs #if epoch > max_epochs: true or false 반환이라서 둘중하나를 반환해서 대입

return terminate_for_epochs # 함수내의 변수명(함수밖에서는 사용불가)

#if epoch > max_epochs:

while not early_termination(loss_change, change_threshold, epoch, max_epochs):#일찍 끝나지 마라의 not이니 일찍끝나라

for _ in range(n_batches):

# 단계 0: 데이터 추출

x_data, y_target = get_toy_data(batch_size)

# 단계 1: 그레이디언트 초기화

mlp1.zero_grad()

# 단계 2: 정방향 계산 수행

y_pred = mlp1(x_data).squeeze()

# 단계 3: 손실 계산

loss = cross_ent_loss(y_pred, y_target.long())

# 단계 4: 역방향 계산

loss.backward() #gradient계산

# 단계 5: 옵티마이저 단계 수행

optimizer.step() #weight를 업데이트

# 부가 정보

loss_value = loss.item()

losses.append(loss_value)

loss_change = abs(last_loss - loss_value)

last_loss = loss_value

결과 시각화 함수

def visualize_results(perceptron, x_data, y_truth, n_samples=1000, ax=None, epoch=None,

title='', levels=[0.3, 0.4, 0.5], linestyles=['--', '-', '--']):

_, y_pred = perceptron(x_data, apply_softmax=True).max(dim=1)

y_pred = y_pred.data.numpy()

x_data = x_data.data.numpy()

y_truth = y_truth.data.numpy()

n_classes = len(set(LABELS))

all_x = [[] for _ in range(n_classes)]

all_colors = [[] for _ in range(n_classes)]

colors = ['orange', 'green']

markers = ['o', '*']

edge_color = {'o':'orange', '*':'green'}

for x_i, y_pred_i, y_true_i in zip(x_data, y_pred, y_truth):

all_x[y_true_i].append(x_i)

if y_pred_i == y_true_i:# 예측값이랑 맞춘거랑 같으면 white(비어있음)

2개 층을 가진 다층 퍼셉트론 훈련하기

input_size = 2

output_size = len(set(LABELS))

num_hidden_layers = 1 #hidden layer를 1로했음. 위에 hidden layer를 0일때는 학습이 안됨. MLP로 몇개씩 hidden layer를 쌓아여 모델파워가 좋아지는거 증명

hidden_size = 2

MLP 만듬 - 학습전상태

3개 층을 가진 다층 퍼셉트론 훈련하기

input_size = 2

output_size = len(set(LABELS))

num_hidden_layers = 2 # hidden layer 2개짜리

hidden_size = 2

위 결과 이미지를 파일로 저장

모델 정의

class MultilayerPerceptron(nn.Module):

"""

def __init__(self, input_size, hidden_size=2, output_size=3,

num_hidden_layers=1, hidden_activation=nn.Sigmoid):

"""가중치 초기화

매개변수:

input_size (int): 입력 크기

hidden_size (int): 은닉층 크기

output_size (int): 출력 크기

num_hidden_layers (int): 은닉층 개수

hidden_activation (torch.nn.*): 활성화 함수

"""

super(MultilayerPerceptron, self).__init__()

self.module_list = nn.ModuleList()

interim_input_size = input_size

interim_output_size = hidden_size

for _ in range(num_hidden_layers):

self.module_list.append(nn.Linear(interim_input_size, interim_output_size))

self.module_list.append(hidden_activation())

interim_input_size = interim_output_size

self.fc_final = nn.Linear(interim_input_size, output_size)

self.last_forward_cache = []

모델 정의

class MultilayerPerceptron(nn.Module):

"""

def __init__(self, input_size, hidden_size=2, output_size=3,

num_hidden_layers=1, hidden_activation=nn.Sigmoid):

"""가중치 초기화

매개변수:

input_size (int): 입력 크기

hidden_size (int): 은닉층 크기

output_size (int): 출력 크기

num_hidden_layers (int): 은닉층 개수

hidden_activation (torch.nn.*): 활성화 함수

"""

super(MultilayerPerceptron, self).__init__()

self.module_list = nn.ModuleList()

interim_input_size = input_size

interim_output_size = hidden_size

for _ in range(num_hidden_layers):

self.module_list.append(nn.Linear(interim_input_size, interim_output_size)) # module_list 에다 nn.Linear,hidden_activation를 append (perceptron하나씩 지나갈때 activation지남)

self.module_list.append(hidden_activation())

interim_input_size = interim_output_size

self.fc_final = nn.Linear(interim_input_size, output_size)

self.last_forward_cache = [] #last_forward_cache 리스트 생성

def forward(self, x, apply_softmax=False): #forward함수를 지나갈때

"""MLP의 정방향 계산

매개변수:

x_in (torch.Tensor): 입력 데이터 텐서

x_in.shape는 (batch, input_dim)입니다.

apply_softmax (bool): 소프트맥스 함수를 위한 플래그

크로스 엔트로피 손실을 사용하려면 반드시 False로 지정해야 합니다

반환값:

결과 텐서. tensor.shape는 (batch, output_dim)입니다.

"""

self.last_forward_cache = []

self.last_forward_cache.append(x.to("cpu").numpy()) #append

for module in self.module_list: #hidden layer하나씩 넣는다.

x = module(x)

self.last_forward_cache.append(x.to("cpu").data.numpy()) #forward함수 지나갈때마다 last_forward_cache에 x값을 append (to("cpu").data.numpy()는 gpu에 tensor상태를 cpu에 옮겨주겠다)(gpu는 연산할때만 사용, cpu는 값만 확인할때)

output = self.fc_final(x)

self.last_forward_cache.append(output.to("cpu").data.numpy())

if apply_softmax:

output = F.softmax(output, dim=1)

return output

표현 분석

3개 층을 가진 다층 퍼셉트론 훈련하기

input_size = 2 #2는 맞음, data가 2차원이라서

output_size = len(set(LABELS)) #output도 2 , 왜나면 0아니면 1을 맞춰야되서

num_hidden_layers = 2 # hidden layer 2개짜리

hidden_size = 2

seed = 399

torch.manual_seed(seed)

torch.cuda.manual_seed_all(seed)

np.random.seed(seed)

mlp3 = MultilayerPerceptron(input_size=input_size, # num_hidden_layers = 2

hidden_size=hidden_size, #표현분석을 하기위해 2로 지정, 8차원같은건 볼수 없어서 중간층인 2차원을 지정함

num_hidden_layers=num_hidden_layers,

output_size=output_size)

print(mlp3)

batch_size = 1000

perceptron 하나 지나면 값이 하나만 나옴 (스칼라)

2개면 값2개 = output dimension

dimension는 몇차원인지

vector 는 1,2같은 값을 갖고있는것

'pytorch를 이용한 자연어입문' 카테고리의 다른 글

7_3_Model1_Unconditioned_Surname_Generation.ipynb (0)	2024.05.31
6-Surname_Classification_with_RNNs.ipynb (0)	2024.05.24
5-2_Continuous_Bag_of_Words_CBOW.ipynb_단어와 타입 임베딩 (0)	2024.05.10
4-3_Classifying_Surnames_with_a_CNN.ipynb (1)	2024.04.20
4-2 Classifying_Surnames_with_an_MLP.ipynb_ 다층 퍼셉트론으로 성씨 분류하기 - overfitting 방지법 (데이터 늘리기, Early stopping(validation data), Weight Decay(가중치 감쇠), Dropout, Batch Normalization (0)	2024.04.19

ABOUT ME

자연어(NLP) 자연어(NLP)

SUMMARY

XOR problem은 perceptron으로 풀수 없음을 실험으로 확인함.

그래서 XOR problem을 풀려면 multi layer perceptron으로 풀수있다는것을 실험으로 확인함.

activation function을 쓰는 이유

-> 선형변형 여러번은 선형변환 한번과 동일해서 , 중간에 비선형을 넣어줘야 데이터변환의 의미가 있다.

-> MLP는 activation이 없으면 mlp로 아무리 쌓아봤자 선형변환 한번한것과 동일해서 MLP가 Perceptron이 되버림

Perceptron 과 MLP의 차이

초기 데이터 그래프

모델 정의

퍼셉트론 훈련

퍼셉트론 훈련

결과 시각화 함수

학습시도

학습시도

결과 시각화 함수

2개 층을 가진 다층 퍼셉트론 훈련하기

3개 층을 가진 다층 퍼셉트론 훈련하기

모델 정의

모델 정의

표현 분석

3개 층을 가진 다층 퍼셉트론 훈련하기

'pytorch를 이용한 자연어입문' 카테고리의 다른 글

티스토리툴바

ABOUT ME

SUMMARY

XOR problem은 perceptron으로 풀수 없음을 실험으로 확인함.

그래서 XOR problem을 풀려면 multi layer perceptron으로 풀수있다는것을 실험으로 확인함.

activation function을 쓰는 이유

-> 선형변형 여러번은 선형변환 한번과 동일해서 , 중간에 비선형을 넣어줘야 데이터변환의 의미가 있다.

-> MLP는 activation이 없으면 mlp로 아무리 쌓아봤자 선형변환 한번한것과 동일해서 MLP가 Perceptron이 되버림

Perceptron 과 MLP의 차이

초기 데이터 그래프

모델 정의

퍼셉트론 훈련

퍼셉트론 훈련

결과 시각화 함수

학습시도

학습시도

결과 시각화 함수

2개 층을 가진 다층 퍼셉트론 훈련하기

3개 층을 가진 다층 퍼셉트론 훈련하기

모델 정의

모델 정의

표현 분석

3개 층을 가진 다층 퍼셉트론 훈련하기

'pytorch를 이용한 자연어입문' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바