CS234 Assignment 2-4 solution 및 풀이
1. [코딩] q3_nature.py의 get_q_values_op를 작성하여 [mnih2015human] 논문에 나와있는 대로 deep Q-network를 만들어보자.
나머지 코드들은 linear approximation에서 짠 코드 그대로 따라갈 것이다.
python q3_nature.py 명령어로 이를 CPU로 돌려보아라. 대략 1분에서 2분정도 걸릴 것이다. [10점]
sol) nature지 아래에 실린 architecture 란에 보면 :
The first hidden layer convolves 32 filters of 8 X 8 with stride 4 with the input image and applies a rectifier nonlinearity.
The second hidden layer convolves 64 filters of 4 X 4 with stride 2, again followed by a rectifier nonlinearity
This is followed by a third convolutional layer that convolves 64 filters of 3 X 3 with stride 1 followed by a rectifier.
The final hidden layer is fully-connected and consists of 512 rectifier units.
The output layer is a fully-connected linear layer with a single output for each valid action.
라는 설명이 있습니다.
이를 대충 해석하면,
첫번째 convolutional layer은 32 filters, 8*8 kernel size, stride 4, ReLU
두번째 convolutional layer은 64 filters, 4*4 kernel size, stride 2, ReLU
세번째 convolutional layer은 64 filters, 3*3 kernel size, stride 1, ReLU
마지막 hidden layer은 fully-connected layer로 512개의 output에 ReLU,
output layer은 각각의 action에 대한 single output이 나오는 fully-connected layer이라고 합니다.
이것을 코드로 구현하면 다음과 같은 코드가 나옵니다.
class NatureQN(Linear):
"""
Implementing DeepMind's Nature paper. Here are the relevant urls.
https://storage.googleapis.com/deepmind-data/assets/papers/DeepMindNature14236Paper.pdf
https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
"""
def get_q_values_op(self, state, scope, reuse=False):
"""
Returns Q values for all actions
Args:
state: (tf tensor)
shape = (batch_size, img height, img width, nchannels)
scope: (string) scope name, that specifies if target network or not
reuse: (bool) reuse of variables in the scope
Returns:
out: (tf tensor) of shape = (batch_size, num_actions)
"""
# this information might be useful
num_actions = self.env.action_space.n
##############################################################
"""
TODO: implement the computation of Q values like in the paper
https://storage.googleapis.com/deepmind-data/assets/papers/DeepMindNature14236Paper.pdf
https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
you may find the section "model architecture" of the appendix of the
nature paper particulary useful.
store your result in out of shape = (batch_size, num_actions)
HINT:
- You may find the following functions useful:
- tf.layers.conv2d
- tf.layers.flatten
- tf.layers.dense
- Make sure to also specify the scope and reuse
"""
"""
The first hidden layer convolves 32 filters of 8 X 8 with stride 4 with the input image and applies a rectifier nonlinearity.
The second hidden layer convolves 64 filters of 4 X 4 with stride 2, again followed by a rectifier nonlinearity
This is followed by a third convolutional layer that convolves 64 filters of 3 X 3 with stride 1 followed by a rectifier.
The final hidden layer is fully-connected and consists of 512 rectifier units.
The output layer is a fully-connected linear layer with a single output for each valid action.
"""
##############################################################
################ YOUR CODE HERE - 10-15 lines ################
input = state
with tf.variable_scope(scope,reuse=reuse):
conv1 = tf.layers.conv2d(input, 32, (8, 8), strides=4, activation=tf.nn.relu, name='conv1')
conv2 = tf.layers.conv2d(conv1, 64, (4, 4), strides=2, activation=tf.nn.relu, name='conv2')
conv3 = tf.layers.conv2d(conv2, 64, (3, 3), strides=1, activation=tf.nn.relu, name='conv3')
flat = tf.layers.flatten(conv3, name='flatten')
fc = tf.layers.dense(flat, 512, activation=tf.nn.relu, name='fully-connected')
out = tf.layers.dense(fc, num_actions, name='out')
pass
##############################################################
######################## END YOUR CODE #######################
return out
(보기 좋으라고 Class 선언부부터 넣어두지만, 구현한 부분은 가장 아래 10줄 가량입니다. input = state 부터...)
2. (written 5pts) Attach the plot of scores, scores.png, from the directory results/q3 nature to your writeup. Compare this model with linear approximation. How do the final performances compare? How about the training time?
2. results/q3 nature에서 scores.png 사진을 첨부하시오. linear approximation model과 비교하시오.
성능은 어떻게 차이가 나는가? 훈련 시간은 얼마나 차이가 나는가?
일단, DQN보다 Linear Approximation이 더 빠르게 converge 하기 시작하는 모습을 볼 수 있다.
심지어는 DQN을 돌릴 때 가끔씩 reward가 4.0에서 4.1로 가지 못한 채 training이 끝나는 모습도 볼 수 있었다.
아무래도 DQN이 Linear Approximation보다 더 unstable한 방식이라 그런 것 같다.
또한, training 시간도 DQN쪽이 더 오래 걸린다.
이것을 통해, 이렇게 간단한 test environment 같은 경우는 Linear Approximation이 더 잘 작동할 수도 있다는 것을 알게 되었다.
즉, environment에 따라 서로 다른 최적의 모델이 존재한다는 것을 알 수 있다.
'인공지능 > CS234 Assignments' 카테고리의 다른 글
CS234 Assignment 2-3 solution 및 풀이 (0) | 2019.06.07 |
---|---|
CS234 Assignment 2-2 solution 및 풀이 (0) | 2019.06.07 |
CS234 Assignment 2-1 solution 및 풀이 (0) | 2019.05.29 |
CS234 Assignment 1-4 solution 및 풀이 (코드) (0) | 2019.05.29 |
CS234 Assignments 1-2 Solution 및 풀이 (0) | 2019.05.28 |