のねのBlog

パソコンの問題や、ソフトウェアの開発で起きた問題など書いていきます。よろしくお願いします^^。

ResourceExhaustedError

Batch size=256のとき、エラーになった。
Batch size=128のとき、エラー。
Batch size= 64のとき、ワーニング。
64のとき、progress epoch 7 step 1 image/sec 15.1 remaining 563m 9時間ちょい
f:id:none53:20190209095436p:plain

ResourceExhaustedError (see above for traceback): 
OOM when allocating tensor with shape[256,512,31,31] 
and type float on /job:localhost/replica:0/task:0/device:GPU:0 
by allocator GPU_0_bfc
[[node real_discriminator/discriminator/layer_4/lrelu/Abs (defined at pix2pix_tf/pix2pix.py:128) ]]
Hint: If you want to see a list of allocated tensors when OOM happens,
 add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
128のとき

2019-02-09 12:42:31.421918: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Sum Total of in-use chunks: 8.52GiB
2019-02-09 12:42:31.421935: I tensorflow/core/common_runtime/bfc_allocator.cc:647] Stats: 
Limit:                 11276822119
InUse:                  9153840128
MaxInUse:              10018224640
NumAllocs:                    2861
MaxAllocSize:           2685838848
aspect_ratio = 1.0
batch_size = 256
beta1 = 0.5
checkpoint = None
display_freq = 200
flip = False
gan_weight = 1.0
input_dir = data/train
l1_weight = 100.0
lab_colorization = False
lr = 0.0002
max_epochs = 1000
max_steps = None
mode = train
ndf = 64
ngf = 64
output_dir = log_train/sub_20190209_0932
output_filetype = png
progress_freq = 50
save_freq = 1000
scale_size = 286
seed = 1281510929
separable_conv = False
summary_freq = 100
trace_freq = 0
which_direction = AtoB
続きを読む