Getting hl_matrix_classification_error if using trainer_config settings.batch_size > 16 #44

F0REacH · 2016-09-06T19:03:02Z

Can't run train.sh if trainer_config.py settings batch_size > 16. Getting following error:
train.log:

./train.sh
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/utils/Util.cpp:144] commandline: /opt/paddle/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.py --save_dir=./model_output --job=train --use_gpu=true --trainer_count=1 --num_passes=100000 --log_period=15 --dot_period=1 --show_parameter_stats_period=100 --test_all_data_in_one_period=1 --saving_period=100 --test_period=100
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/utils/Util.cpp:113] Calling runInitFunctions
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/utils/Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-06 20:10:47,439 networks.py:1122] The input order is [input, label]
[INFO 2016-09-06 20:10:47,439 networks.py:1129] The output order is [cost_0]
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/trainer/Trainer.cpp:169] trainer mode: Normal
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/dataproviders/PyDataProvider2.cpp:219] loading dataprovider dataprovider::process
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/dataproviders/PyDataProvider2.cpp:219] loading dataprovider dataprovider::process
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/gradientmachines/GradientMachine.cpp:134] Initing parameters..
I /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/gserver/gradientmachines/GradientMachine.cpp:141] Init parameters done.
F /home/user/SOFT/BAIDU/PADDLE/Paddle/paddle/cuda/src/hl_cuda_matrix.cu:322] 0x933ba8[hl_matrix_classification_error] CUDA error: invalid configuration argument
/opt/paddle/bin/paddle: line 46: 10921 Aborted (core dumped) ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}

I'm trying to solve clasification task with LSTM model. My dataset is 180 examples, each is roughly 5000 timesteps (variable length). Each timestep is len=24 float vector labeled with int label in range [0, 132].

settings.input_types = [
    dense_vector_sequence(settings.inputSize),
    integer_value_sequence(settings.vocabSize)]

Smaller size batches eg. 12 give no error, but my data is not very redundant, so gradients become unstable. My setup is 980ti (6Gb VRAM) memory usage for batch_size=12 is ~ 20%.
trainer_config.py:
settings( batch_size=24, learning_rate=0.001, learning_method=RMSPropOptimizer() ) stacked_lstm_net(input_dim=24, class_dim=133, hid_dim=24, stacked_num=7, is_predict=is_predict)

stacked_lstm_net
# simple sequential lstm

lstm_act = TanhActivation()
fc_act = LinearActivation()

data = data_layer("input", size=input_dim)

fc1 = fc_layer(input=data, size=hid_dim, act=fc_act)
lstm1 = lstmemory(input=fc1, act=lstm_act)

inputs = [fc1, lstm1]
for i in range(2, stacked_num + 1):
    fc = fc_layer(input=inputs, size=hid_dim, act=fc_act)
    lstm = lstmemory(input=fc, act=lstm_act)
    inputs = [fc, lstm]

output = fc_layer(input=[inputs[0], inputs[1]], size=class_dim,
                  act=SoftmaxActivation())

if is_predict:
    outputs(output)
else:
    outputs(classification_cost(input=output, label=data_layer('label', class_dim)))

Could you please explain this error or point me how to debug such issue?

The text was updated successfully, but these errors were encountered:

hedaoyuan · 2016-09-07T02:44:41Z

There is a bug in hl_matrix_classification_error when the number of samples to be classified more than 65536.The data each is roughly 5000 timesteps, and if batch_size = 16, than the number of samples passed into this API is 5000x16 (> 65536).
We will fix this bug as soon. Before this may be you can reduce the size of the batch_size to ensure the program runs correctly.

F0REacH · 2016-09-08T01:30:50Z

@hedaoyuan Thanks for clarifying. Looks like I've found another bug here: Issue #46 .

F0REacH · 2016-09-08T11:24:37Z

Looks like Pull #48 fixed this issue. Thanks for the quick fix

refine details of bmn model

* unifiy constant instruction to support N-D array * update template typename

* fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R

* 1. add interface for fft; 2. add data type predicate; 3. fix paddle.roll. * add fft c2c cufft kernel * implement argument checking & op calling parts for fft_c2c and fftn_c2c * add operator and opmaker definitions * only register float and double for cpu. * add common code for implementing FFT, add pocketfft as a dependency * add fft c2c cufft kernel function * fix bugs in python interface * add support for c2r, r2c operators, op makers, kernels and kernel functors. * test and fix bugs * 1. fft_c2c function: add support for onesided=False; 2. add complex<float>, complex<double> support for concat and flip. * 1. fft: fix python api bugs; 2. shape_op: add support for complex data types. * fft c2c cufft kernel done with complie and link * fix shape_op, add mkl placeholder * remove mkl * complete fft c2c in gpu * 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft; 2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation. * complete fft c2c on gpu in ND * complete fft c2c on gpu in ND * complete fft c2c backward in ND * fix MKL-based implementation * Add frame op and CPU/GPU kernels. * Add frame op forward unittest. * Add frame op forward unittest. * Remove axis parameter in FrameFunctor. * Add frame op grad CPU/GPU kernels and unittest. * Add frame op grad CPU/GPU kernels and unittest. * Update doc string. * Update after review and remove librosa requirement in unittest. * Update grad kernel. * add fft_c2r op * Remove data allocation in TransCompute function. * add fft r2c onesided with cpu(pocketfft/mkl) and gpu * last fft c2r functor * fix C2R and R2C for cufft, becase the direction is not an option in these cases. * add fft r2c onesided with cpu(pocketfft/mkl) and gpu * fix bugs in python APIs * fix fft_c2r grad kernal * fix bugs in python APIs * add cuda fft c2r grad kernal functor * clean code * fix fft_c2r python API * fill fft r2c result with conjugate symmetry (#19) fill fft r2c result with conjugate symmetry * add placeholder for unittests (#24) * simple parameterize test function by auto generate test case from parm list (#25) * miscellaneous fixes for python APIs (#26) * add placeholder for unittests * resize fft inputs before computation is n or s is provided. * add complex kernels for pad and pad_grad * simplify argument checking. * add type promotion * add int to float or complex promotion * fix output data type for static mode * fix fft's input dtype dispatch, import fft to paddle * fix typos in axes checking (#27) * fix typos in axes checking * fix argument checking (#28) * fix argument checking * Add C2R Python layer normal and abnormal use cases (#29) * documents and single case * test c2r case * New C2R Python layer normal and exception use cases * complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (#30) * Documentation of the common interfaces of c2r and c2c (#31) * Documentation of the common interfaces of c2r and c2c * clean c++ code (#32) * clean code * Add numpy-based implementation of spectral ops (#33) * add numpy reference implementation of spectral ops * Add fft_c2r numpy based implementation for unittest. (#34) * add fft_c2r numpy implementation * Add deframe op and stft/istft api. (#23) * Add frame api * Add deframe op and kernels. * Add stft and istft apis. * Add deframe api. Update stft and istft apis. * Fix bug in frame_from_librosa function when input dims >= 3 * Rename deframe to overlap_add. * Update istft. * Update after code review. * Add overlap_add op and stft/istft api unittest (#35) * Add overlap_add op unittest. * Register complex kernels of squeeze/unsquuze op. * Add stft/istft api unittest. * Add unittest for fft helper functions (#36) * add unittests for fft helper functions. add complex kernel for roll op. * complete static graph unittest for all public api (#37) * Unittest of op with FFT C2C, C2R and r2c added (#38) * documents and single case * test c2r case * New C2R Python layer normal and exception use cases * Documentation of the common interfaces of c2r and c2c * Unittest of op with FFT C2C, C2R and r2c added Co-authored-by: lijiaqi <[email protected]> * add fft related options to CMakeLists.txt * fix typos and clean code (#39) * fix invisible character in mkl branch and fix error in error message * clean code: remove docstring from unittest for signal.py. * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (#40) * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. * fix CI Errors: numpy dtype comparison, thrust when cuda is not available (#41) 1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. 2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r; 3. fix unittest to catch UnImplementedError and RuntimeError; 4. fix compile error by avoid using thrust when cuda is not available. 5. fix sample code, use paddle.fft instead of paddle.tensor.fft * remove inclusion of thrust, add __all__ list for fft (#42) * Add api doc and update unittest. (#43) * Add doc strings. * Update overlap_add op unittest * fix MKL-based FFT implementation (#44) * fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R * remove code for debug (#45) * use dynload for cufft (#46) * use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms. * add complex support for fill_zeros_like * use dynload for cufft * Update doc and unittest. (#47) * Add doc of frame op and overlap_add op. * Update unittest. * use dynload for cufft (#48) 1. use dynload for cufft 2. fix unittest; 3. temporarily disable Rocm. * fix conflicts and merge upstream (#49) fix conflicts and merge upstream * fix compile error: only link dyload_cuda when cuda is available (#50) * fix compile error: only link dyload_cuda when cuda is available * fix dynload for cufft on windows (#51) 1. fix dynload for cufft on windows; 2. fix unittests. * add NOMINMAX to compile on windows (#52) add NOMINMAX to compile on windows * explicitly specify capture mode for lambdas (#55) explicitly specify capture mode for lambdas * fix fft sample (#53) * fix fft sample * update scipy and numpy version for unittests of fft (#56) update scipy and numpy version for unittests of fft * Add static graph unittests of frame and overlap_add api. (#57) * Remove cache of cuFFT & Disable ONEMKL (#59) 1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm 2. remove cache of cufft plans; 3. enhance error checking. 4. default WITH_ONEMKL to OFF Co-authored-by: jeff41404 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: KP <[email protected]> Co-authored-by: lijiaqi <[email protected]> Co-authored-by: Xiaoxu Chen <[email protected]> Co-authored-by: lijiaqi0612 <[email protected]>

* 1. add interface for fft; 2. add data type predicate; 3. fix paddle.roll. * add fft c2c cufft kernel * implement argument checking & op calling parts for fft_c2c and fftn_c2c * add operator and opmaker definitions * only register float and double for cpu. * add common code for implementing FFT, add pocketfft as a dependency * add fft c2c cufft kernel function * fix bugs in python interface * add support for c2r, r2c operators, op makers, kernels and kernel functors. * test and fix bugs * 1. fft_c2c function: add support for onesided=False; 2. add complex<float>, complex<double> support for concat and flip. * 1. fft: fix python api bugs; 2. shape_op: add support for complex data types. * fft c2c cufft kernel done with complie and link * fix shape_op, add mkl placeholder * remove mkl * complete fft c2c in gpu * 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft; 2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation. * complete fft c2c on gpu in ND * complete fft c2c on gpu in ND * complete fft c2c backward in ND * fix MKL-based implementation * Add frame op and CPU/GPU kernels. * Add frame op forward unittest. * Add frame op forward unittest. * Remove axis parameter in FrameFunctor. * Add frame op grad CPU/GPU kernels and unittest. * Add frame op grad CPU/GPU kernels and unittest. * Update doc string. * Update after review and remove librosa requirement in unittest. * Update grad kernel. * add fft_c2r op * Remove data allocation in TransCompute function. * add fft r2c onesided with cpu(pocketfft/mkl) and gpu * last fft c2r functor * fix C2R and R2C for cufft, becase the direction is not an option in these cases. * add fft r2c onesided with cpu(pocketfft/mkl) and gpu * fix bugs in python APIs * fix fft_c2r grad kernal * fix bugs in python APIs * add cuda fft c2r grad kernal functor * clean code * fix fft_c2r python API * fill fft r2c result with conjugate symmetry (#19) fill fft r2c result with conjugate symmetry * add placeholder for unittests (#24) * simple parameterize test function by auto generate test case from parm list (#25) * miscellaneous fixes for python APIs (#26) * add placeholder for unittests * resize fft inputs before computation is n or s is provided. * add complex kernels for pad and pad_grad * simplify argument checking. * add type promotion * add int to float or complex promotion * fix output data type for static mode * fix fft's input dtype dispatch, import fft to paddle * fix typos in axes checking (#27) * fix typos in axes checking * fix argument checking (#28) * fix argument checking * Add C2R Python layer normal and abnormal use cases (#29) * documents and single case * test c2r case * New C2R Python layer normal and exception use cases * complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (PaddlePaddle#30) * Documentation of the common interfaces of c2r and c2c (PaddlePaddle#31) * Documentation of the common interfaces of c2r and c2c * clean c++ code (PaddlePaddle#32) * clean code * Add numpy-based implementation of spectral ops (PaddlePaddle#33) * add numpy reference implementation of spectral ops * Add fft_c2r numpy based implementation for unittest. (PaddlePaddle#34) * add fft_c2r numpy implementation * Add deframe op and stft/istft api. (#23) * Add frame api * Add deframe op and kernels. * Add stft and istft apis. * Add deframe api. Update stft and istft apis. * Fix bug in frame_from_librosa function when input dims >= 3 * Rename deframe to overlap_add. * Update istft. * Update after code review. * Add overlap_add op and stft/istft api unittest (PaddlePaddle#35) * Add overlap_add op unittest. * Register complex kernels of squeeze/unsquuze op. * Add stft/istft api unittest. * Add unittest for fft helper functions (PaddlePaddle#36) * add unittests for fft helper functions. add complex kernel for roll op. * complete static graph unittest for all public api (PaddlePaddle#37) * Unittest of op with FFT C2C, C2R and r2c added (PaddlePaddle#38) * documents and single case * test c2r case * New C2R Python layer normal and exception use cases * Documentation of the common interfaces of c2r and c2c * Unittest of op with FFT C2C, C2R and r2c added Co-authored-by: lijiaqi <[email protected]> * add fft related options to CMakeLists.txt * fix typos and clean code (PaddlePaddle#39) * fix invisible character in mkl branch and fix error in error message * clean code: remove docstring from unittest for signal.py. * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (PaddlePaddle#40) * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. * fix CI Errors: numpy dtype comparison, thrust when cuda is not available (PaddlePaddle#41) 1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. 2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r; 3. fix unittest to catch UnImplementedError and RuntimeError; 4. fix compile error by avoid using thrust when cuda is not available. 5. fix sample code, use paddle.fft instead of paddle.tensor.fft * remove inclusion of thrust, add __all__ list for fft (PaddlePaddle#42) * Add api doc and update unittest. (PaddlePaddle#43) * Add doc strings. * Update overlap_add op unittest * fix MKL-based FFT implementation (PaddlePaddle#44) * fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R * remove code for debug (PaddlePaddle#45) * use dynload for cufft (PaddlePaddle#46) * use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms. * add complex support for fill_zeros_like * use dynload for cufft * Update doc and unittest. (PaddlePaddle#47) * Add doc of frame op and overlap_add op. * Update unittest. * use dynload for cufft (PaddlePaddle#48) 1. use dynload for cufft 2. fix unittest; 3. temporarily disable Rocm. * fix conflicts and merge upstream (PaddlePaddle#49) fix conflicts and merge upstream * fix compile error: only link dyload_cuda when cuda is available (PaddlePaddle#50) * fix compile error: only link dyload_cuda when cuda is available * fix dynload for cufft on windows (PaddlePaddle#51) 1. fix dynload for cufft on windows; 2. fix unittests. * add NOMINMAX to compile on windows (PaddlePaddle#52) add NOMINMAX to compile on windows * explicitly specify capture mode for lambdas (PaddlePaddle#55) explicitly specify capture mode for lambdas * fix fft sample (PaddlePaddle#53) * fix fft sample * update scipy and numpy version for unittests of fft (PaddlePaddle#56) update scipy and numpy version for unittests of fft * Add static graph unittests of frame and overlap_add api. (PaddlePaddle#57) * Remove cache of cuFFT & Disable ONEMKL (PaddlePaddle#59) 1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm 2. remove cache of cufft plans; 3. enhance error checking. 4. default WITH_ONEMKL to OFF Co-authored-by: jeff41404 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: KP <[email protected]> Co-authored-by: lijiaqi <[email protected]> Co-authored-by: Xiaoxu Chen <[email protected]> Co-authored-by: lijiaqi0612 <[email protected]>

add Bind to Tensor and Buffer

[DOC] Add CXX and Python API Doc

* fix config does not take effect; test=develop * remove useless code;test=develop

* [GPUPS]Fix psgpuwrapper initialization (PaddlePaddle#44468) * Update ps_gpu_wrapper.h * Update ps_gpu_wrapper.h * Update ps_gpu_wrapper.cc * remote Optimizer base Class * remove feature value * remove featurevalue base class * fix hbm_thread_pool&pull_thread_pool

* refine code * test * fix apps * update readme * rm unused code * fix apps output when input is image * clean code * update requirements.txt

* [GPUPS]Fix psgpuwrapper initialization (PaddlePaddle#44468) * Update ps_gpu_wrapper.h * Update ps_gpu_wrapper.h * Update ps_gpu_wrapper.cc * remote Optimizer base Class * remove feature value * remove featurevalue base class * fix hbm_thread_pool&pull_thread_pool

1. add trade weight support 2. remove pull push template struct 3. add credit support

[MTAI-484] fix(build): optimize new files for MUSA

rename StmtIter to StmtPtr

add accum_freqs&&base match

emailweixu assigned hedaoyuan Sep 7, 2016

hedaoyuan added the Bug label Sep 7, 2016

F0REacH mentioned this issue Sep 8, 2016

Floating point exception (overflow) #46

Closed

F0REacH closed this as completed Sep 8, 2016

F0REacH mentioned this issue Sep 10, 2016

New LSTM error! #56

Closed

dzhwinter mentioned this issue Dec 10, 2017

在运行例子时报错 Illegal instruction (core dumped), #6432

Closed

qingqing01 pushed a commit to qingqing01/Paddle that referenced this issue Apr 30, 2020

Merge pull request PaddlePaddle#44 from huangjun12/refine-bmn

c54980b

refine details of bmn model

DemoMoon mentioned this issue Mar 24, 2021

oneDNN 如何能提升DeepSpeech的语音处理性能 #31838

Closed

thisjiang pushed a commit to thisjiang/Paddle that referenced this issue Sep 3, 2021

unifiy constant instruction to support N-D array (PaddlePaddle#44)

e7f279f

* unifiy constant instruction to support N-D array * update template typename

KPatr1ck pushed a commit to KPatr1ck/Paddle that referenced this issue Sep 16, 2021

fix MKL-based FFT implementation (PaddlePaddle#44)

b3d5f13

* fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R

thisjiang pushed a commit to thisjiang/Paddle that referenced this issue Oct 28, 2021

Merge pull request PaddlePaddle#44 from Superjomn/refine/buffer

d9a1874

add Bind to Tensor and Buffer

gglin001 pushed a commit to graphcore/Paddle-fork that referenced this issue Dec 8, 2021

fix ipu_backend setcope with Paddle inference (PaddlePaddle#44)

ee3af7d

wangxicoding pushed a commit to wangxicoding/Paddle that referenced this issue Dec 9, 2021

fix save steps arg bug (PaddlePaddle#44)

dd98f7f

paddle-bot-old bot referenced this issue Feb 24, 2022

remove a fault.

b7f2705

zhoutianzi666 pushed a commit to zhoutianzi666/Paddle that referenced this issue May 23, 2022

Merge pull request PaddlePaddle#44 from qili93/add_api

ab80d89

[DOC] Add CXX and Python API Doc

danleifeng added a commit to danleifeng/Paddle that referenced this issue Jun 22, 2022

fix config (PaddlePaddle#44)

d5011c7

* fix config does not take effect; test=develop * remove useless code;test=develop

AnnaTrainingG pushed a commit to AnnaTrainingG/Paddle that referenced this issue Sep 19, 2022

Refine code (PaddlePaddle#44)

3e4a59f

* refine code * test * fix apps * update readme * rm unused code * fix apps output when input is image * clean code * update requirements.txt

jack603047588 referenced this issue in jack603047588/Paddle Nov 9, 2022

Merge pull request #44 from qingshui/paddlebox

9a5cd90

1. add trade weight support 2. remove pull push template struct 3. add credit support

lizexu123 pushed a commit to lizexu123/Paddle that referenced this issue Feb 23, 2024

Remove redundant code in paddleslim/prune (PaddlePaddle#44)

d95da67

hanhaowen-mt pushed a commit to hanhaowen-mt/Paddle that referenced this issue Feb 29, 2024

Merge pull request PaddlePaddle#44 from mthreads/new_files

db90713

[MTAI-484] fix(build): optimize new files for MUSA

kivenyangming mentioned this issue Mar 5, 2024

根据官网教程在因特尔CPU以及昆仑2代芯片编译安装XPU出现 Built target extern_mkldnn 错误 #62398

Closed

feifei-111 pushed a commit to feifei-111/Paddle that referenced this issue Mar 10, 2024

Merge pull request PaddlePaddle#44 from tc20042008/xk-cinn-trivalop-fuse

6be51d0

rename StmtIter to StmtPtr

WAYKEN-TSE pushed a commit to WAYKEN-TSE/Paddle that referenced this issue Dec 6, 2024

Merge pull request PaddlePaddle#44 from zhiboniu/develop

03d5ec8

add accum_freqs&&base match

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting hl_matrix_classification_error if using trainer_config settings.batch_size > 16 #44

Getting hl_matrix_classification_error if using trainer_config settings.batch_size > 16 #44

F0REacH commented Sep 6, 2016

hedaoyuan commented Sep 7, 2016

F0REacH commented Sep 8, 2016

F0REacH commented Sep 8, 2016

Getting hl_matrix_classification_error if using trainer_config settings.batch_size > 16 #44

Getting hl_matrix_classification_error if using trainer_config settings.batch_size > 16 #44

Comments

F0REacH commented Sep 6, 2016

hedaoyuan commented Sep 7, 2016

F0REacH commented Sep 8, 2016

F0REacH commented Sep 8, 2016