Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connecting Conv1d layer with LSTM Layer. #1052

Open
svs11 opened this issue Aug 13, 2024 · 3 comments
Open

Connecting Conv1d layer with LSTM Layer. #1052

svs11 opened this issue Aug 13, 2024 · 3 comments
Labels

Comments

@svs11
Copy link

svs11 commented Aug 13, 2024

I’m afraid that hls4ml is not properly flattening the tensor between a conv1d layer and an LSTM layer.
For the network generated from the following Keras code:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, MaxPooling1D, LSTM, Flatten, Dense,TimeDistributed
def create_model():
model = Sequential()
model.add(Conv1D(8, 3, padding='same', activation='relu', input_shape=(16, 1)))
model.add(MaxPooling1D(pool_size=2, strides=2, padding='same'))
model.add(Conv1D(16, 3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2, strides=2, padding='same'))
model.add(Conv1D(32, 3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2, strides=2, padding='same'))

model.add(LSTM(8, return_sequences=True))
model.add(LSTM(8,return_sequences=True))

model.add(Dense(1)) 


model_json = model.to_json()
with open("lstm_CNN_model.json", "w") as json_file:
    json_file.write(model_json)
model.save("lstm_CNN.h5")
print("Model saved to lstm_CNN_model.json and lstm_CNN.h5.")
return model

Create the model

model = create_model()

I see the following generated CPP code:

#include

#include "network_64_4_64_2_32_2_32_ru.h"
#include "parameters.h"

void network_64_4_64_2_32_2_32_ru(
hls::stream<input_t> &conv1d_input,
hls::stream<result_t> &layer15_out
) {

// hls-fpga-machine-learning insert IO
#pragma HLS INTERFACE axis port=conv1d_input,layer15_out 
#pragma HLS DATAFLOW 

#ifndef SYNTHESIS
static bool loaded_weights = false;
if (!loaded_weights) {
// hls-fpga-machine-learning insert load weights
nnet::load_weights_from_txt<model_default_t, 24>(w2, "w2.txt");
nnet::load_weights_from_txt<model_default_t, 8>(b2, "b2.txt");
nnet::load_weights_from_txt<model_default_t, 384>(w5, "w5.txt");
nnet::load_weights_from_txt<model_default_t, 16>(b5, "b5.txt");
nnet::load_weights_from_txt<model_default_t, 1536>(w8, "w8.txt");
nnet::load_weights_from_txt<model_default_t, 32>(b8, "b8.txt");
nnet::load_weights_from_txt<model_default_t, 1024>(w11, "w11.txt");
nnet::load_weights_from_txt<model_default_t, 256>(wr11, "wr11.txt");
nnet::load_weights_from_txt<model_default_t, 32>(b11, "b11.txt");
nnet::load_weights_from_txt<model_default_t, 32>(br11, "br11.txt");
nnet::load_weights_from_txt<model_default_t, 256>(w12, "w12.txt");
nnet::load_weights_from_txt<model_default_t, 256>(wr12, "wr12.txt");
nnet::load_weights_from_txt<model_default_t, 32>(b12, "b12.txt");
nnet::load_weights_from_txt<model_default_t, 32>(br12, "br12.txt");
nnet::load_weights_from_txt<model_default_t, 8>(w15, "w15.txt");
nnet::load_weights_from_txt<model_default_t, 1>(b15, "b15.txt");
loaded_weights = true;
}
#endif

// ****************************************
// NETWORK INSTANTIATION
// ****************************************

// hls-fpga-machine-learning insert layers

hls::stream<layer16_t> layer16_out("layer16_out");
#pragma HLS STREAM variable=layer16_out depth=18
nnet::zeropad1d_cl<input_t, layer16_t, config16>(conv1d_input, layer16_out); // zp1d_conv1d

hls::stream<layer2_t> layer2_out("layer2_out");
#pragma HLS STREAM variable=layer2_out depth=16
nnet::conv_1d_cl<layer16_t, layer2_t, config2>(layer16_out, layer2_out, w2, b2); // conv1d

hls::stream<layer3_t> layer3_out("layer3_out");
#pragma HLS STREAM variable=layer3_out depth=16
nnet::relu<layer2_t, layer3_t, relu_config3>(layer2_out, layer3_out); // conv1d_relu

hls::stream<layer4_t> layer4_out("layer4_out");
#pragma HLS STREAM variable=layer4_out depth=8
nnet::pooling1d_cl<layer3_t, layer4_t, config4>(layer3_out, layer4_out); // max_pooling1d

hls::stream<layer17_t> layer17_out("layer17_out");
#pragma HLS STREAM variable=layer17_out depth=10
nnet::zeropad1d_cl<layer4_t, layer17_t, config17>(layer4_out, layer17_out); // zp1d_conv1d_1

hls::stream<layer5_t> layer5_out("layer5_out");
#pragma HLS STREAM variable=layer5_out depth=8
nnet::conv_1d_cl<layer17_t, layer5_t, config5>(layer17_out, layer5_out, w5, b5); // conv1d_1

hls::stream<layer6_t> layer6_out("layer6_out");
#pragma HLS STREAM variable=layer6_out depth=8
nnet::relu<layer5_t, layer6_t, relu_config6>(layer5_out, layer6_out); // conv1d_1_relu

hls::stream<layer7_t> layer7_out("layer7_out");
#pragma HLS STREAM variable=layer7_out depth=4
nnet::pooling1d_cl<layer6_t, layer7_t, config7>(layer6_out, layer7_out); // max_pooling1d_1

hls::stream<layer18_t> layer18_out("layer18_out");
#pragma HLS STREAM variable=layer18_out depth=6
nnet::zeropad1d_cl<layer7_t, layer18_t, config18>(layer7_out, layer18_out); // zp1d_conv1d_2

hls::stream<layer8_t> layer8_out("layer8_out");
#pragma HLS STREAM variable=layer8_out depth=4
nnet::conv_1d_cl<layer18_t, layer8_t, config8>(layer18_out, layer8_out, w8, b8); // conv1d_2

hls::stream<layer9_t> layer9_out("layer9_out");
#pragma HLS STREAM variable=layer9_out depth=4
nnet::relu<layer8_t, layer9_t, relu_config9>(layer8_out, layer9_out); // conv1d_2_relu

hls::stream<layer10_t> layer10_out("layer10_out");
#pragma HLS STREAM variable=layer10_out depth=2
nnet::pooling1d_cl<layer9_t, layer10_t, config10>(layer9_out, layer10_out); // max_pooling1d_2

hls::stream<layer11_t> layer11_out("layer11_out");
#pragma HLS STREAM variable=layer11_out depth=2
nnet::lstm_stack<layer10_t, layer11_t, config11>(layer10_out, layer11_out, w11, wr11, b11, br11); // lstm

hls::stream<layer12_t> layer12_out("layer12_out");
#pragma HLS STREAM variable=layer12_out depth=2
nnet::lstm_stack<layer11_t, layer12_t, config12>(layer11_out, layer12_out, w12, wr12, b12, br12); // lstm_1

nnet::pointwise_conv_1d_cl<layer12_t, result_t, config15>(layer12_out, layer15_out, w15, b15); // dense

}

The number of weights on the first LSTM layer is expected to be the number of outputs from the last conv1d+pooling layer, which is 2 samples * 32 channels x 4 gates * 8 states = 2048, but is instead shown in the generated CPP as 1024.

How should I be connecting a pooling layer to an LSTM layer to guarantee that all outputs are conveyed?

@svs11 svs11 added the bug label Aug 13, 2024
@JanFSchulte
Copy link
Contributor

Hi!

I had a look at your model, and just printing the trainable weights for the first LSTM layer, I see

[<tf.Variable 'lstm/lstm_cell/kernel:0' shape=(32, 32) dtype=float32, numpy=
array([[ 0.01692209, -0.22750317, -0.17461008, ..., -0.1579345 ,
        -0.21596879, -0.18585742],
       [-0.00184596,  0.1575419 ,  0.20252898, ...,  0.14971459,
         0.17116585, -0.04111549],
       [-0.24567568,  0.01723912, -0.15928173, ..., -0.20553797,
         0.22376046, -0.24837291],
       ...,
       [-0.20332155,  0.06006312,  0.06557494, ...,  0.10808015,
        -0.2113491 , -0.05491558],
       [-0.27010858,  0.10658553, -0.13689941, ...,  0.2040728 ,
        -0.14297459,  0.2779071 ],
       [-0.29793295, -0.13058276,  0.01223576, ..., -0.02761602,
        -0.27836597,  0.1290856 ]], dtype=float32)>,

for the kernel weights, which are of size 32 x 32 = 1024. So hls4ml is correctly inferring the size of the weight tensor. I think there is a misunderstanding of the expected size here, the number of samples does not impact the size of the weight tensors, see for example https://medium.com/analytics-vidhya/demystifying-lstm-weights-and-biases-dimensions-c47dbd39b30a.

@svs11
Copy link
Author

svs11 commented Aug 14, 2024

Thank you for your reply!

After reading the page linked in your response, I see that my terminology might be incorrect. I'm assuming--or rather I desire--that the CNN+pooling layer is providing a "sequence length" of 1 and an embedded dimension of samples * channels. We're building latency-constrained models and we can't afford to invoke the LSTM equations multiple times per forward pass. In other words, we want to flatten the tensor feeding the LSTM layer into a single embedding vector. Is this possible?

Thank you!
-Suyash

@JanFSchulte
Copy link
Contributor

Hi Suyash,

I don't think something like this is supported in hls4ml at the moment. AFAIK, our implementation keeps the structure of iterating over the time steps to calculate the results. I presume it would be possible to add an optional version that flattens the inputs (Flatten layers are supported) and processes the full calculation in one go. People that are more expert on the implementation on LSTM in hls4ml can correct me, but I think this would require some development.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants