Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QUESTION (or Issue in documentation): X and Y arrays for trainr function #29

Open
chaitanyabjoshi opened this issue Sep 18, 2017 · 7 comments

Comments

@chaitanyabjoshi
Copy link

The documentation for functoin trainr of the package says

Y array of output values, dim 1: samples (must be equal to dim 1 of X), dim 2: time (must be equal to dim 2 of X), dim 3: variables (could be 1 or more, if a matrix, will be coerce to array)
X array of input values, dim 1: samples, dim 2: time, dim 3: variables (could be 1 or more, if a matrix, will be coerce to array)

What is exactly samples, time variables in dimensions? Is it a consideration for time series? How can I use my existing time series data for prediction?

@franciszmy
Copy link

same question here. It's really confused to use, could you give more examples of using this package

@faltinl
Copy link

faltinl commented Dec 3, 2017

Here is an example which unfortunately does NOT work: calling

model <- trainr(Y=Y(a:dim1, 1), X=(train:dim1, 1, 2:dim3) ... seq_to_seq_unsync=T, ...)

(i.e. dim2=1 for both X and Y as required, and 0<train<<dim1=1000, dim3=20) produces the error message

Error in store[, 1:dim(X)[2], ] = X : incorrect number of subscripts

which is completely unexplainable for me.

Addendum: As can be seen, I have a data set of 1000 samples with 20 elements each in X and I want to train a 20-to-1 network on (1000-train+1) elements in order to classify the sets into 3 classes, defined by target values in Y.

@starmessage
Copy link

Same question here.
I have a dataset in a dataframe with historical stock data. The dataframe columns are:
openPercent, highPercent, lowPercent, closePercent, volumeNormalized, buySignal, sellSignal.
The last two columns are to be used as outputs.
All my attempts to feed the trainr function failed.
Please provide more information on the expected format for X and Y.

@faltinl
Copy link

faltinl commented Jan 15, 2018

Hi-
in the meantime I have changed from rnn-package, which is not supported, to KerasR. It's an awfully tedious procedure and specifying tensor dimensions within a network etc. is not at all straightforward, but finally opens up much more flexibility and programming options, so I really recommend it. Perhaps the single most important detail is NOT to install KerasR from CRAN but from github development tools, see https://keras.rstudio.com/ for more details. This will get you the most recent version. The official version from CRAN I happened to install in the first place did not contain the most recent versions of one of the sub-packages, leading to mysterious error messages...

@ulsmanikanta
Copy link

Hi Fatini

Are you able to work with your dataset of 1000 samples having 20 to 1 network using RNN package?

@faltinl
Copy link

faltinl commented May 12, 2018

No, sorry. I don't use this packege any more, as I explained in my post above.

@DimitriF
Copy link
Contributor

To compare with the documentation of keras (which I am now also using):
"
Input shapes

3D tensor with shape (batch_size, timesteps, input_dim), (Optional) 2D tensors with shape (batch_size, output_dim).
"

Similarly for the rnn package, you cannot train the model with formula approach, i.e. x and y must be supplied and their dimension must make sens: (sample, time steps, variable).

  • the number of observation this is the number of sample you have in your dataset, same as for classical training with other algorithm in R
  • time step is time step there is no ambiguity
  • variable is input_dim in keras, i.e. the number of input/output unit.

What is important to understand is that the network will see a 3D shape and not a 2D as for classical modeling in R so you must think in 3D. Being comfortable with the dimension in your dataset and how they make sens for what you want your neural network to do is mandatory to train it. As @faltinl mentioned, it is the same in keras where you need to specify the tensor dimension, in the rnn package, we tried to infer it from the inputs and put warning when mismatch are found. It is still not perfect though and more documentation could help.

In case of @faltinl example with dim2=1, it will means there is only one time step which is not what you want to do if you used rnn. the error is not catch, thus the useless error message.

In case of @starmessage dataset, I believe you have only one observation with 7 variable in input and 2 in output. If I assume you have 1000 row in such dataframe, the X dimension will be c(1,1000,7) and Y dimension c(1,1000,2). The function array, aperm and dim are very useful for re-dimensioning. Not entirely sure we tried it though and R drop dimension of value 1 when subsetted if drop=F is not set...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants