You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we only support features to be ndarrays, a single feature of shape (batch, ) and several features together of shape (batch, n). But if users define the model to take a list of tensors as input, we can't support this.
E.g.
def forward(self, f1, f2, f3) # where f1 is a list of torch tensors, f2 and f3 are a single tensor separately
For this case in trainingh_operator.py train_batch, we need to modify to:
Basically user definition of model forward and their own dataset can be quite flexible (even their code is not well enough written), but whenever there is some flexibility, we will have issue in detecting the correct behavior.
Also for the output, may have some postprocess steps in user code, e.g.
y1_pred = out[0].squeeze()
If output is a single list and user takes the first element in their own train loop, then in our code we will treat it as multiple outputs, which is wrong.
The text was updated successfully, but these errors were encountered:
The behavior for PyTorch Dataset and DataLoader is as follows:
If getitem in Dataset returns a list of single features, then DataLoader will return a list of 1D torch tensors.
If getitem in Dataset returns directly a 1D torch tensor to represent a list of single features, then DataLoader will return one 2D torch tensor.
Probably one straightforward way to simulate such behavior is to support nested list in feature_cols? If the entry of a feature_cols is a list, then we return a list of ndarrays.
I think we should take a list of ndarray as input (e.g., for xshards)? @yushan111 @sgwhat
Sorry that I may not catch it. xshards already supports a list of ndarray as input, e.g. estimators take a dictionary of xshards as input: {'x': features, 'y': labels}, where features/labels can be a numpy array or a list of numpy arrays. In the above example, features could be [f1, f2, f3], where f1 is an ndarray of shape (batch ,10), f2 and f3 of shape (batch, )
From studying vz-recommenders code:
E.g.
For this case in trainingh_operator.py train_batch, we need to modify to:
Basically user definition of model forward and their own dataset can be quite flexible (even their code is not well enough written), but whenever there is some flexibility, we will have issue in detecting the correct behavior.
If output is a single list and user takes the first element in their own train loop, then in our code we will treat it as multiple outputs, which is wrong.
The text was updated successfully, but these errors were encountered: