Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A quick question about prediction #9

Open
HelloWorldLTY opened this issue Apr 17, 2024 · 1 comment
Open

A quick question about prediction #9

HelloWorldLTY opened this issue Apr 17, 2024 · 1 comment

Comments

@HelloWorldLTY
Copy link

Hi, thanks for your great work. After running your training step, I tried to reproduce the prediction process:

{'n_components_list': [18211], 'd_models_list': [128], 'batch_size': 32, 'data_file': 'de_train.parquet', 'id_map_file': 'id_map.csv', 'device': 'cuda', 'seed': None, 'models_dir': 'trained_models'}
      id  A1BG  A1BG-AS1  A2M  A2M-AS1  A2MP1  A4GALT  AAAS  AACS  AAGAB  AAK1  AAMDC  ...  ZSWIM8  ZSWIM9  ZUP1  ZW10  ZWILCH  ZWINT  ZXDA  ZXDB  ZXDC  ZYG11B  ZYX  ZZEF1
0      0   0.0       0.0  0.0      0.0    0.0     0.0   0.0   0.0    0.0   0.0    0.0  ...     0.0     0.0   0.0   0.0     0.0    0.0   0.0   0.0   0.0     0.0  0.0    0.0
1      1   0.0       0.0  0.0      0.0    0.0     0.0   0.0   0.0    0.0   0.0    0.0  ...     0.0     0.0   0.0   0.0     0.0    0.0   0.0   0.0   0.0     0.0  0.0    0.0
2      2   0.0       0.0  0.0      0.0    0.0     0.0   0.0   0.0    0.0   0.0    0.0  ...     0.0     0.0   0.0   0.0     0.0    0.0   0.0   0.0   0.0     0.0  0.0    0.0
3      3   0.0       0.0  0.0      0.0    0.0     0.0   0.0   0.0    0.0   0.0    0.0  ...     0.0     0.0   0.0   0.0     0.0    0.0   0.0   0.0   0.0     0.0  0.0    0.0
4      4   0.0       0.0  0.0      0.0    0.0     0.0   0.0   0.0    0.0   0.0    0.0  ...     0.0     0.0   0.0   0.0     0.0    0.0   0.0   0.0   0.0     0.0  0.0    0.0
..   ...   ...       ...  ...      ...    ...     ...   ...   ...    ...   ...    ...  ...     ...     ...   ...   ...     ...    ...   ...   ...   ...     ...  ...    ...
250  250   0.0       0.0  0.0      0.0    0.0     0.0   0.0   0.0    0.0   0.0    0.0  ...     0.0     0.0   0.0   0.0     0.0    0.0   0.0   0.0   0.0     0.0  0.0    0.0
251  251   0.0       0.0  0.0      0.0    0.0     0.0   0.0   0.0    0.0   0.0    0.0  ...     0.0     0.0   0.0   0.0     0.0    0.0   0.0   0.0   0.0     0.0  0.0    0.0
252  252   0.0       0.0  0.0      0.0    0.0     0.0   0.0   0.0    0.0   0.0    0.0  ...     0.0     0.0   0.0   0.0     0.0    0.0   0.0   0.0   0.0     0.0  0.0    0.0
253  253   0.0       0.0  0.0      0.0    0.0     0.0   0.0   0.0    0.0   0.0    0.0  ...     0.0     0.0   0.0   0.0     0.0    0.0   0.0   0.0   0.0     0.0  0.0    0.0
254  254   0.0       0.0  0.0      0.0    0.0     0.0   0.0   0.0    0.0   0.0    0.0  ...     0.0     0.0   0.0   0.0     0.0    0.0   0.0   0.0   0.0     0.0  0.0    0.0

[255 rows x 18212 columns]
(1, 31, 18211)
Traceback (most recent call last):
  File "predict.py", line 87, in <module>
    main()
  File "predict.py", line 83, in main
    predict_test(unseen_data, transformer_models, n_components_list, d_models_list, batch_size, device=device)
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/openproblem/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "predict.py", line 33, in predict_test
    submission_df.insert(0, 'id', range(255))
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/openproblem/lib/python3.8/site-packages/pandas/core/frame.py", line 4776, in insert
    value = self._sanitize_column(value)
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/openproblem/lib/python3.8/site-packages/pandas/core/frame.py", line 4870, in _sanitize_column
    com.require_length_match(value, self.index)
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/openproblem/lib/python3.8/site-packages/pandas/core/common.py", line 576, in require_length_match
    raise ValueError(
ValueError: Length of values (255) does not match length of index (31)

However, I received the bugs mentioned above. It seems that the output combined_emb has dim with 31, but the target sample submission has dim 255. Are there anything wrong?

My config file looks like:

n_components_list: # targets dimension list
  - 18211
d_models_list:
  - 128
batch_size: 32
data_file: 'de_train.parquet'
id_map_file: 'id_map.csv'
device: cuda
seed: null
models_dir: 'trained_models'

Thanks a lot.

@HelloWorldLTY
Copy link
Author

Hi, I modified the test codes to this following:

@torch.no_grad()
def predict_test(data, models, n_components_list, d_list, batch_size, device='cuda'):
    num_samples = len(data)
    for i, n_components in enumerate(n_components_list):
        for j, d_model in enumerate(d_list):
            combined_outputs = []
            label_reducer, scaler, transformer_model = models[f'{n_components},{d_model}']
            transformer_model.eval()
            for i in range(0, num_samples, batch_size):
                batch_unseen_data = data[i:i + batch_size]
                transformed_data = transformer_model(batch_unseen_data)
                if scaler:
                    transformed_data = torch.tensor(scaler.inverse_transform(
                        label_reducer.inverse_transform(transformed_data.cpu().detach().numpy()))).to(device)
                print(transformed_data.shape)
                combined_outputs.append(transformed_data)

            # Stack the combined outputs
            combined_outputs = torch.vstack(combined_outputs)
            sample_submission = pd.read_csv(
                f"./sample_submission.csv")
            print(sample_submission)
            print(combined_outputs.cpu().detach().numpy().shape)
            sample_columns = sample_submission.columns
            sample_columns = sample_columns[1:]
            submission_df = pd.DataFrame(combined_outputs.cpu().detach().numpy(), columns=sample_columns)
            submission_df.insert(0, 'id', range(255))
            submission_df.to_csv(f"result_{n_components}_{d_model}.csv", index=False)

Then I will have a matrix with shape 255*18211. Is it correct? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant