Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

good work, thanks~ When I run run_mgcd_demo.py, the data set ASSIST_0910, the code reports an error #19

Closed
zjj1333 opened this issue Dec 4, 2024 · 1 comment

Comments

@zjj1333
Copy link

zjj1333 commented Dec 4, 2024

1.When I run run_mgcd_demo.py, the data set ASSIST_0910, the code reports an error
2.The datasets for ASSIST_12 and 13 are not provided, right?

2024-12-04 16:12:22[INFO]: ============================================================
/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/atom_op/mid2cache/common/remapid.py:49: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
v[col] = v[col].apply(lambda x: lbe.transform(x).tolist())
/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/atom_op/mid2cache/common/remapid.py:49: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
v[col] = v[col].apply(lambda x: lbe.transform(x).tolist())
/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/atom_op/mid2cache/common/remapid.py:47: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
v[col] = lbe.transform(v[col])
2024-12-04 16:12:28[INFO]: {'group_count': 42, 'exer_count': 16936, 'stu_count': 42, 'cpt_count': 122}
2024-12-04 16:12:28[INFO]: TrainTPL <class 'edustudio.traintpl.general_traintpl.GeneralTrainTPL'> Started!
2024-12-04 16:12:29[INFO]: ====== [FOLD ID]: 0 ======
2024-12-04 16:12:29[INFO]: [CALLBACK]-ModelCheckPoint has been registered!
2024-12-04 16:12:29[INFO]: [CALLBACK]-EarlyStopping has been registered!
2024-12-04 16:12:29[INFO]: [CALLBACK]-History has been registered!
2024-12-04 16:12:29[INFO]: [CALLBACK]-BaseLogger has been registered!
2024-12-04 16:12:29[INFO]: Start Training...
[EPOCH=001]: 100%|████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.80it/s]
[PREDICT]: 0%| | 0/5 [00:00<?, ?it/s]
2024-12-04 16:12:30[ERROR]: Traceback (most recent call last):
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/quickstart/quickstart.py", line 58, in run_edustudio
traintpl.start()
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/traintpl/gd_traintpl.py", line 79, in start
metrics = self.one_fold_start(fold_id)
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/traintpl/general_traintpl.py", line 53, in one_fold_start
self.fit(train_loader=self.train_loader, valid_loader=self.valid_loader)
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/traintpl/general_traintpl.py", line 108, in fit
val_metrics = self.evaluate(valid_loader)
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/traintpl/general_traintpl.py", line 126, in evaluate
stu_id_list[idx] = batch_dict['stu_id']
KeyError: 'stu_id'

Traceback (most recent call last):
File "/home/bfs/ZJJ_KETI2_JIAYOU/EduStudio/examples/single_model/run_mgcd_demo.py", line 9, in
run_edustudio(
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/quickstart/quickstart.py", line 72, in run_edustudio
raise e
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/quickstart/quickstart.py", line 58, in run_edustudio
traintpl.start()
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/traintpl/gd_traintpl.py", line 79, in start
metrics = self.one_fold_start(fold_id)
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/traintpl/general_traintpl.py", line 53, in one_fold_start
self.fit(train_loader=self.train_loader, valid_loader=self.valid_loader)
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/traintpl/general_traintpl.py", line 108, in fit
val_metrics = self.evaluate(valid_loader)
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/traintpl/general_traintpl.py", line 126, in evaluate
stu_id_list[idx] = batch_dict['stu_id']
KeyError: 'stu_id'

@kervias
Copy link
Contributor

kervias commented Dec 5, 2024

Thank you for your feedback on EduStudio!

Problem of Running Error

We found that there was an error in the support for group-level cognitive diagnosis in GeneralTrainTPL, so we added a specialized training template called GroupCDTrainTPL in the latest code. Currently, you can install the latest version from the source code and then replace GeneralTrainTPL with GroupCDTrainTPL in the the cls key of traintpl_cfg_dict.

Alternatively, you can also customize the training template instead of installing from source code. The corresponding code is as follows:

import sys
import os

# sys.path.append(os.path.dirname(os.path.abspath(__file__)) + "/../../")
# os.chdir(os.path.dirname(os.path.abspath(__file__)))

from edustudio.quickstart import run_edustudio
from edustudio.traintpl import GeneralTrainTPL
from edustudio.utils.common import tensor2npy
import torch
from tqdm import tqdm


class GroupCDTrainTPL(GeneralTrainTPL):
  
    @torch.no_grad()
    def evaluate(self, loader):
        self.model.eval()
        stu_id_list = list(range(len(loader)))
        pd_list = list(range(len(loader)))
        gt_list = list(range(len(loader)))
        for idx, batch_dict in enumerate(tqdm(loader, ncols=self.frame_cfg['TQDM_NCOLS'], desc="[PREDICT]")):
            batch_dict = self.batch_dict2device(batch_dict)
            eval_dict = self.model.predict(**batch_dict)
            stu_id_list[idx] = batch_dict['group_id']
            pd_list[idx] = eval_dict['y_pd']
            gt_list[idx] = eval_dict['y_gt'] if 'y_gt' in eval_dict else batch_dict['label']
        y_pd = torch.hstack(pd_list)
        y_gt = torch.hstack(gt_list)
        group_id = torch.hstack(stu_id_list)

        eval_data_dict = {
            'group_id': group_id,
            'y_pd': y_pd,
            'y_gt': y_gt,
        }
        if hasattr(self.model, 'get_stu_status'):
            stu_stats_list = []
            idx = torch.arange(0, self.datatpl_cfg['dt_info']['stu_count']).to(self.traintpl_cfg['device'])
            for i in range(0,self.datatpl_cfg['dt_info']['stu_count'], self.traintpl_cfg['eval_batch_size']):
                batch_stu_id = idx[i:i+self.traintpl_cfg['eval_batch_size']]
                batch = self.model.get_stu_status(batch_stu_id)
                stu_stats_list.append(batch)
            stu_stats = torch.vstack(stu_stats_list)
            eval_data_dict.update({
                'stu_stats': tensor2npy(stu_stats),
            })
        if hasattr(self.datatpl, 'Q_mat'):
            eval_data_dict.update({
                'Q_mat': tensor2npy(self.datatpl.Q_mat)
            })
        eval_result = {}
        for evaltpl in self.evaltpls: eval_result.update(
                evaltpl.eval(ignore_metrics=self.traintpl_cfg['ignore_metrics_in_train'], **eval_data_dict)
            )
        return eval_result

    @torch.no_grad()
    def inference(self, loader):
        self.model.eval()
        stu_id_list = list(range(len(loader)))
        pd_list = list(range(len(loader)))
        gt_list = list(range(len(loader)))
        for idx, batch_dict in enumerate(tqdm(loader, ncols=self.frame_cfg['TQDM_NCOLS'], desc="[PREDICT]")):
            batch_dict = self.batch_dict2device(batch_dict)
            eval_dict = self.model.predict(**batch_dict)
            stu_id_list[idx] = batch_dict['group_id']
            pd_list[idx] = eval_dict['y_pd']
            gt_list[idx] = eval_dict['y_gt'] if 'y_gt' in eval_dict else batch_dict['label']
        y_pd = torch.hstack(pd_list)
        y_gt = torch.hstack(gt_list)
        group_id = torch.hstack(stu_id_list)

        eval_data_dict = {
            'group_id': group_id,
            'y_pd': y_pd,
            'y_gt': y_gt,
        }
        if hasattr(self.model, 'get_stu_status'):
            stu_stats_list = []
            idx = torch.arange(0, self.datatpl_cfg['dt_info']['stu_count']).to(self.traintpl_cfg['device'])
            for i in range(0,self.datatpl_cfg['dt_info']['stu_count'], self.traintpl_cfg['eval_batch_size']):
                batch_stu_id = idx[i:i+self.traintpl_cfg['eval_batch_size']]
                batch = self.model.get_stu_status(batch_stu_id)
                stu_stats_list.append(batch)
            stu_stats = torch.vstack(stu_stats_list)
            eval_data_dict.update({
                'stu_stats': tensor2npy(stu_stats),
            })
        if hasattr(self.datatpl, 'Q_mat'):
            eval_data_dict.update({
                'Q_mat': tensor2npy(self.datatpl.Q_mat)
            })
        eval_result = {}
        for evaltpl in self.evaltpls: eval_result.update(evaltpl.eval(**eval_data_dict))
        return eval_result


run_edustudio(
    dataset='ASSIST_0910',
    cfg_file_name=None,
    traintpl_cfg_dict={
        'cls': GroupCDTrainTPL,
        'early_stop_metrics': [('rmse','min')],
        'best_epoch_metric': 'rmse',
        'batch_size': 512
    },
    datatpl_cfg_dict={
        'cls': 'MGCDDataTPL',
    },
    modeltpl_cfg_dict={
        'cls': 'MGCD',
    },
    evaltpl_cfg_dict={
        'clses': ['PredictionEvalTPL'],
        'PredictionEvalTPL': {
            'use_metrics': ['auc', 'rmse']
        }
    }
)

Problem of ASSIST_1213

EduStudio currently does not provide automatic download of the middle data format for the ASSIST_1213 dataset, but EduStudio supports processing from raw data. The method is as follows:

  1. Find the official website of the ASSIST_1213 dataset from EduStudio documentation

  2. Download the file using gdown, and place it in the directory data/ASSIST_1213/rawdata/.

    pip install gdown
    gdown 1cU6Ft4R3hLqA7G1rIGArVfelSZvc6RxY # Download the file
    unzip 2012-2013-data-with-predictions-4-final.zip # Unzip the file
  3. Project structure

    ├── [4.0K]  data
    │   └── [4.0K]  ASSIST_1213
    │       ├── [4.0K]  rawdata
    │       │   ├── [2.8G]  2012-2013-data-with-predictions-4-final.csv
    │       │   └── [550M]  2012-2013-data-with-predictions-4-final.zip
    ├── [ 779]  run_mgcd_demo.py
  4. Modify the datatpl_cfg_dict configuration in run_mgcd_demo.py:
    (1) 'load_data_from': 'rawdata': Process from the raw data format.
    (2) 'raw2mid_op': 'R2M_ASSIST_1213': The class name for converting raw data to EduStudio standard middle data format.

    import sys
    import os
    
    # sys.path.append(os.path.dirname(os.path.abspath(__file__)) + "/../../")
    # os.chdir(os.path.dirname(os.path.abspath(__file__)))
    
    from edustudio.quickstart import run_edustudio
    from edustudio.traintpl import GeneralTrainTPL
    from edustudio.utils.common import tensor2npy
    import torch
    from tqdm import tqdm
    
    
    class GroupCDTrainTPL(GeneralTrainTPL):
      
        @torch.no_grad()
        def evaluate(self, loader):
            self.model.eval()
            stu_id_list = list(range(len(loader)))
            pd_list = list(range(len(loader)))
            gt_list = list(range(len(loader)))
            for idx, batch_dict in enumerate(tqdm(loader, ncols=self.frame_cfg['TQDM_NCOLS'], desc="[PREDICT]")):
                batch_dict = self.batch_dict2device(batch_dict)
                eval_dict = self.model.predict(**batch_dict)
                stu_id_list[idx] = batch_dict['group_id']
                pd_list[idx] = eval_dict['y_pd']
                gt_list[idx] = eval_dict['y_gt'] if 'y_gt' in eval_dict else batch_dict['label']
            y_pd = torch.hstack(pd_list)
            y_gt = torch.hstack(gt_list)
            group_id = torch.hstack(stu_id_list)
    
            eval_data_dict = {
                'group_id': group_id,
                'y_pd': y_pd,
                'y_gt': y_gt,
            }
            if hasattr(self.model, 'get_stu_status'):
                stu_stats_list = []
                idx = torch.arange(0, self.datatpl_cfg['dt_info']['stu_count']).to(self.traintpl_cfg['device'])
                for i in range(0,self.datatpl_cfg['dt_info']['stu_count'], self.traintpl_cfg['eval_batch_size']):
                    batch_stu_id = idx[i:i+self.traintpl_cfg['eval_batch_size']]
                    batch = self.model.get_stu_status(batch_stu_id)
                    stu_stats_list.append(batch)
                stu_stats = torch.vstack(stu_stats_list)
                eval_data_dict.update({
                    'stu_stats': tensor2npy(stu_stats),
                })
            if hasattr(self.datatpl, 'Q_mat'):
                eval_data_dict.update({
                    'Q_mat': tensor2npy(self.datatpl.Q_mat)
                })
            eval_result = {}
            for evaltpl in self.evaltpls: eval_result.update(
                    evaltpl.eval(ignore_metrics=self.traintpl_cfg['ignore_metrics_in_train'], **eval_data_dict)
                )
            return eval_result
    
        @torch.no_grad()
        def inference(self, loader):
            self.model.eval()
            stu_id_list = list(range(len(loader)))
            pd_list = list(range(len(loader)))
            gt_list = list(range(len(loader)))
            for idx, batch_dict in enumerate(tqdm(loader, ncols=self.frame_cfg['TQDM_NCOLS'], desc="[PREDICT]")):
                batch_dict = self.batch_dict2device(batch_dict)
                eval_dict = self.model.predict(**batch_dict)
                stu_id_list[idx] = batch_dict['group_id']
                pd_list[idx] = eval_dict['y_pd']
                gt_list[idx] = eval_dict['y_gt'] if 'y_gt' in eval_dict else batch_dict['label']
            y_pd = torch.hstack(pd_list)
            y_gt = torch.hstack(gt_list)
            group_id = torch.hstack(stu_id_list)
    
            eval_data_dict = {
                'group_id': group_id,
                'y_pd': y_pd,
                'y_gt': y_gt,
            }
            if hasattr(self.model, 'get_stu_status'):
                stu_stats_list = []
                idx = torch.arange(0, self.datatpl_cfg['dt_info']['stu_count']).to(self.traintpl_cfg['device'])
                for i in range(0,self.datatpl_cfg['dt_info']['stu_count'], self.traintpl_cfg['eval_batch_size']):
                    batch_stu_id = idx[i:i+self.traintpl_cfg['eval_batch_size']]
                    batch = self.model.get_stu_status(batch_stu_id)
                    stu_stats_list.append(batch)
                stu_stats = torch.vstack(stu_stats_list)
                eval_data_dict.update({
                    'stu_stats': tensor2npy(stu_stats),
                })
            if hasattr(self.datatpl, 'Q_mat'):
                eval_data_dict.update({
                    'Q_mat': tensor2npy(self.datatpl.Q_mat)
                })
            eval_result = {}
            for evaltpl in self.evaltpls: eval_result.update(evaltpl.eval(**eval_data_dict))
            return eval_result
    
    
    run_edustudio(
        dataset='ASSIST_1213',
        cfg_file_name=None,
        traintpl_cfg_dict={
            'cls': GroupCDTrainTPL,
            'early_stop_metrics': [('rmse','min')],
            'best_epoch_metric': 'rmse',
            'batch_size': 512
        },
        datatpl_cfg_dict={
            'cls': 'MGCDDataTPL',
            'load_data_from': 'rawdata',
            'raw2mid_op': 'R2M_ASSIST_1213'
        },
        modeltpl_cfg_dict={
            'cls': 'MGCD',
        },
        evaltpl_cfg_dict={
            'clses': ['PredictionEvalTPL'],
            'PredictionEvalTPL': {
                'use_metrics': ['auc', 'rmse']
            }
        }
    )

@kervias kervias closed this as completed Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants