good work, thanks~ When I run run_mgcd_demo.py, the data set ASSIST_0910, the code reports an error #19

zjj1333 · 2024-12-04T08:17:40Z

1.When I run run_mgcd_demo.py, the data set ASSIST_0910, the code reports an error
2.The datasets for ASSIST_12 and 13 are not provided, right?

2024-12-04 16:12:22[INFO]: ============================================================
/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/atom_op/mid2cache/common/remapid.py:49: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
v[col] = v[col].apply(lambda x: lbe.transform(x).tolist())
/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/atom_op/mid2cache/common/remapid.py:49: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
v[col] = v[col].apply(lambda x: lbe.transform(x).tolist())
/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/atom_op/mid2cache/common/remapid.py:47: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
v[col] = lbe.transform(v[col])
2024-12-04 16:12:28[INFO]: {'group_count': 42, 'exer_count': 16936, 'stu_count': 42, 'cpt_count': 122}
2024-12-04 16:12:28[INFO]: TrainTPL <class 'edustudio.traintpl.general_traintpl.GeneralTrainTPL'> Started!
2024-12-04 16:12:29[INFO]: ====== [FOLD ID]: 0 ======
2024-12-04 16:12:29[INFO]: [CALLBACK]-ModelCheckPoint has been registered!
2024-12-04 16:12:29[INFO]: [CALLBACK]-EarlyStopping has been registered!
2024-12-04 16:12:29[INFO]: [CALLBACK]-History has been registered!
2024-12-04 16:12:29[INFO]: [CALLBACK]-BaseLogger has been registered!
2024-12-04 16:12:29[INFO]: Start Training...
[EPOCH=001]: 100%|████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.80it/s]
[PREDICT]: 0%| | 0/5 [00:00<?, ?it/s]
2024-12-04 16:12:30[ERROR]: Traceback (most recent call last):
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/quickstart/quickstart.py", line 58, in run_edustudio
traintpl.start()
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/traintpl/gd_traintpl.py", line 79, in start
metrics = self.one_fold_start(fold_id)
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/traintpl/general_traintpl.py", line 53, in one_fold_start
self.fit(train_loader=self.train_loader, valid_loader=self.valid_loader)
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/traintpl/general_traintpl.py", line 108, in fit
val_metrics = self.evaluate(valid_loader)
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/traintpl/general_traintpl.py", line 126, in evaluate
stu_id_list[idx] = batch_dict['stu_id']
KeyError: 'stu_id'

Traceback (most recent call last):
File "/home/bfs/ZJJ_KETI2_JIAYOU/EduStudio/examples/single_model/run_mgcd_demo.py", line 9, in
run_edustudio(
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/quickstart/quickstart.py", line 72, in run_edustudio
raise e
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/quickstart/quickstart.py", line 58, in run_edustudio
traintpl.start()
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/traintpl/gd_traintpl.py", line 79, in start
metrics = self.one_fold_start(fold_id)
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/traintpl/general_traintpl.py", line 53, in one_fold_start
self.fit(train_loader=self.train_loader, valid_loader=self.valid_loader)
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/traintpl/general_traintpl.py", line 108, in fit
val_metrics = self.evaluate(valid_loader)
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/bfs/miniconda3/envs/rag/lib/python3.9/site-packages/edustudio/traintpl/general_traintpl.py", line 126, in evaluate
stu_id_list[idx] = batch_dict['stu_id']
KeyError: 'stu_id'

kervias · 2024-12-05T12:19:57Z

Thank you for your feedback on EduStudio!

Problem of Running Error

We found that there was an error in the support for group-level cognitive diagnosis in GeneralTrainTPL, so we added a specialized training template called GroupCDTrainTPL in the latest code. Currently, you can install the latest version from the source code and then replace GeneralTrainTPL with GroupCDTrainTPL in the the cls key of traintpl_cfg_dict.

Alternatively, you can also customize the training template instead of installing from source code. The corresponding code is as follows:

import sys
import os

# sys.path.append(os.path.dirname(os.path.abspath(__file__)) + "/../../")
# os.chdir(os.path.dirname(os.path.abspath(__file__)))

from edustudio.quickstart import run_edustudio
from edustudio.traintpl import GeneralTrainTPL
from edustudio.utils.common import tensor2npy
import torch
from tqdm import tqdm


class GroupCDTrainTPL(GeneralTrainTPL):
  
    @torch.no_grad()
    def evaluate(self, loader):
        self.model.eval()
        stu_id_list = list(range(len(loader)))
        pd_list = list(range(len(loader)))
        gt_list = list(range(len(loader)))
        for idx, batch_dict in enumerate(tqdm(loader, ncols=self.frame_cfg['TQDM_NCOLS'], desc="[PREDICT]")):
            batch_dict = self.batch_dict2device(batch_dict)
            eval_dict = self.model.predict(**batch_dict)
            stu_id_list[idx] = batch_dict['group_id']
            pd_list[idx] = eval_dict['y_pd']
            gt_list[idx] = eval_dict['y_gt'] if 'y_gt' in eval_dict else batch_dict['label']
        y_pd = torch.hstack(pd_list)
        y_gt = torch.hstack(gt_list)
        group_id = torch.hstack(stu_id_list)

        eval_data_dict = {
            'group_id': group_id,
            'y_pd': y_pd,
            'y_gt': y_gt,
        }
        if hasattr(self.model, 'get_stu_status'):
            stu_stats_list = []
            idx = torch.arange(0, self.datatpl_cfg['dt_info']['stu_count']).to(self.traintpl_cfg['device'])
            for i in range(0,self.datatpl_cfg['dt_info']['stu_count'], self.traintpl_cfg['eval_batch_size']):
                batch_stu_id = idx[i:i+self.traintpl_cfg['eval_batch_size']]
                batch = self.model.get_stu_status(batch_stu_id)
                stu_stats_list.append(batch)
            stu_stats = torch.vstack(stu_stats_list)
            eval_data_dict.update({
                'stu_stats': tensor2npy(stu_stats),
            })
        if hasattr(self.datatpl, 'Q_mat'):
            eval_data_dict.update({
                'Q_mat': tensor2npy(self.datatpl.Q_mat)
            })
        eval_result = {}
        for evaltpl in self.evaltpls: eval_result.update(
                evaltpl.eval(ignore_metrics=self.traintpl_cfg['ignore_metrics_in_train'], **eval_data_dict)
            )
        return eval_result

    @torch.no_grad()
    def inference(self, loader):
        self.model.eval()
        stu_id_list = list(range(len(loader)))
        pd_list = list(range(len(loader)))
        gt_list = list(range(len(loader)))
        for idx, batch_dict in enumerate(tqdm(loader, ncols=self.frame_cfg['TQDM_NCOLS'], desc="[PREDICT]")):
            batch_dict = self.batch_dict2device(batch_dict)
            eval_dict = self.model.predict(**batch_dict)
            stu_id_list[idx] = batch_dict['group_id']
            pd_list[idx] = eval_dict['y_pd']
            gt_list[idx] = eval_dict['y_gt'] if 'y_gt' in eval_dict else batch_dict['label']
        y_pd = torch.hstack(pd_list)
        y_gt = torch.hstack(gt_list)
        group_id = torch.hstack(stu_id_list)

        eval_data_dict = {
            'group_id': group_id,
            'y_pd': y_pd,
            'y_gt': y_gt,
        }
        if hasattr(self.model, 'get_stu_status'):
            stu_stats_list = []
            idx = torch.arange(0, self.datatpl_cfg['dt_info']['stu_count']).to(self.traintpl_cfg['device'])
            for i in range(0,self.datatpl_cfg['dt_info']['stu_count'], self.traintpl_cfg['eval_batch_size']):
                batch_stu_id = idx[i:i+self.traintpl_cfg['eval_batch_size']]
                batch = self.model.get_stu_status(batch_stu_id)
                stu_stats_list.append(batch)
            stu_stats = torch.vstack(stu_stats_list)
            eval_data_dict.update({
                'stu_stats': tensor2npy(stu_stats),
            })
        if hasattr(self.datatpl, 'Q_mat'):
            eval_data_dict.update({
                'Q_mat': tensor2npy(self.datatpl.Q_mat)
            })
        eval_result = {}
        for evaltpl in self.evaltpls: eval_result.update(evaltpl.eval(**eval_data_dict))
        return eval_result


run_edustudio(
    dataset='ASSIST_0910',
    cfg_file_name=None,
    traintpl_cfg_dict={
        'cls': GroupCDTrainTPL,
        'early_stop_metrics': [('rmse','min')],
        'best_epoch_metric': 'rmse',
        'batch_size': 512
    },
    datatpl_cfg_dict={
        'cls': 'MGCDDataTPL',
    },
    modeltpl_cfg_dict={
        'cls': 'MGCD',
    },
    evaltpl_cfg_dict={
        'clses': ['PredictionEvalTPL'],
        'PredictionEvalTPL': {
            'use_metrics': ['auc', 'rmse']
        }
    }
)

Problem of ASSIST_1213

EduStudio currently does not provide automatic download of the middle data format for the ASSIST_1213 dataset, but EduStudio supports processing from raw data. The method is as follows:

Find the official website of the ASSIST_1213 dataset from EduStudio documentation

Download the file using gdown, and place it in the directory data/ASSIST_1213/rawdata/.

pip install gdown
gdown 1cU6Ft4R3hLqA7G1rIGArVfelSZvc6RxY # Download the file
unzip 2012-2013-data-with-predictions-4-final.zip # Unzip the file

Project structure

├── [4.0K]  data
│   └── [4.0K]  ASSIST_1213
│       ├── [4.0K]  rawdata
│       │   ├── [2.8G]  2012-2013-data-with-predictions-4-final.csv
│       │   └── [550M]  2012-2013-data-with-predictions-4-final.zip
├── [ 779]  run_mgcd_demo.py

Modify the datatpl_cfg_dict configuration in run_mgcd_demo.py:
(1) 'load_data_from': 'rawdata': Process from the raw data format.
(2) 'raw2mid_op': 'R2M_ASSIST_1213': The class name for converting raw data to EduStudio standard middle data format.

import sys
import os

# sys.path.append(os.path.dirname(os.path.abspath(__file__)) + "/../../")
# os.chdir(os.path.dirname(os.path.abspath(__file__)))

from edustudio.quickstart import run_edustudio
from edustudio.traintpl import GeneralTrainTPL
from edustudio.utils.common import tensor2npy
import torch
from tqdm import tqdm


class GroupCDTrainTPL(GeneralTrainTPL):
  
    @torch.no_grad()
    def evaluate(self, loader):
        self.model.eval()
        stu_id_list = list(range(len(loader)))
        pd_list = list(range(len(loader)))
        gt_list = list(range(len(loader)))
        for idx, batch_dict in enumerate(tqdm(loader, ncols=self.frame_cfg['TQDM_NCOLS'], desc="[PREDICT]")):
            batch_dict = self.batch_dict2device(batch_dict)
            eval_dict = self.model.predict(**batch_dict)
            stu_id_list[idx] = batch_dict['group_id']
            pd_list[idx] = eval_dict['y_pd']
            gt_list[idx] = eval_dict['y_gt'] if 'y_gt' in eval_dict else batch_dict['label']
        y_pd = torch.hstack(pd_list)
        y_gt = torch.hstack(gt_list)
        group_id = torch.hstack(stu_id_list)

        eval_data_dict = {
            'group_id': group_id,
            'y_pd': y_pd,
            'y_gt': y_gt,
        }
        if hasattr(self.model, 'get_stu_status'):
            stu_stats_list = []
            idx = torch.arange(0, self.datatpl_cfg['dt_info']['stu_count']).to(self.traintpl_cfg['device'])
            for i in range(0,self.datatpl_cfg['dt_info']['stu_count'], self.traintpl_cfg['eval_batch_size']):
                batch_stu_id = idx[i:i+self.traintpl_cfg['eval_batch_size']]
                batch = self.model.get_stu_status(batch_stu_id)
                stu_stats_list.append(batch)
            stu_stats = torch.vstack(stu_stats_list)
            eval_data_dict.update({
                'stu_stats': tensor2npy(stu_stats),
            })
        if hasattr(self.datatpl, 'Q_mat'):
            eval_data_dict.update({
                'Q_mat': tensor2npy(self.datatpl.Q_mat)
            })
        eval_result = {}
        for evaltpl in self.evaltpls: eval_result.update(
                evaltpl.eval(ignore_metrics=self.traintpl_cfg['ignore_metrics_in_train'], **eval_data_dict)
            )
        return eval_result

    @torch.no_grad()
    def inference(self, loader):
        self.model.eval()
        stu_id_list = list(range(len(loader)))
        pd_list = list(range(len(loader)))
        gt_list = list(range(len(loader)))
        for idx, batch_dict in enumerate(tqdm(loader, ncols=self.frame_cfg['TQDM_NCOLS'], desc="[PREDICT]")):
            batch_dict = self.batch_dict2device(batch_dict)
            eval_dict = self.model.predict(**batch_dict)
            stu_id_list[idx] = batch_dict['group_id']
            pd_list[idx] = eval_dict['y_pd']
            gt_list[idx] = eval_dict['y_gt'] if 'y_gt' in eval_dict else batch_dict['label']
        y_pd = torch.hstack(pd_list)
        y_gt = torch.hstack(gt_list)
        group_id = torch.hstack(stu_id_list)

        eval_data_dict = {
            'group_id': group_id,
            'y_pd': y_pd,
            'y_gt': y_gt,
        }
        if hasattr(self.model, 'get_stu_status'):
            stu_stats_list = []
            idx = torch.arange(0, self.datatpl_cfg['dt_info']['stu_count']).to(self.traintpl_cfg['device'])
            for i in range(0,self.datatpl_cfg['dt_info']['stu_count'], self.traintpl_cfg['eval_batch_size']):
                batch_stu_id = idx[i:i+self.traintpl_cfg['eval_batch_size']]
                batch = self.model.get_stu_status(batch_stu_id)
                stu_stats_list.append(batch)
            stu_stats = torch.vstack(stu_stats_list)
            eval_data_dict.update({
                'stu_stats': tensor2npy(stu_stats),
            })
        if hasattr(self.datatpl, 'Q_mat'):
            eval_data_dict.update({
                'Q_mat': tensor2npy(self.datatpl.Q_mat)
            })
        eval_result = {}
        for evaltpl in self.evaltpls: eval_result.update(evaltpl.eval(**eval_data_dict))
        return eval_result


run_edustudio(
    dataset='ASSIST_1213',
    cfg_file_name=None,
    traintpl_cfg_dict={
        'cls': GroupCDTrainTPL,
        'early_stop_metrics': [('rmse','min')],
        'best_epoch_metric': 'rmse',
        'batch_size': 512
    },
    datatpl_cfg_dict={
        'cls': 'MGCDDataTPL',
        'load_data_from': 'rawdata',
        'raw2mid_op': 'R2M_ASSIST_1213'
    },
    modeltpl_cfg_dict={
        'cls': 'MGCD',
    },
    evaltpl_cfg_dict={
        'clses': ['PredictionEvalTPL'],
        'PredictionEvalTPL': {
            'use_metrics': ['auc', 'rmse']
        }
    }
)

kervias closed this as completed Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

good work, thanks~ When I run run_mgcd_demo.py, the data set ASSIST_0910, the code reports an error #19

good work, thanks~ When I run run_mgcd_demo.py, the data set ASSIST_0910, the code reports an error #19

zjj1333 commented Dec 4, 2024

kervias commented Dec 5, 2024

good work, thanks~ When I run run_mgcd_demo.py, the data set ASSIST_0910, the code reports an error #19

good work, thanks~ When I run run_mgcd_demo.py, the data set ASSIST_0910, the code reports an error #19

Comments

zjj1333 commented Dec 4, 2024

kervias commented Dec 5, 2024

Problem of Running Error

Problem of ASSIST_1213