Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Trial details page empty #5793

Open
Interfish opened this issue Jun 17, 2024 · 1 comment
Open

Trial details page empty #5793

Interfish opened this issue Jun 17, 2024 · 1 comment

Comments

@Interfish
Copy link

Describe the issue:
Trial details page is empty while overview page and others seems okay. I am using Edge browser. Here is a screenshot of total blank details page below, after clicking the button:

image

Overview page is okay:

image

Environment:

  • NNI version: 3.0
  • Training service (local|remote|pai|aml|etc): local
  • Client OS: Ubuntu 22.04 docker container
  • Server OS (for remote mode only): NaN
  • Python version: 3.10.12
  • PyTorch/TensorFlow version: PyTorch 2.3.1+cu121
  • Is conda/virtualenv/venv used?:No
  • Is running in Docker?: Yes

Configuration:

  • Experiment config (remember to remove secrets!):
{
  "params": {
    "experimentType": "hpo",
    "trialCommand": "python dl_run.py --use_nni --config /life_changer/experiments/ws_related/train/nni/v25/default.yaml",
    "trialCodeDirectory": "/life_changer/experiments/ws_related/train",
    "trialConcurrency": 1,
    "maxTrialDuration": "1h",
    "useAnnotation": false,
    "debug": false,
    "logLevel": "info",
    "experimentWorkingDirectory": "/life_changer/experiments/ws_related/train/nni/experiments",
    "tuner": {
      "name": "GridSearch"
    },
    "trainingService": {
      "platform": "local",
      "trialCommand": "python dl_run.py --use_nni --config /life_changer/experiments/ws_related/train/nni/v25/default.yaml",
      "trialCodeDirectory": "/life_changer/experiments/ws_related/train",
      "debug": false,
      "maxTrialNumberPerGpu": 1,
      "reuseMode": false
    }
  },
  "execDuration": "1m 40s",
  "nextSequenceId": 2,
  "revision": 14
}
  • Search space:
{
  "model.embedding.mark_unknown_as_padding": {
    "_type": "choice",
    "_value": [true, false]
  },
  "model.embedding.init_weight_by_numerical_intensity": {
    "_type": "choice",
    "_value": [true, false]
  },
  "model.cross_layer.sub_net_dims": {
    "_type": "choice",
    "_value": [4, 16]
  },
  "model.cross_layer.score_fn": {
    "_type": "choice",
    "_value": [
      {
        "softplus": {"use": true, "beta": 5, "threshold": 100}
      },
      {
        "softmax": {"use": true}
      },
      {
        "silu": {"use": true}
      }
    ]
  },
  "model.cross_layer.use_lhuc": {
    "_type": "choice",
    "_value": [true, false]
  },
  "model.cross_layer.global_score_fn": {
    "_type": "choice",
    "_value": [true, false]
  },
  "model.output_layer.moe.topk": {
    "_type": "choice",
    "_value": [
      null,
      2
    ]
  },
  "model.output_layer.moe.num_experts": {
    "_type": "choice",
    "_value": [8, 32]
  },
  "model.output_layer.moe.experts_share_input": {
    "_type": "choice",
    "_value": [true, false]
  },
  "model.output_layer.moe.gating_input": {
    "_type": "choice",
    "_value": ["odds_team_ha_emb", "same_as_expert"]
  },
  "model.loss": {
    "_type": "choice",
    "_value": [
      {
        "focal": {"use": true, "gamma": 2}
      },
      {
        "ghmc": {"use": true, "bins": 10}
      },
      {
        "wdl_prob_rank": {"use": true, "top_k_frac": 0.1, "top_k_weight"": 10}
      },
      {
        "odds_weighted": {"use": true, "pow_of_n": 2}
      },
      {
        "pred_odds_topk": {
          "use": true,
          "top_k_frac": 0.1,
          "top_k_weight": 10,
          "weight_by_odds": false
        }
      },
      {
        "pred_odds_topk": {
          "use": true,
          "top_k_frac": 0.1,
          "top_k_weight": null,
          "weight_by_odds": true
        }
      },
      {
        "pred_odds_topk": {
          "use": true,
          "top_k_frac": 0.2,
          "top_k_weight": 5,
          "weight_by_odds": false
        }
      },
      {
        "preds_correct_odds": {
          "use": true,
          "threshold": 3,
          "weight": 10,
          "weight_by_odds": false
        }
      },
      {
        "preds_correct_odds": {
          "use": true,
          "threshold": 3,
          "weight": null,
          "weight_by_odds": true
        }
      }
    ]
  },
  "model.degree_1.last_n": {
    "_type": "choice",
    "_value": [1, 3]
  },
  "model.degree_1.use_lhuc": {
    "_type": "choice",
    "_value": [true, false]
  },
  "optimizer": {
    "_type": "choice",
    "_value": [
      {
        "_name": "RAdam",
        "weight_decay": {
          "_type": "choice",
          "_value": [0.0001, 0.001]
        }
      },
      {
        "_name": "Ranger",
        "weight_decay": {
          "_type": "choice",
          "_value": [0.0001, 0.001]
        }
      }
    ]
  }
}

Log message:

  • nnimanager.log:
[2024-06-17 01:58:52] INFO (main) Start NNI manager
[2024-06-17 01:58:52] INFO (RestServer) Starting REST server at port 8888, URL prefix: "/"
[2024-06-17 01:58:52] INFO (RestServer) REST server started.
[2024-06-17 01:58:53] INFO (NNIDataStore) Datastore initialization done
[2024-06-17 01:58:54] INFO (NNIManager) Starting experiment: l2yv4dn1
[2024-06-17 01:58:54] INFO (NNIManager) Setup training service...
[2024-06-17 01:58:54] INFO (NNIManager) Setup tuner...
[2024-06-17 01:58:54] INFO (NNIManager) Change NNIManager status from: INITIALIZED to: RUNNING
[2024-06-17 01:58:54] INFO (NNIManager) Add event listeners
[2024-06-17 01:58:54] INFO (LocalV3.local) Start
[2024-06-17 01:58:54] INFO (NNIManager) NNIManager received command from dispatcher: ID, 
[2024-06-17 01:58:54] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 0, "parameter_source": "algorithm", "parameters": {"model.embedding.mark_unknown_as_padding": true, "model.embedding.init_weight_by_numerical_intensity": true, "model.cross_layer.sub_net_dims": 4, "model.cross_layer.score_fn": {"softplus": {"use": true, "beta": 5, "threshold": 100}}, "model.cross_layer.use_lhuc": true, "model.cross_layer.global_score_fn": true, "model.output_layer.moe.topk": null, "model.output_layer.moe.num_experts": 8, "model.output_layer.moe.experts_share_input": true, "model.output_layer.moe.gating_input": "odds_team_ha_emb", "model.loss": {"focal": {"use": true, "gamma": 2}}, "model.degree_1.last_n": 1, "model.degree_1.use_lhuc": true, "optimizer": {"_name": "RAdam", "weight_decay": 0.0001}}, "parameter_index": 0}
[2024-06-17 01:58:55] INFO (NNIManager) submitTrialJob: form: {
  sequenceId: 0,
  hyperParameters: {
    value: '{"parameter_id": 0, "parameter_source": "algorithm", "parameters": {"model.embedding.mark_unknown_as_padding": true, "model.embedding.init_weight_by_numerical_intensity": true, "model.cross_layer.sub_net_dims": 4, "model.cross_layer.score_fn": {"softplus": {"use": true, "beta": 5, "threshold": 100}}, "model.cross_layer.use_lhuc": true, "model.cross_layer.global_score_fn": true, "model.output_layer.moe.topk": null, "model.output_layer.moe.num_experts": 8, "model.output_layer.moe.experts_share_input": true, "model.output_layer.moe.gating_input": "odds_team_ha_emb", "model.loss": {"focal": {"use": true, "gamma": 2}}, "model.degree_1.last_n": 1, "model.degree_1.use_lhuc": true, "optimizer": {"_name": "RAdam", "weight_decay": 0.0001}}, "parameter_index": 0}',
    index: 0
  },
  placementConstraint: { type: 'None', gpus: [] }
}
[2024-06-17 01:58:55] INFO (LocalV3.local) Register directory trial_code = /life_changer/experiments/ws_related/train
[2024-06-17 01:58:55] INFO (LocalV3.local) Created trial FOUTA
[2024-06-17 01:58:58] INFO (LocalV3.local) Trial parameter: FOUTA {"parameter_id": 0, "parameter_source": "algorithm", "parameters": {"model.embedding.mark_unknown_as_padding": true, "model.embedding.init_weight_by_numerical_intensity": true, "model.cross_layer.sub_net_dims": 4, "model.cross_layer.score_fn": {"softplus": {"use": true, "beta": 5, "threshold": 100}}, "model.cross_layer.use_lhuc": true, "model.cross_layer.global_score_fn": true, "model.output_layer.moe.topk": null, "model.output_layer.moe.num_experts": 8, "model.output_layer.moe.experts_share_input": true, "model.output_layer.moe.gating_input": "odds_team_ha_emb", "model.loss": {"focal": {"use": true, "gamma": 2}}, "model.degree_1.last_n": 1, "model.degree_1.use_lhuc": true, "optimizer": {"_name": "RAdam", "weight_decay": 0.0001}}, "parameter_index": 0}
[2024-06-17 02:00:08] INFO (NNIManager) Trial job FOUTA status changed from RUNNING to SUCCEEDED
[2024-06-17 02:00:09] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 1, "parameter_source": "algorithm", "parameters": {"model.embedding.mark_unknown_as_padding": true, "model.embedding.init_weight_by_numerical_intensity": true, "model.cross_layer.sub_net_dims": 4, "model.cross_layer.score_fn": {"softplus": {"use": true, "beta": 5, "threshold": 100}}, "model.cross_layer.use_lhuc": true, "model.cross_layer.global_score_fn": true, "model.output_layer.moe.topk": null, "model.output_layer.moe.num_experts": 8, "model.output_layer.moe.experts_share_input": true, "model.output_layer.moe.gating_input": "odds_team_ha_emb", "model.loss": {"focal": {"use": true, "gamma": 2}}, "model.degree_1.last_n": 1, "model.degree_1.use_lhuc": true, "optimizer": {"_name": "RAdam", "weight_decay": 0.001}}, "parameter_index": 0}
[2024-06-17 02:00:09] INFO (NNIManager) submitTrialJob: form: {
  sequenceId: 1,
  hyperParameters: {
    value: '{"parameter_id": 1, "parameter_source": "algorithm", "parameters": {"model.embedding.mark_unknown_as_padding": true, "model.embedding.init_weight_by_numerical_intensity": true, "model.cross_layer.sub_net_dims": 4, "model.cross_layer.score_fn": {"softplus": {"use": true, "beta": 5, "threshold": 100}}, "model.cross_layer.use_lhuc": true, "model.cross_layer.global_score_fn": true, "model.output_layer.moe.topk": null, "model.output_layer.moe.num_experts": 8, "model.output_layer.moe.experts_share_input": true, "model.output_layer.moe.gating_input": "odds_team_ha_emb", "model.loss": {"focal": {"use": true, "gamma": 2}}, "model.degree_1.last_n": 1, "model.degree_1.use_lhuc": true, "optimizer": {"_name": "RAdam", "weight_decay": 0.001}}, "parameter_index": 0}',
    index: 0
  },
  placementConstraint: { type: 'None', gpus: [] }
}
[2024-06-17 02:00:09] INFO (LocalV3.local) Created trial sCcur
[2024-06-17 02:00:12] INFO (LocalV3.local) Trial parameter: sCcur {"parameter_id": 1, "parameter_source": "algorithm", "parameters": {"model.embedding.mark_unknown_as_padding": true, "model.embedding.init_weight_by_numerical_intensity": true, "model.cross_layer.sub_net_dims": 4, "model.cross_layer.score_fn": {"softplus": {"use": true, "beta": 5, "threshold": 100}}, "model.cross_layer.use_lhuc": true, "model.cross_layer.global_score_fn": true, "model.output_layer.moe.topk": null, "model.output_layer.moe.num_experts": 8, "model.output_layer.moe.experts_share_input": true, "model.output_layer.moe.gating_input": "odds_team_ha_emb", "model.loss": {"focal": {"use": true, "gamma": 2}}, "model.degree_1.last_n": 1, "model.degree_1.use_lhuc": true, "optimizer": {"_name": "RAdam", "weight_decay": 0.001}}, "parameter_index": 0}
[2024-06-17 02:01:20] INFO (NNIManager) Trial job sCcur status changed from RUNNING to SUCCEEDED
[2024-06-17 02:01:20] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 2, "parameter_source": "algorithm", "parameters": {"model.embedding.mark_unknown_as_padding": true, "model.embedding.init_weight_by_numerical_intensity": true, "model.cross_layer.sub_net_dims": 4, "model.cross_layer.score_fn": {"softplus": {"use": true, "beta": 5, "threshold": 100}}, "model.cross_layer.use_lhuc": true, "model.cross_layer.global_score_fn": true, "model.output_layer.moe.topk": null, "model.output_layer.moe.num_experts": 8, "model.output_layer.moe.experts_share_input": true, "model.output_layer.moe.gating_input": "odds_team_ha_emb", "model.loss": {"focal": {"use": true, "gamma": 2}}, "model.degree_1.last_n": 1, "model.degree_1.use_lhuc": true, "optimizer": {"_name": "Ranger", "weight_decay": 0.0001}}, "parameter_index": 0}
[2024-06-17 02:01:20] INFO (NNIManager) submitTrialJob: form: {
  sequenceId: 2,
  hyperParameters: {
    value: '{"parameter_id": 2, "parameter_source": "algorithm", "parameters": {"model.embedding.mark_unknown_as_padding": true, "model.embedding.init_weight_by_numerical_intensity": true, "model.cross_layer.sub_net_dims": 4, "model.cross_layer.score_fn": {"softplus": {"use": true, "beta": 5, "threshold": 100}}, "model.cross_layer.use_lhuc": true, "model.cross_layer.global_score_fn": true, "model.output_layer.moe.topk": null, "model.output_layer.moe.num_experts": 8, "model.output_layer.moe.experts_share_input": true, "model.output_layer.moe.gating_input": "odds_team_ha_emb", "model.loss": {"focal": {"use": true, "gamma": 2}}, "model.degree_1.last_n": 1, "model.degree_1.use_lhuc": true, "optimizer": {"_name": "Ranger", "weight_decay": 0.0001}}, "parameter_index": 0}',
    index: 0
  },
  placementConstraint: { type: 'None', gpus: [] }
}
[2024-06-17 02:01:20] INFO (LocalV3.local) Created trial y553t
[2024-06-17 02:01:23] INFO (LocalV3.local) Trial parameter: y553t {"parameter_id": 2, "parameter_source": "algorithm", "parameters": {"model.embedding.mark_unknown_as_padding": true, "model.embedding.init_weight_by_numerical_intensity": true, "model.cross_layer.sub_net_dims": 4, "model.cross_layer.score_fn": {"softplus": {"use": true, "beta": 5, "threshold": 100}}, "model.cross_layer.use_lhuc": true, "model.cross_layer.global_score_fn": true, "model.output_layer.moe.topk": null, "model.output_layer.moe.num_experts": 8, "model.output_layer.moe.experts_share_input": true, "model.output_layer.moe.gating_input": "odds_team_ha_emb", "model.loss": {"focal": {"use": true, "gamma": 2}}, "model.degree_1.last_n": 1, "model.degree_1.use_lhuc": true, "optimizer": {"_name": "Ranger", "weight_decay": 0.0001}}, "parameter_index": 0}
  • dispatcher.log:
[2024-06-17 01:58:54] INFO (nni.runtime.msg_dispatcher_base/MainThread) Dispatcher started
[2024-06-17 01:58:54] INFO (nni.runtime.msg_dispatcher/Thread-1 (command_queue_worker)) Initial search space: {'model.embedding.mark_unknown_as_padding': {'_type': 'choice', '_value': [True, False]}, 'model.embedding.init_weight_by_numerical_intensity': {'_type': 'choice', '_value': [True, False]}, 'model.cross_layer.sub_net_dims': {'_type': 'choice', '_value': [4, 16]}, 'model.cross_layer.score_fn': {'_type': 'choice', '_value': [{'softplus': {'use': True, 'beta': 5, 'threshold': 100}}, {'softmax': {'use': True}}, {'silu': {'use': True}}]}, 'model.cross_layer.use_lhuc': {'_type': 'choice', '_value': [True, False]}, 'model.cross_layer.global_score_fn': {'_type': 'choice', '_value': [True, False]}, 'model.output_layer.moe.topk': {'_type': 'choice', '_value': [None, 2]}, 'model.output_layer.moe.num_experts': {'_type': 'choice', '_value': [8, 32]}, 'model.output_layer.moe.experts_share_input': {'_type': 'choice', '_value': [True, False]}, 'model.output_layer.moe.gating_input': {'_type': 'choice', '_value': ['odds_team_ha_emb', 'same_as_expert']}, 'model.loss': {'_type': 'choice', '_value': [{'focal': {'use': True, 'gamma': 2}}, {'ghmc': {'use': True, 'bins': 10}}, {'wdl_prob_rank': {'use': True, 'top_k_frac': 0.1, 'top_k_weight"': 10}}, {'odds_weighted': {'use': True, 'pow_of_n': 2}}, {'pred_odds_topk': {'use': True, 'top_k_frac': 0.1, 'top_k_weight': 10, 'weight_by_odds': False}}, {'pred_odds_topk': {'use': True, 'top_k_frac': 0.1, 'top_k_weight': None, 'weight_by_odds': True}}, {'pred_odds_topk': {'use': True, 'top_k_frac': 0.2, 'top_k_weight': 5, 'weight_by_odds': False}}, {'preds_correct_odds': {'use': True, 'threshold': 3, 'weight': 10, 'weight_by_odds': False}}, {'preds_correct_odds': {'use': True, 'threshold': 3, 'weight': None, 'weight_by_odds': True}}]}, 'model.degree_1.last_n': {'_type': 'choice', '_value': [1, 3]}, 'model.degree_1.use_lhuc': {'_type': 'choice', '_value': [True, False]}, 'optimizer': {'_type': 'choice', '_value': [{'_name': 'RAdam', 'weight_decay': {'_type': 'choice', '_value': [0.0001, 0.001]}}, {'_name': 'Ranger', 'weight_decay': {'_type': 'choice', '_value': [0.0001, 0.001]}}]}}
[2024-06-17 01:58:54] INFO (nni.tuner.gridsearch/Thread-1 (command_queue_worker)) Grid initialized, size: (2×2×2×3×2×2×2×2×2×2×9×2×2×2×2×2) = 442368
  • nnictl stdout and stderr:
--------------------------------------------------------------------------------
Experiment l2yv4dn1 start: 2024-06-17 01:58:52.009267
--------------------------------------------------------------------------------

How to reproduce it?:
Maybe hard to reproduce because the exact env is complicated on my machine.

@Interfish
Copy link
Author

After experiments of my own, I found that the root cause is "null" in my exp config yaml file. If I replace them for some values like "True" or "1", then details page can display normally. Since this repo is no longer maintained, just for record for those who is stilling use nni.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant