Skip to content

[ICME 2024] Official Datasets and example of LLM-SAP: Large Language Model Situational Awareness Based Planning

Notifications You must be signed in to change notification settings

HanyangZhong/Situational_Planning_datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM-SAP: Large Language Model Situational Awareness Based Planning

This is the official opensource for the paper LLM-SAP: Large Language Model Situational Awareness Based Planning.

The paper is available at arXiv.

Web page Navigation

Dataset Introduction

This dataset mainly includes hazardous scenarios from 24 home scenarios.
This includes a list of scenes, scenario events, planning complexity, detailed scenario descriptions, corresponding image descriptions, best human-written finite state machine demonstrations, and approximate visualization images of the finite state machine.
The detailed scene content can be viewed and used in CSV or JSON files under dataset.
In addition, we have prepared around 600 high-quality multimodal situational awareness planning datasets generated based on the SAP prompt method provided in this article, which can be used for fine-tuning GitHub or Hugging Face.

Quick Start

The dataset folder contains the main dataset content, including the test set 24_Home_Hazard_Scenario and training set Household_ Safety.
The finite state machine is generated by the test mentioned in the paper and the measurement results of the state machine are generated in the folder generated_FSM and eval_result.
Among them, regarding generated_FSM and eval_result, a detailed explanation is explained below.

Generated Results

Generate template

  1. baseline prompting template
  2. SAP prompting template
  3. Comparation eval prompting template
  4. Single round eval without feedback prompting template
  5. Second round eval with feedback prompting template
  6. Feedback prompting template
  7. Ablation study 1 normal prompting template
  8. Ablation study 1 SAP prompting template
  9. Ablation study 2 Zero_shot_COT prompting template
  10. Ablation study 2 EP05 prompting template
  11. Ablation study 2 EP09 prompting template

GPT-4 generated results

  1. GPT-4+SAP
  2. GPT-4

GPT-3.5 generated results

  1. GPT-3.5+SAP
    (The demo is shown here)
  2. GPT-3.5

Claude-2 generated results

  1. Claude2+SAP
  2. Claude2

Multi-agent test

  1. GPT-3.5+SAP
    GPT-3.5+SAP was selected as the test baseline.
  2. Claude-2 eval
    GPT-3.5+SAP would be evaluated according to the best demo in the corresponding scene and generated with feedback.
  3. Regenerate with feedback
    New FSM results would be generated by GPT-3.5 according to feedback.
  4. Claude-2 eval new FSM
    GPT-3.5+SAP+feedback would be evaluated with GPT-3.5 based on the best demo.
    Or evaluated with GPT-4+SAP based on the GPT-3.5+SAP result.
    (The whole demo is shown here)

Ablation test1 formatting generation

The format prompt is shown in the prompt template.
The Ablation tests are only generated by GPT-4.

  1. GPT-4 format wtih SAP
    Generation result here
  2. GPT-4 format without SAP
    Generation result here
  3. Vicuna13b format with SAP
    Only get the generation from Vicuna, not been evaluated. Generation result here
  4. Vicuna13b format without SAP
    Only get the generation from Vicuna, not been evaluated. Generation result here

Ablation test2 result

The result of ablation study1. Only evaluated by GPT-4.

GPT-4+Zero_shot_COT Generation

The prompt of Zero_shot_COT is: Let's think step by step.
Generation result here

GPT-4+EP05 Generation

The prompt of EP05 is: Are you sure that’s your final answer? It might be worth taking another look.
Generation result here

GPT-4+EP09 Generation

The prompt of EP09 is: Stay focused and dedicated to your goals. Your consistent efforts will lead to outstanding achievements.
Generation result here

Eval Results

The generated FSM are evaluated by GPT-4 and Claude-2. Please find the corresponding results in the folders.

Multi-agent SAP rank result

The initial result of GPT-3.5+SAP is used here in the first loop.
The second loop of the GPT-3.5+SAP+feedback result is shown here and here

Group of 6 rank result

The result of the ranking test by GPT-4 here and by Claude-2 here
In the result txt file, x_1 is GPT-4 with SAP, x_2 is GPT-4 without SAP, x_3 is GPT-3.5 with SAP, x_4 is GPT-3.5 without SAP, x_5 is Claude-2 with SAP, x_6 is Claude-2 without SAP.

Group of 4 rank result

GPT-4 & GPT-3.5

The result of the ranking test by GPT-4 here and by Claude-2 here
In the result txt file, x_1 is GPT-4 with SAP, x_2 is GPT-4 without SAP, x_3 is GPT-3.5 with SAP, and x_4 is GPT-3.5 without SAP.

GPT-4 & Claude-2

The result of the ranking test by GPT-4 here and by Claude-2 here
In the result txt file, x_1 is GPT-4 with SAP, x_2 is GPT-4 without SAP, x_3 is Claude-2 with SAP, and x_4 is Claude-2 without SAP.

Pairs rank result

Only recorded the evaluation of GPT-4.

Evaluate Claude-2

The result of the ranking test about Claude-2 here In the result txt file, x_3 is Claude-2 without SAP, x_4 is Claude-2 with SAP.

Evaluate GPT-3.5

The result of the ranking test about GPT-3.5 here In the result txt file, x_3 is GPT-3.5 without SAP, x_4 is GPT-3.5 with SAP.

Evaluate GPT-4

The result of the ranking test about GPT-4 here In the result txt file, x_3 is GPT-4 without SAP, x_4 is GPT-4 with SAP.

Ablation study 1 formatting result

The result of ablation study 1 evaluated by Claude-2 here In the result txt file, x_3 is GPT-4 without SAP and x_4 is GPT-4 with SAP.

Ablation study 2 result

Only evaluated by GPT-4.

GPT-4+SAP & GPT-4+Zero_shot_COT

The result of ablation study 2 here
In the result txt file, x_3 is GPT-4 with Zero_shot_COT and x_4 is GPT-4 with SAP.

GPT-4+SAP & GPT-4+EP05

The result of ablation study 2 here
In the result txt file, x_3 is GPT-4 with EP05 and x_4 is GPT-4 with SAP.

GPT-4+SAP & GPT-4+EP09

The result of ablation study 2 here
In the result txt file, x_3 is GPT-4 with EP09 and x_4 is GPT-4 with SAP.

Citation

@article{wang&zhong2024SAP_LLM,
      title={LLM-SAP: Large Language Model Situational Awareness Based Planning}, 
      author={Liman Wang and Hanyang Zhong},
      year={2024},
      eprint={2312.16127},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

About

[ICME 2024] Official Datasets and example of LLM-SAP: Large Language Model Situational Awareness Based Planning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •