A8: Large Language Models

Instructions

For this assignment, you are welcome to use any modern AI Chatbot you want unless specified. Some of the main free models are:

Microsoft Co-pilot - Free in the Edge Browser
ChatGPT - Requires free account for ChatGPT 3.5
Google Gemini - Requires a gree Google account.
Claude AI - Requires a free account and a phone number.

Important

For all tasks, always start a fresh conversation for each prompt.

If you change models inbetween questions, please note which model you switched to.

Part 1: Prompt Engineering Techniques

Consider the following prompt:

You are a friendly administrative assistant who helps write important safety instructions at a magical university of sorcery. Instructions you write should be addressed to a bureaucratic council of safety wizards known as the “Esteemed Magical Fellows”. At least one of the instruction’s steps should be to cast a spell.

Here’s an example set of instructions.

Task: Extracting a Prondleby from a Skub.

Esteemed Magical Fellows,

Below are the safety instructions for extracting a Prondleby from a Skub. Above all else, attention must be paid to the emotional state of the Skub. Please do not allow your pro-skub or anti-skub attitudes to influence your duty to safety.

Step 1: Don your personal protective equipment.

Step 2: Bathe the Skub. Be sure to never fully submerge a Skub in water.

Step 3. Cast “Spin Pronds” in order to generate the heat needed to melt the seal of the Prondleby. If you see any smoke, you must stop immediately, and wait at least 3 months before rebathing the Skub.

Step 4: If you have completed all previous steps, the Skub should ask you if you want it’s Prondleby. Be sure to refuse the first two times it asks with, “Oh I really shouldn’t”. You may accept the Prondleby the third time the Skub asks.

Step: Store the Prondleby in a light-proof container. Put the Skub back into its colony, return your personal protective equipment, and you’re done! Congratulations, esteemed Magical Fellow.

Write safety instructions for the following Sourcerer task. First write a summary of the instructions and explain your reasoning step-by-step.

Task: Disposing of Rippleskip juice.

1.1 Try giving the prompt to the AI Chatbot of your choice. Paste its response here and note which model you used.

Paste response here

1.2 Highlight the following parts of the prompt/prompting strategies in it's text above.

1.2.1 Make the part of the prompt which uses one shot learning bold.

1.2.2 Make the part of the prompt which uses chain-of-thought prompting italic.

1.2.3 Make the part of the prompt which uses ~~role-giving crossed-through~~.

1.2.4 Make the part of the prompt which gives the task description formatted as code.

1.3 Consider the 3 the parts of the prompt besides the task description which you highlighted above: one-shot learning, chain-of-thought prompting, and role-giving. For each of the 3 parts, try giving the prompt with that part excluded to the same AI Chatbot you used for the complete prompt. For instance, for 1.3.1 give the AI Chatbot the entire prompt except the parts which constitute one shot learning. Be sure to start a fresh conversation each time.

[IMPORTANT] Make sure you don't copy and paste the special characters you added to format those parts of the prompt.

1.3.1 Response without one shot learning:

Paste response here

1.3.2 Response without chain-of-thought prompting:

Paste response here

1.3.3 Response without role-giving:

Paste response here

1.4 What do you notice about the quality of the responses when each of the parts is removed? Which part of the prompt seemed to be most important for getting a good response?

Your response here

Part 2: Misinformation

2.1.1 Try using the following prompt, replacing the blank with a field you’re interested in. Paste your prompt and the response.

You are a helpful research scientist. Create a small report on the use of AI in ________. Include citations and a references section.

Paste your prompt here.

Paste the response here.

2.1.2 Check the citations generated by the AI. You don’t need to check every one, but look at 2 or 3. Are the papers real? Were they published in the year the AI citation claims? One way you can double check a source is to paste the information into an academic search engine like Google Scholar. Google Scholar is also not 100% trustworthy, but usually is correct.

Your response here.

2.2.1 Head to https://quiz.cord.com/ and try to get the AI to get the questions correct. Paste a screenshot of your score here.

Note

You will not be graded on the number of questions the AI gets correct, just that you completed the quiz.

Insert your screenshot Here

2.2.2 Based on that quiz, what are two tasks LLMs struggle with?

Your response here.

2.2.3 For each struggle/task you mentioned above, create a prompt which an LLM would struggle to get correct.

Prompt for struggle 1 here.

Prompt for struggle 2 here.

Part 3: Comparing Models

3.1.1 Give the prompt to the AI Chatbot of your choice and paste its response.

Answer the following question in 3-5 sentences. When would somebody use milk for cooking versus using water?

Paste Response Here

Let's compare some model performance. Head over to the Chatbot Arena: https://chat.lmsys.org/.

3.1.2 One task LLM models can struggle with is negation, tasks similar to adding "not" or trying to make something its opposite. Let's see how different models compare in the negation task.

Take the milk vs. water response you pasted above and use it as the basis for a new prompt with the following format.

Create the opposite of the following paragraph, where each sentence has the same general structure but now means its opposite:

[PUT YOUR MILK VS. WATER RESPONSE HERE]

Try this negation prompt in the Chatbot Arena for 4 models, or 2 runs total. It's okay if one of the models happens to repeat. Record the 4 models and their responses below. Remember to also vote for your preference on the website between each pair!

Model 1: [PUT MODEL NAME HERE]

Paste Response Here

Model 2: [PUT MODEL NAME HERE]

Paste Response Here

Model 3: [PUT MODEL NAME HERE]

Paste Response Here

Model 4: [PUT MODEL NAME HERE]

Paste Response Here

3.1.3 Which models suceeded versus didn't suceed at the task? Which model gave, in your opinion, the best response and why?

Your Response Here

Now see how models perform for each of the two prompts you created in question 2.2.3. Record the responses and give your thoughts below.

3.3.1 Prompt 2:

Model 1: [PUT MODEL NAME HERE]

Paste Response Here

Model 2: [PUT MODEL NAME HERE]

Paste Response Here

Model 3: [PUT MODEL NAME HERE]

Paste Response Here

Model 4: [PUT MODEL NAME HERE]

Paste Response Here

3.2.2 Which models suceeded versus didn't suceed at the task? Which model gave, in your opinion, the best response and why?

Your Response Here

3.3.1 Prompt 1:

Model 1: [PUT MODEL NAME HERE]

Paste Response Here

Model 2: [PUT MODEL NAME HERE]

Paste Response Here

Model 3: [PUT MODEL NAME HERE]

Paste Response Here

Model 4: [PUT MODEL NAME HERE]

Paste Response Here

3.3.2 Which models suceeded versus didn't suceed at the task? Which model gave, in your opinion, the best response and why?

Your Response Here

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A8: Large Language Models

Instructions

Part 1: Prompt Engineering Techniques

Part 2: Misinformation

Part 3: Comparing Models

About

Releases

Packages

csmagnano/llm-assignment-pml

Folders and files

Latest commit

History

Repository files navigation

A8: Large Language Models

Instructions

Part 1: Prompt Engineering Techniques

Part 2: Misinformation

Part 3: Comparing Models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages