Skip to content

Model Evaluation

cybercoder edited this page Jan 25, 2024 · 5 revisions

The IntelliNode module provides a comprehensive evaluation mechanism that enables quick comparisons between different large models. With this feature, you can evaluate and determine which model is better suited to your use cases.

LLM Evaluation

The Language Model Evaluation feature is used to compare different Language Learning Models (LLMs) such as ChatGPT, LLaMA, Cohere, etc. The evaluation is performed by comparing the cosine similarity and Euclidean distance scores of predicted responses against an array of target answers.

Usage

Here is an example to demonstrate the usage of LLM Evaluation. In this case, we are comparing OpenAI's ChatGPT language model with Cohere's command completion model:

Step 1: Imports
The first thing is to import the required modules from the IntelliNode library. The main module that we need for this is LLMEvaluation, along with SupportedChatModels and SupportedLangModels.

const { LLMEvaluation, SupportedChatModels, SupportedLangModels } = require('intellinode');

Step 2: Create Providers Set
Create a list of the chat or language models you want to evaluate with each provider’s API key, the provider name, the model name, and the maximum tokens that the model is allowed to generate in a single call.

const openaiChat = { 
    apiKey: openaiKey, 
    provider: SupportedChatModels.OPENAI, 
    type: 'chat', 
    model: 'gpt-3.5-turbo', 
    maxTokens: 100
};

const cohereCompletion = { 
    apiKey: cohereKey, 
    provider: SupportedLangModels.COHERE, 
    type: 'completion', 
    model: 'command', 
    maxTokens: 100 
};

const llamaChat = {
    apiKey: replicateKey, 
    provider: SupportedChatModels.REPLICATE,
    type:'chat', 
    model: '13b-chat', 
    maxTokens: 100
}


const geminiChat = {
  apiKey: geminiKey, 
  provider: SupportedChatModels.GEMINI,
  type: 'chat', 
  model: 'gemini'
};

// the models set to compare
const providerSets = [openaiChat, cohereCompletion, llamaChat, geminiChat];

Step 3: Call LLMEvaluation
Create an instance of LLMEvaluation and provide it with the models set:

// create the LLMEvaluation with the desired embedding algorithm for unified comparison.
const llmEvaluation = new LLMEvaluation(openaiKey, 'openai');

// the model input
const inputString = "Explain the process of photosynthesis in simple terms.";

// target answers to calculate the average distance
const targetAnswers = [
    "Photosynthesis is the process where green plants use sunlight to...",
    "Photosynthesis is how plants make their own food. They take in water..."
];

// execute the evaluation
const results = await llmEvaluation.compareModels(inputString, targetAnswers, providerSets);

The function will return an object that contains the prediction from each model, along with its cosine similarity score and euclidean distance score compared to the target answers.

{
  'openai/gpt-3.5-turbo': [
    {
      prediction: 'Photosynthesis is how plants make food for themselves....',
      score_cosine_similarity: 0.9566836802012463,
      score_euclidean_distance: 0.29175853870023755
    }
  ],
  'replicate/13b-chat': [
      prediction: "Here's an explanation of photosynthesis in simple terms .....",
      score_cosine_similarity: 0.9096764395396765,
      score_euclidean_distance: 0.4248874961328429
  ],
  ...
  lookup: {
    cosine_similarity: 'a value closer to 1 indicates a higher degree of similarity between two vectors',
    euclidean_distance: 'the lower the value, the closer the two points'
  }
}
Clone this wiki locally