From 11ad5beed5e9dd0ee63e21ab002ddf46e452c5d6 Mon Sep 17 00:00:00 2001 From: Aleksandar Tomasevic Date: Tue, 20 Aug 2024 15:18:05 +0200 Subject: [PATCH] Fix minor issues for CRAN submission; RAG added to the readme --- DESCRIPTION | 4 ++-- R/transformer_scores.R | 2 +- README.md | 27 ++++++++++++++++++++++++++- man/transformer_scores.Rd | 2 +- 4 files changed, 30 insertions(+), 5 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index d824dce..0111583 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -6,9 +6,9 @@ Authors@R: c(person("Alexander", "Christensen", email = "alexpaulchristensen@gma role = "aut", comment = c(ORCID = "0000-0002-9798-7037")), person("Hudson", "Golino", email = "hfg9s@virginia.edu", role = "aut", comment = c(ORCID = "0000-0002-1601-1447")), - person("Aleksandar", "Tomasevic", email = "atomashevic@gmail.com", role = c("aut", "cre"), + person("Aleksandar", "Tomašević", email = "atomashevic@gmail.com", role = c("aut", "cre"), comment = c(ORCID = "0000-0003-4863-6051"))) -Maintainer: Aleksandar Tomasevic +Maintainer: Aleksandar Tomašević Description: Implements sentiment analysis using huggingface transformer zero-shot classification model pipelines for text and image data. The default text pipeline is Cross-Encoder's DistilRoBERTa and default image/video pipeline is Open AI's CLIP . All other zero-shot classification model pipelines can be implemented using their model name from . License: GPL (>= 3.0) Encoding: UTF-8 diff --git a/R/transformer_scores.R b/R/transformer_scores.R index 2d41a50..5698d8a 100644 --- a/R/transformer_scores.R +++ b/R/transformer_scores.R @@ -33,7 +33,7 @@ #' \href{https://huggingface.co/datasets/multi_nli}{MultiNLI} datasets. The DistilRoBERTa #' is intended to be a smaller, more lightweight version of \code{"cross-encoder-roberta"}, #' that sacrifices some accuracy for much faster speed (see -#' \href{https://www.sbert.net/docs/pretrained_cross-encoders.html#nli}{https://www.sbert.net/docs/pretrained_cross-encoders.html#nli})} +#' \href{https://www.sbert.net/docs/cross_encoder/pretrained_models.html#nli}{https://www.sbert.net/docs/cross_encoder/pretrained_models.html#nli})} #' #' \item{\code{"facebook-bart"}}{Uses \href{https://huggingface.co/facebook/bart-large-mnli}{Facebook's BART Large} #' zero-shot classification model trained on the diff --git a/README.md b/README.md index 54cf5d9..d361719 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -### CRAN 0.1.4 | GitHub 0.1.5 +### CRAN 0.1.5 | GitHub 0.1.5 [![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active) [![R-CMD-check](https://github.com/atomashevic/transforEmotion/actions/workflows/r.yml/badge.svg)](https://github.com/atomashevic/transforEmotion/actions/workflows/r.yml) [![Downloads Total](https://cranlogs.r-pkg.org/badges/grand-total/transforEmotion?color=brightgreen)](https://cran.r-project.org/package=transforEmotion) @@ -120,6 +120,31 @@ transformer_scores( ) ``` +## RAG + +The `rag` function is designed to enhance text generation using Retrieval-Augmented Generation (RAG) techniques. This function allows users to input text data or specify a path to local PDF files, which are then used to retrieve relevant documents. + +The rag function supports various large language models (LLMs), including TinyLLAMA, LLAMA-2, Mistral-7B, Orca-2, and Phi-2, each offering different levels of computational efficiency and quality. The default model is TinyLLAMA, which is the fastest model. + +Here's an example based on the decription of this package. First, we specify the text data. + +```R +text <- "With `transforEmotion` you can use cutting-edge transformer models for zero-shot emotion classification of text, image, and video in R, *all without the need for a GPU, subscriptions, paid services, or using Python. Implements sentiment analysis using [huggingface](https://huggingface.co/) transformer zero-shot classification model pipelines. The default pipeline for text is [Cross-Encoder's DistilRoBERTa](https://huggingface.co/cross-encoder/nli-distilroberta-base) trained on the [Stanford Natural Language Inference](https://huggingface.co/datasets/snli) (SNLI) and [Multi-Genre Natural Language Inference](https://huggingface.co/datasets/multi_nli) (MultiNLI) datasets. Using similar models, zero-shot classification transformers have demonstrated superior performance relative to other natural language processing models (Yin, Hay, & Roth, [2019](https://arxiv.org/abs/1909.00161)). All other zero-shot classification model pipelines can be implemented using their model name from https://huggingface.co/models?pipeline_tag=zero-shot-classification." +``` + +And then we run the `rag` function. + +```R + rag(text, query = "What is the use case for transforEmotion package?" ++ ) +``` + +This code will provide the output similar to this one. + +``` +The use case for transforEmotion package is to use cutting-edge transformer models for zero-shot emotion classification of text, image, and video in R, without the need for a GPU, subscriptions, paid services, or using Python. This package implements sentiment analysis using the Cross-Encoder's DistilRoBERTa model trained on the Stanford Natural Language Inference (SNLI) and MultiNLI datasets. Using similar models, zero-shot classification transformers have demonstrated superior performance relative to other natural language processing models (Yin, Hay, & Roth, [2019](https://arxiv.org/abs/1909.00161)). The transforEmotion package can be used to implement these models and other zero-shot classification model pipelines from the HuggingFace library.> +``` + ## Image Example For Facial Expression Recognition (FER) task from images we use Open AI's [CLIP](https://huggingface.co/openai/clip-vit-base-patch32) transformer model. Two input arguments are needed: the path to image and list of emotion labels. diff --git a/man/transformer_scores.Rd b/man/transformer_scores.Rd index 6497a7c..1892cee 100644 --- a/man/transformer_scores.Rd +++ b/man/transformer_scores.Rd @@ -47,7 +47,7 @@ zero-shot classification model trained on the \href{https://huggingface.co/datasets/multi_nli}{MultiNLI} datasets. The DistilRoBERTa is intended to be a smaller, more lightweight version of \code{"cross-encoder-roberta"}, that sacrifices some accuracy for much faster speed (see -\href{https://www.sbert.net/docs/pretrained_cross-encoders.html#nli}{https://www.sbert.net/docs/pretrained_cross-encoders.html#nli})} +\href{https://www.sbert.net/docs/cross_encoder/pretrained_models.html#nli}{https://www.sbert.net/docs/cross_encoder/pretrained_models.html#nli})} \item{\code{"facebook-bart"}}{Uses \href{https://huggingface.co/facebook/bart-large-mnli}{Facebook's BART Large} zero-shot classification model trained on the