Fix minor issues for CRAN submission; RAG added to the readme

atomashevic · Aug 20, 2024 · 11ad5be · 11ad5be
1 parent 41bb99f
commit 11ad5be
Show file tree

Hide file tree

Showing 4 changed files with 30 additions and 5 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -6,9 +6,9 @@ Authors@R: c(person("Alexander", "Christensen", email = "alexpaulchristensen@gma
               role = "aut", comment = c(ORCID = "0000-0002-9798-7037")),
 	           person("Hudson", "Golino", email = "[email protected]", role = "aut",
 	            comment = c(ORCID = "0000-0002-1601-1447")),
-	           person("Aleksandar", "Tomasevic", email = "[email protected]", role = c("aut", "cre"),
+	           person("Aleksandar", "Tomašević", email = "[email protected]", role = c("aut", "cre"),
 	           comment = c(ORCID = "0000-0003-4863-6051")))
-Maintainer: Aleksandar Tomasevic <[email protected]>
+Maintainer: Aleksandar Tomašević <[email protected]>
 Description: Implements sentiment analysis using huggingface <https://huggingface.co> transformer zero-shot classification model pipelines for text and image data. The default text pipeline is Cross-Encoder's DistilRoBERTa <https://huggingface.co/cross-encoder/nli-distilroberta-base> and default image/video pipeline is Open AI's CLIP  <https://huggingface.co/openai/clip-vit-base-patch32>. All other zero-shot classification model pipelines can be implemented using their model name from <https://huggingface.co/models?pipeline_tag=zero-shot-classification>.
 License: GPL (>= 3.0)
 Encoding: UTF-8

diff --git a/R/transformer_scores.R b/R/transformer_scores.R
@@ -33,7 +33,7 @@
 #' \href{https://huggingface.co/datasets/multi_nli}{MultiNLI} datasets. The DistilRoBERTa
 #' is intended to be a smaller, more lightweight version of \code{"cross-encoder-roberta"},
 #' that sacrifices some accuracy for much faster speed (see
-#' \href{https://www.sbert.net/docs/pretrained_cross-encoders.html#nli}{https://www.sbert.net/docs/pretrained_cross-encoders.html#nli})}
+#' \href{https://www.sbert.net/docs/cross_encoder/pretrained_models.html#nli}{https://www.sbert.net/docs/cross_encoder/pretrained_models.html#nli})}
 #'
 #' \item{\code{"facebook-bart"}}{Uses \href{https://huggingface.co/facebook/bart-large-mnli}{Facebook's BART Large}
 #' zero-shot classification model trained on the

diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-### CRAN 0.1.4 | GitHub 0.1.5
+### CRAN 0.1.5 | GitHub 0.1.5
 
 [![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active) [![R-CMD-check](https://github.com/atomashevic/transforEmotion/actions/workflows/r.yml/badge.svg)](https://github.com/atomashevic/transforEmotion/actions/workflows/r.yml) [![Downloads Total](https://cranlogs.r-pkg.org/badges/grand-total/transforEmotion?color=brightgreen)](https://cran.r-project.org/package=transforEmotion) 
 
@@ -120,6 +120,31 @@ transformer_scores(
 )
 ```
 
+## RAG 
+
+The `rag` function  is designed to enhance text generation using Retrieval-Augmented Generation (RAG) techniques. This function allows users to input text data or specify a path to local PDF files, which are then used to retrieve relevant documents.
+
+The rag function supports various large language models (LLMs), including TinyLLAMA, LLAMA-2, Mistral-7B, Orca-2, and Phi-2, each offering different levels of computational efficiency and quality. The default model is TinyLLAMA, which is the fastest model.
+
+Here's an example based on the decription of this package. First, we specify the text data.
+
+```R
+text <- "With `transforEmotion` you can use cutting-edge transformer models for zero-shot emotion classification of text, image, and video in R, *all without the need for a GPU, subscriptions, paid services, or using Python. Implements sentiment analysis using [huggingface](https://huggingface.co/) transformer zero-shot classification model pipelines. The default pipeline for text is [Cross-Encoder's DistilRoBERTa](https://huggingface.co/cross-encoder/nli-distilroberta-base) trained on the [Stanford Natural Language Inference](https://huggingface.co/datasets/snli) (SNLI) and [Multi-Genre Natural Language Inference](https://huggingface.co/datasets/multi_nli) (MultiNLI) datasets. Using similar models, zero-shot classification transformers have demonstrated superior performance relative to other natural language processing models (Yin, Hay, & Roth, [2019](https://arxiv.org/abs/1909.00161)). All other zero-shot classification model pipelines can be implemented using their model name from https://huggingface.co/models?pipeline_tag=zero-shot-classification." 
+```
+
+And then we run the `rag` function.
+
+```R
+ rag(text, query = "What is the use case for transforEmotion package?"
++ )
+```
+
+This code will provide the output similar to this one.
+
+```
+The use case for transforEmotion package is to use cutting-edge transformer models for zero-shot emotion classification of text, image, and video in R, without the need for a GPU, subscriptions, paid services, or using Python. This package implements sentiment analysis using the Cross-Encoder's DistilRoBERTa model trained on the Stanford Natural Language Inference (SNLI) and MultiNLI datasets. Using similar models, zero-shot classification transformers have demonstrated superior performance relative to other natural language processing models (Yin, Hay, & Roth, [2019](https://arxiv.org/abs/1909.00161)). The transforEmotion package can be used to implement these models and other zero-shot classification model pipelines from the HuggingFace library.> 
+```
+
 ## Image Example
 
 For Facial Expression Recognition (FER) task from images we use Open AI's [CLIP](https://huggingface.co/openai/clip-vit-base-patch32) transformer model. Two input arguments are needed: the path to image and list of emotion labels.

diff --git a/man/transformer_scores.Rd b/man/transformer_scores.Rd