Skip to content

Annotate entities directly onto a PDF with automatic OCR for scanned PDFs

License

Notifications You must be signed in to change notification settings

gradox2020/react-pdf-ner-annotator

 
 

Repository files navigation

react-pdf-ner-annotator

A React component to annotate named entities directly onto a PDF.

Live demo

Features

  • NER annotation
  • Area annotation
  • OCR on scanned PDFs

Installation

The package can be installed through NPM.

npm install react-pdf-ner-annotator

Usage

For simple example usage you can refer to example/src/App.tsx.

import Annotator from 'react-pdf-ner-annotator';
// import the css
import 'react-pdf-ner-annotator/lib/css/style.css';
// OR import the sass
import 'react-pdf-ner-annotator/lib/scss/style.scss';
<Annotator url={'http://example.pdf'} />

Properties

Name Type Required Default value Description
url string Either url or data is required undefined The URL of the PDF.
data Uint8Array BufferSource string Either data or url is required
httpHeaders { [key: string]: string } no undefined Extra fields for in the HTTP header when for example authentication is needed.
initialScale number no 1.5 The initial scale to display the PDF as. Must be between 1 and 2.
tokenizer RegExp no A Regular expression for the tokenization of the paragraphs.
disableOCR boolean no false You can set this value to true if your PDF doesn't have a text layer and you don't want the frontend to run OCR.
entity Entity no undefined The active Entity to annotate on the PDF.
initialTextmap Array<TextLayer> no undefined A array of TextLayer if you want to provide your own TextLayer for the PDF instead of letting the frontend generate one.
defaultAnnotations Array<Annotation> no [] An array of Annotation to show on the PDF.
ref ref no undefined A ref to pass to the Annotator, this ref can be used to call removeAnnotation.

Callback methods

Name Parameters Return type Required Description
getAnnotations annotations: Array void yes Provide a method that takes an array of Annotation as input. This method can be the set of a useState hook for example. This is to subscribe to changes of the annotations made on the PDF.
getTextMaps maps: Array void no Same as the annotations but for the text layers. This will only return text layers of pages with annotations on them.

Local development

Contributors

About

Annotate entities directly onto a PDF with automatic OCR for scanned PDFs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 88.3%
  • SCSS 9.8%
  • HTML 1.9%