# OCR with PDF Embedded Text [Document AI - Document OCR](https://cloud.google.com/document-ai/docs/document-ocr) From [Release Notes](https://cloud.google.com/document-ai/docs/release-notes#December_19_2022) The Document AI OCR Processor has the following new features: - The OCR Processor now supports extracting embedded text from digital PDFs in public preview. A fallback to the optical OCR model is automatically triggered to extract text in the regions when the PDF being processed contains non-digital text. To opt into this feature, set [`process_options.ocr_config.enable_native_pdf_parsing=true`](https://cloud.google.com/document-ai/docs/reference/rest/v1/ProcessOptions#OcrConfig) in your API request to the OCR Processor. Known issues with the digital PDF feature of the Document AI OCR Processor: - On a small number of documents, the word ordering within lines of text as reported by native text extraction might be wrong. - On certain documents, invisible text embedded in a native PDF may be reported. - On certain Japanese documents, currency symbols such as Yen might be incorrectly extracted as `/`. - On certain documents, apostrophe symbols may be missing in word/line results. - On certain documents, native text extraction might report different word/line results than those obtained by image-based OCR on an identical document. ## Sample Document - A sample document has been provided that demonstrates how the results can vary by using embedded text instead of OCR detected text. - [Declaration of Independence (Cursive)](DeclarationOfIndependence-Cursive.pdf) - This document is the text of The Declaration of Independence in a cursive script created in Google Docs. - Try this document with the sample code in [`main.py`](main.py) with `enable_native_pdf_parsing` set to `True` or `False` and compare the results. - [Example Diff](https://www.diffchecker.com/Z4QIzt3H/) (`enable_native_pdf_parsing` set to `True` and `False` respectively