For the official presentation in Romanian see: Official Presentation (RO)

General Presentation

UNpaper is the practical part (project implementation) of my Bachelor Thesis at Transilvania University of Brasov (Romania).

UNpaper is a complex Document Digitizer Cloud Solution, implemented using Azure Cloud Architecture, Microservices, Serverless components, Single Page Applications, Databases and Blob Storages, and other similar technologies.

The main purpose of UNpaper is to reduce time spent by medium and small sized businesses in redundant document processing activities, such as registering documents by hand. UNpaper thus automates the pre-processing part of analogous data (e.g. invoices, receipts, identification documents, etc.).

Thesis Abstract

Countless domains and professional fields are lacking proper digitalization and automation when it comes to redundant daily processes and tasks. Numerous documents are massively pre-processed, as the data is being manually introduced in different administrative, financial, or managerial systems. These processes are highly demanding in terms of time and effort, and it would be more efficient to invest these resources into more high-level activities that require human assistance, such as: analyzing, consulting, managing, etc. The purpose of this paper is to describe the existing problems, and to explain the researching, analyzation, testing, and implementation processes of the proposed solution, more specifically a document digitizer cloud solution, which involves: a cloud architecture, state-of-the-art cognition technologies, Single Page Applications, modern data storage and management techniques, and last but not least intuitive userfriendly functionalities.

Objectives

The three main objectives of UNpaper are, as follows:

Reducing time spent by employees in redundant paperwork activities, automating the pre-processing part in the manner the users impose.
Being accessible to small and medium-sized businesses, and especially being user friendly for people who have previously worked mostly with analogous data and are now transitioning to a more digital environment.
Offering a high level of security and reliability, by using state-of-the-art architectures and technologies.

Extensive objectives of UNpaper are, as follows:

Upgrading intelligent user flows to recognize specific keywords, document structures, etc. and handle (ex: Notifying the administrator of documents due dates)
Provide users with more advanced tools meant for defining sandbox environments, in order to improve the genericity of the solution and grant users more specificity in their tasks and requests.
Implementing tools for more specific local paperwork and legislative requirements, like automatic generation of payment sheets.
Creating a more solid link between UNpaper and other databases, ERMs, and systems widely used in office work.

Technical Presentation

User Flow

The following diagram describes the user flow, as initially intended. The flow is split in 3 stages/levels as follows:

User Level: The physical level, of the analogous data (i.e. documents which need to be processed).
Application Level: The main level, this is the level where UNpaper comes into play by digitalizing and automating the process.
Post-processing Level: This level concerns about future post-processing and storing functionalities wanted/needed by the user.

Functionalities

Layout Analysis

Quick analyzation of document's structure
Text extraction capabilities (i.e. OCR)
Tables extraction in CSV format
Compatible with: PNG, PDF, JPG, TIFF, JSON

Prebuilt Analysis

Quick analyzation of documents, based on pre-trained AI models (e.g. invoices, recipes, ID cards, etc.)
Extraction of the analyzed data in a structured format (CSV)
Compatible with: PNG, PDF, JPG, TIFF, JSON

Custom labeled analysis & resources management

Custom user hierarchy builder based on organizations and batches (categories of documents)
Custom models creation and training, later used for extracting data from custom documents in a structured format
Simultaneously processing of multiple documents (in a queue)

Projects Structure

There are 3 main projects/components on which the UNpaper implementation is based upon. Below are some diagrams showing how each project is structured.

1. UNpaper.WebUI

2. UNpaper.RegistryAPI

Structure	Description
	API: Controllers and routing properties Business: Business Logic Common: Common classes within the project Data: Data Logic classes (ex: Repositories, Database configurations, etc.) Interface: Interfaces Model: Entity Models

Entity Models

Registry API Flow

3. UNpaper.AzureFunctions

Cloud Atchitecture

UNpaper is entirely developed using Azure Cloud tools and technologies, below is a diagram showing how these components are organized and related to each other.

Blob Storage Structure

Technologies and dependencies

Some of the main tools and technologies used are listed bellow:

Languages:
- C#
- HTML, (S)CSS, JS
- TypeScript
Frameworks:
- ASP NET
- Angular
- React
Packet Managers:
- NuGet
- Node Package Manager
Azure Cloud Components:
- Azure App Service
- Azure Functions
- APIM (API Management Service)
- Form Recognizer Cognitive Service
- AD B2C (Active Directory Business-To-Client)
- Azure SQL Server & SQL Database
- Azure Blob Storage
- Azure DevOps
Other:
- LINQ
- Entity Framework Core
- Postman & Swagger

Microsoft FOTT Integration

At the core of UNpaper and it's functionalities are laying 2 main components:

Azure Form Recognizer
Microsoft's OCR-Form-Tools (FOTT)

Integrating FOTT into UNpaper's WebUI SPA frontend was a complex and uncertain (at the beginning) process. Because UNpaper WebUI is developed using Angular, while FOTT uses React.js. Below is a diagram showing all the options which were taken into consideration while trying to handle this integration, together with the considered advantages, disadvantages, and final choices/results.

Future Improvements

Bank Sheet Templates automatic generation
Employees Roadmaps automation
Improved post-processing output management
Automatic synchronization with Business Databases & ERMs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

General Presentation

Thesis Abstract

Objectives

Technical Presentation

User Flow

Functionalities

Layout Analysis

Prebuilt Analysis

Custom labeled analysis & resources management

Projects Structure

1. UNpaper.WebUI

2. UNpaper.RegistryAPI

Entity Models

Registry API Flow

3. UNpaper.AzureFunctions

Cloud Atchitecture

Blob Storage Structure

Technologies and dependencies

Microsoft FOTT Integration

Future Improvements

Files

README.md

Latest commit

History

README.md

File metadata and controls

General Presentation

Thesis Abstract

Objectives

Technical Presentation

User Flow

Functionalities

Layout Analysis

Prebuilt Analysis

Custom labeled analysis & resources management

Projects Structure

1. UNpaper.WebUI

2. UNpaper.RegistryAPI

Entity Models

Registry API Flow

3. UNpaper.AzureFunctions

Cloud Atchitecture

Blob Storage Structure

Technologies and dependencies

Microsoft FOTT Integration

Future Improvements