MEGAT P&ID Extractor Streamlit App
P & ID or Process and Instrumentation Diagram is one of the most important drawings for Chemical Engineer/ Process Engineers. P&ID is a non-proportional and not-drawn-to-scale schematic illustration that shows process flow of a piping system, connected with instrumentations and system equipments. This diagram is like a master document for everything we are going to build in piping system. The diagram is used for multiple purposes. Such as mass balance calculation/ hydraulic calculation / risk assessment / hazard and operability analysis and for general troubleshooting. Currently, the reading of the drawing are done manually. I'm looking for ways to automated this reading and making things easier for me as a process engineer.
This is the hardest part, so I'm doing this part first. There isn't a standard way on how to properly put a line number. So I'm googling to find the most common ways for people to write the line number. This is what found:
So i'm using this to find the most common pattern line number. Inputting that into regular expression and yes, we can extract the line number. But if other patterns exist, you may contact me, so I may adjust my code.
I'm currently using PyPDF2 library to extract the text. It has issues in identifying special characters like 3/4 or 1/2 or 5/3. It return blanks when it found this characters. I'll try to update this code by using Tesseract or other OCR package next time.
There are few other plans that I have for this. All are listed below:
- Extracting all process type information
- Extracting all equipment information
- Highlighting potential hazard/risk so it can help during HAZOP / Risk Assessment