If you are looking to get involved improving invoice2data
, this
guide will help you get started quickly.
- Fork main repository (optional)
- Clone repository:
git clone https://github.com/m3nu/invoice2data
- Install as editable:
pip install -e invoice2data
Some little-used dependencies are optional. Like pytesseract
and
pdfminer
. Install if needed.
Major folders in the invoice2data
package and their purpose:
input
: Has modules for extracting plain text from files. Currently mostly PDF files.extract
: Get useful data from plain text using templates. The main class –BaseInvoiceTemplate
– is inbase_template
. Other classes can add extra functions and inherit from it. E.g.LineInvoiceTemplate
adds support for getting individual items.extract/templates
: Keeps all supported template files. Add new templates here.output
: Modules to output structured data. CurrentlyCSV
,JSON
andXML
are supported.
This project uses numpydoc extension for Sphinx.
Every new feature should have a test to make sure it still works after modifications done by you or someone else in the future.
To run tests using the current Python version: pytest invoice2data
To run tests using all supported Python versions: tox
(needs
pyenv
and corresponding Python versions installed.)
To test coverage we recommend using pytest-cov
pip install pytest-cov pytest --cov-report html --cov invoice2data --verbose [yourbrowser] htmlconv/index.hml