Skip to content

shtratos/ms-uk-payslip-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ms-uk-payslip-parser

Simple parser for payslips issued by MS UK.

Converts a series of your PDF payslips into a neat CSV table.

Installation

  • Install Python3 3.7+ and Virtualenv
  • Install dependencies
# create a virtualenv
mkvirtualenv payslip-parser
# switch to virtualenv
workon payslip-parser
# install dependencies
pip3 install -r requirements.txt
  • Or if you have pipenv installed:
pipenv install

Usage

  1. Download your payslips PDF files from the portal and put them in a directory e.g. ~/payslips

  2. Get into your virtualenv:

    workon payslip-parser

    or if you have pipenv

    pipenv shell
  3. First, convert PDF files to text:

    python3 to_text.py ~/payslips ./text_payslips

    Now you should see text files with your payslips content in text_payslips directory.

  4. Now you can parse the text files and produce CSV tables:

    python3 parser.py ./text_payslips

    After this you will see two CSV files in this directory:

    • payslips-month-columns.csv - each month's data is in a separate column
    • payslips-month-rows.csv - each month's data is in a separate row

    Every payslip item label has a short prefix identifying its payslip section:

    • .m - metadata item
    • .d.p - payments data item
    • .d.d - deductions data item
    • .d.t - totals data item
    • .d.et - employer totals data item
    • .d.ytd - year-to-date data item
  5. Open the CSV file in your spreadsheet editor of choice or Pandas.

Feedback

Create an issue if you encounter a problem or have a suggestion. Or ping me on Teams.

About

Parser for payslips

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages