Beneficial ownership data analysis tools

Data analysis tools to help analysts, journalists and anyone wanting to examine and dive into beneficial ownership data published in line with the Beneficial Ownership Data Standard.

Install

pip install git+https://github.com/openownership/bodsdata.git@main#egg=bodsdata

Usage

To get the initial flattened data you need to run the following, giving an output directory and source name of your choice.

import bodsdata

# Set output directory default to tmp folder if not set.
bodsdata.output_dir = 'my_dir'

# Have to name a source
source = 'my_data_source'

# Downloand individual file or zip file.
bodsdata.download_file('some_url', source=source)

# Flatten all the data into CSV files and an SQLite file.
# this step needs to happen before other functions can work
bodsdata.flatten(source)

After the above is run there are various choices to how you can use the data.

# Create dictionary of pandas dataframes
dataframes = bodsdata.pandas_dataframe(source)

# Create dictionary of polars dataframes
dataframes = bodsdata.polars_dataframe(source)

# Create parquet files for spark or other data tools
bodsdata.create_parquet(source)

# Create pg_dump for inserting into postgres database
bodsdata.create_pgdump(source)

# Zip CSV and create datapackage 
bodsdata.datapackage(source)

# zip or gzip sqlite files 
bodsdata.sqlite_zip(source)
bodsdata.sqlite_gzip(source)

Pipeline

A pipeline which runs all these stages is also available:

bodsdata.run_pipeline(source, title, description, download, upload, bucket)

Consistency Checks

After the input data has been downloaded (but before doing anything else) it is possible perform some basic consistency checks using:

# Perform consistency checks on source data
bodsdata.check_data_consistency(source)

By default this checks:

Check for missing require fields in each statement within input data
Check for duplicate statementIDs within input data
Check internal references within input data

These checks can be individually disabled with optional keyword arguments (e.g. check_missing_fields=False, check_statement_dups=False, check_statement_refs=False). Setting one of these arguments to an integer value will also allow the checks to pass if exactly that number of that type of errors are found (e.g. check_statement_refs=62 will pass if exactly 62 referencing errors are found).

An error_limit parameter (default=1000) controls the maximum number of consistency check errors which will be printed.

When running the pipeline, the argument checks=False will disable the consistency checks stage.

Development

To test website locally:

cd bodsdataweb
flask run

Tests can be run with:

pytest

Change web-site theme

Theme lives at bodsdataweb/bootswatch/dist/bodsdata/

Modify _bootswatch.scss for big changes and _variables.scss for changes to variables.

To build theme:

cd bodsdataweb/bootswatch
// for one of build
grunt swatch 
// to watch changes
grunt  watch

Deploy website

bodsdata.build_website(source)

You will need the following envirment variables to be set:

export AWS_ACCESS_KEY_ID=XXX
export AWS_SECRET_ACCESS_KEY=XXX
export AWS_DEFAULT_REGION=eu-west-1

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.github/workflows		.github/workflows
bodsdataweb		bodsdataweb
requirements		requirements
tests		tests
.gitignore		.gitignore
LICENCE.rst		LICENCE.rst
README.md		README.md
bodsdata.py		bodsdata.py
consistency_checks.py		consistency_checks.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beneficial ownership data analysis tools

Install

Usage

Pipeline

Consistency Checks

Development

Change web-site theme

Deploy website

About

Releases

Packages

Contributors 4

Languages

License

openownership/bodsdata

Folders and files

Latest commit

History

Repository files navigation

Beneficial ownership data analysis tools

Install

Usage

Pipeline

Consistency Checks

Development

Change web-site theme

Deploy website

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages