-
Notifications
You must be signed in to change notification settings - Fork 21
GitHub Contributors
Contributions to this project are managed through a standard GitHub process for external contributions, by forking this repository, making changes in your fork, and then creating a pull request in the main repository to request that an administrator review and accept this changes into the repository. You must have a GitHub account to do this.
This GitHub guide to forking provides general instructions on how to make a fork, make changes, and create a pull request back in the original repository. We recommend cloning the repository on your local machine and making edits to these files and committing and pushing them back to the repository, because working in the GitHub editor can be difficult.
We further recommend working within a text or code editor (not Excel) to make changes to the CSV files, because when Excel opens and saves the files it can result in changes to special characters due to encoding issues. Once you perform a pull request, please request that it be reviewed by a maintainer.
The two primary manners of contributing are by adding to the flow list and creating of editing flow mappings files. Other forms of contribution are also welcome, such as improvements to the list building code, JSON-LD output writing, adding analysis functions or tests, etc.
- Python 3.6.5 or greater, 64-bit version - https://www.python.org/downloads/ The 64-bit version is required for use of the pyarrow dependency.
- The latest version of pip. Included as a default option with Python installation.
Install a release directly from github using pip. From a command line interface, run:
pip install git+https://github.com/USEPA/[email protected]#egg=Federal-LCA-Commons-Elementary-Flow-List
where you can replace 'v1.0' with the version you wish to use under Releases.
This should install all the necessary dependencies for you.
To uninstall, use the command:
pip uninstall fedelemflowlist
fedelemflowlist
has been developed in PyCharm on Windows, and instructions are provided for working with it
in that environment.
These are in addition to those on Install.
- Create a new project in Pycharm from a version control system, and use the homepage 'Clone' link to get the URL to use for git
- Open the project. You will be warned if missing any dependencies. Install those and restart.
- Right-click on the 'fedelemflowlist' folder in the project view and Mark Directory as 'Sources Root' and restart.
The flow list can be revised by adding to the input files used to build the flow list. Edits to these files will update the flow list once it is rebuilt. This process has to be done through forking and a pull request. See Contributing.
We recommend these steps:
- Read the EPA Report describing how the flows and contexts are defined.
- Identify the class appropriate to add the flow to, and the appropriate nomenclature or term sources to use. For example, the EPA Substance Registry Services 'EPA Registry Name' is the first choice name for Chemicals.
- Identify the input files
that need to be updated. For new flowables, find the
Flowables
andFlowablePrimaryContexts
files for the pertinent class. Check very thoroughly to be sure that this flowable is not already in the list. Proceed, if you are certain that it is not present. - Add the new flowable with all fields as a new record in the class
Flowables
file. - Determine which primary contexts it should exist in, and add a record for
each primary context to the class
FlowablePrimaryContexts
file. - If the flow has an alternate unit, add that alternate unit with a conversion factors in
FlowableAltUnits
. - Save these changes, and run the Python script test_input_files to be sure your edits results in valid input files.
- After verifying that the input files are valid with the new flowable added, create the pull request to be reviewed by one of the list maintainers . Please describe the reason for your change.
Please consult a maintainer if you would like to suggest other flow list changes.
The flow list mappings can be revised by adding to or editing the mapping files used to associate the flow list with other datasets. Edits to these files will update the flow list once it is rebuilt. This process must be done through forking and a pull request. See Contributing.
We recommend these steps:
-
Read the EPA Report describing how the flows and contexts are defined.
-
Identify the appropriate dataset for the flows you want to map or edit.
-
Identify the mapping file that needs to be updated, or create a new mapping file if it is a new source. Identify the flow you would like to update in the mapping file. If that flow is not present in the mapping file but is associated with records in the dataset being mapped, you will need to add a new line for that flow.
-
If adding a new flowable, enter information as a new record in the mapping file in fields for which you have data. If you are editing an existing flow mapping, identify the record and edit the data accordingly.
-
For all source flows, determine the TargetFlowName, TargetFlowContext, and TargetUnit from the FlowList, and add them to the mapping file. (see note below)
-
Run the add_uuid_to_flowmapping script to automatically lookup and add the Target FlowUUID.
-
Run the test_flow_mapping to be sure your edits or new mapping file(s) are valid.
-
After verifying that the mapping files are valid with the new flow mapping, create the pull request to be reviewed by one of the list maintainers. Please describe the reason for your change.
Please consult a maintainer if you would like to suggest other flow list changes.
These instructions are intended for maintainers who wish to rebuild the list after edits to the input files.
-
Save a copy of the existing FedElemFlowListMaster.parquet.
-
Check and update the flow list specs in globals.py to update
list_version
and that flow classes are correct. -
Verify that all files for the flow classes are in input, including those beginning with the flow class name and ending with '..Flowables.csv', '..FlowablePrimaryContexts.csv', and '..Flowables.csv', along with the two context files,
Contexts.csv
andSecondaryPrimaryContexts.csv
. -
Run the test script test_input_files.py, to run a series of tests on these input files to make sure they are valid. Proceed only if all tests pass.
-
Run the flowlist.py script. This will rewrite over the output/FedElemFlowListMaster.parquet.
-
Run the test script test_flow_list.py script to verify the integrity of the generated flow list.
-
To get standard FlowList output of new flows created and expired flows, you can read the instructions and run compare_old_flowlist.py
generated from v1.0-rc
DESCRIPTION Allows retrieval of the Federal LCA Commons flow list and mapping files in standard pandas dataframe formats, defined within format specs
PACKAGE CONTENTS contexts flowlist globals jsonld uuid_generators
FUNCTIONS
get_flowmapping(source=None)
Gets a flow mapping in standard format
Looks for a dataframe of the mapping file specific to the source
If a source list is provided, it returns only the desired mappings
Returns an error if specified source does not equal the source name
:param source: Name of source list in
:return: standard Flow Mapping dataframe
get_flows(preferred_only=None)
Gets a flow list in a standard format
Returns the full master flow list unless preferred flows is lists
:param preferred_only:
:return: standard Flow List dataframe
write_jsonld(flows, path, mappings=None)
Writes a standard openLCA JSON-LD zip archive with elementary flows and optionally
flowmappings
:param flows: standard pd Flow List dataframe, generally from get_flows()
:param path: path and filename with .zip extention, e.g. 'c:users/mai/fedcommonsflows.zip'
:param mappings: standard pd Flow Mapping dataframe , generally from get_flowmapping()
:return: writes out .zip file
NAME fedelemflowlist.flowlist
DESCRIPTION Generate the elementary flow master list as a pandas dataframe from input files Write it to the output folder
NAME fedelemflowlist.analysis.flow_list_analysis
DESCRIPTION Functions to perform analysis of a flow list
NAME scripts.add_uuid_to_flowmapping
DESCRIPTION Gets Fed Commons Flow UUID from mapping list and adds it to mapping file(s). Mapping file must already conform to mapping format
NAME scripts.extract_flow_data_from_Excel
DESCRIPTION Flow data were originally developed in Excel. This script extracts those data and writes them to CSV files. WARNING: This will replace the existing input files The flow class files should match the class names in the flowlistspecs
NAME scripts.summarize_flowlist
DESCRIPTION Uses fedelemflowlist analysis functions to perform and export basic analysis
NAME scripts.write_preferred_flows
DESCRIPTION Writes preferred flows to a JSON-LD archive in the output folder
NAME tests.test_flow_list - Tests the stored flow list to provide quality assurance
NAME
tests.test_flow_mappings - Tests the stored flow mappings to provide quality assurance
NAME tests.test_input_files - Tests for input files used to build flow list to provide quality assurance
NAME tests.test_writing_jsonld - Tests for that flows and mappings are written as expected to JSON-LD archives
Example Python uses
### Get preferred flows only, write them to JSON_LD
import fedelemflowlist
#Get preferred flows
preferred_flows = fedelemflowlist.get_flows(preferred_only=True)
#Write out to JSON-LD zip. INFO messages will appear in console describing progress
fedelemflowlist.write_jsonld(preferred_flows,f'FedCommonsFlows_preferred_flows.zip')
### Get flows, mappings from NEI and TRI sources that use those flows, write the flows and mappings to JSON_LD
import fedelemflowlist
import pandas as pd
all_flows = fedelemflowlist.get_flows()
sources = ['NEI','TRI']
mappings = fedelemflowlist.get_flowmapping(sources)
#Identify unique UUIDs for flows that NEI and TRI map to
flow_ids_in_mappings = list(pd.unique(mappings['TargetFlowUUID']))
#Subset flow list with those UUIDS
flows_in_mappings = all_flows[all_flows['Flow UUID'].isin(flow_ids_in_mappings)]
#Write out to JSON-LD zip
fedelemflowlist.write_jsonld(flows_in_mappings,f'FedCommonsFlows_with_NEI_TRI_mappings.zip',mappings)
See the documentation of the main module and scripts in the Doc_Strings