Skip to content

GitHub Contributors

hottleta edited this page Mar 10, 2020 · 10 revisions

Contributions to this project are managed through a standard GitHub process for external contributions, by forking this repository, making changes in your fork, and then creating a pull request in the main repository to request that an administrator review and accept this changes into the repository. You must have a GitHub account to do this.

This GitHub guide to forking provides general instructions on how to make a fork, make changes, and create a pull request back in the original repository. We recommend cloning the repository on your local machine and making edits to these files and committing and pushing them back to the repository, because working in the GitHub editor can be difficult.

We further recommend working within a text or code editor (not Excel) to make changes to the CSV files, because when Excel opens and saves the files it can result in changes to special characters due to encoding issues. Once you perform a pull request, please request that it be reviewed by a maintainer.

The two primary manners of contributing are by adding to the flow list and creating of editing flow mappings files. Other forms of contribution are also welcome, such as improvements to the list building code, JSON-LD output writing, adding analysis functions or tests, etc.

Install for Users

Prerequisites

  1. Python 3.6.5 or greater, 64-bit version - https://www.python.org/downloads/ The 64-bit version is required for use of the pyarrow dependency.
  2. The latest version of pip. Included as a default option with Python installation.

Installation of python module and dependencies

Install a release directly from github using pip. From a command line interface, run:

pip install git+https://github.com/USEPA/[email protected]#egg=Federal-LCA-Commons-Elementary-Flow-List

where you can replace 'v1.0' with the version you wish to use under Releases.

This should install all the necessary dependencies for you.

To uninstall, use the command:

pip uninstall fedelemflowlist

Install for Maintainers

fedelemflowlist has been developed in PyCharm on Windows, and instructions are provided for working with it in that environment.

Working with latest master version in PyCharm

Prerequisites

These are in addition to those detailed in Install for Users above.

  1. PyCharm
  2. git

Working with fedelemflowlist in PyCharm

  1. Create a new project in Pycharm from a version control system, and use the homepage 'Clone' link to get the URL to use for git
  2. Open the project. You will be warned if missing any dependencies. Install those and restart.
  3. Right-click on the 'fedelemflowlist' folder in the project view and Mark Directory as 'Sources Root' and restart.

Flow List Additions

The flow list can be revised by adding to the input files used to build the flow list. Edits to these files will update the flow list once it is rebuilt. This process has to be done through forking and a pull request.

We recommend these steps:

  1. Read the EPA Report describing how the flows and contexts are defined.
  2. Identify the class appropriate to add the flow to, and the appropriate nomenclature or term sources to use. For example, the EPA Substance Registry Services 'EPA Registry Name' is the first choice name for Chemicals.
  3. Identify the input files that need to be updated. For new flowables, find the Flowables and FlowablePrimaryContexts files for the pertinent class. Check very thoroughly to be sure that this flowable is not already in the list. Proceed, if you are certain that it is not present.
  4. Add the new flowable with all fields as a new record in the class Flowables file.
  5. Determine which primary contexts it should exist in, and add a record for each primary context to the class FlowablePrimaryContexts file.
  6. If the flow has an alternate unit, add that alternate unit with a conversion factors in FlowableAltUnits.
  7. Save these changes, and run the Python script test_input_files to be sure your edits results in valid input files.
  8. After verifying that the input files are valid with the new flowable added, create the pull request to be reviewed by one of the list maintainers . Please describe the reason for your change.

Please consult a maintainer if you would like to suggest other flow list changes.

Create or Edit Flow Mappings

The flow list mappings can be revised by adding to or editing the mapping files used to associate the flow list with other datasets. Edits to these files will update the flow list once it is rebuilt. This process must be done through forking and a pull request.

We recommend these steps:

  1. Read the EPA Report describing how the flows and contexts are defined.

  2. Identify the appropriate dataset for the flows you want to map or edit.

  3. Identify the mapping file that needs to be updated, or create a new mapping file if it is a new source. Identify the flow you would like to update in the mapping file. If that flow is not present in the mapping file but is associated with records in the dataset being mapped, you will need to add a new line for that flow.

  4. If adding a new flowable, enter information as a new record in the mapping file in fields for which you have data. If you are editing an existing flow mapping, identify the record and edit the data accordingly.

  5. For all source flows, determine the TargetFlowName, TargetFlowContext, and TargetUnit from the FlowList, and add them to the mapping file. (see note below)

  6. Run the add_uuid_to_flowmapping script to automatically lookup and add the Target FlowUUID.

  7. Run the test_flow_mapping to be sure your edits or new mapping file(s) are valid.

  8. After verifying that the mapping files are valid with the new flow mapping, create the pull request to be reviewed by one of the list maintainers. Please describe the reason for your change.

Please consult a maintainer if you would like to suggest other flow list changes.

Rebuild the List

These instructions are intended for maintainers who wish to rebuild the list after edits to the input files.

  1. Save a copy of the existing FedElemFlowListMaster.parquet.

  2. Check and update the flow list specs in globals.py to update list_version and that flow classes are correct.

  3. Verify that all files for the flow classes are in input, including those beginning with the flow class name and ending with '..Flowables.csv', '..FlowablePrimaryContexts.csv', and '..Flowables.csv', along with the two context files, Contexts.csv and SecondaryPrimaryContexts.csv.

  4. Run the test script test_input_files.py, to run a series of tests on these input files to make sure they are valid. Proceed only if all tests pass.

  5. Run the flowlist.py script. This will rewrite over the output/FedElemFlowListMaster.parquet.

  6. Run the test script test_flow_list.py script to verify the integrity of the generated flow list.

  7. To get standard FlowList output of new flows created and expired flows, you can read the instructions and run compare_old_flowlist.py

Doc Strings

generated from v1.0-rc

fedelemflowlist module

DESCRIPTION Allows retrieval of the Federal LCA Commons flow list and mapping files in standard pandas dataframe formats, defined within format specs

PACKAGE CONTENTS contexts flowlist globals jsonld uuid_generators

FUNCTIONS

get_flowmapping(source=None)
    Gets a flow mapping in standard format
    Looks for a dataframe of the mapping file specific to the source
    If a source list is provided, it returns only the desired mappings
    Returns an error if specified source does not equal the source name
    :param source: Name of source list in
    :return: standard Flow Mapping dataframe

get_flows(preferred_only=None)
    Gets a flow list in a standard format

    Returns the full master flow list unless preferred flows is lists
    :param preferred_only:
    :return: standard Flow List dataframe

write_jsonld(flows, path, mappings=None)
    Writes a standard openLCA JSON-LD zip archive with elementary flows and optionally
     flowmappings

    :param flows: standard pd Flow List dataframe, generally from get_flows()
    :param path: path and filename with .zip extention, e.g. 'c:users/mai/fedcommonsflows.zip'
    :param mappings: standard pd Flow Mapping dataframe , generally from get_flowmapping()
    :return: writes out .zip file

NAME fedelemflowlist.flowlist

DESCRIPTION Generate the elementary flow master list as a pandas dataframe from input files Write it to the output folder

NAME fedelemflowlist.analysis.flow_list_analysis

DESCRIPTION Functions to perform analysis of a flow list

scripts

NAME scripts.add_uuid_to_flowmapping

DESCRIPTION Gets Fed Commons Flow UUID from mapping list and adds it to mapping file(s). Mapping file must already conform to mapping format

NAME scripts.extract_flow_data_from_Excel

DESCRIPTION Flow data were originally developed in Excel. This script extracts those data and writes them to CSV files. WARNING: This will replace the existing input files The flow class files should match the class names in the flowlistspecs

NAME scripts.summarize_flowlist

DESCRIPTION Uses fedelemflowlist analysis functions to perform and export basic analysis

NAME scripts.write_preferred_flows

DESCRIPTION Writes preferred flows to a JSON-LD archive in the output folder

tests

NAME tests.test_flow_list - Tests the stored flow list to provide quality assurance

NAME
tests.test_flow_mappings - Tests the stored flow mappings to provide quality assurance

NAME tests.test_input_files - Tests for input files used to build flow list to provide quality assurance

NAME tests.test_writing_jsonld - Tests for that flows and mappings are written as expected to JSON-LD archives

fedelemflowlist Code Examples

Example Python uses

### Get preferred flows only, write them to JSON_LD
import fedelemflowlist

#Get preferred flows
preferred_flows = fedelemflowlist.get_flows(preferred_only=True)
#Write out to JSON-LD zip. INFO messages will appear in console describing progress
fedelemflowlist.write_jsonld(preferred_flows,f'FedCommonsFlows_preferred_flows.zip')

### Get flows, mappings from NEI and TRI sources that use those flows, write the flows and mappings to JSON_LD

import fedelemflowlist
import pandas as pd
all_flows = fedelemflowlist.get_flows()

sources = ['NEI','TRI']
mappings = fedelemflowlist.get_flowmapping(sources)

#Identify unique UUIDs for flows that NEI and TRI map to
flow_ids_in_mappings = list(pd.unique(mappings['TargetFlowUUID']))
#Subset flow list with those UUIDS
flows_in_mappings = all_flows[all_flows['Flow UUID'].isin(flow_ids_in_mappings)]
#Write out to JSON-LD zip
fedelemflowlist.write_jsonld(flows_in_mappings,f'FedCommonsFlows_with_NEI_TRI_mappings.zip',mappings)

See the documentation of the main module and scripts in the Doc Strings section.

Clone this wiki locally