Skip to content

Latest commit

 

History

History
385 lines (303 loc) · 15.8 KB

README.md

File metadata and controls

385 lines (303 loc) · 15.8 KB

tap-pendo

License: GPL v3 Python 3.7

tap-pendo

This is a Singer tap that produces JSON-formatted data following the Singer spec.

This tap:

  • Pulls raw data from the Pendo API.
  • Supports following two subscription
    • US Subscription
    • EU Subscription
  • Extracts the following resources:
    • Accounts
    • Features
    • Guides
    • Pages
    • Visitors
    • Visitor History
      • Syncs for this endpoint may be very long running if extracting anonymous visitors, see Visitors config include_anonymous_visitors.
    • Track Types
    • Feature Events
    • Events
    • Page Events
    • Guide Events
    • Poll Events
    • Track Events
    • Metadata Accounts
    • Metadata Visitors
  • Outputs the schema for each resource
  • Incrementally pulls data based on the input state

Streams

accounts

  • US Subscription Endpoint: https://app.pendo.io/api/v1/aggregation
  • EU Subscription Endpoint: https://app.eu.pendo.io/api/v1/aggregation
  • Primary key fields: account_id
  • Replication strategy: INCREMENTAL (query filtered)
    • Bookmark: lastupdated
  • Transformations
    • Camel to snake case.
    • metadata.auto.lastupdated denested to root as lastupdated
    • metadata objects denested(metadata_agent, metadata_audo, metadata_custom, etc)

features

guides

track types

visitors

visitor_history

feature_events

events

page_events

guide_events

poll_events

track_events

guides

metadata accounts

metadata visitors

Authentication

Authentication is managed by integration keys. An integration key may be created in the Pendo website: Settings -> Integrations -> Integration Keys.

State

{
  "currently_syncing": null,
  "bookmarks": {
    "pages": { "lastUpdatedAt": "2020-09-26T00:00:00.000000Z" },
    "page_events": { "day": "2020-09-27T04:00:00.000000Z" },
    "accounts": { "lastupdated": "2020-09-28T01:30:29.237000Z" },
    "feature_events": { "day": "2020-09-27T04:00:00.000000Z" },
    "guides": { "lastUpdatedAt": "2020-09-26T00:00:00.000000Z" },
    "poll_events": { "day": "2020-09-26T00:00:00.000000Z" },
    "events": { "day": "2020-09-27T04:00:00.000000Z" },
    "visitors": { "lastupdated": "2020-09-28T01:30:29.199000Z" },
    "features": { "lastUpdatedAt": "2020-09-26T00:00:00.000000Z" },
    "guide_events": { "day": "2020-09-26T00:00:00.000000Z" },
    "track_types": { "lastUpdatedAt": "2020-09-26T00:00:00.000000Z" }
  }
}

Interrupted syncs for Event type stream are resumed via a bookmark placed during processing, last_processed. The value of the parent GUID will be

{
  "bookmarks": {
    "guides": { "lastUpdatedAt": "2020-09-22T20:23:44.514000Z" },
    "poll_events": { "day": "2020-09-20T00:00:00.000000Z" },
    "feature_events": { "day": "2020-09-27T04:00:00.000000Z" },
    "visitors": { "lastupdated": "2020-09-27T15:40:02.729000Z" },
    "pages": { "lastUpdatedAt": "2020-09-20T00:00:00.000000Z" },
    "track_types": { "lastUpdatedAt": "2020-09-20T00:00:00.000000Z" },
    "features": { "lastUpdatedAt": "2020-09-20T00:00:00.000000Z" },
    "accounts": { "lastupdated": "2020-09-27T15:39:50.585000Z" },
    "guide_events": { "day": "2020-09-20T00:00:00.000000Z" },
    "page_events": {
      "day": "2020-09-27T04:00:00.000000Z",
      "last_processed": "_E9IwR8tFCTQryv_hCzGVZvsgcg"
    },
    "events": { "day": "2020-09-27T04:00:00.000000Z" }
  },
  "currently_syncing": "track_events"
}

Quick Start

  1. Install

    Clone this repository, and then install using setup.py. We recommend using a virtualenv:

    > virtualenv -p python3 venv
    > source venv/bin/activate
    > python setup.py install
    OR
    > cd .../tap-pendo
    > pip install .
  2. Dependent libraries. The following dependent libraries were installed.

    > pip install singer-python
    > pip install jsonlines
    > pip install singer-tools
    > pip install target-stitch
    > pip install target-json
    
  3. Create your tap's config.json file. The tap config file for this tap should include these entries:

    • start_date - the default value to use if no bookmark exists for an endpoint (rfc3339 date string)
    • x_pendo_integration_key (string, ABCdef123): an integration key from Pendo.
    • period (string, ABCdef123): dayRange or hourRange
    • lookback_window (integer): 10 (For event objects. Default: 0)
    • request_timeout (integer): 300 (For passing timeout to the request. Default: 300)
    • record_limit (integer, 100000): maximum number of records Pendo API can retrieve in a single request. Default: 100000 records
    • app_ids (string, 8877665523, 1234545): (comma seperated appIDs. If this parameter is not provided, then the data will be collected from all the apps)

    Note: It is important to set record_limit parameter to an appropriate value, as selecting a smaller value may have a negative effect on the Pendo API's performance, while a larger value may result in connection errors, request timeouts, or memory overflows.

    ```json
    {
      "x_pendo_integration_key": "YOUR_INTEGRATION_KEY",
      "start_date": "2020-09-18T00:00:00Z",
      "period": "dayRange",
      "lookback_window": 10,
      "request_timeout": 300,
      "record_limit": 100000,
      "include_anonymous_visitors": "true",
      "app_ids": "1234545, 8877665523"
    }
    
  4. Run the Tap in Discovery Mode This creates a catalog.json for selecting objects/fields to integrate:

    tap-pendo --config config.json --discover > catalog.json

    See the Singer docs on discovery mode here.

  5. Run the Tap in Sync Mode (with catalog) and write out to state file

    For Sync mode:

    > tap-pendo --config tap_config.json --catalog catalog.json > state.json
    > tail -1 state.json > state.json.tmp && mv state.json.tmp state.json

    To load to json files to verify outputs:

    > tap-pendo --config tap_config.json --catalog catalog.json | target-json > state.json
    > tail -1 state.json > state.json.tmp && mv state.json.tmp state.json

    To pseudo-load to Stitch Import API with dry run:

    > tap-pendo --config tap_config.json --catalog catalog.json | target-stitch --config target_config.json --dry-run > state.json
    > tail -1 state.json > state.json.tmp && mv state.json.tmp state.json
  6. Test the Tap

    While developing the pendo tap, the following utilities were run in accordance with Singer.io best practices: Pylint to improve code quality:

    > pylint tap_pendo -d missing-docstring -d logging-format-interpolation -d too-many-locals -d too-many-arguments

    Pylint test resulted in the following score:

    Your code has been rated at 9.67/10

    To check the tap and verify working:

    > tap-pendo --config tap_config.json --catalog catalog.json | singer-check-tap > state.json
    > tail -1 state.json > state.json.tmp && mv state.json.tmp state.json

    Check tap resulted in the following:

    Checking stdin for valid Singer-formatted data
    The output is valid.
    It contained 3734 messages for 11 streams.
    
        13 schema messages
      3702 record messages
        19 state messages
    
    Details by stream:
    +----------------+---------+---------+
    | stream         | records | schemas |
    +----------------+---------+---------+
    | accounts       | 1       | 1       |
    | features       | 29      | 1       |
    | feature_events | 158     | 2       |
    | guides         | 0       | 1       |
    | pages          | 34      | 1       |
    | page_events    | 830     | 2       |
    | reports        | 2       | 1       |
    | visitors       | 1902    | 1       |
    | events         | 746     | 1       |
    | guide_events   | 0       | 1       |
    | poll_events    | 0       | 1       |
    +----------------+---------+---------+

    Unit Tests

    Unit tests may be run with the following.

    python -m pytest --verbose
    

    Note, you may need to install test dependencies.

    pip install -e .'[dev]'
    

Copyright © 2020 Stitch