Clay Data Analysis

Installation

git clone
nvm install v8
npm install
Authenticate to Google's Cloud API from an associated Google Cloud Platform Project and download the keyfile.json.
Set the environment variable GOOGLE_APPLICATION_CREDENTIALS=[PATH], replacing [PATH] with the location of the keyfile.json file you downloaded in the previous step.
Enable both the BigQuery API and the Google Natural Language API within your created project.

Setup & Integration

In your app.js, instantiate Clay Data Science by passing in the parent directory where your tasks (data science features) will live:

dataAnalysis.config({
  projectDir: path.resolve('./parent-directory')
});

To leverage save and publish hooks, ensure that Clay Data Science is also passed in as an Amphora Plugin during Amphora instantation:

return amphora(
  plugins: [dataAnalysis]
})

The parent directory should include a subdirectory called tasks, with each task including a [handler], a [transform], and a [data schema]. The directory structure should look like this:

- parent-directory
  - tasks
    - feature
      - handler.js
      - schema.yml
      - transform.js

Data Schema

Coming soon!

Transform

Coming soon!

Handler

Coming soon!

CLI

Clay Data Science also contains a handy CLI for importing legacy data to BigQuery via Elasticsearch. To get started, just set an ELASTICSEARCH_HOST environment variable.

Commands

npm lint - runs eslint
./bin/cli.js
- --help
- nlp

NLP

Parses Elasticsearch data based on a specified NLP feature and stores the parsed data into a BigQuery dataset/table.

./bin/cli.js nlp --service elasticsearch --from published-articles.general --to clay_sites.content_classification --field content --query /path/to/query.json --schema /path/to/schema.yml --feature classifyContent

--service, -s <service> : The data source
--feature, -fe <feature> : An NLP feature, e.g. classifyContent
--to, -t <index>.<type> : Configuration for pulling data from Elasticsearch
--from, -fr <dataset>.<table> : The BigQuery dataset and table to insert data into
--field -f <field> : The data to analyze, based on property/field name
--query -q <query> : The file path to a query to POST to Elasticsearch
--schema -sc <schema> : The file path to a yml schema to pass to BigQuery BigQuery Schemas

Coming Soon

Tests
More NLP features!
More thorough documentation on schemas within tasks

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
bin		bin
lib		lib
test		test
.eslintrc		.eslintrc
.gitignore		.gitignore
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clay Data Analysis

Installation

Setup & Integration

Data Schema

Transform

Handler

CLI

Commands

NLP

Coming Soon

About

Releases 1

Packages

Contributors 3

Languages

clay/data-analysis

Folders and files

Latest commit

History

Repository files navigation

Clay Data Analysis

Installation

Setup & Integration

Data Schema

Transform

Handler

CLI

Commands

NLP

Coming Soon

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages