Getl

About

Groovy ETL (Getl) - open source project on Groovy, developed since 2012 to automate loading and processing data from different sources.

Links

Getl jar is published in the Central Maven
The source code of the project is located on the GitHub project getl
Russian documentation available on Wiki
ETL/ELT Process Development Center EasyPortal

When do you need Getl?

Copying datasets between RDBMS, file and cloud sources;
Capturing and delivering data increment from sources to the data warehouse;
Copying and processing files from local and external file sources;
Rapid development of pilot projects for data warehouses (converting the structure of sources to warehouse tables, multi-threaded reloading of data from sources to warehouse tables)
Organization of development, testing and prodiuction stands for ETL projects;
Automation of testing etl processes;
Automating data health monitoring in a data warehouse;
Centralization of storage of descriptions of data sources and their structures in the repository;
Simplify the development of data processing patterns.

Supported RDBMS

IBM DB2, FireBird, H2 Database, Hadoop Hive, Cloudera Impala, MS SQLServer, MySql, IBM Netezza, NetSuite, Oracle, PostgreSql, SAP Hana, SQLite, Micro Focus Vertica.

Supported file sources

CSV, MS Excel, Json, XML, Yaml, DBF.

Supported cloud sources

Kafka, SalesForce, WebServices.

Supported file systems

Local file systems Windows and Linux, FTP, SFTP, Hadoop HDFS.

Operations with RDBMS data sources

Creating and dropping tables and temporary tables;
Creating indexes;
Managing table partitions (only Vertica)
Reading records from tables and queries;
Adding, modifying and deleting records in tables;
Batch loading CSV files into tables (only H2, Vertica, Hive and Impala);
Execution of SQL queries.

Operations with file sources

Reading records from files;
Saving records to files (only CSV and Json);
Splitting files into portions when recording (only CSV);
Working with files compressed in Gz;
Deleting files.

Working with cloud sources

Reading records from datasets;
Saving records to datasets (only Kafka).

Working with file systems

Creating and deleting directories;
Uploading and uploading a file from the server to a local directory;
Renaming and moving files at source
Executing OS command at source (only local file systems and SFTP).

Getl language operators

ETL:
- Copy records from source to source or multiple sources at the same time
- Reading records from source with error handling;
- Saving records to a source or multiple sources at the same time;
ELT:
- Executing scripts in Getl stored procedure language;
- Copying records between tables of the same RDBMS;
File systems:
- Copying a hierarchy of directories and their files between two file systems;
- Clearing the hierarchy of directories and their files in the file system;
- Multi-threaded parsing of files from the file system;
Object repository:
- Saving and reading repository object descriptions from repository files;
- Processing repository objects by name group or mask;
Configuration and resources:
- Loading and saving configuration files;
- Working with loaded configuration content;
Logging and profiling:
- Logging messages, events and errors;
- Profiling the running time of commands and code blocks;
Workflow management:
- Controlling permission to start processes;
- Automatic cloning and freeing of repository objects in threads;
- Automatically freeing temporary repository objects upon termination of their processes;
- Process and application termination Commands.

Examples

Registration of connections to Oracle and Vertica:

package packet1

import groovy.transform.BaseScript
import groovy.transform.Field
import getl.lang.Getl

@BaseScript Getl main

oracleConnection('ora', true) {
    connectHost = 'oracle-host'
    connectDatabase = 'oradb'
    login = 'user'
    password = 'password'
}

verticaConnection('ver', true) {
    connectHost = 'vertica-host1'
    connectDatabase = 'verdb'
    extended.backupservernode = 'vertica-host2,vertica-host3'
    login = 'user'
    password = 'password'
}

Creating a table in Vertica based on Oracle table:

oracleTable('ora:table1', true) {
    useConnection oracleConnection('ora')
    schemaName = 'user'
    tableName = 'table1'
}

verticaTable('ver.stage:table1', true) {
    useConnection verticaConnection('ver')
    schemaName = 'stage'
    tableName = 'table1'
    if (!exists) {
        setOracleFields oracleTable('ora:table1')
        create()
    }
}

verticaTable('ver.work:table1', true) {
    useConnection verticaConnection('ver')
    schemaName = 'public'
    tableName = 'table1'
    if (!exists)
        createLike verticaTable('ver.stage:table1')
}

Copying all rows from the Oracle table to the Vertica table by uploading to an temporary csv file and loading it through the COPY statement:

etl.copyRows(oracleTable('ora:table1'), verticaTable('ver.stage:table1')) { bulkLoad = true }

Copying table data from a staging area into a working one:

sql {
    useConnection verticaConnection('ver')
    exec '''
        /*:count_insert*/
        INSERT INTO public.table1 SELECT * FROM stage.table1;
        IF ({count_insert} > 0);
            ECHO Copied {count_insert} rows successfull.
        END IF;
        COMMIT;
    ''' 
}

Truncate staging table and purge working table:

verticaTable('ver.stage:table1') { truncate() }
verticaTable('ver.prod:table1') { purgeTable() }

Unloading rows from the Vertica table to a csv file according to a specified condition:

csv('file1') {
    useConnection csvConnection { path = '/tmp/unload' }
    fileName = 'data.table1'
    extenstion = 'csv'
    fieldDelimiter = '|'
    codePage = 'utf-8'
    header = true
}

verticaTable('ver.work:table1') {
    readOpts {
        where = 'field1 = CURRENT_DATE'
        order = ['field2', 'field3']
    }
}

etl.copyRows(verticaTable('ver.work:table1'), csv('file1'))

Copying a file via ssh to another server:

files('unload_files', true) {
    rootPath = '/tmp/unload'
}

sftp('ssh1', true) {
    host = 'ssh-host'
    login = 'user'
    password = 'password'
    rootPath = '/csv'
}

fileman.copier(files('unload_files'), sftp('ssh1')) {
    useSourcePath { mask = 'data.{table}.csv' }
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,038 Commits
gradle/wrapper		gradle/wrapper
jdbc		jdbc
libs		libs
manuals		manuals
src		src
tests/csv		tests/csv
.gitignore		.gitignore
LICENSE.RU.md		LICENSE.RU.md
LICENSE.md		LICENSE.md
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getl

About

Links

When do you need Getl?

Supported RDBMS

Supported file sources

Supported cloud sources

Supported file systems

Operations with RDBMS data sources

Operations with file sources

Working with cloud sources

Working with file systems

Getl language operators

Examples

About

Releases 118

Packages

Contributors 2

Languages

License

ascrus/getl

Folders and files

Latest commit

History

Repository files navigation

Getl

About

Links

When do you need Getl?

Supported RDBMS

Supported file sources

Supported cloud sources

Supported file systems

Operations with RDBMS data sources

Operations with file sources

Working with cloud sources

Working with file systems

Getl language operators

Examples

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 118

Packages 0

Contributors 2

Languages

Packages