Skip to content

Latest commit

 

History

History
83 lines (67 loc) · 5.98 KB

sre.md

File metadata and controls

83 lines (67 loc) · 5.98 KB

SRE Application (sre)

The Sre Tool brings together information from the HMS RDBMS and HDFS to provide reports and potential actions to address areas of concern. This process is a READ-ONLY process and does not perform any actions automatically.

Action commands for identified scenarios are written out to file(s), which can be reviewed / edited and run through either "beeline" for "hive" actions or in Hadoop-CLI for hdfs commands.

This process is driven by a control file. A template is here. Make a copy, edit the needed parameters and reference it with the '-cfg' parameter when running the process.

Known Issues

For a while during the evolution of Hive 3, there was a separate 'catalog' for Spark. The queries in this process do NOT consider this alternate catalog and may yield cross products in some areas if the 'spark' catalog was used at any point. Due to the nature of this tool and an attempt to use it across multiple versions of Hive, we do NOT include criteria regarding the 'catalog'.

Assumptions

  • Ran from a node on the cluster
    • That includes clients and configuration files for HDFS
  • Ran by a user that has READ privileges to all HDFS directories.
  • If cluster is 'kerberized', the user has a valid Kerberos Ticket 'before' starting the application.
  • Drivers for the HMS database are available.
  • The configuration file has been defined
  • The HMS Metastore DB is on a supported RDBMS for the platform (version matters!)

Application Help

Launching: sre
usage: hive-sre u3|sre|perf -cdh|-hdp2|-hdp3|-all|-i <proc[,proc...]> -o <output-dir> [options]
                version:2.4.0.24.0-SNAPSHOT
Hive SRE Utility
 -all,--all-reports                            Run ALL available processes.
 -cdh,--cloudera-data-hub                      Run processes that make sense for CDH.
 -cfg,--config <arg>                           Config with details for the Sre Job.  Must match the
                                               either sre or u3 selection. Default:
                                               $HOME/.hive-sre/cfg/default.yaml
 -db,--database <arg>                          Comma separated list of Databases.  Will override
                                               config. (upto 100)
 -dbRegEx,--database-regex <arg>               A RegEx of databases to process
 -dp,--decrypt-password <encrypted-password>   Used this in conjunction with '-pkey' to decrypt the
                                               generated passcode from `-p`.
 -edbRegEx,--exclude-database-regex <arg>      A RegEx that will filter OUT matching databases from
                                               processing
 -h,--help                                     Help
 -hdp2,--hortonworks-data-platfrom-v2          Run processes that make sense for HDP2.
 -hdp3,--hortonworks-data-platfrom-v3          Run processes that make sense for HDP3.
 -hfw,--hive-framework <arg>                   The custom HiveFramework check configuration. Needs
                                               to be in the 'Classpath'.
 -i,--include <arg>                            Comma separated list of process id's to run.  When
                                               not specified, ALL processes are run.
 -o,--output-dir <arg>                         Output Directory to save results from Sre.
 -p,--password <password>                      Used this in conjunction with '-pkey' to generate the
                                               encrypted password that you'll add to the configs for
                                               the JDBC connections.
 -pkey,--password-key <password-key>           The key used to encrypt / decrypt the cluster jdbc
                                               passwords.  If not present, the passwords will be
                                               processed as is (clear text) from the config file.
 -scc,--skip-command-checks                    Don't process the command checks for the process.
 -tsql,--test-sql                              Check SQL against target Metastore RDBMS

Visit https://github.com/cloudera-labs/hive-sre for detailed docs

The -db parameter is optional. When specified, it will limit the search to the databases listed as a parameter. IE: -db my_db,test_db

The -o parameter is required.

To limit which process runs, use the -i (include) option at the command line with a comma separated list of ids (below) of desired processes.

sre Processes

id (link to sample report) process
1 Hive Metastore Summary
- Numerous HMS reports outlining summary information about databases and tables
2 Hive Metastore Details
- Numerous HMS reports outlining detailed information about databases and tables
3 Table and Partition Scan - Small Files
4 Table and Partition Scan - Volume Report
5 Table and Partition Scan - Empty Datasets
6 Table and Partition Compactions
8 Analyze Tables (beta - use -i 8 to activate)
9 Analyze Tables - Detailed (beta - use -i 9 to activate)
10 Hive Table Type Base Location UNIQUE Count
11 Hive Table Type Base Location Count Detail - Anti Pattern

Sre needs to be run by a user with READ access to all the potential HDFS locations presented by the database/table/partition defined locations.