Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to Network Load Balancing; allow access to the Metastore from workers #2

Merged
merged 2 commits into from
Mar 17, 2020

Conversation

ndabas
Copy link
Contributor

@ndabas ndabas commented Mar 17, 2020

This PR switches over the Application Load Balancer in front of the coordinator to a Network Load Balancer. It also opens up port 9083 on the coordinator, through the NLB, to allow access to the Hive Metastore over the Thrift protocol; this is needed when workers query the Metastore -- for example, when inserting data into a partitioned table.

The NLB target group for the coordinator has some changes:

  • A stickiness block was added; this appears to be a bug in the underlying Terraform AWS provider, which generates a default stickiness block when a health check is configured; the generated stickiness block is invalid for a TCP NLB.
  • timeout and matcher were removed from the health check because AWS does not support these properties when configuring a health check for a TCP NLB. AWS will use the default values of timeout = 6 and matcher = "200-399" instead, which I believe are fine.

Note that it takes a few minutes after deployment for the new port to be opened, while the health checks are initialized.

@ndabas ndabas requested a review from levand March 17, 2020 10:35
@ndabas
Copy link
Contributor Author

ndabas commented Mar 17, 2020

Here is a minimal test case which fails before this fix, and works with the fix. You will need to update the external_location in the table definition below to an actual bucket path that Scio can access.

create schema if not exists hive.ndabas_temp;

drop table if exists hive.ndabas_temp.partition_test;

create table if not exists hive.ndabas_temp.partition_test (
   postal_code varchar,
   country varchar
)
with (
   external_location = 's3a://data.your.domain.name/ndabas_temp/partition_test',
   format = 'ORC',
   partitioned_by = ARRAY['country']
);

call system.create_empty_partition(
  schema_name => 'ndabas_temp',
  table_name => 'partition_test',
  partition_columns => ARRAY['country'],
  partition_values => array['US']);

call system.create_empty_partition(
  schema_name => 'ndabas_temp',
  table_name => 'partition_test',
  partition_columns => ARRAY['country'],
  partition_values => array['IN']);

insert into hive.ndabas_temp.partition_test values
('10001', 'US'),
('10002', 'US'),
('90001', 'US'),
('90002', 'US'),
('110001', 'IN'),
('110002', 'IN'),
('110003', 'IN');

call system.sync_partition_metadata(
  schema_name => 'ndabas_temp',
  table_name => 'partition_test',
  mode => 'FULL');

select * from hive.ndabas_temp.partition_test;

drop table if exists hive.ndabas_temp.partition_test;
drop schema if exists hive.ndabas_temp;

@levand levand merged commit 9dfdd57 into master Mar 17, 2020
@ndabas ndabas deleted the ndabas/alb-to-nlb branch March 17, 2020 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants