-
Notifications
You must be signed in to change notification settings - Fork 53
Presto plugin executor #69
Changes from 8 commits
e5b6ccb
c0f71da
90478fd
0211489
526c3ac
ebcf04a
39a3a3c
2379221
c87dbd5
7ba7cc7
4baecf2
cd07b27
574a3cc
352ac58
f48673f
4dff686
2898c2c
23b3eb2
a3810ac
8a1954a
859d50f
177e7f9
76d0e96
7a5f2d6
7325020
78ac9ff
e639f60
3e87ce4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,66 @@ | ||||||
package client | ||||||
EngHabu marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
import ( | ||||||
"context" | ||||||
"net/http" | ||||||
"net/url" | ||||||
|
||||||
"github.com/lyft/flyteplugins/go/tasks/plugins/svc" | ||||||
|
||||||
"time" | ||||||
|
||||||
"github.com/lyft/flyteplugins/go/tasks/plugins/presto/config" | ||||||
) | ||||||
|
||||||
const ( | ||||||
httpRequestTimeoutSecs = 30 | ||||||
//AcceptHeaderKey = "Accept" | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. delete? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Completely missed this, will do! |
||||||
//ContentTypeHeaderKey = "Content-Type" | ||||||
//ContentTypeJSON = "application/json" | ||||||
//ContentTypeTextPlain = "text/plain" | ||||||
//PrestoCatalogHeader = "X-Presto-Catalog" | ||||||
//PrestoRoutingGroupHeader = "X-Presto-Routing-Group" | ||||||
//PrestoSchemaHeader = "X-Presto-Schema" | ||||||
//PrestoSourceHeader = "X-Presto-Source" | ||||||
//PrestoUserHeader = "X-Presto-User" | ||||||
) | ||||||
|
||||||
type prestoClient struct { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
client *http.Client | ||||||
environment *url.URL | ||||||
} | ||||||
|
||||||
type PrestoExecuteArgs struct { | ||||||
RoutingGroup string `json:"routing_group,omitempty"` | ||||||
Catalog string `json:"catalog,omitempty"` | ||||||
Schema string `json:"schema,omitempty"` | ||||||
Source string `json:"source,omitempty"` | ||||||
} | ||||||
type PrestoExecuteResponse struct { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: add json tags |
||||||
ID string | ||||||
Status svc.CommandStatus | ||||||
NextURI string | ||||||
} | ||||||
|
||||||
func (p *prestoClient) ExecuteCommand( | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see a reason to use pointer receivers in this client...
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting. I was following what the Hive client was doing here. I'm not too familiar with things in Go yet but, when would I use a reference vs a value for a method? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Hive code should probably be changed. by value here is fine, esp since the underlying http client is already a pointer. |
||||||
ctx context.Context, | ||||||
queryStr string, | ||||||
extraArgs interface{}) (interface{}, error) { | ||||||
|
||||||
return PrestoExecuteResponse{}, nil | ||||||
} | ||||||
|
||||||
func (p *prestoClient) KillCommand(ctx context.Context, commandID string) error { | ||||||
return nil | ||||||
} | ||||||
|
||||||
func (p *prestoClient) GetCommandStatus(ctx context.Context, commandID string) (svc.CommandStatus, error) { | ||||||
return NewPrestoStatus(ctx, "UNKNOWN"), nil | ||||||
} | ||||||
|
||||||
func NewPrestoClient(cfg *config.Config) svc.ServiceClient { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know we are not consistent on this but the more idiomatic way in go to using interfaces suggests we should return the actual type here not an interface. which means your type should also be public. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, that's interesting. Normally from my experience in other langs (java, scala, rust, etc), you typically tie yourself to an interface instead of a specific implementation. In our case, I feel this will help when trying to swap out the Mozart client for one based on just Presto. Otherwise, won't we need to make a copy of the calling code (ie. the executor) in the private repo? In general, I think this might minimize the number of code changes we will need to make in the long run There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, interfaces in Go are kind of the inverse in Java. I do not understand what do you mean by copying calling code? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So given that I made the client specific to Presto, my original comment is largely void now. What I was trying to say is that if we made the function return a specific type:
Then in the calling code that expects this client, we would also need to specifically have it accept a
Normally, I think this would be ok. But what I was trying to say (perhaps in a not so clear manner), is that if one day we wanted to abstract the client to be a more generic |
||||||
return &prestoClient{ | ||||||
client: &http.Client{Timeout: httpRequestTimeoutSecs * time.Second}, | ||||||
environment: cfg.Environment.ResolveReference(&cfg.Environment.URL), | ||||||
} | ||||||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
package client | ||
|
||
import ( | ||
"context" | ||
"strings" | ||
|
||
"github.com/lyft/flyteplugins/go/tasks/plugins/svc" | ||
"github.com/lyft/flytestdlib/logger" | ||
) | ||
|
||
// This type is meant only to encapsulate the response coming from Presto as a type, it is | ||
// not meant to be stored locally. | ||
const ( | ||
PrestoStatusUnknown svc.CommandStatus = "UNKNOWN" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried this but ran into a problem:
I think this might be why the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here is an example that works: I needed to implement UnmarshalJSON()... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @EngHabu Sorry, I'm still a little confused as I think this example you posted uses:
where the type is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it need to be a string though? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, this is not a big deal... if it doesn't seem obvious to change, that's fine... |
||
PrestoStatusQueued svc.CommandStatus = "QUEUED" | ||
PrestoStatusRunning svc.CommandStatus = "RUNNING" | ||
PrestoStatusFinished svc.CommandStatus = "FINISHED" | ||
PrestoStatusFailed svc.CommandStatus = "FAILED" | ||
PrestoStatusCancelled svc.CommandStatus = "CANCELLED" | ||
) | ||
|
||
var PrestoStatuses = map[svc.CommandStatus]struct{}{ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When I do this:
I get the error messages about:
Am I doing this correctly? |
||
PrestoStatusUnknown: {}, | ||
PrestoStatusQueued: {}, | ||
PrestoStatusRunning: {}, | ||
PrestoStatusFinished: {}, | ||
PrestoStatusFailed: {}, | ||
PrestoStatusCancelled: {}, | ||
} | ||
|
||
func NewPrestoStatus(ctx context.Context, state string) svc.CommandStatus { | ||
upperCased := strings.ToUpper(state) | ||
if strings.Contains(upperCased, "FAILED") { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is failed singled out? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not entirely sure, I sort of copied this from the existing qubole_status. Probably not necessary to single this out. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, I think it was needed because Hive/Qubole has different error/failure modes and I think this maps them all to a single Failure/Error on the Flyte side. Going to leave this in for now There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add this as a comment? |
||
return PrestoStatusFailed | ||
} else if _, ok := PrestoStatuses[svc.CommandStatus(upperCased)]; ok { | ||
return svc.CommandStatus(upperCased) | ||
} else { | ||
logger.Warnf(ctx, "Invalid Presto Status found: %v", state) | ||
return PrestoStatusUnknown | ||
} | ||
} |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,67 @@ | ||||||
package config | ||||||
|
||||||
//go:generate pflags Config --default-var=defaultConfig | ||||||
|
||||||
import ( | ||||||
"context" | ||||||
"net/url" | ||||||
|
||||||
"github.com/lyft/flytestdlib/config" | ||||||
"github.com/lyft/flytestdlib/logger" | ||||||
|
||||||
pluginsConfig "github.com/lyft/flyteplugins/go/tasks/config" | ||||||
) | ||||||
|
||||||
const prestoConfigSectionKey = "presto" | ||||||
|
||||||
func URLMustParse(s string) config.URL { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. lower case?
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I made it public because I call the method in one of the unit test There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like URL. |
||||||
r, err := url.Parse(s) | ||||||
if err != nil { | ||||||
logger.Panicf(context.TODO(), "Bad Presto URL Specified as default, error: %s", err) | ||||||
} | ||||||
if r == nil { | ||||||
logger.Panicf(context.TODO(), "Nil Presto URL specified.", err) | ||||||
} | ||||||
return config.URL{URL: *r} | ||||||
} | ||||||
|
||||||
type RoutingGroupConfig struct { | ||||||
Name string `json:"primaryLabel" pflag:",The name of a given Presto routing group"` | ||||||
Limit int `json:"limit" pflag:",Resource quota (in the number of outstanding requests) of the routing group"` | ||||||
ProjectScopeQuotaProportionCap float64 `json:"projectScopeQuotaProportionCap" pflag:",A floating point number between 0 and 1, specifying the maximum proportion of quotas allowed to allocate to a project in the routing group"` | ||||||
NamespaceScopeQuotaProportionCap float64 `json:"namespaceScopeQuotaProportionCap" pflag:",A floating point number between 0 and 1, specifying the maximum proportion of quotas allowed to allocate to a namespace in the routing group"` | ||||||
} | ||||||
|
||||||
var ( | ||||||
defaultConfig = Config{ | ||||||
Environment: URLMustParse("https://prestoproxy-internal.lyft.net:443"), | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's try not to put internal URLs in the public repo There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @EngHabu I had a larger question with regards to configs, namely, where would I create the actual config file for the Presto plugin? I couldn't find any in this repo There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here: https://github.com/lyft/flytepropeller-private/tree/master/artifacts/overlays/production/propeller/plugins. (i think a private link is okay). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! I'll create separate PRs to add configs to these repos |
||||||
DefaultRoutingGroup: "adhoc", | ||||||
Workers: 15, | ||||||
LruCacheSize: 2000, | ||||||
AwsS3ShardFormatter: "s3://lyft-modelbuilder/{}/", | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's leave this out too There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Leave it out, but we should put in canonical examples that work with the sandbox (docker desktop/minikube) deployment. |
||||||
AwsS3ShardCount: 2, | ||||||
RoutingGroupConfigs: []RoutingGroupConfig{{Name: "adhoc", Limit: 250}}, | ||||||
} | ||||||
|
||||||
prestoConfigSection = pluginsConfig.MustRegisterSubSection(prestoConfigSectionKey, &defaultConfig) | ||||||
) | ||||||
|
||||||
// Presto plugin configs | ||||||
type Config struct { | ||||||
Environment config.URL `json:"endpoint" pflag:",Endpoint for Presto to use"` | ||||||
DefaultRoutingGroup string `json:"defaultRoutingGroup" pflag:",Default Presto routing group"` | ||||||
Workers int `json:"workers" pflag:",Number of parallel workers to refresh the cache"` | ||||||
LruCacheSize int `json:"lruCacheSize" pflag:",Size of the AutoRefreshCache"` | ||||||
AwsS3ShardFormatter string `json:"awsS3ShardFormatter" pflag:", S3 bucket prefix where Presto results will be stored"` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we limited to S3? Flyte in general is storage agnostic There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this case, we would be limited by whatever Presto supports in terms of external tables. I'm not very familiar with Flyte's generic storage model. Do you have any ideas of how I might abstract this? Currently, this logic of generating random paths is handled by flytekit but I'm not sure if the logic there is very generic as it appeared to be tied to S3. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like you can. |
||||||
AwsS3ShardCount int `json:"awsS3ShardStringLength" pflag:", Number of characters for the S3 bucket shard prefix"` | ||||||
RoutingGroupConfigs []RoutingGroupConfig `json:"clusterConfigs" pflag:"-,A list of cluster configs. Each of the configs corresponds to a service cluster"` | ||||||
} | ||||||
|
||||||
// Retrieves the current config value or default. | ||||||
func GetPrestoConfig() *Config { | ||||||
return prestoConfigSection.GetConfig().(*Config) | ||||||
} | ||||||
|
||||||
func SetPrestoConfig(cfg *Config) error { | ||||||
return prestoConfigSection.SetConfig(cfg) | ||||||
} |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a noop Presto client for the open-source version. There will be a separate commit for one based on Mozart