-
Notifications
You must be signed in to change notification settings - Fork 676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-14673: Add bin/solr stream CLI #2479
Conversation
… to modern SolrCLI
So much easier without having to add more bash scripts now!
…eters for headers, delimiters etc...
A few high-level questions/concerns:
|
Thanks for sharing the feedback @gerlowskija ! I think the value of the tool is only there if your second comment about being able to run a streaming expression locally is valid, and then having it do what yoru first comment highlights falls out easy, otherwise it really is a thin wrapper/duplication of the I do believe the second part is the really cool thing, that I can run a streaming expression locally and use it to process some data. We clearly need some way of specifying where the processing is happening, in the cluster or locally. I was trying to think if we have any other places in Solr where we define "Where am I doing work" that might provide a name for a parameter. Reading through docs more, we have the I have found that lots of streaming expressions don't require a Solr connection, especially during development. I'm just iterating on the logic, and I'm starting and ending iwth tuples.. it's only later when I get the mappings etc working that I then move to adding in my Also, as far as docs go, we have a LONG way to go in Streaming expressions. It's both the best docuemnted code, with all the howtos and guides, but also, I find a million expressions that exist but don't show up in our reference docs ;-). |
I went with the plural name --workers solr, and then you pass in a collection. However, I could imagine that this becomes --workers my_collection,films,worker_collection on your local solr... Not quite sure what passing more then one in means however...
@gerlowskija since you provided some early review, do you think the docs I've added etc are enough that I can merge this in? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for those added docs @epugh . They're a huge help, and I suggested a few tweaks that might help even more.
One remaining question I have - wdyt about marking the tool syntax as "experimental" in some way? Seeing all the hard work you've put into improving syntax on the other tools, and considering that we might not notice some rough edges to the syntax of this tool until it's out in the wild a bit...might be prudent to give this script equivalent of "@lucene.experimental" so that we wouldn't need to worry about backcompat if we want to make any future tweaks?
The Stream tool allows you to run a xref:streaming-expressions.adoc[] and see the results from the command line. | ||
It is very similar to the xref:stream-screen.adoc[], but is part of the `bin/solr` CLI. | ||
|
||
To run it, open a window and enter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[0] "window" -> "terminal" or maybe "shell"?
-u,--credentials <credentials> Credentials in the format username:password. Example: --credentials solr:SolrRocks | ||
-url,--solr-url <HOST> Base Solr URL, which can be used to determine the zk-host if that's not known; | ||
defaults to: http://localhost:8983. | ||
-e,--execution <CONTEXT> Execution context is either 'local' or 'solr'. Default is 'solr' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[0] I don't love this terminology, but I don't have anything better in mind (yet). "Local" to me could be misconstrued by folks running 'bin/solr' on a box that also happens to have Solr running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on:
Execution context is either 'local' (i.e CLI process) or 'solr'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noodled on this a bit but still couldn't come up with anything I liked better, so maybe you picked the best option. +1 for the current enum values with the "(i.e. CLI process)" clarification you suggested.
|
||
Caveats: | ||
|
||
* You don't get to use any of the parallelization support that is available when you run the expression on the cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Q] Is this only a limitation of --execution=local
is specified?
Hello World | ||
---- | ||
|
||
This also works with a `.expr` files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[0] "also works with a .expr
files." -> "also works when using .expr
files."
) | ||
---- | ||
|
||
Running this expression will read in the local file and send the first two lines to the collection `gettingstarted`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[0] Might be worth hitting this point a little more strongly: even "local" processing is likely to reach out to a remote host.
Maybe something like:
All streaming expressions are processed "locally" if that execution mode is selected.
However, "local" processing does not imply a networking sandbox.
Many streaming expressions, such assearch
andupdate
, will make network requests to remote Solr nodes if configured to do so, even in "local" execution mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I took a stab at it... If I didn't quite nail it, please edit it but with myabe the suggestion thing?
== Using the bin/solr stream Tool | ||
|
||
NOTE: The Stream Tool is classified as "experimental". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth putting this at the top of the page. Seems a little "buried" as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Left one last comment about the "experimental" designation, but feel free to ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try the stream command next, but here are a few points to look into while I'm trying stream expressions out. :)
LineNumberReader bufferedReader = null; | ||
String expr; | ||
try { | ||
Reader inputStream = | ||
expressionArgument.toLowerCase(Locale.ROOT).endsWith("expr") | ||
? new InputStreamReader( | ||
new FileInputStream(expressionArgument), Charset.defaultCharset()) | ||
: new StringReader(expressionArgument); | ||
|
||
bufferedReader = new LineNumberReader(inputStream); | ||
expr = StreamTool.readExpression(bufferedReader, cli.getArgs()); | ||
echoIfVerbose("Running Expression: " + expr); | ||
} finally { | ||
if (bufferedReader != null) { | ||
bufferedReader.close(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot happening in the runImpl
, maybe extracting the two blocks into separate private methods would keep a better overview? (this is block no 1)
public static String[] getOutputFields(CommandLine cli) { | ||
if (cli.hasOption("fields")) { | ||
|
||
String fl = cli.getOptionValue("fields"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use option object instead for consistency.
|
||
@SuppressWarnings({"rawtypes"}) | ||
public Map getLetParams() { | ||
return this.letParams; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method has no usage, is it needed? And if so, should we not simply add the types to Map<String, Object>
rather than suppressing them?
A few more notes:
|
@gerlowskija @malliaridis What do you think about using the word "environment" versus "context" for the option you pass for execution? I.e, your environment is either local or solr... Context to me means more like "this is athing that you set up that has lots of variables".... "enviroment" is where you run it...? |
Leaning towards "execution" and either "local" or "remote". |
@malliaridis in reference to your question, yeah -z never has a scheme, but -s always does. Why exactly, not totally sure.. I guess that ZK communciates over same port regardless of ssl or not? |
https://issues.apache.org/jira/browse/SOLR-14673
Description
Bring in code that @joel-bernstein wrote, but using the SolrCLI infrastructure. The original code is a patch in the associated JIRA.
Solution
Another CLI client ;-)
Tests
Copied over the basic tests from the patch. I still need to write an integration style test and ideally one that exercies the basic auth.
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.