-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add string functions in a string
module
#3913
Conversation
let ltrim = column -> <text> internal std.ltrim | ||
let rtrim = column -> <text> internal std.rtrim | ||
let trim = column -> <text> internal std.trim | ||
let length = column -> <int> internal std.length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine for now, but eventually for length
probably we want it within a string
namespace; given how general the term is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I'm planning to first open a PR to add the math
module
And for this one I plan to add a str
module
Would that be ok?
So we would have derive { x | str.lower | str.endswith "foo" }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A module str
has been added. Do you prefer string
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect re a str
module!
No particularly strong view on str
vs string
. Possibly we should pick the one that is not the name of the type.
I'd also previously thought that "string" was a bit too "engineer-y", and most folks think a string is something used for tying. So we currently have from_text
, for example. But I'm not sure we're going to win that battle! So I'm fine with any of those — str
/ string
/ text
.
Any thoughts from others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like most languages prefer string
or String
over str
and String
or text
.
- C# uses
string.Format
- D uses
std.string.isNumeric("123")
- Elixir uses
String.first("elixir")
- F# uses
String.length "Hello"
- Go uses
strings.ToLower("Hello")
- Java uses
String.format
- JavaScript uses
String.fromCharCode(65)
- Kotlin uses
String.format("Hello")
- Lua uses
string.lower("Hello")
- PowerShell uses
[string]::Format("Hello")
- Python uses
str.format
- Ruby uses
String::try_convert('Hello')
- Rust uses
String::from("Hello, world!");
- Scala uses
String.format("Hello")
- VB.NET uses
String.Format
0261574
to
3bb88f3
Compare
str
module
a5d0ad2
to
2bb4844
Compare
2bb4844
to
6500782
Compare
9707ce1
to
080c120
Compare
#[case::generic(sql::Dialect::Generic, "LIKE CONCAT('%', 'pika', '%')")] | ||
#[case::sqlite(sql::Dialect::SQLite, "LIKE '%' || 'pika' || '%'")] // `CONCAT` is not supported in SQLite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I had heard of rstest
but haven't used it before. (I use pytest a lot, which I think is similar)
Happy to try this! (Maybe @aljazerzen has thoughts?)
(alternatively, we could have a normal loop and supply different names for the snapshot name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a huge fan of pytest and the parametrize + fixtures mechanisms.
And I'm a huge fan of rstest for the exact same reasons. It is quite similar and I feel like it would be a great value to cover many dialects since prql will probably need more tests to ensure compatibility across all the supported DBs. In my company we support 5 SQL databases/DWs and everything is parametrized on that.
But I can definitely remove that if you prefer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very happy to try it for a while at least!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm +1 for rstest - on the macro-expansion level it does basically the same thing as for_each_file!
, but offers much more flexibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work.
Again, it would be nice to also have test-dbs tests.
#[case::generic(sql::Dialect::Generic, "LIKE CONCAT('%', 'pika', '%')")] | ||
#[case::sqlite(sql::Dialect::SQLite, "LIKE '%' || 'pika' || '%'")] // `CONCAT` is not supported in SQLite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm +1 for rstest - on the macro-expansion level it does basically the same thing as for_each_file!
, but offers much more flexibility.
for more information, see https://pre-commit.ci
@aljazerzen @eitsupi @vanillajonathan @richb-hanover Following this comment. Sorry to ping you but since you are main contributors on this repo, your opinion matters for the name of the module that would contain main functions on strings |
Thanks for mentioning me (and thanks for your great works!). |
The |
I'll weigh in to support the full name |
Ok I'll change for |
Great! |
str
modulestring
module
Thanks @PrettyWood ! These are great. Indeed — I think when developing the compiler, it's easy for us to focus too much on "do we have a good algorithm for right-associative precedence", etc — but actually adding an equivalent for |
As a followup, we might want to reconsider |
I am OK with changing these if folks prefer... |
Yeah, I think we should change those, because See: #3913 (comment) |
Sorry for coming late to this (I was away for a while) but I would throw in my vote in favour of I also think our earlier discussions around |
For me the big downside of text is bytes vs strings. How do you differentiate them? I agree that often people want strings but not always! and even here there are subtleties like encoding, chars vs graphemes, ... |
I think The closest I came was when working with MS SQL 15 years ago, you would sometimes get unexpected ordering depending on what collation you chose for your database. Is that still much of an issue these days? It's not something I've encountered in a long time so I really can't say. |
To be honest my interest in PRQL is to simplify the refactoring of a custom DSL we have that translates many steps into Mongo, Polars and many SQL dialects. |
Well, |
I strongly agree with @snth: regardless of how it is named, But I'm impartial on the name. A quick survey of exactly 1 person showed that |
A quick survey of a couple of others reinforces that "string" is actually not that well-known. From a program manager at SpaceX (i.e. someone who is close to lots of technical things without writing code herself):
|
Continue work on #217
closes #3238