We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a new method/transformer to limit.
Look at the Spark documentation: https://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.limit Pandas doesn't have one. Maybe the backend can use df.head() or df.sample if we want it to be random.
The text was updated successfully, but these errors were encountered:
It should be head for pandas
For limit, if the order by is not specified, then returning anything is valid, we don't have to sample the dataset
we may expect something like this:
with FugueWorkflow() as dag: df = dag.df([[0,1],[0,2],[1,3]],"a:int,b:int") df.partition(by=["a"], presort="b desc").limit(1).show()
it should extract [0,2],[1,3]
[0,2],[1,3]
another case, if there is no partition by, it should be something like:
with FugueWorkflow() as dag: df = dag.df([[0,1],[0,2],[1,3]],"a:int,b:int") df.limit(1, presort="a,b desc").show()
it should extract [0,2]
[0,2]
for the simplest case:
with FugueWorkflow() as dag: df = dag.df([[0,1],[0,2],[1,3]],"a:int,b:int") df.limit(1).show()
returning any row of df should be valid because for a distributed environment, we don't guarantee order cross partitions
For this problem, we need to firstly implement limit on engine level, then on workflow level.
Sorry, something went wrong.
kvnkho
Successfully merging a pull request may close this issue.
Implement a new method/transformer to limit.
Look at the Spark documentation: https://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.limit
Pandas doesn't have one. Maybe the backend can use df.head() or df.sample if we want it to be random.
The text was updated successfully, but these errors were encountered: