-
Notifications
You must be signed in to change notification settings - Fork 323
/
Table.enso
1850 lines (1508 loc) · 90 KB
/
Table.enso
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
from Standard.Base import all
import Standard.Base.Data.Array_Proxy.Array_Proxy
import Standard.Base.Data.Index_Sub_Range as Index_Sub_Range_Module
import Standard.Base.Data.Ordering.Comparator
import Standard.Base.Error.Common.Index_Out_Of_Bounds
import Standard.Base.Error.Common.No_Such_Method
import Standard.Base.Error.Common.Type_Error
import Standard.Base.Error.File_Error.File_Error
import Standard.Base.Error.Illegal_Argument.Illegal_Argument
import Standard.Base.Error.Incomparable_Values.Incomparable_Values
import Standard.Base.Error.Unimplemented.Unimplemented
import project.Data.Aggregate_Column.Aggregate_Column
import project.Data.Column.Column
import project.Data.Column as Column_Module
import project.Data.Column_Name_Mapping.Column_Name_Mapping
import project.Data.Column_Selector.Column_Selector
import project.Data.Data_Formatter.Data_Formatter
import project.Data.Join_Condition.Join_Condition
import project.Data.Join_Kind.Join_Kind
import project.Data.Match_Columns.Match_Columns
import project.Data.Match_Columns as Match_Columns_Helpers
import project.Data.Position.Position
import project.Data.Report_Unmatched.Report_Unmatched
import project.Data.Row.Row
import project.Data.Set_Mode.Set_Mode
import project.Data.Sort_Column.Sort_Column
import project.Data.Storage.Storage
import project.Data.Value_Type.Value_Type
import project.Internal.Aggregate_Column_Helper
import project.Internal.Java_Problems
import project.Internal.Join_Helpers
import project.Internal.Parse_Values_Helper
import project.Internal.Problem_Builder.Problem_Builder
import project.Internal.Table_Helpers
import project.Internal.Table_Helpers.Table_Column_Helper
import project.Internal.Unique_Name_Strategy.Unique_Name_Strategy
import project.Internal.Widget_Helpers
import project.Data.Expression.Expression
import project.Data.Expression.Expression_Error
import project.Delimited.Delimited_Format.Delimited_Format
from project.Data.Column_Type_Selection import Column_Type_Selection, Auto
from project.Internal.Rows_View import Rows_View
from project.Errors import all
from project.Data.Column import get_item_string
from project.Internal.Filter_Condition_Helpers import make_filter_column
polyglot java import org.enso.table.data.column.builder.object.StorageTypeMismatch
polyglot java import org.enso.table.data.table.Table as Java_Table
polyglot java import org.enso.table.data.table.Column as Java_Column
polyglot java import org.enso.table.data.table.join.Equals as Java_Join_Equals
polyglot java import org.enso.table.data.table.join.EqualsIgnoreCase as Java_Join_Equals_Ignore_Case
polyglot java import org.enso.table.data.table.join.Between as Java_Join_Between
polyglot java import org.enso.table.operations.OrderBuilder
polyglot java import org.enso.table.data.mask.OrderMask
polyglot java import java.util.UUID
## Represents a column-oriented table data structure.
type Table
## Creates a new table from a vector of `[name, items]` pairs.
Arguments:
- columns: The `[name, items]` pairs to construct a new table from.
> Example
Create a new table with the given columns.
from Standard.Table import Table
example_new =
first_column = ["count", [1, 2, 3]]
second_column = ["is_valid", [True, False, True]]
Table.new [first_column, second_column]
new : Vector (Vector | Column) -> Table
new columns =
cols = columns.map c->
case c of
_ : Vector -> Column.from_vector (c.at 0) (c.at 1) . java_column
Column.Value java_col -> java_col
if cols.is_empty then Error.throw (Illegal_Argument.Error "Cannot create a table with no columns.") else
if (cols.all c-> c.getSize == cols.first.getSize).not then Error.throw (Illegal_Argument.Error "All columns must have the same row count.") else
if cols.distinct .getName . length != cols.length then Error.throw (Illegal_Argument.Error "Column names must be distinct.") else
Table.Value (Java_Table.new cols.to_array)
## Creates a new table from a vector of column names and a vector of vectors
specifying row contents.
Arguments:
- header: A list of texts specifying the column names
- rows: A vector of vectors, specifying the contents of each table row. The
length of each element of `rows` must be equal in length to `header`.
> Example
Create a table with 3 columns, named `foo`, `bar`, and `baz`, containing
`[1, 2, 3]`, `[True, False, True]`, and `['a', 'b', 'c']`, respectively.
from Standard.Table import Table
example_from_rows =
header = [ 'foo' , 'bar' , 'baz' ]
row_1 = [ 1 , True , 'a' ]
row_2 = [ 2 , False , 'b' ]
row_3 = [ 3 , True , 'c' ]
Table.from_rows header [row_1, row_2, row_3]
from_rows : Vector -> Vector -> Table
from_rows header rows =
columns = header.map_with_index i-> name-> [name, rows.map (_.at i)]
Table.new columns
## PRIVATE
A table.
Arguments:
- java_table: The internal java representation of the table.
Value java_table
## Returns a text containing an ASCII-art table displaying this data.
Arguments:
- show_rows: the number of initial rows that should be displayed.
- format_terminal: whether ANSI-terminal formatting should be used
> Example
Convert the table to a pretty-printed representation.
import Standard.Examples
example_display = Examples.inventory_table.display
display : Integer -> Boolean -> Text
display self show_rows=10 format_terminal=False =
cols = Vector.from_polyglot_array self.java_table.getColumns
index = self.java_table.getIndex
col_names = [index.getName] + cols.map .getName
col_vals = cols.map .getStorage
num_rows = self.row_count
display_rows = Math.min num_rows show_rows
rows = Vector.new display_rows row_num->
cols = col_vals.map col->
if col.isNa row_num then "Nothing" else get_item_string col row_num
[index.ilocString row_num] + cols
table = print_table col_names rows 1 format_terminal
if num_rows - display_rows <= 0 then table else
missing = '\n\u2026 and ' + (num_rows - display_rows).to_text + ' hidden rows.'
table + missing
## Prints an ASCII-art table with this data to the standard output.
Arguments:
- show_rows: the number of initial rows that should be displayed.
> Example
Convert the table to a pretty-printed representation and print it to
the console.
import Standard.Examples
example_print = Examples.inventory_table.print
print self show_rows=10 =
IO.println (self.display show_rows format_terminal=True)
IO.println ''
## Converts this table into a JS_Object.
> Example
Convert a table to a corresponding JavaScript JS_Object representation.
import Standard.Examples
example_to_json = Examples.inventory_table.to_js_object
to_js_object : JS_Object
to_js_object self =
cols = self.columns
rows = 0.up_to self.row_count . map row->
vals_kv = cols.map col-> [col.name, col.at row]
JS_Object.from_pairs vals_kv
rows
## Returns the column with the given name.
Arguments:
- selector: The name or index of the column being looked up.
> Example
Get the names of all of the items from the shop inventory.
import Standard.Examples
example_at = Examples.inventory_table.at "item_name"
> Example
Get the last column.
import Standard.Examples
example_at = Examples.inventory_table.at -1
@selector Widget_Helpers.make_column_name_selector
at : Text | Integer -> Column ! No_Such_Column | Index_Out_Of_Bounds
at self selector=0 = case selector of
_ : Integer ->
java_columns = Vector.from_polyglot_array self.java_table.getColumns
Column.Value (java_columns.at selector)
_ -> self.get selector (Error.throw (No_Such_Column.Error selector))
## Returns the column with the given name or index.
Arguments:
- selector: The name or index of the column being looked up.
- if_missing: The value to use if the selector isn't present.
> Example
Get the names of all of the items from the shop inventory.
import Standard.Examples
example_at = Examples.inventory_table.get "item_name"
> Example
Get the last column.
import Standard.Examples
example_at = Examples.inventory_table.get -1
@selector Widget_Helpers.make_column_name_selector
get : Text | Integer -> Any -> Column | Any
get self selector=0 ~if_missing=Nothing =
java_column = case selector of
_ : Integer -> Vector.from_polyglot_array self.java_table.getColumns . get selector
_ : Text -> self.java_table.getColumnByName selector
_ -> Error.throw (Illegal_Argument.Error "expected 'selector' to be either a Text or an Integer, but got "+(Meta.get_simple_type_name selector)+".")
if java_column.is_nothing then if_missing else Column.Value java_column
## Gets the first column.
first_column : Column ! Index_Out_Of_Bounds
first_column self = self.at 0
## Gets the second column
second_column : Column ! Index_Out_Of_Bounds
second_column self = self.at 1
## Gets the last column
last_column : Column ! Index_Out_Of_Bounds
last_column self = self.at -1
## Returns the number of columns in the table.
column_count : Integer
column_count self = self.java_table.getColumns.length
## Returns a new table with a chosen subset of columns, as specified by the
`columns`, from the input table. Any unmatched input columns will be
dropped from the output.
Arguments:
- columns: Column selection criteria - a single instance or Vector of
names, indexes or `Column_Selector`.
- reorder: By default, or if set to `False`, columns in the output will
be in the same order as in the input table. If `True`, the order in the
output table will match the order in the columns list. If a column is
matched by multiple selectors in reorder mode, it will be placed at
the position of the first one matched.
- error_on_missing_columns: Specifies if a missing input column should
result in an error regardless of the `on_problems` settings. Defaults
to `True`.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.
! Error Conditions
- If there are no columns in the output table, a `No_Output_Columns` is
raised as an error regardless of the problem behavior, because it is
not possible to create a table without any columns.
- If a column in `columns` is not in the input table, a
`Missing_Input_Columns` is raised as an error, unless
`error_on_missing_columns` is set to `False`, in which case the
problem is reported according to the `on_problems` setting.
- If a column index is out of range, a `Column_Indexes_Out_Of_Range` is
raised as an error, unless `error_on_missing_columns` is set to
`False`, in which case the problem is reported according to the
`on_problems` setting.
> Example
Select columns by name.
table.select_columns ["bar", "foo"]
> Example
Select columns using names passed as a Vector.
table.select_columns ["bar", "foo"]
> Example
Select columns matching a regular expression.
table.select_columns (Column_Selector.By_Name "foo.+" Case_Sensitivity.Insensitive use_regex=True)
> Example
Select the first two columns and the last column, moving the last one to front.
table.select_columns [-1, 0, 1] reorder=True
Icon: select_column
select_columns : Text | Integer | Column_Selector | Vector (Integer | Text | Column_Selector) -> Boolean -> Boolean -> Problem_Behavior -> Table ! No_Output_Columns | Missing_Input_Columns | Column_Indexes_Out_Of_Range
select_columns self columns=[0] (reorder = False) (error_on_missing_columns = True) (on_problems = Report_Warning) =
new_columns = self.columns_helper.select_columns selectors=columns reorder=reorder error_on_missing_columns=error_on_missing_columns on_problems=on_problems
Table.new new_columns
## Returns a new table with the chosen set of columns, as specified by the
`columns`, removed from the input table. Any unmatched input columns will
be kept in the output. Columns are returned in the same order as in the
input.
Arguments:
- columns: Column selection criteria - a single instance or Vector of
names, indexes or `Column_Selector`, which are to be removed.
- error_on_missing_columns: Specifies if a missing input column should
result in an error regardless of the `on_problems` settings. Defaults
to `False`.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.
! Error Conditions
- If there are no columns in the output table, a `No_Output_Columns` is
raised as an error regardless of the problem behavior, because it is
not possible to create a table without any columns.
- If a column in `columns` is not in the input table, a
`Missing_Input_Columns` is reported according to the `on_problems`
setting, unless `error_on_missing_columns` is set to `True`, in which
case it is raised as an error.
- If a column index is out of range, a `Column_Indexes_Out_Of_Range` is
reported according to the `on_problems` setting, unless
`error_on_missing_columns` is set to `True`, in which case it is
raised as an error.
> Example
Remove columns with given names.
table.remove_columns ["bar", "foo"]
> Example
Remove columns using names passed as a Vector.
table.remove_columns ["bar", "foo"]
> Example
Remove columns matching a regular expression.
table.remove_columns (Column_Selector.By_Name "foo.+" Case_Sensitivity.Insensitive use_regex=True)
> Example
Remove the first two columns and the last column.
table.remove_columns [-1, 0, 1]
remove_columns : Text | Integer | Column_Selector | Vector (Integer | Text | Column_Selector) -> Boolean -> Problem_Behavior -> Table ! No_Output_Columns | Missing_Input_Columns | Column_Indexes_Out_Of_Range
remove_columns self (columns=[0]) (error_on_missing_columns = False) (on_problems = Report_Warning) =
new_columns = self.columns_helper.remove_columns selectors=columns error_on_missing_columns=error_on_missing_columns on_problems=on_problems
Table.new new_columns
## Returns a new table with the specified selection of columns moved to
either the start or the end in the specified order.
Arguments:
- columns: Column selection criteria - a single instance or Vector of
names, indexes or `Column_Selector`, which should be reordered and
specifying their order.
- position: Specifies how to place the selected columns in relation to
the remaining columns which were not matched by `columns` (if any).
- error_on_missing_columns: Specifies if a missing input column should
result in an error regardless of the `on_problems` settings. Defaults
to `False`.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.
! Error Conditions
- If a column in `columns` is not in the input table, a
`Missing_Input_Columns` is reported according to the `on_problems`
setting, unless `error_on_missing_columns` is set to `True`, in which
case it is raised as an error.
- If a column index is out of range, a `Column_Indexes_Out_Of_Range` is
reported according to the `on_problems` setting, unless
`error_on_missing_columns` is set to `True`, in which case it is
raised as an error.
> Example
Move a column with a specified name to back.
table.reorder_columns ["foo"] position=Position.After_Other_Columns
> Example
Move columns using names passed as a Vector.
table.reorder_columns ["bar", "foo"] position=Position.After_Other_Columns
> Example
Move columns matching a regular expression to front, keeping columns matching "foo.+" before columns matching "b.*".
table.reorder_columns (Column_Selector.By_Name "foo.+" Case_Sensitivity.Insensitive use_regex=True)
> Example
Swap the first two columns.
table.reorder_columns [1, 0] position=Position.Before_Other_Columns
> Example
Move the first column to back.
table.reorder_columns [0] position=Position.After_Other_Columns
reorder_columns : Text | Integer | Column_Selector | Vector (Integer | Text | Column_Selector) -> Position -> Boolean -> Problem_Behavior -> Table ! Missing_Input_Columns | Column_Indexes_Out_Of_Range
reorder_columns self (columns = [0]) (position = Position.Before_Other_Columns) (error_on_missing_columns = False) (on_problems = Report_Warning) =
new_columns = self.columns_helper.reorder_columns selectors=columns position=position error_on_missing_columns=error_on_missing_columns on_problems=on_problems
Table.new new_columns
## Returns a new table with the columns sorted by name according to the
specified sort method. By default, sorting will be according to
case-sensitive ascending order based on the normalized Unicode ordering.
Arguments:
- order: Whether sorting should be in ascending or descending order.
- text_ordering: The sort methodology to use.
> Example
Sort columns according to the default ordering.
table.sort_columns
> Example
Sort columns according to the natural case-insensitive ordering.
table.sort_columns text_ordering=(Text_Ordering.Case_Insensitive sort_digits_as_numbers=True)
> Example
Sort columns in descending order.
table.reorder_columns Sort_Direction.Descending
sort_columns : Sort_Direction -> Text_Ordering -> Table
sort_columns self order=Sort_Direction.Ascending text_ordering=Text_Ordering.Default =
new_columns = Table_Helpers.sort_columns internal_columns=self.columns order text_ordering
Table.new new_columns
## Returns a new table with the columns renamed based on either a mapping
from the old name to the new or a positional list of new names.
Arguments:
- column_map: Mapping from old column names to new or a vector of new
column names to apply by position.
- error_on_missing_columns: Specifies if a missing input column should
result in an error regardless of the `on_problems` settings. Defaults
to `True`.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.
! Error Conditions
- If a column in `columns` is not in the input table, a
`Missing_Input_Columns` is raised as an error, unless
`error_on_missing_columns` is set to `False`, in which case the
problem is reported according to the `on_problems` setting.
- If a column index is out of range, a `Column_Indexes_Out_Of_Range` is
raised as an error, unless `error_on_missing_columns` is set to
`False`, in which case the problem is reported according to the
`on_problems` setting.
- Other problems are reported according to the `on_problems` setting:
- If a column is matched by two selectors resulting in a different
name mapping, a `Ambiguous_Column_Rename`.
- If in `By_Position` mode and more names than columns are
provided, a `Too_Many_Column_Names_Provided`.
- If any of the new names are invalid, an
`Invalid_Output_Column_Names`.
- If any of the new names clash either with existing names or each
other, a `Duplicate_Output_Column_Names`.
> Example
Rename the first column to "FirstColumn"
table.rename_columns (Column_Name_Mapping.By_Position ["FirstColumn"])
> Example
Rename the first column to "FirstColumn" passed as a Vector
table.rename_columns ["FirstColumn"]
> Example
Add a prefix to all column names.
table.rename_columns (table.columns.map c-> "prefix_" + c.name)
> Example
For all columns starting with the prefix `name=`, replace it with `key:`.
table.rename_columns (Column_Name_Mapping.By_Name (Map.from_vector [["name=(.*)", "key:$1"]]) Regex_Matcher.Value)
rename_columns : Map | Vector Text | Column_Name_Mapping -> Boolean -> Problem_Behavior -> Table ! Missing_Input_Columns | Column_Indexes_Out_Of_Range | Ambiguous_Column_Rename | Too_Many_Column_Names_Provided | Invalid_Output_Column_Names | Duplicate_Output_Column_Names
rename_columns self (column_map=(Column_Name_Mapping.By_Position ["Column"])) (error_on_missing_columns=True) (on_problems=Report_Warning) = case column_map of
_ : Vector ->
self.rename_columns (Column_Name_Mapping.By_Position column_map) error_on_missing_columns on_problems
_ : Map ->
self.rename_columns (Column_Name_Mapping.By_Name column_map) error_on_missing_columns on_problems
_ ->
case Table_Helpers.rename_columns internal_columns=self.columns mapping=column_map error_on_missing_columns=error_on_missing_columns on_problems=on_problems of
new_names ->
new_columns = self.columns.map_with_index i->c->(c.rename (new_names.at i))
Table.new new_columns
## Returns a new table with the columns renamed based on entries in the
first row.
Arguments:
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.
The following problems can occur:
- If any of the new names are invalid, an
`Invalid_Output_Column_Names`.
- If any of the new names clash either with existing names or each
other, a Duplicate_Output_Column_Names.
> Example
Rename the column based on the first row
table.use_first_row_as_names
use_first_row_as_names : Problem_Behavior -> Table
use_first_row_as_names self (on_problems=Report_Warning) =
mapper = col->
val = col.at 0
case val of
_ : Text -> val
Nothing -> Nothing
_ -> val.to_text
new_names = self.columns.map mapper
self.drop (First 1) . rename_columns (Column_Name_Mapping.By_Position new_names) on_problems=on_problems
## ALIAS group, summarize
Aggregates the rows in a table using any `Group_By` entries in columns.
The columns argument specifies which additional aggregations to perform and to return.
Arguments:
- columns: Vector of `Aggregate_Column` specifying the aggregated table.
Expressions can be used within the aggregate column to perform more
complicated calculations.
- error_on_missing_columns: Specifies if a missing columns in aggregates
should result in an error regardless of the `on_problems` settings.
Defaults to `False`, meaning that problematic aggregate will not be
included in the result and the problem reported according to the
`on_problems` setting.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.
! Error Conditions
- If there are no columns in the output table, a `No_Output_Columns` is
raised as an error regardless of the problem behavior, because it is
not possible to create a table without any columns.
- If a column index is out of range, a `Column_Indexes_Out_Of_Range` is
reported according to the `on_problems` setting, unless
`error_on_missing_columns` is set to `True`, in which case it is
raised as an error. Problems resolving `Group_By` columns are
reported as dataflow errors regardless of these settings, as a
missing grouping will completely change semantics of the query.
- If a column selector is given as a `Text` and it does not match any
columns in the input table nor is it a valid expression, an
`Invalid_Aggregate_Column` problem is raised according to the
`on_problems` settings (unless `error_on_missing_columns` is set to
`True` in which case it will always be an error). Problems resolving
`Group_By` columns are reported as dataflow errors regardless of
these settings, as a missing grouping will completely change
semantics of the query.
- If an aggregation fails, an `Invalid_Aggregation` dataflow error is
raised.
- Additionally, the following problems may be reported according to the
`on_problems` setting:
- If there are invalid column names in the output table,
a `Invalid_Output_Column_Names`.
- If there are duplicate column names in the output table,
a `Duplicate_Output_Column_Names`.
- If grouping on or computing the `Mode` on a floating point number,
a `Floating_Point_Equality`.
- If when concatenating values there is an quoted delimited,
an `Unquoted_Delimiter`
- If there are more than 10 issues with a single column,
an `Additional_Warnings`.
> Example
Group by the Key column, count the rows
table.aggregate [Aggregate_Column.Group_By "Key", Aggregate_Column.Count]
aggregate : Vector Aggregate_Column -> Boolean -> Problem_Behavior -> Table ! No_Output_Columns | Invalid_Aggregate_Column | Invalid_Output_Column_Names | Duplicate_Output_Column_Names | Floating_Point_Equality | Invalid_Aggregation | Unquoted_Delimiter | Additional_Warnings
aggregate self columns (error_on_missing_columns=False) (on_problems=Report_Warning) =
validated = Aggregate_Column_Helper.prepare_aggregate_columns columns self error_on_missing_columns=error_on_missing_columns
on_problems.attach_problems_before validated.problems <| Illegal_Argument.handle_java_exception <|
java_key_columns = validated.key_columns.map .java_column
index = self.java_table.indexFromColumns java_key_columns.to_array
new_columns = validated.valid_columns.map c->(Aggregate_Column_Helper.java_aggregator c.first c.second)
java_table = index.makeTable new_columns.to_array
new_table = Table.Value java_table
on_problems.attach_problems_after new_table <|
problems = java_table.getProblems
Java_Problems.parse_aggregated_problems problems
## ALIAS sort
Sorts the rows of the table according to the specified columns and order.
Arguments:
- columns: The columns and order to sort the table.
- text_ordering: The ordering method to use on text values.
- error_on_missing_columns: Specifies if a missing input column should
result in an error regardless of the `on_problems` settings. Defaults
to `True`.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.
! Error Conditions
- If a column in `columns` is not in the input table, a
`Missing_Input_Columns` is raised as an error, unless
`error_on_missing_columns` is set to `False`, in which case the
problem is reported according to the `on_problems` setting.
- If a column index is out of range, a `Column_Indexes_Out_Of_Range` is
raised as an error, unless `error_on_missing_columns` is set to
`False`, in which case the problem is reported according to the
`on_problems` setting.
- If no columns have been selected for ordering,
a `No_Input_Columns_Selected` is raised as dataflow error regardless
of any settings.
- If a column used for ordering contains values that cannot be
compared, an `Incomparable_Values` error is raised.
? Missing Values
Missing (`Nothing`) values are sorted as less than any other object.
> Example
Sorting `table` in ascending order by the value in column `'Quantity'`.
table.order_by ['Quantity']
> Example
Sorting `table` in descending order by the value in column `'Quantity'`.
table.order_by [Sort_Column.Name 'Quantity' Sort_Direction.Descending]
> Example
Sorting `table` in ascending order by the value in column `'Quantity'`,
using the value in column `'Rating'` for breaking ties.
table.order_by ['Quantity', 'Rating']
> Example
Sorting `table` in ascending order by the value in column `'Quantity'`,
using the value in column `'Rating'` in descending order for breaking
ties.
table.order_by [Sort_Column.Name 'Quantity', Sort_Column.Name 'Rating' Sort_Direction.Descending]
> Example
Order the table by the second column in ascending order. In case of any
ties, break them based on the 7th column from the end of the table in
descending order.
table.order_by [1, Sort_Column.Index -7 Sort_Direction.Descending]
> Example
Sort the table by columns whose names start with letter `a`.
table.order_by [(Sort_Column.Select_By_Name "a.*" use_regex=True case_sensitivity=Case_Sensitivity.Insensitive)]
order_by : Text | Sort_Column | Vector (Text | Sort_Column) -> Text_Ordering -> Boolean -> Problem_Behavior -> Table ! Incomparable_Values | No_Input_Columns_Selected | Missing_Input_Columns | Column_Indexes_Out_Of_Range
order_by self (columns = ([(Sort_Column.Name (self.columns.at 0 . name))])) text_ordering=Text_Ordering.Default error_on_missing_columns=True on_problems=Problem_Behavior.Report_Warning =
problem_builder = Problem_Builder.new error_on_missing_columns=error_on_missing_columns types_to_always_throw=[No_Input_Columns_Selected]
columns_for_ordering = Table_Helpers.prepare_order_by self.columns columns problem_builder
problem_builder.attach_problems_before on_problems <|
java_columns = columns_for_ordering.map c->
c.column.java_column
directions = columns_for_ordering.map c->
c.associated_selector.direction.to_sign
comparator = Comparator.for_text_ordering text_ordering
java_table = Illegal_Argument.handle_java_exception <| Incomparable_Values.handle_errors <|
self.java_table.orderBy java_columns.to_array directions.to_array comparator
Table.Value java_table
## Returns the distinct set of rows within the specified columns from the
input table.
When multiple rows have the same values within the specified columns, the
first row of each such set is returned if possible, but in database
backends any row from each set may be returned (for example if the row
ordering is unspecified).
For the in-memory table, the unique rows will be in the order they
occurred in the input (this is not guaranteed for database operations).
Arguments:
- columns: The columns of the table to use for distinguishing the rows.
- case_sensitivity: Specifies if the text values should be compared case
sensitively.
- on_problems: Specifies how to handle if a problem occurs, raising as a
warning by default.
! Error Conditions
- If there are no columns in the output table, a `No_Output_Columns` is
raised as an error regardless of the problem behavior, because it is
not possible to create a table without any columns.
- If a column in `columns` is not in the input table, a
`Missing_Input_Columns` is raised as an error, unless
`error_on_missing_columns` is set to `False`, in which case the
problem is reported according to the `on_problems` setting.
- If no valid columns are selected, a `No_Input_Columns_Selected`, is
reported as a dataflow error regardless of setting.
- If floating points values are present in the distinct columns, a
`Floating_Point_Equality` is reported according to the `on_problems`
setting.
distinct : Text | Integer | Column_Selector | Vector (Integer | Text | Column_Selector) -> Case_Sensitivity -> Boolean -> Problem_Behavior -> Table ! No_Output_Columns | Missing_Input_Columns | No_Input_Columns_Selected | Floating_Point_Equality
distinct self (columns = self.column_names) case_sensitivity=Case_Sensitivity.Default error_on_missing_columns=True on_problems=Report_Warning =
key_columns = self.columns_helper.select_columns selectors=columns reorder=True error_on_missing_columns=error_on_missing_columns on_problems=on_problems . catch No_Output_Columns _->
Error.throw No_Input_Columns_Selected
java_columns = key_columns.map .java_column
text_folding_strategy = Case_Sensitivity.folding_strategy case_sensitivity
java_table = Illegal_Argument.handle_java_exception <|
self.java_table.distinct java_columns.to_array text_folding_strategy
on_problems.attach_problems_after (Table.Value java_table) <|
problems = java_table.getProblems
Java_Problems.parse_aggregated_problems problems
## Parses columns within a Table to a specific value type.
By default, it looks at all `Text` columns and attempts to deduce the
type (columns with other types are not affected). If `column_types` are
provided, only selected columns are parsed, according to the specified
type.
The default parser options only parse values where the process is
reversible (e.g., 0123 would not be converted to an integer as there is
a leading 0). However, settings in the `Data_Formatter` can
control this.
parse_values : Data_Formatter -> (Nothing | Vector Column_Type_Selection) -> Problem_Behavior -> Table
parse_values self value_formatter=Data_Formatter.Value column_types=Nothing on_problems=Report_Warning =
columns = self.columns
problem_builder = Vector.new_builder
find_datatype index column =
matching_input = column_types.filter selection->
selector = selection.column
case selector of
_ : Text -> column.name == selector
_ : Integer -> if selector >= 0 then index == selector else
index == columns.length + selector
if matching_input.length == 0 then Nothing else
if matching_input.length == 1 then matching_input.first.datatype else
first_type = matching_input.first.datatype
ambiguous = matching_input.any s-> s.datatype != first_type
problem_builder.append (Duplicate_Type_Selector.Error column.name ambiguous)
if ambiguous then Nothing else first_type
expected_types = case column_types of
Nothing -> columns.map _->Auto
_ ->
missing_columns = Vector.new_builder
invalid_indices = Vector.new_builder
column_types.each selection->
selector = selection.column
case selector of
_ : Integer ->
valid = Table_Helpers.is_index_valid columns.length selector
if valid.not then
invalid_indices.append selector
_ : Text ->
found = columns.any col-> col.name == selector
if found.not then
missing_columns.append selector
if missing_columns.is_empty.not then
problem_builder.append (Missing_Input_Columns.Error missing_columns.to_vector)
if invalid_indices.is_empty.not then
problem_builder.append (Column_Indexes_Out_Of_Range.Error invalid_indices.to_vector)
columns.map_with_index find_datatype
new_columns = columns.zip expected_types column-> expected_type-> case expected_type of
Nothing -> column
_ ->
parser = if expected_type == Auto then value_formatter.make_auto_parser else
value_formatter.make_datatype_parser expected_type
storage = column.java_column.getStorage
new_storage_and_problems = parser.parseColumn column.name storage
new_storage = new_storage_and_problems.value
problems = Vector.from_polyglot_array new_storage_and_problems.problems . map (Parse_Values_Helper.translate_parsing_problem expected_type)
problems.each problem_builder.append
Column.Value (Java_Column.new column.name new_storage)
## TODO [RW] this case of is a workaround for wrong dataflow handling on arrays, it can be removed once the PR fixing it is merged, the relevant PR is:
https://github.com/enso-org/enso/pull/3400
result = Table.new new_columns
on_problems.attach_problems_after result problem_builder.to_vector
## Replaces the first, last, or all occurrences of `term` with
`new_text` in each text row of selected columns.
If `term` is empty, the function returns the table unchanged.
This method follows the exact replacement semantics of the
`Text.replace` method.
Arguments:
- columns: Column selection criteria or a column name or index.
- term: The term to find.
- new_text: The new text to replace occurrences of `term` with.
If `matcher` is a `Regex_Matcher`, `new_text` can include replacement
patterns (such as `$<n>`) for a marked group.
- mode: Specifies which occurences of term the engine tries to find. When the
mode is `First` or `Last`, this method replaces the first or last occurence
of term in each individual table cell. If set to `All`, it replaces all
occurences of term.
- matcher: If a `Text_Matcher`, the text is compared using case-sensitivity
rules specified in the matcher. If a `Regex_Matcher`, the term is used as a
regular expression and matched using the associated options.
- on_problems: Specifies how to handle if a problem occurs, raising as a
warning by default.
The following problems can occur:
- If a column in columns is not in the input table, a `Missing_Input_Columns`.
- If a column index is out of range, a `Column_Indexes_Out_Of_Range`.
- If a column in columns does not have a storage type of `Text`, or `Any`,
thus it is guaranteed that it can't contain any text values, a
`Invalid_Value_Type`.
> Example
Replace dashes with underscores in a column named "variable_names".
table.replace_text "variable_names" "-" "_"
> Example
Remove leading and trailing spaces from cells in multiple columns.
table.replace_text By_Name ["foo", "bar"] "^\s*(.*?)\s*$" "$1" matcher=Regex_Matcher.Value
> Example
Replace texts in quotes with parentheses in column at index 1.
table.replace_text 1 '"(.*?)"' '($1)' matcher=Regex_Matcher.Value
replace_text : Text | Integer | Column_Selector | Vector (Integer | Text | Column_Selector) -> Text -> Text -> Matching_Mode.First | Matching_Mode.Last | Regex_Mode -> (Text_Matcher | Regex_Matcher) -> Problem_Behavior -> Table
replace_text self columns=[0] term="" new_text="" mode=Regex_Mode.All matcher=Text_Matcher.Case_Sensitive on_problems=Problem_Behavior.Report_Warning = if term.is_empty then self else
problem_builder = Problem_Builder.new
selection = self.columns_helper.select_columns_helper columns reorder=False problem_builder
selected_names = Map.from_vector (selection.map column-> [column.name, True])
map_preserve_name column f = column.map f . rename column.name
do_replace = _.replace term new_text mode matcher
do_replace_only_text = case _ of
item : Text -> do_replace item
item -> item
transform column = case column.storage_type of
Storage.Text -> map_preserve_name column do_replace
Storage.Any -> map_preserve_name column do_replace_only_text
_ ->
problem = Invalid_Value_Type.Error Value_Type.Char column.value_type
problem_builder.report_other_warning problem
column
new_columns = self.columns.map column->
is_selected = selected_names.get column.name False
if is_selected then transform column else column
result = Table.new new_columns
problem_builder.attach_problems_after on_problems result
## ALIAS Filter Rows
Selects only the rows of this table that correspond to `True` values of
`filter`.
Arguments:
- column: The column to use for filtering. Can be a column name, index or
the `Column` object itself.
- filter: The filter to apply to the column. It can either be an instance
of `Filter_Condition` or a predicate taking a cell value and returning
a boolean value indicating whether the corresponding row should be kept
or not.
- on_problems: Specifies how to handle if a non-fatal problem occurs,
attaching a warning by default.
! Error Conditions
- If a column name cannot be found, a `No_Such_Column` dataflow error
is raised.
- If a column index is invalid, an `Index_Out_Of_Bounds` dataflow error
is raised.
- If the column is an invalid type for the filter, an
`Invalid_Value_Type` dataflow error is raised.
- Additionally, the following problems may be reported according to the
`on_problems` setting:
- If filtering by equality on a floating-point column,
a `Floating_Point_Equality`.
> Example
Get people older than 30.
people.filter "Age" (Greater 30)
> Example
Filter people between 30 and 40.
people.filter "Age" (Between 30 40)
> Example
Select rows where more than 50% of the stock is sold.
table.filter "sold_stock" (Greater (table.at "total_stock" / 2))
> Example
Select people celebrating a jubilee.
people.filter "age" (age -> (age%10 == 0))
@column Widget_Helpers.make_column_name_selector
@filter Filter_Condition.default_widget
filter : (Column | Text | Integer) -> (Filter_Condition|(Any->Boolean)) -> Problem_Behavior -> Table ! No_Such_Column | Index_Out_Of_Bounds | Invalid_Value_Type
filter self column filter=(Filter_Condition.Is_True) on_problems=Report_Warning = case column of
_ : Column ->
mask filter_column = Table.Value (self.java_table.mask filter_column.java_column)
case filter of
_ : Filter_Condition -> mask (make_filter_column column filter on_problems)
_ : Function -> mask (column.map filter)
_ ->
table_at = self.at column
self.filter table_at filter on_problems
## ALIAS Filter Rows
Selects only the rows of this table that correspond to `True` values of
`filter`.
Arguments:
- expression: The expression to evaluate to filter the rows.
- on_problems: Specifies how to handle non-fatal problems, attaching a
warning by default.
! Error Conditions
- If a column name cannot be found, a `No_Such_Column` dataflow error
is raised.
- If the provided expression is invalid, a corresponding
`Expression_Error` dataflow error is raised.
- If the expression returns a column that does not have a boolean type,
an `Invalid_Value_Type` dataflow error is raised.
- Additionally, the following problems may be reported according to the
`on_problems` setting:
- If the expression checks equality on a floating-point column,
a `Floating_Point_Equality`.
- If an arithmetic error occurs when computing the expression,
an `Arithmetic_Error`.
- If more than 10 rows encounter computation issues,
an `Additional_Warnings`.
> Example
Select people celebrating a jubilee.
people.filter_by_expression "[age] % 10 == 0"
filter_by_expression : Text -> Problem_Behavior -> Table ! No_Such_Column | Invalid_Value_Type | Expression_Error
filter_by_expression self expression on_problems=Report_Warning =
column = self.compute expression on_problems
self.filter column Filter_Condition.Is_True
## PRIVATE
with_no_rows self = self.take (First 0)
## Creates a new Table with the specified range of rows from the input
Table.
Arguments:
- range: The selection of rows from the table to return.
For the purposes of the `Index_Sub_Range.While` predicate a single
"element" of the table is represented by the `Row` type.
> Example
Take first 10 rows of the table.
table.take (First 10)
> Example
Take rows from the top of the table as long as their values sum to 10.
table.take (While row-> row.to_vector.compute Statistic.Sum == 10)
take : (Index_Sub_Range | Range | Integer) -> Table
take self range=(First 1) =
Index_Sub_Range_Module.take_helper self.row_count self.rows.at self.slice (slice_ranges self) range
## Creates a new Table from the input with the specified range of rows
removed.
Arguments:
- range: The selection of rows from the table to remove.
For the purposes of the `Index_Sub_Range.While` predicate a single
"element" of the table is represented by the `Row` type.