Add `split` and `tokenize` to the `Table`. #5125

wdanilo · 2023-02-05T22:48:46Z

This task is automatically imported from the old Task Issue Board and it was originally created by James Dunkerley.
Original issue is here.

Table variant of the Text.split and Text.tokenize.
Split to rows or to new columns.
In-Memory for v1, should report unsupported on the database.

## Splits a column of text into a set of new columns.
   The original column will be removed from the table.
   The new columns will be named with the name of the input with a incrementing number after.

   Arguments:
   - column: The column to split the text of.
   - delimiter: The term or terms used to split the text.
   - column_count: The number of columns to split to. 
     If `Auto` then columns will be added to fit all data.
     If the data exceeds the number of columns, a `Column_Count_Exceeded` error 
     will follow the `on_problems` behavior.
   - on_problems: Specifies the behavior when a problem occurs.
Table.split_to_columns : Text | Integer -> Text -> Auto | Integer -> Problem_Behavior -> Table
Table.split_to_columns self column delimiter="," column_count=Auto on_problems=Report_Error = ...

## Splits a column of text into a set of new rows.
   The values of other columns are repeated for the new rows.

   Arguments:
   - column: The column to split the text of.
   - delimiter: The term or terms used to split the text.
   - on_problems: Specifies the behavior when a problem occurs.
Table.split_to_rows : Text | Integer -> Text -> Problem_Behavior -> Table
Table.split_to_rows self column delimiter="," on_problems=Report_Error = ...

## Splits a column of text into a set of new columns using a regular expression.
   If the pattern contains marked groups, the values are concatenated together 
   otherwise the whole match is returned.
   The original column will be removed from the table.
   The new columns will be named with the name of the input with a incrementing number after.

   Arguments:
   - column: The column to tokenize the text of.
   - pattern: The pattern used to find within the text.
   - case_sensitivity: Specifies if the text values should be compared case
     sensitively.
   - column_count: The number of columns to split to. 
     If `Auto` then columns will be added to fit all data.
     If the data exceeds the number of columns, a `Column_Count_Exceeded` error 
     will follow the `on_problems` behavior.
   - on_problems: Specifies the behavior when a problem occurs.
Table.tokenize_to_columns : Text | Integer -> Text -> Case_Sensitivity -> Auto | Integer -> Problem_Behavior -> Table
Table.tokenize_to_columns self column pattern="." case_sensitivity=Case_Sensitivity.Sensitive column_count=Auto on_problems=Report_Error =

## Takes a regular expression pattern and returns all the matches as new rows.
   If the pattern contains marked groups, the values are concatenated together 
   otherwise the whole match is returned.
   The values of other columns are repeated for the new rows.

   Arguments:
   - column: The column to split the text of.
   - pattern: The pattern used to find within the text.
   - case_sensitivity: Specifies if the text values should be compared case
     sensitively.
   - on_problems: Specifies the behavior when a problem occurs.
Table.tokenize_to_rows : Text | Integer -> Text -> Case_Sensitivity -> Vector Text
Table.tokenize_to_rows self column pattern="." case_sensitivity=Case_Sensitivity.Sensitive =

The text was updated successfully, but these errors were encountered:

enso-bot · 2023-04-05T20:19:38Z

Greg Travis reports a new STANDUP for today (2023-04-05):

Progress: tests, research, start implementation of 5125 It should be finished by 2023-04-10.

Next Day: split to rows

enso-bot · 2023-04-06T20:26:49Z

Greg Travis reports a new STANDUP for today (2023-04-06):

Progress: implemented + factored Table.split and .tokenize, just basic functionality, and tests It should be finished by 2023-04-10.

Next Day: column max

enso-bot · 2023-04-07T22:05:40Z

Greg Travis reports a new STANDUP for today (2023-04-07):

Progress: problem handling for split + tokenize It should be finished by 2023-04-10.

Next Day: more

enso-bot · 2023-04-10T20:33:55Z

Greg Travis reports a new STANDUP for today (2023-04-10):

Progress: Error handling and tests It should be finished by 2023-04-10.

Next Day: more

enso-bot · 2023-04-11T21:23:42Z

Greg Travis reports a new 🔴 DELAY for today (2023-04-11):

Summary: There is 1 day delay in implementation of the Add split and tokenize to the Table. (#5125) task.
It will cause 0 days delay for the delivery of this weekly plan.

Delay Cause: optimize *_to_rows

enso-bot · 2023-04-11T21:23:50Z

Greg Travis reports a new STANDUP for today (2023-04-11):

Progress: finish features, review, more tests, optimize *_to_rows It should be finished by 2023-04-11.

Next Day: optimize *_to_columns

enso-bot · 2023-04-12T20:25:44Z

Greg Travis reports a new 🔴 DELAY for today (2023-04-12):

Summary: There is 1 day delay in implementation of the Add split and tokenize to the Table. (#5125) task.
It will cause 0 days delay for the delivery of this weekly plan.

Delay Cause: optimize *_to_cols

enso-bot · 2023-04-12T20:26:04Z

Greg Travis reports a new STANDUP for today (2023-04-12):

Progress: finish features, review, more tests, optimize *_to_rows It should be finished by 2023-04-12.

Next Day: review, 5126

enso-bot · 2023-04-13T20:23:18Z

Greg Travis reports a new 🔴 DELAY for today (2023-04-13):

Summary: There is 1 day delay in implementation of the Add split and tokenize to the Table. (#5125) task.
It will cause 0 days delay for the delivery of this weekly plan.

Delay Cause: review

enso-bot · 2023-04-13T20:23:37Z

Greg Travis reports a new STANDUP for today (2023-04-13):

Progress: split/tok review; getting a head start on text_to_table It should be finished by 2023-04-13.

Next Day: review, 5126

wdanilo added this to Issues Board Feb 6, 2023

wdanilo removed State: unstarted labels Feb 6, 2023

wdanilo added this to the Beta Release milestone Feb 6, 2023

wdanilo removed the x-deprecated-release label Feb 6, 2023

jdunkerley mentioned this issue Feb 6, 2023

Design for Regex support #4986

Closed

jdunkerley moved this to ❓New in Issues Board Feb 7, 2023

jdunkerley moved this from ❓New to 📤 Backlog in Issues Board Feb 14, 2023

jdunkerley assigned GregoryTravis Feb 14, 2023

jdunkerley modified the milestones: Beta Release, Design Partners Mar 14, 2023

GregoryTravis closed this as completed Apr 4, 2023

github-project-automation bot moved this from 📤 Backlog to 🟢 Accepted in Issues Board Apr 4, 2023

GregoryTravis reopened this Apr 4, 2023

github-project-automation bot moved this from 🟢 Accepted to ❓New in Issues Board Apr 4, 2023

GregoryTravis moved this from ❓New to 🔧 Implementation in Issues Board Apr 4, 2023

GregoryTravis moved this from 🔧 Implementation to 📤 Backlog in Issues Board Apr 4, 2023

GregoryTravis moved this from 📤 Backlog to 🔧 Implementation in Issues Board Apr 5, 2023

jdunkerley linked a pull request Apr 11, 2023 that will close this issue

Add split and tokenize to the Table. #6233

Merged

5 tasks

GregoryTravis moved this from 🔧 Implementation to 👁️ Code review in Issues Board Apr 12, 2023

mergify bot closed this as completed in #6233 Apr 14, 2023

github-project-automation bot moved this from 👁️ Code review to 🟢 Accepted in Issues Board Apr 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `split` and `tokenize` to the `Table`. #5125

Add `split` and `tokenize` to the `Table`. #5125

wdanilo commented Feb 5, 2023

enso-bot bot commented Apr 5, 2023

enso-bot bot commented Apr 6, 2023

enso-bot bot commented Apr 7, 2023

enso-bot bot commented Apr 10, 2023

enso-bot bot commented Apr 11, 2023

enso-bot bot commented Apr 11, 2023

enso-bot bot commented Apr 12, 2023

enso-bot bot commented Apr 12, 2023

enso-bot bot commented Apr 13, 2023

enso-bot bot commented Apr 13, 2023

Add split and tokenize to the Table. #5125

Add split and tokenize to the Table. #5125

Comments

wdanilo commented Feb 5, 2023

enso-bot bot commented Apr 5, 2023

enso-bot bot commented Apr 6, 2023

enso-bot bot commented Apr 7, 2023

enso-bot bot commented Apr 10, 2023

enso-bot bot commented Apr 11, 2023

enso-bot bot commented Apr 11, 2023

enso-bot bot commented Apr 12, 2023

enso-bot bot commented Apr 12, 2023

enso-bot bot commented Apr 13, 2023

enso-bot bot commented Apr 13, 2023

Add `split` and `tokenize` to the `Table`. #5125

Add `split` and `tokenize` to the `Table`. #5125