regexp-cut

Uses awk to provide cut like syntax for field extraction. The command name is rcut.

⚠️ ⚠️ Work under construction!

Motivation

cut's syntax is handy for many field extraction problems. But it doesn't allow multi-character or regexp delimiters. So, this project aims to provide cut like syntax for those cases. Currently uses mawk in a bash script.

ℹ️ Note that rcut isn't feature compatible or a replacement for the cut command. rcut helps when you need features like regexp field separator.

Features

Default field separation is same as awk
Both input (-d) and output (-o) field separators can be multiple characters
Input field separator can use regular expressions
- this script uses mawk by default
- you can change it to gawk for better regexp support with -g option
If input field separator is a single character, output field separator will also be this same character
Fixed string input field separator can be enabled by using the -F option
- if -o is not used, value passed to the -d option will be set as the output field separator
Field range can be specified by using - separator (same as cut)
- - by itself means all the fields (this is also the default if -f option isn't used at all)
- if start of the range isn't given, default is 1
- if end of the range isn't given, default is last field of a line
Negative indexing is allowed if you use -n option
- -1 means the last field, -2 means the second-last field and so on
- you'll have to use : to specify field ranges
Multiple fields and ranges can be separated using , character (same as cut)
Unlike cut, order matters with the -f option and field/range duplication is also allowed
- this assumes -c (complement) is not active
Using -c option will print all the fields in the same order as input except the fields specified by -f option
Using -s option will suppress lines not matching the input field separator
Minimum field number is forced to be 1
Maximum field number is forced to be last field of a line

⚠️ ⚠️ Work under construction!

Examples

$ cat spaces.txt
   1 2	3  
x y z
 i          j 		k	

# by default, it uses awk's space/tab field separation and trimming
# unlike cut, order matters
$ rcut -f3,1 spaces.txt
3 1
z x
k i

# multi-character delimiter
$ echo 'apple:-:fig:-:guava' | rcut -d:-: -f2
fig

# regexp delimiter
$ echo 'Sample123string42with777numbers' | rcut -d'[0-9]+' -f1,4
Sample numbers

# fixed string delimiter
$ echo '123)(%)*#^&(*@#.[](\\){1}\xyz' | rcut -Fd')(%)*#^&(*@#.[](\\){1}\' -f1,2 -o,
123,xyz

# multiple ranges can be specified, order matters
$ printf '1 2 3 4 5\na b c d e\n' | rcut -f2-3,5,1,2-4
2 3 5 1 2 3 4
b c e a b c d

# last field
$ printf 'apple ball cat\n1 2 3 4 5' | rcut -nf-1
cat
5

# except last two fields
$ printf 'apple ball cat\n1 2 3 4 5' | rcut -cnf-2:
apple
1 2 3

# suppress lines without input field delimiter
$ printf '1,2,3,4\nhello\na,b,c\n' | rcut -sd, -f2
2
b

# -g option will switch to gawk
$ echo '1aa2aa3' | rcut -gd'a{2}' -f2
2

See Examples.md for many more examples.

Tests

You can use script.awk to check if all the example code snippets are working as expected.

$ cd examples/
$ awk -f script.awk Examples.md

TODO

Step value other than 1 for field range
What to do if start of the range is greater than end?
And possibly more...

Similar tools

hck — close to drop in replacement for cut that can use a regex delimiter, works on compressed files, etc
choose — negative indexing, regexp based delimiters, etc

Contributing

Please open an issue for typos/bugs/suggestions/etc
Even for pull requests, open an issue for discussion before submitting PRs
In case you need to reach me, mail me at echo 'bGVhcm5ieWV4YW1wbGUubmV0QGdtYWlsLmNvbQo=' | base64 --decode or send a DM via twitter

License

This project is licensed under MIT, see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
examples		examples
LICENSE		LICENSE
README.md		README.md
rcut		rcut

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

regexp-cut

Motivation

Features

Examples

Tests

TODO

Similar tools

Contributing

License

About

Languages

License

learnbyexample/regexp-cut

Folders and files

Latest commit

History

Repository files navigation

regexp-cut

Motivation

Features

Examples

Tests

TODO

Similar tools

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages