Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add empty-key check to mlr check #1330

Merged
merged 3 commits into from
Jun 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 12 additions & 9 deletions docs/src/manpage.md
Original file line number Diff line number Diff line change
Expand Up @@ -936,8 +936,11 @@ MILLER(1) MILLER(1)

1mcheck0m
Usage: mlr check [options]
Consumes records without printing any output.
Consumes records without printing any output,
Useful for doing a well-formatted check on input data.
with the exception that warnings are printed to stderr.
Current checks are:
* If any key is the empty string
Options:
-h|--help Show this message.

Expand Down Expand Up @@ -1212,13 +1215,13 @@ MILLER(1) MILLER(1)
Note that "mlr filter" is more powerful, but requires you to know field names.
By contrast, "mlr grep" allows you to regex-match the entire record. It does this
by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using
command-line-specified ORS/OFS/OPS, and matching the resulting line against the
regex specified here. In particular, the regex is not applied to the input
stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the
regex will be matched, not against either of these lines, but against the DKVP
line "x=1,y=2,z=3". Furthermore, not all the options to system grep are
supported, and this command is intended to be merely a keystroke-saver. To get
all the features of system grep, you can do
OFS "," and OPS "=", and matching the resulting line against the regex specified
here. In particular, the regex is not applied to the input stream: if you have
CSV with header line "x,y,z" and data line "1,2,3" then the regex will be
matched, not against either of these lines, but against the DKVP line
"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
and this command is intended to be merely a keystroke-saver. To get all the
features of system grep, you can do
"mlr --odkvp ... | grep ... | mlr --idkvp ..."

1mgroup-by0m
Expand Down Expand Up @@ -3430,5 +3433,5 @@ MILLER(1) MILLER(1)



2023-06-24 MILLER(1)
2023-06-25 MILLER(1)
</pre>
21 changes: 12 additions & 9 deletions docs/src/manpage.txt
Original file line number Diff line number Diff line change
Expand Up @@ -915,8 +915,11 @@ MILLER(1) MILLER(1)

1mcheck0m
Usage: mlr check [options]
Consumes records without printing any output.
Consumes records without printing any output,
Useful for doing a well-formatted check on input data.
with the exception that warnings are printed to stderr.
Current checks are:
* If any key is the empty string
Options:
-h|--help Show this message.

Expand Down Expand Up @@ -1191,13 +1194,13 @@ MILLER(1) MILLER(1)
Note that "mlr filter" is more powerful, but requires you to know field names.
By contrast, "mlr grep" allows you to regex-match the entire record. It does this
by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using
command-line-specified ORS/OFS/OPS, and matching the resulting line against the
regex specified here. In particular, the regex is not applied to the input
stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the
regex will be matched, not against either of these lines, but against the DKVP
line "x=1,y=2,z=3". Furthermore, not all the options to system grep are
supported, and this command is intended to be merely a keystroke-saver. To get
all the features of system grep, you can do
OFS "," and OPS "=", and matching the resulting line against the regex specified
here. In particular, the regex is not applied to the input stream: if you have
CSV with header line "x,y,z" and data line "1,2,3" then the regex will be
matched, not against either of these lines, but against the DKVP line
"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
and this command is intended to be merely a keystroke-saver. To get all the
features of system grep, you can do
"mlr --odkvp ... | grep ... | mlr --idkvp ..."

1mgroup-by0m
Expand Down Expand Up @@ -3409,4 +3412,4 @@ MILLER(1) MILLER(1)



2023-06-24 MILLER(1)
2023-06-25 MILLER(1)
19 changes: 11 additions & 8 deletions docs/src/reference-verbs.md
Original file line number Diff line number Diff line change
Expand Up @@ -376,8 +376,11 @@ n a b i x y
</pre>
<pre class="pre-non-highlight-in-pair">
Usage: mlr check [options]
Consumes records without printing any output.
Consumes records without printing any output,
Useful for doing a well-formatted check on input data.
with the exception that warnings are printed to stderr.
Current checks are:
* If any key is the empty string
Options:
-h|--help Show this message.
</pre>
Expand Down Expand Up @@ -1355,13 +1358,13 @@ Options:
Note that "mlr filter" is more powerful, but requires you to know field names.
By contrast, "mlr grep" allows you to regex-match the entire record. It does this
by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using
command-line-specified ORS/OFS/OPS, and matching the resulting line against the
regex specified here. In particular, the regex is not applied to the input
stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the
regex will be matched, not against either of these lines, but against the DKVP
line "x=1,y=2,z=3". Furthermore, not all the options to system grep are
supported, and this command is intended to be merely a keystroke-saver. To get
all the features of system grep, you can do
OFS "," and OPS "=", and matching the resulting line against the regex specified
here. In particular, the regex is not applied to the input stream: if you have
CSV with header line "x,y,z" and data line "1,2,3" then the regex will be
matched, not against either of these lines, but against the DKVP line
"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
and this command is intended to be merely a keystroke-saver. To get all the
features of system grep, you can do
"mlr --odkvp ... | grep ... | mlr --idkvp ..."
</pre>

Expand Down
33 changes: 30 additions & 3 deletions internal/pkg/transformers/check.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,12 @@ func transformerCheckUsage(
o *os.File,
) {
fmt.Fprintf(o, "Usage: %s %s [options]\n", "mlr", verbNameCheck)
fmt.Fprintf(o, "Consumes records without printing any output.\n")
fmt.Fprintf(o, "Consumes records without printing any output,\n")
fmt.Fprintf(o, "Useful for doing a well-formatted check on input data.\n")
fmt.Fprintf(o, "with the exception that warnings are printed to stderr.\n")
fmt.Fprintf(o, "Current checks are:\n")
fmt.Fprintf(o, "* Data are parseable\n")
fmt.Fprintf(o, "* If any key is the empty string\n")
fmt.Fprintf(o, "Options:\n")
fmt.Fprintf(o, "-h|--help Show this message.\n")
}
Expand Down Expand Up @@ -79,10 +83,13 @@ func transformerCheckParseCLI(
// ----------------------------------------------------------------
type TransformerCheck struct {
// stateless
messagedReEmptyKey map[string]bool
}

func NewTransformerCheck() (*TransformerCheck, error) {
return &TransformerCheck{}, nil
return &TransformerCheck{
messagedReEmptyKey: make(map[string]bool),
}, nil
}

func (tr *TransformerCheck) Transform(
Expand All @@ -92,7 +99,27 @@ func (tr *TransformerCheck) Transform(
outputDownstreamDoneChannel chan<- bool,
) {
HandleDefaultDownstreamDone(inputDownstreamDoneChannel, outputDownstreamDoneChannel)
if inrecAndContext.EndOfStream {
if !inrecAndContext.EndOfStream {
inrec := inrecAndContext.Record
for pe := inrec.Head; pe != nil; pe = pe.Next {
if pe.Key == "" {
context := inrecAndContext.Context

// Most Miller users are CSV users. And for CSV this will be an error on
// *every* record, or none -- so let's not print this multiple times.
if tr.messagedReEmptyKey[context.FILENAME] {
continue
}

message := fmt.Sprintf(
"mlr: warning: empty-string key at filename %s record number %d",
context.FILENAME, context.NR,
)
fmt.Fprintln(os.Stderr, message)
tr.messagedReEmptyKey[context.FILENAME] = true
}
}
} else {
outputRecordsAndContexts.PushBack(inrecAndContext)
}
}
21 changes: 12 additions & 9 deletions man/manpage.txt
Original file line number Diff line number Diff line change
Expand Up @@ -915,8 +915,11 @@ MILLER(1) MILLER(1)

1mcheck0m
Usage: mlr check [options]
Consumes records without printing any output.
Consumes records without printing any output,
Useful for doing a well-formatted check on input data.
with the exception that warnings are printed to stderr.
Current checks are:
* If any key is the empty string
Options:
-h|--help Show this message.

Expand Down Expand Up @@ -1191,13 +1194,13 @@ MILLER(1) MILLER(1)
Note that "mlr filter" is more powerful, but requires you to know field names.
By contrast, "mlr grep" allows you to regex-match the entire record. It does this
by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using
command-line-specified ORS/OFS/OPS, and matching the resulting line against the
regex specified here. In particular, the regex is not applied to the input
stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the
regex will be matched, not against either of these lines, but against the DKVP
line "x=1,y=2,z=3". Furthermore, not all the options to system grep are
supported, and this command is intended to be merely a keystroke-saver. To get
all the features of system grep, you can do
OFS "," and OPS "=", and matching the resulting line against the regex specified
here. In particular, the regex is not applied to the input stream: if you have
CSV with header line "x,y,z" and data line "1,2,3" then the regex will be
matched, not against either of these lines, but against the DKVP line
"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
and this command is intended to be merely a keystroke-saver. To get all the
features of system grep, you can do
"mlr --odkvp ... | grep ... | mlr --idkvp ..."

1mgroup-by0m
Expand Down Expand Up @@ -3409,4 +3412,4 @@ MILLER(1) MILLER(1)



2023-06-24 MILLER(1)
2023-06-25 MILLER(1)
23 changes: 13 additions & 10 deletions man/mlr.1
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@
.\" Title: mlr
.\" Author: [see the "AUTHOR" section]
.\" Generator: ./mkman.rb
.\" Date: 2023-06-24
.\" Date: 2023-06-25
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "MILLER" "1" "2023-06-24" "\ \&" "\ \&"
.TH "MILLER" "1" "2023-06-25" "\ \&" "\ \&"
.\" -----------------------------------------------------------------
.\" * Portability definitions
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -1122,8 +1122,11 @@ Options:
.\}
.nf
Usage: mlr check [options]
Consumes records without printing any output.
Consumes records without printing any output,
Useful for doing a well-formatted check on input data.
with the exception that warnings are printed to stderr.
Current checks are:
* If any key is the empty string
Options:
-h|--help Show this message.
.fi
Expand Down Expand Up @@ -1482,13 +1485,13 @@ Options:
Note that "mlr filter" is more powerful, but requires you to know field names.
By contrast, "mlr grep" allows you to regex-match the entire record. It does this
by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using
command-line-specified ORS/OFS/OPS, and matching the resulting line against the
regex specified here. In particular, the regex is not applied to the input
stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the
regex will be matched, not against either of these lines, but against the DKVP
line "x=1,y=2,z=3". Furthermore, not all the options to system grep are
supported, and this command is intended to be merely a keystroke-saver. To get
all the features of system grep, you can do
OFS "," and OPS "=", and matching the resulting line against the regex specified
here. In particular, the regex is not applied to the input stream: if you have
CSV with header line "x,y,z" and data line "1,2,3" then the regex will be
matched, not against either of these lines, but against the DKVP line
"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
and this command is intended to be merely a keystroke-saver. To get all the
features of system grep, you can do
"mlr --odkvp ... | grep ... | mlr --idkvp ..."
.fi
.if n \{\
Expand Down
6 changes: 5 additions & 1 deletion test/cases/cli-help/0001/expout
Original file line number Diff line number Diff line change
Expand Up @@ -63,8 +63,12 @@ Options:
================================================================
check
Usage: mlr check [options]
Consumes records without printing any output.
Consumes records without printing any output,
Useful for doing a well-formatted check on input data.
with the exception that warnings are printed to stderr.
Current checks are:
* Data are parseable
* If any key is the empty string
Options:
-h|--help Show this message.

Expand Down
1 change: 1 addition & 0 deletions test/cases/verb-check/0001/cmd
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
mlr --csv check ${CASEDIR}/input.csv
Empty file.
Empty file.
3 changes: 3 additions & 0 deletions test/cases/verb-check/0001/input.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
a,b,c
1,2,3
4,5,6
1 change: 1 addition & 0 deletions test/cases/verb-check/0002/cmd
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
mlr --csv check ${CASEDIR}/input.csv
1 change: 1 addition & 0 deletions test/cases/verb-check/0002/experr
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
mlr: warning: empty-string key at filename test/cases/verb-check/0002/input.csv record number 1
Empty file.
3 changes: 3 additions & 0 deletions test/cases/verb-check/0002/input.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
a,,c
1,2,3
4,5,6