Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lens for reading termcap-style databases #274

Merged
merged 1 commit into from
Oct 5, 2017

Conversation

bodgit
Copy link
Contributor

@bodgit bodgit commented Jul 31, 2015

This is my stab at fixing #271 which seems to parse everything I've thrown at it including a ~900 KB termcap file. I can also add new records to files. However unless I disable type checking I get:

Syntax error in lens definition
./getcap.aug:34.2-.62:Failed to compile record
./getcap.aug:34.43-.54:exception: ambiguous iteration
      Iterated regexp: /(([^:\n]|\\\\:)+|[^:\n]*(\\"[^\n]+\\"[^:\n]*)+)(:|:[ \t]*\\\\\n[ \t]*:)/
      '\\:\\:\\::' can be split into
      '\\:|=|\\:\\::'

     and
      '\\:\\:\\::|=|'

    Iterated lens: ./getcap.aug:33.19-.61:

Any advice on cleaning that up would be appreciated.

I've tried writing some basic tests however I'm getting stuck where there should be a literal \ in the output. If I try \-escaping it then I get something like the following:

Syntax error in lens definition
tests/test_getcap.aug:102:17: Unexpected character #
tests/test_getcap.aug:101.17-102.8:syntax error, unexpected LIDENT, expecting '}'
tests/test_getcap.aug:syntax error

Any ideas?

@bodgit
Copy link
Contributor Author

bodgit commented Jul 31, 2015

One annoyance I have is that new records will be added without any whitespace like so:

foo:bar:baz:

which is fine for small records, but long ones will wrap and look a bit meh so I made the following change to my lens:

diff --git a/lenses/getcap.aug b/lenses/getcap.aug
index bd002de..c38893a 100644
--- a/lenses/getcap.aug
+++ b/lenses/getcap.aug
@@ -28,10 +28,9 @@ module Getcap =
   (* field must not contain ':' unless quoted or '\'-escaped  *)
   let field      = /([^:\n]|\\\\:)+|[^:\n]*(\"[^\n]+\"[^:\n]*)+/

-  let sep        = del /:|:[ \t]*\\\\\n[ \t]*:/ ":"
-  let name       = store field . sep
-  let capability = [ label "capability" . store field . sep ]
-  let record     = [ seq "record" . name . capability+ . eol ]
+  let sep        = del /:([ \t]*\\\\\n[ \t]*:)?/ ":\\\n\t:"
+  let capability = [ label "capability" . store field ]
+  let record     = [ seq "record" . store field . sep . capability . ( sep . capability )* . Sep.colon . eol ]

   let lns = ( empty | comment | record )*

which should in theory should mean new records are added like so:

foo:\
        :bar:\
        :baz:

however what gets written to the file with augtool is:

foo:
        :bar:
        :baz:

The \'s aren't written. This and the problem with my tests suggests something weird going on with augeas and literal \ characters.

"

test Getcap.lns get getcap =
{ "1" = "example|an example of binding multiple values to names"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't match what your lens is doing.

@raphink
Copy link
Member

raphink commented Aug 3, 2015

I've added a CSV lens this morning, which is very similar to what you're trying to achieve.

In fact, CSV.lns_generic ":" parses your examples (although not the way you want them parsed).

You might want to check it out.

@bodgit
Copy link
Contributor Author

bodgit commented Aug 3, 2015

I realised the structure I was parsing records into didn't allow me easily to write idempotent changes for use with Puppet so I've changed how records are parsed.

I still have a problem with my tests in that I can't give some sample test output containing literal \ characters and then test that they are present in the parsed structures.

I will take a look at the CVS lens and see how that works.

@raphink
Copy link
Member

raphink commented Dec 22, 2015

Any news on this lens @bodgit ?

@zachfi
Copy link

zachfi commented Oct 5, 2016

Oooh yes, this would be quite useful.

@bodgit
Copy link
Contributor Author

bodgit commented Sep 29, 2017

Still not working 100% but now we know there are some bugs with backslashes in general it makes sense to get those ironed out first.

@raphink
Copy link
Member

raphink commented Oct 2, 2017

@bodgit typechecking doesn't pass yet though

@lutter
Copy link
Member

lutter commented Oct 2, 2017

@bodgit nice work .. one minor nit: I now consider seq a design mistake since it just confuses people when they try to add to the tree. Would you object to changing seq "record" to label "record" ? If you don't have time to make that change, I'd be more than happy to do that after merging this PR.

@bodgit
Copy link
Contributor Author

bodgit commented Oct 3, 2017

Ironically I think it was label "record" originally before I noticed some other lenses use seq and figured it might make things more usable. I don't mind either way, I will change it back to use label again and update the tests to match.

@bodgit
Copy link
Contributor Author

bodgit commented Oct 3, 2017

Other than that, the only thing to point out is that I've dropped reading the actual termcap file. Two reasons:

  1. It doesn't seem to follow its own conventions w.r.t. escape sequences. AFAICT capabilities should not contain literal : characters anywhere and should either be replaced with \c/\C or its octal value \072, I found lots of entries that try to use \:. ^ and \ should also be escaped as \^ and \\ and often they're not. This is partly due to ^X being used as an escape sequence to represent control-X (FSVO X). If I make the lens follow the escape sequence rules then it doesn't parse termcap, if I make it aware of the bugs then the lens becomes mostly impossible to pass type checking as there's no consistency.
  2. rtadvd.conf despite using the same file format doesn't actually use the same library routines for reading the file, hence it contains bare IPv6 addresses, complete with literal : characters. Trying to parse that with getcap will split addr="2001:dead::beef" into addr="2001, dead, ``, and beef" which isn't terribly useful from an Augeas users' PoV.

In theory I should just be able to use a simple regex such as /[^:]*/ but in order to cater for the most useful files I've used something similar to /("[^"]*"|[^:"]*)/ so it matches either a quoted string containing anything, or a string containing neither colons nor quotes which seems to work for anything except the termcap file.

@raphink
Copy link
Member

raphink commented Oct 3, 2017

@lutter seq is still useful so long as #244 is not merged, or am I missing something?

@bodgit
Copy link
Contributor Author

bodgit commented Oct 3, 2017

Actually, thinking about this, I can split this into two lenses with one lens written mostly in terms of the other with just a different regexp for matching the capabilities. Bear with me 😄

@lutter
Copy link
Member

lutter commented Oct 3, 2017

@bodgit sounds good .. awaiting the refactored, much expanded lens anxiously ;) Looks like you've stumbled on one of the great things about config files: the discrepancy of the documented format and the format tools actually understand; in a lot of cases, I've had to read the parsing code of whatever consumes a file to discover that. Seems termcap is no different in that regard.

@raphink yes, #244 would still be useful, but I'd like to get a little more feedback on syntax etc. So if you have thoughts, would love to discuss more on the PR

@bodgit
Copy link
Contributor Author

bodgit commented Oct 5, 2017

The termcap lens still doesn't work on an OpenBSD /etc/termcap but I think that's a bug with their tic which generates it; it doesn't escape bare ^ characters whereas if I use tic on a RHEL7 host then it generates a file with \136 instead which the lens parses fine. I will try and raise that with them at some point.

@lutter lutter merged commit d1b6351 into hercules-team:master Oct 5, 2017
@lutter
Copy link
Member

lutter commented Oct 5, 2017

Awesome ! I am merging this for now; when you resolve the tic issues, we can always amend/improve the lens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants