Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delimited lists in values #3

Open
grimblefritz opened this issue May 22, 2015 · 12 comments
Open

Delimited lists in values #3

grimblefritz opened this issue May 22, 2015 · 12 comments

Comments

@grimblefritz
Copy link

It would be helpful if genders supported delimited lists (a sort of sub-fielding) in values. I envision it this way:

# in the genders file
hostx mount=fs1:fs2:fs3

Then to find any host that mounts fs2:

nodeattr -d: -q mount?fs2

The -d: option sets the list delimiter to the colon, which I would recommend as the default.

The ? is a "list contains" operator. It is interpreted as "using : as a delimiter, is the item fs2 in the mount list?"

I would not suggest wildcards in the ? argument. I'm sure some would love it, but I personally have never seen the need.

By the way, I implement this capability now, but it requires some conventions to be imposed in the genders data, uses an external script or function, and is a multi-step process. Having the capability in-built would be a blessing!

@chu11
Copy link
Member

chu11 commented May 23, 2015

Internally, the data isn't stored in a way to make this feasible (atleast not with the user specifying an arbitrary delimeter).

But perhaps a helper function could be developed to parse based on this delimeter, creating an alternate genders instance that can then be parsed as suggested above.

@wlschwartz
Copy link

Please add these delimiters to genders. These would benefit our site. Thanks, Grimble, I like the way you think. There's one more from you...

@jthiesfeld
Copy link

This would also be useful for my site. We are doing something similar, and frequently have to run these queries as a loop with 'nodeattr -UV attr' as the input and running 'nodeattr -q attr=val'.

@elstak
Copy link

elstak commented Jun 1, 2015

Why not just support regular expressions? Perhaps it is not as "clean" as grimblefritz's proposal but it is definitely more generic and flexible.

nodeattr -q 'mount~\bfs2\b'

@grimblefritz
Copy link
Author

elstak:

I considered that, but I think it strays into territory genders isn't designed for. If too many features are in-built, then the real question becomes:

Why not use a proper database and query with SQL?

If you're willing to give up the text file data source of genders and being able to edit with vim or whatever, then put the info into sqlite. It's not much different to replicate a sqlite db than it is a text file. You could even (in xcat fashion) write an edit interface that pulls data out into a text editor for modification, and then writes it back to the db on save. It would then be relatively simple to write a wrapper into sqlite to provide command line queries with all the power of SQL. (In practice, btw, if you were using xcat then genders would be redundant - I only used it as an example of the editing methodology.)

@grimblefritz
Copy link
Author

chu11:

Personally, I wouldn't have a problem if the delimiter were fixed as being ":", but I don't see much difference if it is user-settable via the command line with ":" as the default.

Internally, wouldn't it be a matter of checking the operator type and if "?" then perform a match along the lines of "{d}list{d} contains {d}item{d}", where {d} is the delimiter string, list is the full value, and item is the match string supplied by the user? Pre- and post-fixing the {d} to the list provides a (less than perfect, but workable) way around needing a regex for word boundary matches.

@jthiesfeld
Copy link

What would be the performance impact of either using flexible delimiters or regular expressions? Based on the background at LLNL, I'd assume the primary focus on genders was minimal overhead and run time.

Rather than having delimiters in the value field, would it make more sense to allow the same attribute to have a value defined twice. If a value is defined a second time, then it is changed from a string to an array. The elements in the array would be each value assigned to that attribute. Then when you are doing pattern matching, if you hit an array, you search within the array rather than only the single element. The risk you get is that today, having a config such as the one below is invalid, and it may be used to enforce sane configurations.

host1 location=Birmingham,function=nfs,function=dns
...
host1 location=Atlanta

Logically, it wouldn't make sense for a system to have a location of both Birmingham and Atlanta; therefore, someone adding a second entry was probably a mistake. However, it may make sense that a system would provide both nfs and dns.

Going this route, I don't think any of the query syntax would need to change.

Here are two example queries. The first one would nfs is in the list of values for attribute function and return true. The second one would look through the list of values twice and again return true.
nodeattr -Q 'function=nfs'
nodeattr -Q 'function=nfs&&function=dns'

@grimblefritz
Copy link
Author

I think the current handling of the first scenario - where location cannot be redefined in another record - is valid. I believe it also precludes the definition of a name more than once in a single record. The rule is probably something like "a name cannot be defined more than once" and it doesn't matter if that's in the same record or multiple records. Just guessing on that.

As to effect, I think it's an exchange of either string scanning or regex parsing, for multi-pass scanning using a modification of the current logic. (I'm not sure an array is needed.) Allowing multiple name definitions would change the query behavior from "name equals value" to "any instance of name equals value". That requires re-coding, probably about the same as a "value in list" approach.

I'm sure it would work if multiple instances of a name were allowed - I think it simply takes a different route to get to the same destination. And it requires a change in the database structure rules.

That being said.

The query "function?nfs&&function?dns" isn't much different from "function=nfs&&function=dns", exchanging only ? and = characters. I don't think there's a meaningful difference there in terms of syntax or results.

In the database you would have "function=nfs:dns" versus "function=nfs,function=dns". The former imposes no changes on database rules, but the latter requires a change to the name duplication exclusion.

As both would require a change in the query processing, but the list approach does not require a change in the database format or restrictions, I think the list approach might be simpler overall. (And I personally prefer the structure of the list form.)

@jthiesfeld
Copy link

I guess one other consideration we have at our site is that we use the colon to delineate fields which should not be interpreted by genders.

example:
host1 network:eth1:eth2:mode4
We use this format to determine the primary interface for doing a pxe install and then create a mode 4 bond between interfaces eth1 and eth2.

I would hope that enabling the colon as a delimiter wouldn't have any impact on current functions.

@grimblefritz
Copy link
Author

I would expect that, unless the name?value form is used, the delimiter would be ignored. Certainly, in returning the value assigned to a name the delimiter should not be a factor.

Also, if the delimiter can be user-specified, then you could elect to use another delimiter such as the ; or | or @ characters. That would be a site-specific design choice. You'd know in the database that "network:eth1:eth2:mode4" is a compound name and "network=eth1|eth2|mode4" is a value list.

@chu11
Copy link
Member

chu11 commented Jun 1, 2015

Multiple responses to the above:

Due to backwards compatibility, I think that having a default delimiter is out of the question. All the sensible default delimiters (colon, semicolon, etc.) are already widely used.

I originally thought a new function, that takes a genders database and a delimiter, would be wise, something like:

genders_handle = genders_parse(myfile);
genders_delimiter_handle = genders_parse_delimeter(genders_handle, ":");
genders_query(genders_delimeter_handle, "mount=fs2");

But then I realized this breaks genders semantics because a host can't have the same attribute twice.

(The above would be super nice, all other genders functions work w/ the delimiter, don't have to break architecture, etc. oh well )

Re-reading the original poster's comment, was the vision that this would only be supported with the hypothetical "?" query?

@grimblefritz
Copy link
Author

Correct. The effect of a delimiter would only be in conjunction with the "?" query. There should be no change to the structure or handling of the database.

With support for a user-specified delimiter character (the proposed -d, or whatever, command line option) any existing conventions such as jthiesfeld describes should be easily accounted for.

From the use cases I've seen, the predominate mode would be to select hosts with a specified item in a list. For example, given "hostx attrib=a:b:c" in the database, they would query for hosts where "attrib?b" (attrib contains an element "b"). If someone wants to query for "attrib=a:b:c" they can (no change in behavior), or if they want to return the value of "attrib" they'd get "a:b:c" (no change in behavior.)

I don't see a need to return individual list elements. It would be pointless to return them by name as you'd have to know the name to begin with. It might be someone would want the Nth element in the list, but that's really stretching for a justification I think. There might be fringe cases like that, but those could be parsed separately in the calling program.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants