You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a tracking issue for xvc data and subcommands.
This is to add metadata and labels to files, make queries.
xvc data label KEY=VALUE <targets>: Add a label to a set of file targets. This one adds a single label. There may be multiple labels for each file at the end. Labeling may be an event like store events. There may be implicit labels, like updated=<timestamp> for the metadata we track with XvcMetadata.
xvc data attach <text-file> <targets>: Attach a file containing metadata (or labels, annotations, searchable text) to targets. If the text file is JSON, YAML, or TOML, it can be parsed as labels. It must have a definite structure though. A dictionary in any of these formats is OK.
xvc data query --select [path, name, label,...] --from <targets> --where QUERY: This lists the asked info about targets that satisfy query.
QUERY can be a complex query that satisfies AND, OR, NOT operators, (), ==, =~ (regex match) operators. For numerics, we can also add numerical operations.
--select and --from are optional in xvc data query. It's possible to write xvc data query QUERY to run a query over all data.
xvc data query --name can be used to give a name to the query and run it later as a target, with --from. e.g. xvc data query --where 'class ~= .*berry' --name berries and can be used later as xvc data query --select name --from berries.
xvc data operations can be run from data by supplying an additional --query option that runs the query. xvc data move --query berries berries/
We also need a xvc data/file export <targets> command to copy a set of files to outside of the workspace. This can be used to create directories that contain subsets of data.
We may also need move, copy, remove commands to xvc data/file to make subsets of the dataset and update its metadata.
xvc data commands will run xvc file operations after running the query. xvc file won't operate with queries or metadata, xvc data will. The difference of these commands are this.
xvc data query
--select should have some implicit columns. ['path', 'filename', 'created', 'updated', ...]. labels will show all labels.
--from should accept files, directories, and globs. It will walk select similar to xvc file list, and run the query on these elements.
--where accepts a single query. The query language can be:
jaq can be embedded as a query language. In this case, json documents can be more complex, but I'm not sure if this complexity is necessary.
A simple home made query language similar to Bash / Python or something.
labels IN [strawberry, blueberry]
created < 2020-12-12 12:12:12
changed > 2022-10-31 00:00:00
name ~= .*berry
path *= images/*/*berry.png
(labels ~= .*berry) AND (created >= 2020-12-12 00:00:00)
I think this last option may be more flexible. It can contain queries in sql or jql or some other ql with other operators.
(attached JAQ '.[][key] == value')
an MVP version can be built by field OP value and other features (parens, logical ops, etc.) can be added later.
This discussion was converted from issue #144 on January 24, 2023 06:59.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
This is a tracking issue for
xvc data
and subcommands.This is to add metadata and labels to files, make queries.
xvc data label KEY=VALUE <targets>
: Add a label to a set of file targets. This one adds a single label. There may be multiple labels for each file at the end. Labeling may be an event like store events. There may be implicit labels, likeupdated=<timestamp>
for the metadata we track withXvcMetadata
.xvc data attach <text-file> <targets>
: Attach a file containing metadata (or labels, annotations, searchable text) to targets. If the text file is JSON, YAML, or TOML, it can be parsed aslabel
s. It must have a definite structure though. A dictionary in any of these formats is OK.xvc data query --select [path, name, label,...] --from <targets> --where QUERY
: This lists the asked info about targets that satisfy query.QUERY
can be a complex query that satisfiesAND
,OR
,NOT
operators,()
,==
,=~
(regex match) operators. For numerics, we can also add numerical operations.--select
and--from
are optional inxvc data query
. It's possible to writexvc data query QUERY
to run a query over all data.xvc data query --name
can be used to give a name to the query and run it later as a target, with--from
. e.g.xvc data query --where 'class ~= .*berry' --name berries
and can be used later asxvc data query --select name --from berries
.xvc data
operations can be run from data by supplying an additional--query
option that runs the query.xvc data move --query berries berries/
We also need a
xvc data/file export <targets>
command to copy a set of files to outside of the workspace. This can be used to create directories that contain subsets of data.We may also need
move
,copy
,remove
commands toxvc data/file
to make subsets of the dataset and update its metadata.xvc data
commands will runxvc file
operations after running the query.xvc file
won't operate with queries or metadata,xvc data
will. The difference of these commands are this.xvc data query
--select
should have some implicit columns.['path', 'filename', 'created', 'updated', ...]
.labels
will show all labels.--from
should accept files, directories, and globs. It will walk select similar toxvc file list
, and run the query on these elements.--where
accepts a single query. The query language can be:labels IN [strawberry, blueberry]
created < 2020-12-12 12:12:12
changed > 2022-10-31 00:00:00
name ~= .*berry
path *= images/*/*berry.png
(labels ~= .*berry) AND (created >= 2020-12-12 00:00:00)
I think this last option may be more flexible. It can contain queries in sql or jql or some other ql with other operators.
(attached JAQ '.[][key] == value')
an MVP version can be built by
field OP value
and other features (parens, logical ops, etc.) can be added later.Beta Was this translation helpful? Give feedback.
All reactions