Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault write can overwrite and accidentally lose data #182

Closed
danielcompton opened this issue May 10, 2015 · 21 comments
Closed

Vault write can overwrite and accidentally lose data #182

danielcompton opened this issue May 10, 2015 · 21 comments

Comments

@danielcompton
Copy link

It's quite likely I'm misunderstanding something, but I found this behaviour surprising.

$ vault write secret/hello value=world
Success! Data written to: secret/hello

$ vault write secret/hello user=john
Success! Data written to: secret/hello

$ vault read secret/hello
Key             Value
lease_id        secret/hello/75e7df09-5487-89b6-8bca-ee654a15e6bd
lease_duration  2592000
user            john

I would expect that writing two key-value pairs with different keys to the same record would keep both of them, not erase the first one. Is this intended behaviour, and if so should it be documented more clearly?

@danielcompton danielcompton changed the title Vault write will overwrite and lose data quite easily Vault write will overwrite and accidentally lose data quite easily May 11, 2015
@danielcompton danielcompton changed the title Vault write will overwrite and accidentally lose data quite easily Vault write can overwrite and accidentally lose data May 11, 2015
@armon armon closed this as completed in 2d9b12b May 11, 2015
@armon
Copy link
Member

armon commented May 11, 2015

Thanks, I've updated the documentation! This is working as intended.

@emilsoman
Copy link

These are the steps that I follow now to update secrets:

  1. vault read --format=json secret/xyz
  2. Copy paste value of "data" from above output to secrets.json
  3. Add new secret to secrets.json
  4. vault write secret/xyz @secrets.json

But sometimes, I forget to read the secrets (steps 1 and 2) before writing new secrets. I am to blame for not remembering the docs, but after a lot of context switches between devops and app development, it's easy to forget these steps.

@armon I'm curious to know why the default behavior is to delete existing key-value pairs.

@jefferai
Copy link
Member

@emilsoman Each key -- a unique URI -- is associated with a value. When you write the value specified by a key, you are writing the full value as a PUT, following CRUD semantics which is the general model for Vault's operations.

The best approach if you find yourself forgetting to save existing data would be to spread that data out to different keys, with each key uniquely identifying one piece of data. For instance, you could split the keys in your JSON object to each map to a different key (URI) in Vault.

@danielcompton
Copy link
Author

The subtlety here is that you are storing a mapping of key to key/value pairs, in effect a nested map. The action you take feels more like upserting one of the key/value pairs, not overwriting the whole thing.

@emilsoman
Copy link

I agree with @danielcompton .

Take this following example of setting a secret :

# App name : 'my-app'
# Secrets {DB_PASSWORD: password}

vault write secret/my-app/DB_PASSWORD value=password

Since Vault requires secrets to be stored as key-value pairs, I have to use the above command. Here the name "value" in value=passworddoesn't serve any purpose. It could be called key, dummy, etc - it's just a placeholder for a key because Vault requires it. I kept doing this initially until I thought maybe this is not the idiomatic way of storing secrets because when multiple users are storing secrets, we have to standardize on the name of this key, which made me look at better approaches.

Since Vault allows me to read a single secret with the command vault read --field=DB_PASSWORD secret/my-app , I thought maybe that's the way to do it. So I converted all my secrets to a single hash nested under secret/my-app. But now, even though Vault treats secret/my-app as a hash and allows me to read secrets one by one using --field option, it doesn't allow me to write secrets one after the other which is causing the confusion. I was expecting an API similar to Heroku configs

IMO it would avoid this confusion if Vault just treated secrets as strings instead of requiring them to be key-value pairs.

Which means:

vault write secret/my-app/DB_PASSWORD password
vault read secret/my-app/DB_PASSWORD
#instead of
vault write secret/my-app/DB_PASSWORD value=password
vault read --field=value secret/my-app/DB_PASSWORD

# For folks who still want to store JSON:
vault write secret/my-app  '{"DB_PASSWORD":"itsasecret"}'
vault read --field=DB_PASSWORD secret/my-app

Is there a reason why secrets have to be key value pairs ?

@jefferai
Copy link
Member

The subtlety here is that you are storing a mapping of key to key/value pairs, in effect a nested map.

This is not accurate, in the sense that it is not a mapping of strings to arbitrary types like JSON. The key and value are both strings; we are storing a mapping of a string to a string. You can decide that you want that string to be JSON or XML or base64-encoded binary data or anything else that you can serialize and escape, but that's up to the user to both perform and to remember.

The -field option is not related to how items are stored or read in any backend. It is only a control on the CLI output and meant to allow people that want to use the CLI rather than the HTTP API to pipe values into other programs.

@CMCDragonkai
Copy link

If vault write corresponds to a PUT method, perhaps we can have a vault patch to correspond to a PATCH method for partial updating of a specific value?

@vishalnayak
Copy link
Member

@CMCDragonkai, more information on why this is not done yet can be found on #1468

@sidd-kulk
Copy link

Is it possible to retrieve a deleted secret? Is there a Recycle Bin for secrets?

@chrishoffman
Copy link
Contributor

@sidcool1234 There is currently no way to retrieve a deleted secret without doing a restore from backup of the underlying data store. Keep track of issue #1364 for updates here.

@sidd-kulk
Copy link

sidd-kulk commented Feb 5, 2018 via email

@compendius
Copy link

Hi,

This is particularly confusing in the UI (new in OSS v 0.10) -

You can create a new 'secret', then you are asked to add a 'key' and 'value' to the secret.
You can then append the secret with multiple keys and values, clickity-click, easy and intuitive for users but massively misleading -
Unless you found this thread you may presume that the 'keys' within the secret could be programmatically read individually with CRUD semantics, when really the 'keys' within a secret are not keys at all. In actual fact the secret name is a genuine key and the 'key/value' within the secret is just a couple of mapped strings.

Perhaps the naming convention, or the UI or both need to be changed to make this clear?

@jantman
Copy link
Contributor

jantman commented Apr 22, 2018

I know this is apparently closed, but this behavior is by far the single aspect of Vault that we've received the most complaints and user confusion from. I'd say that probably every person who we've onboarded to Vault - including all of our Vault support team - has suffered from the pretty abysmal UX of having to read a secret out to JSON on disk, edit it, and then re-write back to Vault to add another field.

I'm aware of how secrets are stored in the backend... but the bottom line is that many/most users seem to be using secret backend keys like they're nested K/V stores, and the UX of the vault read / vault write commands seems to encourage this...

@jefferai
Copy link
Member

jefferai commented Apr 22, 2018

@jantman what exactly would your suggestion be to fix this? The naive approach is a PATCH API as in #1468, which has been asked for in the past, but one of the many problems with a PATCH proposal is the fact that Vault K/V endpoints are not a map of strings to strings, they are URLs to blobs of JSON data:

$ curl -X POST -H "X-Vault-Token: root" http://127.0.0.1:8200/v1/kv/foo -d '{"key1": 1, "key2": null, "key3": ["list"], "key4": [1, 2], "key5": {"subkey1": 1}}'
$ curl -s -X GET -H "X-Vault-Token: root" http://127.0.0.1:8200/v1/kv/foo | jq .
{
  "request_id": "c06322bd-e7ee-0269-cbb2-ec710396c936",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 2764800,
  "data": {
    "key1": 1,
    "key2": null,
    "key3": [
      "list"
    ],
    "key4": [
      1,
      2
    ],
    "key5": {
      "subkey1": 1
    }
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

A vault patch command makes it super easy to overwrite one value with a totally different type becaues the CLI formats all arguments as strings. If the value you get back on read is JSON, it's much easier to understand what the format of the new value should be and ensure you're passing it appropriately.

Another issue with PATCH is that there is no locking on values and no guarantee of client request ordering. Suppose you have:

{"key1": "foo", "key2": "bar", "key3": "zip", "key4": "zap"}

If one client tries to update key1 to foobar, and another client tries to update key1 to barfoo and key3 to zipzap you have two possible outcomes:

{"key1": "foobar", "key2": "bar", "key3": "zipzap", "key4": "zap"}

or

{"key1": "barfoo", "key2": "bar", "key3": "zipzap", "key4": "zap"}

Which of these is correct? The latter is consistent with the second client's wishes but has now lost the first client's update. The former is consistent with first client's wishes -- maybe, depending on if they had specifically intended key3 to stay the same or not -- but is not consistent with the second client's wishes.

Requiring a full update to the key means that the resulting value is at least consistent in the view of the client writing it. The new KVv2 that allows both versioning and check-and-set operations means that if two clients do write different values, it is easy to examine the versions and see what has changed, and check-and-set can ensure that any given value a client writes is changing the underlying version they thought.

I realize that this is annoying for your users, but in the security community data consistency is often (I think usually, but I have nothing to back that up) considered to be an aspect of security, because of the business risks associated with inconsistent data. This is no different than other aspects of data management, such as disaster recovery and backups, because losing data is also a risk to the business.

I don't really follow what you're saying about nested K/V stores or the UX encouraging it, but hopefully the above explains things well enough.

(Edit: fixed some JSON .)

@jantman
Copy link
Contributor

jantman commented Apr 22, 2018

Jeff,

I agree that PATCH would be difficult/problematic. I apologize, I always try to offer suggestions when I have one, but I'm not sure I do for this. All I really have is the behavior and confusion I've seen among our users.

What I meant about the UX is something that I've seen with nearly every person we've onboarded to Vault:

  1. Human user executes:
$ vault read secret/test
Key                 Value
---                 -----
refresh_interval    768h
password            bar
url                 http://example.com
username            foo
  1. Human user knows they can do things like vault read -field=password secret/test
  2. Human user wants to add another field to secret/foo, say, "api_token".
  3. Human user executes vault write secret/test api_token=baz because, at least in the eyes of a majority of our users, that feels like the logical UX. They're presented with something that - at the CLI at least - looks like a nested dictionary/mapping/hash with string keys and values, and they want to write a new key/value under that path.

I think part of this has to do with the mixed nomenclature of "key" (which I know has been discussed before) and the fact that there are (were?) places that referred to a path within the "key/value" backend as a "key". The other part is that your reply regarding the actual storage schema makes perfect sense to people who have worked with Vault's API and understand at least some amount of the internals of the KV backend, but doesn't seem to be terribly intuitive for people who have only interacted with vault via vault read and vault write... the UX of those commands, even the way arguments are taken in and the data is presented, seem to lead users to the conclusion that each path within KV is a mapping of string keys to string values, which can be read individually, and therefore it seems initially logical that they can also be written individually.

Regarding data types: in my experience most of our users who use the vault CLI, and have only used the CLI, probably don't even know that a path is arbitrary JSON. The UX of vault read and vault write present it as a mapping of string keys to string values, and that's how most people seem to be using it.

Maybe this isn't an API issue, but just a CLI issue? After running into this problem a few times myself, I have a trivial helper script (vault-append) for this; it reads the current value of the secret, updates data with the arguments I passed to the script, and writes it back. Yes, this is constrained to string keys and string values, but that's almost exclusively how we structure items in the KV backend so they'll play nicely with any API client or the CLI. For me this also solves the atomic operation issue, at least enough for low-frequency human updates.

@jefferai
Copy link
Member

I agree that PATCH would be difficult/problematic. I apologize, I always try to offer suggestions when I have one, but I'm not sure I do for this.

The problem is, nobody does :-) So if nobody has a better solution, de facto this is currently the best available solution.

What I meant about the UX is something that I've seen with nearly every person we've onboarded to Vault:

Not to be blaming the victim here, but...is it possible that your onboarding needs to be more explicit/thorough about this behavior up front?

I think part of this has to do with the mixed nomenclature of "key" (which I know has been discussed before) and the fact that there are (were?) places that referred to a path within the "key/value" backend as a "key".

We've been trying to move people towards referring to the URI address as path where at a path it is a key/value store (which is true because the data is expected to be a JSON dict, which is by definition key/value, just with the value being any valid JSON type). I'm sure our documentation could use more updating (feel free to PR changes, too!), but there is also some legacy issues at play.

Maybe this isn't an API issue, but just a CLI issue?

It is absolutely a CLI issue. On the CLI, unless you are passing in data via stdin or @ syntax, the best we can do for any generic path in Vault is to treat that value as a string. We jump through a ton of hoops on the API to be able to parse values in ways where they can either be input as a string or some other JSON type -- for instance, a field definition of type CommaStringSlice which allows you to input a slice of strings (for JSON) or comma-separated values (for CLI users), and DurationSecond where you can pass in a string like 12m or an integer number of seconds like 720. This is all to make Vault consumable via a string-only interface like a CLI and JSON via everything else.

The problem is just that Vault has a JSON API, and CLIs necessarily are text-based. Unless we want to force people to write valid JSON in their CLI commands, which we don't, there are tradeoffs to making things generally easier. You are speaking specifically of the K/V engine, but remember that the CLI has to be generally usable for all Vault APIs.

I do have one suggestion for a way forward, under certain conditions. Given that KVv2 has a check-and-set capability and given the new vault kv command which is specific to the KV engines, we could potentially (and, keep in mind, I just thought of this and haven't fully through through all ramifications, so it might be a terrible idea) enable a vault kv patch that will download the current value of a path, modify the parameters given, and perform a write -- but as a safety guard, will only work for KVv2 engines paths (or global config) that have check-and-set enabled. That would make the UX nicer but the tradeoff is that we'd require consistency, so we could all be kind of happy. (Ping @briankassouf for thoughts and maybe last-minute inclusion into 0.10.1.)

@jefferai
Copy link
Member

@briankassouf the main issue I can think of with this proposal is that iirc we don't return whether check and set is required on read. But we could memoize the global config in an atomic and we look up metadata for a key on read anyways (I think?), so it'd be trivial to return. The other bit is that write only clients would be unable to use it, but that's not going to be the general case in user interactive scenarios, and CAS mostly makes this assumption already.

@jefferai
Copy link
Member

#4432

@ksupriyasingh
Copy link

Is there any API which can give separate value,for example i have this data to be stored
{"key1":"value1","key2":"value2","key3":"value3"} then is there any API using which if i query with 'key2' then i should get its value 'value2', simillarly if i query with 'key1' then i should get 'value1'?

@chrishoffman
Copy link
Contributor

@ksupriyasingh There is not.

Also, please post questions on the mailing list (https://groups.google.com/forum/m/#!forum/vault-tool) rather than in GitHub issues.

@cjimmy
Copy link

cjimmy commented Aug 6, 2020

Apologies for adding to a closed issue. @jantman's take on the user's expectations is accurate, and aside from creating a PATCH method or any other technical solution, the least you can do is update the documentation, with the assumption that some users are expecting a PATCH-like behavior. A step up from including this in documentation is a simple warning in the CLI that contents will be overwritten at that path. It would probably prevent many novices from overwriting their team's keys on their first write.

TBH, I was surprised when this happened, and expected there to be some explanation of this in the documentation. Surely, there must be a flag I'm missing in this command, or I'm doing something wrong. Nope, after reading this, that's the intended behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests