Skip to content
This repository has been archived by the owner on Jan 9, 2023. It is now read-only.

Security Concerns #5

Closed
toovy opened this issue Dec 10, 2015 · 27 comments
Closed

Security Concerns #5

toovy opened this issue Dec 10, 2015 · 27 comments

Comments

@toovy
Copy link

toovy commented Dec 10, 2015

Hi,

Great project! I've looked at the server side and I'm interested how you handle the security as I've looked at CouchDB and the way you simply proxy it. Are you using the CouchDB role system? How are the different roles setup, via _design documents? I've not found a setup script or sth.

BR toovy

@jkleinsc
Copy link
Member

We are using roles, but right now at a very basic level. I would like to expand what we have to offer role based ACL. Our setup script currently lives here: https://github.com/HospitalRun/hospitalrun-frontend/blob/master/initcouch.sh

We are using CouchDB's _users database to manage the users for HospitalRun and assigning users different roles based on their roles within HosptialRun. If you look at https://github.com/HospitalRun/hospitalrun-frontend/blob/master/app/mixins/user-roles.js you can see the current user roles in our system. The roles array for each user role gets populated into the roles attribute on the corresponding CouchDB user. Right now those roles are really only used on the frontend to determine functionality there, but I want to make some changes on the CouchDB side to limit data access based on role. Does that make sense?

@toovy
Copy link
Author

toovy commented Dec 10, 2015

Thanks for your answer, and yes, makes sense. The conclusion is that currently you could manipulate the ember client and it would be possible to get more data than your role is allowed to. I've evaluated the possibility of CouchDB ACL and did not really find a solution but using one DB per role / group that is allowed to access the data. If you got different permissions that might change that model is to inflexible, that's why I've been asking. Also as far as I can see the search has the same issue?!

@jkleinsc
Copy link
Member

I ran across this the other day and am wondering if it would be useful:
https://github.com/ermouth/covercouch

You are right about search, but I think we could implement something there as well.

@toovy
Copy link
Author

toovy commented Dec 15, 2015

@jkleinsc looks like what would be needed, the implementation seems to be tricky. Also with a proxy inbetween you might run into performance issues (well if you query one million docs the covercouch loops one million times to filter unwanted stuff out as far as I can see). But I guess you can't have everything: a performant secure offline-first ember app ;)

@cquillen2003
Copy link

How about one DB per user with filtered replication to/from a company-wide "master" DB? If a user's permissions change, delete the user's DB and re-build it from the company-wide DB.

Based on the links below, I think this may be how eHealth Africa handles this. Also the Hood.ie framework, which is built on CouchDB.

http://docs.hood.ie/en/hoodieverse/how-hoodie-works.html

https://github.com/eHealthAfrica/couchdb-bootstrap

@toovy
Copy link
Author

toovy commented Dec 17, 2015

@cquillen2003 thanks for the examples. I understand that replication is the key. I'm new to CouchDB so I'm not quite sure if I understood the concept:

  • each user writes to the MasterDB, use validate_doc_update to prevent them from updating and inserting stuff they should not (is validate_doc_update also used on insert?)
  • each UserDB is replicated from the MasterDB using a filter function that prevents them from reading data that they are not allowed to see
  • users are only reading from their own UserDB

Or ist it...

  • each user works completely on the UserDB
  • inserts and updates are replicated to the MasterDB
  • the MasterDB is replicated to other UserDBs via filter

Thanks for your advice.

@cquillen2003
Copy link

@toovy The second pattern you described is what I meant. The MasterDB would not be read from or written to directly. It would be kept up to date via replication only as a way for a company's users to share data.

@jkleinsc Any thoughts on this? I have not tried this yet myself.

@jkleinsc
Copy link
Member

@cquillen2003 yes I have been debating whether or not per user user/per role dbs are the solution here. My only concern is scalability/performance of a solution like this. For what I've heard it should scale well, but I think there may be more considerations on how you deal with conflicts. I think those are solvable problems but they need to be addressed.

@jkleinsc
Copy link
Member

For posterity's sake, here is a conversation I had with @janl on this topic:

jkleinsc [11:00 AM]
Does anyone have any experience doing partial syncs between pouchdb and couchdb?

jan [11:06 AM]
how partial?

jkleinsc [11:13 AM]
I can’t do a database per user, but I need a way to limit how many records get synced between pouch and couch on a per user basis.

​[11:14]
I’m working on an opensource HIS (hospitalrun.io) and I need to be able in some cases have users just sync patient records and in other cases have users that for example just sync inventory records

​[11:14]
Does that make sense?

jan [11:16 AM]
yup, why can’t you do db-per-user?

jkleinsc [11:17 AM]
Because different people need access to the same records (eg a nurse and a doctor both need access to the same patient data)

jan [11:18 AM]
would this work: one large DB with all data, clients never connect to this, on top of that: db-per-user and filtered replication so that user-dbs only have the docs that particular user is allowed to see?

jkleinsc [11:24 AM]
hmm.. yes I guess that would work, but wouldn’t that dramatically increase server side storage, particularly with a large user base?

jan [11:24 AM]
I wouldn’t say 2x is dramatic :)

jkleinsc [11:28 AM]
@jan is this a scenario you have seen working in a production environment?

jan [11:34 AM]
yup

@cquillen2003
Copy link

@jkleinsc Thanks for sharing! That's good to hear from @janl because he is a lead on CouchDB and Hood.ie as I understand it.

On another note, do you use the _users database? If so, do you allow certain end-users to add/remove users within their company/organization?

@MiguelMadero
Copy link

This is useful. I will have to look at filtered replications.
I found the nolanlawson/pouchdb-authentication useful, specially the section around authentication recipes.

@pgte
Copy link
Contributor

pgte commented Jan 4, 2016

If these assumptions hold:

  • the access control per role fits (you don't need to partition the data per department), and
  • each user has only one role

...in my experience, a database per role can work quite well.

To be clear that we're talking about the same thing:

CouchDB:

  • There is a master database;
  • Only a server admin user can access the main database (no clients access that database directly);
  • There is one database per role;
  • For each role database, you setup filtered replication that decides which documents that role is allowed to access
  • For each role database, we set the security setting to allow access to that role only

Node

Now we have more sources of conflicts, but these conflicts will be replicated into the main database and solved there. (no need to change the merging procedure).

Proxying

Node proxies the one database right now, but from now on:

  • user will not access the main database
  • user will access the database for his role

In this case we can make the forward middleware pick the correct database based on the user role and let the client always use the /db. Node then maps that URL base into the role database base.

Client

Since the access policy is role-based and controlled by CouchDB, you only need to point it to the /db database and the correct database gets proxied.

This may get weird when the user switches roles. The client database now has documents that the role shouldn't access. The database needs to be cleared and then resynced. But if the role change happens in the middle of a sync, data may get lost.
I think role changing needs some thought.


What do you think? I'm willing to work on a PoC for this if you're ok with these above.

@jkleinsc
Copy link
Member

jkleinsc commented Jan 4, 2016

@pgte sounds good to me. Go ahead on a POC. I think users switching roles will be the exception not the norm, but it is something to think about. Also, I am working on something in regards to how pouch/couch sync that I think may make that issue less of an issue.

@toovy
Copy link
Author

toovy commented Jan 5, 2016

I had a look at filtered replications and hacked a small prototype. Replications are pretty easy to setup. Using nano on node I could create new databases and setup the replications.

In detail the process is:

  1. generate a random DB name (prefixed uuidv4 in my case)
  2. create the new DB (that belongs to only the user)
  3. create a new user and set the DB name as a property
  4. setup the _security (e.g. in my case set the users name in the members.names array)
  5. setup the replications with the filter function

First the filter function seemed useless as the userCtx is not accessible. But during creation of the filter functions it is possible to add query_params that can be filled with parameters you need during the filtering of the replication (not tested yet, still theoretical). E.g. you could add the users roles, or a part of the roles, of other properties of the user you need (Note: the replication needs to be changed if those params are changed afterwards).

You could even go one step further than rolebased databases. Assume multiple hospitals run the same instance (e.g. all belong to the same holding). You could create a master database that replicates into tenant databases, that replicate into e.g. department databases and further down into role-bases user databases. Yes, each user has her own database. If the filter functions are used properly that part ensures that users cannot read more than they are allowed to.

To ensure that they cannot create/update/delete certain entries validate_doc_update can be used. This function has access to the old doc, the new doc and the user context. That means you could do all checks you need.

Also that means that in the Ember app the database name for a user is known after login. So the remote database needs to be set after login, not during application start. Don't know if that is easily possible.

From a security point of view I get a good feeling about the fact that every user has her own database. As a relational database user I have mixed feelings about the redundancy. What do you think?

@pgte
Copy link
Contributor

pgte commented Jan 6, 2016

@jkleinsc If possible, I now need to define the replication filter functions.
For each role, can you explain the criteria for allowing or not each doc to be replicated?

@jkleinsc
Copy link
Member

jkleinsc commented Jan 6, 2016

@pgte I am working on the criteria per role. I am putting together a document for this.

@jkleinsc
Copy link
Member

jkleinsc commented Jan 7, 2016

@pgte Here is a spreadsheet showing which user role has access to which keys in the system:
https://docs.google.com/spreadsheets/d/19Ls4cFSf1v5sFojQjzApiIqB6ryatLXBgdgwtVkqo1E/edit?usp=sharing
Over time this will probably change and at some point this may be dynamically built (eg per implementation you may define custom access per role)

Looking at the spreadsheet, there are some entities in the system that are pretty much used system wide. Also, I realized that because of the relationship between certain entities, certain roles need access to objects they otherwise wouldn't. For example, if a user has access to a patient and can see the medication prescribed to a patient then that user also needs access to inventory because medication is linked to inventory items. All of this is to say that I'm wondering how much of the data we can segregate by role.

@tangollama do you have any thoughts on this?

@tangollama
Copy link
Member

For me, what's important to remember is what we're trying to accomplish when we talk about data security with offline, and I have an assertion that anyone should feel welcome to challenge: the primary problem we're concerned with is one of patient confidentiality.

If that assertion is correct, we might really be talking about only two types of offline database experiences:

  • one where the patient data is name and location are hidden / unavailable
  • and another where the identifying data about a patient is visible

As @jkleinsc and I have talked about this more, we also need to acknowledge that (and I don't think we're going to have a problem doing this, given how we're implementing) some installations are not going to be comfortable with offline capabilities b/c of perceived or imagined security concerns or "developing" regulatory environments in some of these locations. As such, we're going to need to offer a site-wide config (and perhaps someday a user-specific config) to disallow offline. However, we should all be in agreement that "out of the box" HospitalRun allows offline storage of data.

So back to this issue. I think I'm proposing that there are two "versions" of the data that a user has access to - depending on role. One with hidden names and demographic data (where presumably patients would be "recalled" by Patient ID only, demographic data would not be editable, and any "malicious" edits to the data would be ignored by the couch/pouch sync and another, that allows edits to patient data.

How does that sit with everyone?

@pgte
Copy link
Contributor

pgte commented Jan 8, 2016

@tangollama My belief is that, in CouchDB, when using filtered replication you can only decide whether to replicate a given document or not — you can't change that document when replicating.

One solution to this (off the top of my head) is to have 2 main databases:

  • main (the current one)
  • main-anonymized: containing all the records, but anonymized.

This last one would be the database that is accessible to the roles that don't have access to personalized data.

There would have to be a worker working on the changes feed, anonymizing the documents and feeding them into the anonymized database.

Then, if you still want to partition data based on role, you can choose which of the "main" databases to replicate to the role-specific database.

@tangollama @jkleinsc what do you think?

@jkleinsc
Copy link
Member

jkleinsc commented Jan 8, 2016

@pgte @tangollama I feel like I need to give this more thought, but here are some things that come to mind:

  1. I think instead of "anonymized" records and "regular" records the better approach would be to put data that needs to anonymized in its own document and reference that document in the patient document. That way we could do filtered replication the way CouchDB intends it.

  2. I think I need to further refine the spreadsheet I shared to determine if we can streamline what records a user/role needs. As I have thought more about it, depending upon role, you have a set of records that you have read/write access to and another set of records that you have read-only access to. In the case of read-only records, you most likely do not need the children records of those records (for example a user who has access to a patient needs medication records, which need inventory records, but if the user cannot affect inventory levels, then that user wouldn't need access to inventory-location or inventory-purchase records, which are only used for fulfillment).

@pgte
Copy link
Contributor

pgte commented Jan 10, 2016

@jkleinsc yes, it makes sense to have 1 separate containing only the personalized data.
I'll then wait for you to come to a conclusion before proceeding.

@jkleinsc
Copy link
Member

@pgte I realized that with a small change to the frontend (HospitalRun/hospitalrun-frontend#261) we can significantly limit who needs access to inventory records. This makes the permissions matrix much simpler. Also, I realized that certain roles overlap with the same data requirements, so where possible I combined those roles. Based on these two things, I updated the doc (https://docs.google.com/spreadsheets/d/19Ls4cFSf1v5sFojQjzApiIqB6ryatLXBgdgwtVkqo1E/edit?usp=sharing) with a more streamlined set of permissions per role that I think we can work with.

I think it is safe to proceed with this matrix and develop the POC from it.

As far as the personalized data on the patient, I think this is something that we handle as a separate issue, which will require some frontend work to part out private data vs public data. There is an issue in the frontend repo (HospitalRun/hospitalrun-frontend#220) which I think we can use to drive the frontend changes.

@jkleinsc
Copy link
Member

Using https://github.com/cloudant-labs/envoy could be a possibility here. Something to investigate at some point.

@robertkeizer
Copy link

What is the status of this at the moment? I would be interested in working on this if it is still a problem spot. I've spent weeks in the past dealing with pouchdb/couchdb partial replication.

@stale
Copy link

stale bot commented Aug 7, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Aug 7, 2019
@fox1t fox1t self-assigned this Aug 7, 2019
@fox1t
Copy link
Member

fox1t commented Aug 7, 2019

This issue will be use as knowledge base for the 2.0.0 version.

@stale
Copy link

stale bot commented Oct 6, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Oct 6, 2019
@fox1t fox1t closed this as completed Jan 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants