Security Concerns #5

toovy · 2015-12-10T13:52:14Z

Hi,

Great project! I've looked at the server side and I'm interested how you handle the security as I've looked at CouchDB and the way you simply proxy it. Are you using the CouchDB role system? How are the different roles setup, via _design documents? I've not found a setup script or sth.

BR toovy

jkleinsc · 2015-12-10T14:16:08Z

We are using roles, but right now at a very basic level. I would like to expand what we have to offer role based ACL. Our setup script currently lives here: https://github.com/HospitalRun/hospitalrun-frontend/blob/master/initcouch.sh

We are using CouchDB's _users database to manage the users for HospitalRun and assigning users different roles based on their roles within HosptialRun. If you look at https://github.com/HospitalRun/hospitalrun-frontend/blob/master/app/mixins/user-roles.js you can see the current user roles in our system. The roles array for each user role gets populated into the roles attribute on the corresponding CouchDB user. Right now those roles are really only used on the frontend to determine functionality there, but I want to make some changes on the CouchDB side to limit data access based on role. Does that make sense?

toovy · 2015-12-10T14:41:31Z

Thanks for your answer, and yes, makes sense. The conclusion is that currently you could manipulate the ember client and it would be possible to get more data than your role is allowed to. I've evaluated the possibility of CouchDB ACL and did not really find a solution but using one DB per role / group that is allowed to access the data. If you got different permissions that might change that model is to inflexible, that's why I've been asking. Also as far as I can see the search has the same issue?!

jkleinsc · 2015-12-10T14:54:54Z

I ran across this the other day and am wondering if it would be useful:
https://github.com/ermouth/covercouch

You are right about search, but I think we could implement something there as well.

toovy · 2015-12-15T13:37:12Z

@jkleinsc looks like what would be needed, the implementation seems to be tricky. Also with a proxy inbetween you might run into performance issues (well if you query one million docs the covercouch loops one million times to filter unwanted stuff out as far as I can see). But I guess you can't have everything: a performant secure offline-first ember app ;)

cquillen2003 · 2015-12-17T09:03:16Z

How about one DB per user with filtered replication to/from a company-wide "master" DB? If a user's permissions change, delete the user's DB and re-build it from the company-wide DB.

Based on the links below, I think this may be how eHealth Africa handles this. Also the Hood.ie framework, which is built on CouchDB.

http://docs.hood.ie/en/hoodieverse/how-hoodie-works.html

https://github.com/eHealthAfrica/couchdb-bootstrap

toovy · 2015-12-17T09:56:26Z

@cquillen2003 thanks for the examples. I understand that replication is the key. I'm new to CouchDB so I'm not quite sure if I understood the concept:

each user writes to the MasterDB, use validate_doc_update to prevent them from updating and inserting stuff they should not (is validate_doc_update also used on insert?)
each UserDB is replicated from the MasterDB using a filter function that prevents them from reading data that they are not allowed to see
users are only reading from their own UserDB

Or ist it...

each user works completely on the UserDB
inserts and updates are replicated to the MasterDB
the MasterDB is replicated to other UserDBs via filter

Thanks for your advice.

cquillen2003 · 2015-12-17T17:52:55Z

@toovy The second pattern you described is what I meant. The MasterDB would not be read from or written to directly. It would be kept up to date via replication only as a way for a company's users to share data.

@jkleinsc Any thoughts on this? I have not tried this yet myself.

jkleinsc · 2015-12-17T18:21:42Z

@cquillen2003 yes I have been debating whether or not per user user/per role dbs are the solution here. My only concern is scalability/performance of a solution like this. For what I've heard it should scale well, but I think there may be more considerations on how you deal with conflicts. I think those are solvable problems but they need to be addressed.

jkleinsc · 2015-12-17T21:19:57Z

For posterity's sake, here is a conversation I had with @janl on this topic:

jkleinsc [11:00 AM]
Does anyone have any experience doing partial syncs between pouchdb and couchdb?

jan [11:06 AM]
how partial?

jkleinsc [11:13 AM]
I can’t do a database per user, but I need a way to limit how many records get synced between pouch and couch on a per user basis.

[11:14]
I’m working on an opensource HIS (hospitalrun.io) and I need to be able in some cases have users just sync patient records and in other cases have users that for example just sync inventory records

[11:14]
Does that make sense?

jan [11:16 AM]
yup, why can’t you do db-per-user?

jkleinsc [11:17 AM]
Because different people need access to the same records (eg a nurse and a doctor both need access to the same patient data)

jan [11:18 AM]
would this work: one large DB with all data, clients never connect to this, on top of that: db-per-user and filtered replication so that user-dbs only have the docs that particular user is allowed to see?

jkleinsc [11:24 AM]
hmm.. yes I guess that would work, but wouldn’t that dramatically increase server side storage, particularly with a large user base?

jan [11:24 AM]
I wouldn’t say 2x is dramatic :)

jkleinsc [11:28 AM]
@jan is this a scenario you have seen working in a production environment?

jan [11:34 AM]
yup

cquillen2003 · 2015-12-17T22:58:37Z

@jkleinsc Thanks for sharing! That's good to hear from @janl because he is a lead on CouchDB and Hood.ie as I understand it.

On another note, do you use the _users database? If so, do you allow certain end-users to add/remove users within their company/organization?

MiguelMadero · 2015-12-21T06:49:45Z

This is useful. I will have to look at filtered replications.
I found the nolanlawson/pouchdb-authentication useful, specially the section around authentication recipes.

pgte · 2016-01-04T11:47:44Z

If these assumptions hold:

the access control per role fits (you don't need to partition the data per department), and
each user has only one role

...in my experience, a database per role can work quite well.

To be clear that we're talking about the same thing:

CouchDB:

There is a master database;
Only a server admin user can access the main database (no clients access that database directly);
There is one database per role;
For each role database, you setup filtered replication that decides which documents that role is allowed to access
For each role database, we set the security setting to allow access to that role only

Node

Now we have more sources of conflicts, but these conflicts will be replicated into the main database and solved there. (no need to change the merging procedure).

Proxying

Node proxies the one database right now, but from now on:

user will not access the main database
user will access the database for his role

In this case we can make the forward middleware pick the correct database based on the user role and let the client always use the /db. Node then maps that URL base into the role database base.

Client

Since the access policy is role-based and controlled by CouchDB, you only need to point it to the /db database and the correct database gets proxied.

This may get weird when the user switches roles. The client database now has documents that the role shouldn't access. The database needs to be cleared and then resynced. But if the role change happens in the middle of a sync, data may get lost.
I think role changing needs some thought.

What do you think? I'm willing to work on a PoC for this if you're ok with these above.

jkleinsc · 2016-01-04T13:02:58Z

@pgte sounds good to me. Go ahead on a POC. I think users switching roles will be the exception not the norm, but it is something to think about. Also, I am working on something in regards to how pouch/couch sync that I think may make that issue less of an issue.

toovy · 2016-01-05T08:55:53Z

I had a look at filtered replications and hacked a small prototype. Replications are pretty easy to setup. Using nano on node I could create new databases and setup the replications.

In detail the process is:

generate a random DB name (prefixed uuidv4 in my case)
create the new DB (that belongs to only the user)
create a new user and set the DB name as a property
setup the _security (e.g. in my case set the users name in the members.names array)
setup the replications with the filter function

First the filter function seemed useless as the userCtx is not accessible. But during creation of the filter functions it is possible to add query_params that can be filled with parameters you need during the filtering of the replication (not tested yet, still theoretical). E.g. you could add the users roles, or a part of the roles, of other properties of the user you need (Note: the replication needs to be changed if those params are changed afterwards).

You could even go one step further than rolebased databases. Assume multiple hospitals run the same instance (e.g. all belong to the same holding). You could create a master database that replicates into tenant databases, that replicate into e.g. department databases and further down into role-bases user databases. Yes, each user has her own database. If the filter functions are used properly that part ensures that users cannot read more than they are allowed to.

To ensure that they cannot create/update/delete certain entries validate_doc_update can be used. This function has access to the old doc, the new doc and the user context. That means you could do all checks you need.

Also that means that in the Ember app the database name for a user is known after login. So the remote database needs to be set after login, not during application start. Don't know if that is easily possible.

From a security point of view I get a good feeling about the fact that every user has her own database. As a relational database user I have mixed feelings about the redundancy. What do you think?

pgte · 2016-01-06T11:50:18Z

@jkleinsc If possible, I now need to define the replication filter functions.
For each role, can you explain the criteria for allowing or not each doc to be replicated?

jkleinsc · 2016-01-06T21:01:59Z

@pgte I am working on the criteria per role. I am putting together a document for this.

jkleinsc · 2016-01-07T21:16:52Z

@pgte Here is a spreadsheet showing which user role has access to which keys in the system:
https://docs.google.com/spreadsheets/d/19Ls4cFSf1v5sFojQjzApiIqB6ryatLXBgdgwtVkqo1E/edit?usp=sharing
Over time this will probably change and at some point this may be dynamically built (eg per implementation you may define custom access per role)

Looking at the spreadsheet, there are some entities in the system that are pretty much used system wide. Also, I realized that because of the relationship between certain entities, certain roles need access to objects they otherwise wouldn't. For example, if a user has access to a patient and can see the medication prescribed to a patient then that user also needs access to inventory because medication is linked to inventory items. All of this is to say that I'm wondering how much of the data we can segregate by role.

@tangollama do you have any thoughts on this?

tangollama · 2016-01-08T03:20:06Z

For me, what's important to remember is what we're trying to accomplish when we talk about data security with offline, and I have an assertion that anyone should feel welcome to challenge: the primary problem we're concerned with is one of patient confidentiality.

If that assertion is correct, we might really be talking about only two types of offline database experiences:

one where the patient data is name and location are hidden / unavailable
and another where the identifying data about a patient is visible

As @jkleinsc and I have talked about this more, we also need to acknowledge that (and I don't think we're going to have a problem doing this, given how we're implementing) some installations are not going to be comfortable with offline capabilities b/c of perceived or imagined security concerns or "developing" regulatory environments in some of these locations. As such, we're going to need to offer a site-wide config (and perhaps someday a user-specific config) to disallow offline. However, we should all be in agreement that "out of the box" HospitalRun allows offline storage of data.

So back to this issue. I think I'm proposing that there are two "versions" of the data that a user has access to - depending on role. One with hidden names and demographic data (where presumably patients would be "recalled" by Patient ID only, demographic data would not be editable, and any "malicious" edits to the data would be ignored by the couch/pouch sync and another, that allows edits to patient data.

How does that sit with everyone?

pgte · 2016-01-08T09:35:36Z

@tangollama My belief is that, in CouchDB, when using filtered replication you can only decide whether to replicate a given document or not — you can't change that document when replicating.

One solution to this (off the top of my head) is to have 2 main databases:

main (the current one)
main-anonymized: containing all the records, but anonymized.

This last one would be the database that is accessible to the roles that don't have access to personalized data.

There would have to be a worker working on the changes feed, anonymizing the documents and feeding them into the anonymized database.

Then, if you still want to partition data based on role, you can choose which of the "main" databases to replicate to the role-specific database.

@tangollama @jkleinsc what do you think?

jkleinsc · 2016-01-08T19:41:37Z

@pgte @tangollama I feel like I need to give this more thought, but here are some things that come to mind:

I think instead of "anonymized" records and "regular" records the better approach would be to put data that needs to anonymized in its own document and reference that document in the patient document. That way we could do filtered replication the way CouchDB intends it.
I think I need to further refine the spreadsheet I shared to determine if we can streamline what records a user/role needs. As I have thought more about it, depending upon role, you have a set of records that you have read/write access to and another set of records that you have read-only access to. In the case of read-only records, you most likely do not need the children records of those records (for example a user who has access to a patient needs medication records, which need inventory records, but if the user cannot affect inventory levels, then that user wouldn't need access to inventory-location or inventory-purchase records, which are only used for fulfillment).

pgte · 2016-01-10T17:00:01Z

@jkleinsc yes, it makes sense to have 1 separate containing only the personalized data.
I'll then wait for you to come to a conclusion before proceeding.

jkleinsc · 2016-01-14T21:02:39Z

@pgte I realized that with a small change to the frontend (HospitalRun/hospitalrun-frontend#261) we can significantly limit who needs access to inventory records. This makes the permissions matrix much simpler. Also, I realized that certain roles overlap with the same data requirements, so where possible I combined those roles. Based on these two things, I updated the doc (https://docs.google.com/spreadsheets/d/19Ls4cFSf1v5sFojQjzApiIqB6ryatLXBgdgwtVkqo1E/edit?usp=sharing) with a more streamlined set of permissions per role that I think we can work with.

I think it is safe to proceed with this matrix and develop the POC from it.

As far as the personalized data on the patient, I think this is something that we handle as a separate issue, which will require some frontend work to part out private data vs public data. There is an issue in the frontend repo (HospitalRun/hospitalrun-frontend#220) which I think we can use to drive the frontend changes.

jkleinsc · 2016-07-11T19:01:01Z

Using https://github.com/cloudant-labs/envoy could be a possibility here. Something to investigate at some point.

robertkeizer · 2017-07-14T14:19:56Z

What is the status of this at the moment? I would be interested in working on this if it is still a problem spot. I've spent weeks in the past dealing with pouchdb/couchdb partial replication.

stale · 2019-08-07T10:02:08Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

fox1t · 2019-08-07T20:47:10Z

This issue will be use as knowledge base for the 2.0.0 version.

stale · 2019-10-06T21:25:49Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jkleinsc mentioned this issue Jan 14, 2016

Change inventory relationship on medication to be async HospitalRun/hospitalrun-frontend#261

Closed

jkleinsc mentioned this issue Jan 14, 2016

Encryption for patient data HospitalRun/hospitalrun-frontend#220

Closed

pgte mentioned this issue Jan 18, 2016

server duplication HospitalRun/hospitalrun-frontend#264

Closed

jkleinsc mentioned this issue Feb 1, 2016

Allow partial offline usage HospitalRun/hospitalrun-frontend#61

Closed

tangollama added help wanted data labels Apr 22, 2016

stale bot added the wontfix label Aug 7, 2019

fox1t added v2.x and removed help wanted wontfix labels Aug 7, 2019

fox1t self-assigned this Aug 7, 2019

stale bot added the wontfix label Oct 6, 2019

fox1t closed this as completed Jan 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security Concerns #5

Security Concerns #5

toovy commented Dec 10, 2015

jkleinsc commented Dec 10, 2015

toovy commented Dec 10, 2015

jkleinsc commented Dec 10, 2015

toovy commented Dec 15, 2015

cquillen2003 commented Dec 17, 2015

toovy commented Dec 17, 2015

cquillen2003 commented Dec 17, 2015

jkleinsc commented Dec 17, 2015

jkleinsc commented Dec 17, 2015

cquillen2003 commented Dec 17, 2015

MiguelMadero commented Dec 21, 2015

pgte commented Jan 4, 2016

jkleinsc commented Jan 4, 2016

toovy commented Jan 5, 2016

pgte commented Jan 6, 2016

jkleinsc commented Jan 6, 2016

jkleinsc commented Jan 7, 2016

tangollama commented Jan 8, 2016

pgte commented Jan 8, 2016

jkleinsc commented Jan 8, 2016

pgte commented Jan 10, 2016

jkleinsc commented Jan 14, 2016

jkleinsc commented Jul 11, 2016

robertkeizer commented Jul 14, 2017

stale bot commented Aug 7, 2019

fox1t commented Aug 7, 2019

stale bot commented Oct 6, 2019

Security Concerns #5

Security Concerns #5

Comments

toovy commented Dec 10, 2015

jkleinsc commented Dec 10, 2015

toovy commented Dec 10, 2015

jkleinsc commented Dec 10, 2015

toovy commented Dec 15, 2015

cquillen2003 commented Dec 17, 2015

toovy commented Dec 17, 2015

cquillen2003 commented Dec 17, 2015

jkleinsc commented Dec 17, 2015

jkleinsc commented Dec 17, 2015

cquillen2003 commented Dec 17, 2015

MiguelMadero commented Dec 21, 2015

pgte commented Jan 4, 2016

CouchDB:

Node

Proxying

Client

jkleinsc commented Jan 4, 2016

toovy commented Jan 5, 2016

pgte commented Jan 6, 2016

jkleinsc commented Jan 6, 2016

jkleinsc commented Jan 7, 2016

tangollama commented Jan 8, 2016

pgte commented Jan 8, 2016

jkleinsc commented Jan 8, 2016

pgte commented Jan 10, 2016

jkleinsc commented Jan 14, 2016

jkleinsc commented Jul 11, 2016

robertkeizer commented Jul 14, 2017

stale bot commented Aug 7, 2019

fox1t commented Aug 7, 2019

stale bot commented Oct 6, 2019