Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic "find or create" Repository API #1956

Closed
4 tasks
bajtos opened this issue Nov 1, 2018 · 9 comments
Closed
4 tasks

Atomic "find or create" Repository API #1956

bajtos opened this issue Nov 1, 2018 · 9 comments
Labels
feature Repository Issues related to @loopback/repository package stale

Comments

@bajtos
Copy link
Member

bajtos commented Nov 1, 2018

While implementing hasOne relation in #1879, we discovered a requirement for an atomic "find or create" operation that's needed to avoid race conditions in hasOne's implementation of create operation.

Cross-posting #1879 (comment)

It is crucial to leverage an atomic implementation of findOrCreate provided by our connectors. The current proposal is prone to race conditions, where to instances of the target "hasOne" model can be created.

Consider the following scenario: the LB4 server is handling two incoming HTTP requests to create a hasOne target and the code is executed in the following way by Node.js runtime:

  1. Request 1 arrives, DefaultHasOneRepository#create is invoked
  2. targetRepository.find is called. It's an async function so other stuff happens while the query is in progress.
  3. Request 2 arrives, DefaultHasOneRepository#create is invoked.
  4. targetRepository.find is called. It's an async function so other stuff happens while the query is in progress.
  5. targetRepository.find for Request 1 returns, no model was found. targetRepository.create in called.
  6. targetRepository.find for Request 2 returns, no model was found. targetRepository.create in called.
  7. Both create requests eventually finish. We have two models that are a target of hasOne relation. 💥 💣 💥

Acceptance criteria

  • Add a new Repository-like interface describing atomic operations.

    export interface FindOrCreateResult<T extends Entity> {
     entity: T;
     found: boolean;
    }
    
    export interface AtomicCrudRepository<T extends Entity, ID>
       extends EntityCrudRepository<T, ID> {
        findOrCreate(
         filter: Filter<T>,
         entity: DataObject<T>,
         options?: Options,
       ): Promise<FindOrCreateResult>
    }
  • Add a new repository class implementing this new interface while extending DefaultCrudRepository. Either the constructor or the findOrCreate method should verify that the connector provides atomic implementation of findOrCreate and throw an error with a machine-readable code if it does not.

  • Modify the table mapping known error codes to HTTP status codes to map the new error code to 501 Not Implemented.

Nice to have but not required:

  • A new repository class implementing AtomicCrudRepository on top of CrudRepositoryImpl.

Out of scope

  • Support for other atomic operations like patchOrCreate, replaceOrCreate, etc.
@bajtos bajtos added feature TOB Repository Issues related to @loopback/repository package labels Nov 1, 2018
@bajtos
Copy link
Member Author

bajtos commented Nov 1, 2018

Few thoughts:

(1)
In LoopBack 3.x, when a connector did not implement atomic findOrCreate, the DataAccessObject provided a non-atomic fallback implementation calling findandcreate`. While this may work well when building prototypes, it's creates a high risk of data corruption when deployed to production.

For LB4 and beyond, I am proposing to provide strong guarantees about robustness of this operation. If the connector and/or the database does not provide a solution that's atomic, then the repository method should throw a "Not Implemented" error (instead of falling-back to a non-atomic version).

If there is enough user demand, then we can provide a different repository class that implements operations like findOrCreate in an unsafe (non-atomic) way.

(2)
In order to allow repository consumers to better express whether they require atomic operations or not, and also in line with Relation refactoring: Split Repository interface into different interfaces #1356, I am proposing to introduce a new repository interface and an new implementation class.

For example:

interface AtomicCrudRepository<T extends Entity, ID> 
  extends EntityCrudRepository<T, ID> {

  // declare atomic operations like findOrCreate
}

class DefaultAtomicCrudRepository<T extends Entity, ID> 
  extends EntityCrudRepository<T, ID> 
  implements AtomicCrudRepository<T, ID> {

  // implement atomic operations like findOrCreate
}

/cc @strongloop/loopback-maintainers thoughts?

@bajtos bajtos changed the title Atomic "find or create" command Atomic "find or create" Repository API Nov 1, 2018
@dhmlau dhmlau added this to the November 2018 Milestone milestone Nov 5, 2018
@raymondfeng
Copy link
Contributor

I think the issue is more complex than just implementing the atomic findOrCreate. It depends on how hasOne is enforced:

  1. By LoopBack using atomic findOrCreate from the connector
  2. By the database with a unique constraint of the FK, for example, a user has a profile and the profile model has a unique index of userId. This can be potentially automated by automigrate.
  3. For relational databases that don't have atomic findOrCreate, can we use transactional find then create with strong isolation level?

My understanding is that we only have mongodb and memory connector support atomic findOrCreate. If that's the default behavior, hasOne create will fail for most databases.

Maybe we have to introduce different options for hasOne.create method to honor users' policies, such as:

  • byAtomicFindCreate
  • byDatabase
  • ...

@jannyHou
Copy link
Contributor

jannyHou commented Nov 6, 2018

based on Raymond's comment, Cloudant/Couchdb2 have a similar situation as sql databases.

@bajtos
Copy link
Member Author

bajtos commented Nov 8, 2018

I think the issue is more complex than just implementing the atomic findOrCreate. It depends on how hasOne is enforced.

Good point!

This is loosely related to my ideas related to validation, see #1872 (comment)

By LoopBack using atomic findOrCreate from the connector

👍

By the database with a unique constraint of the FK, for example, a user has a profile and the profile model has a unique index of userId. This can be potentially automated by automigrate.

Makes sense. To make this option easy to use, I would like our implementation to have the following traits:

  • The database-specific error reporting violating of unique constraint is converted into a well-known LoopBack error code, so that the errors returned by REST APIs are consistent and independent on the underlying database.

  • The application/framework should check the database schema to verify the presence of the correct unique constraint. We should not blindly rely on the user to correctly configure the constraints. I see this as very important for any users that have multiple environments, where it's too easy to forget to add unique constraint to test/staging/production database. This would be of a lesser concern if we had a good database migration framework in place (see Database Migration Management Framework #487).

For relational databases that don't have atomic findOrCreate, can we use transactional find then create with strong isolation level?

Makes sense to me. In the current design, it's up to the connector how it implements findOrCreate. Opening a new transaction and doing two-step find + create is a perfectly valid solution IMO. A possible caveat I see: how to handle the case when findOrCreate is already executed within an (outer) transaction. For example, MySQL does not seem to support nested transactions.

My understanding is that we only have mongodb and memory connector support atomic findOrCreate. If that's the default behavior, hasOne create will fail for most databases.
based on Raymond's comment, Cloudant/Couchdb2 have a similar situation as sql databases.

Ouch! When I proposed to use findOrCreate for hasMany, I has incorrectly assumed that most of our connectors already support this operation in an atomic way. Mea culpa.

Maybe we have to introduce different options for hasOne.create method to honor users' policies, such as:

  • byAtomicFindCreate
  • byDatabase

I don't think it's a good idea to ask the users to decide this matter. In my experience, many LB users don't fully understand ramifications of different ways of enforcing has-one constraint and as a result, we can end up with many LB4 applications prone to race conditions.

What I would like to see instead, is a single Model/Repository-level API that allows:

  • the HasMany relation to express the has-one constraint
  • the connectors to provide a database-specific implementation enforcing the constraint in a reliable (atomic) way

An interesting approach for document databases like Cloudant/Couchdb, based on this StackOverflow discussion:

When a User has one Credentials instance, we can encode the user's id in credentials' id. Even better, Credentials id can be set to the same value as the User id it belongs to. Now

In that setup, when we attempt a PUT request of an existing document id (findOrCreate, the document already exists), it will be rejected with a HTTP 409 error - unless you supply the correct revision (_rev property) of the existing document.

It makes me wonder - would it make sense to merge the primary key with the foreign key into the same property in the target model of HasOne relation and use this as the solution to enforce uniqueness across all databases?

@model()
class User {
  @hasOne(() => Credentials)
  credentials: Credentials;
}

@model
class Credentials {
  @belongsTo(() => User)
  @property({id: true})
  userId: string;
}

Or even

@model()
class User {
  @hasOne(() => Credentials, {keyTo: 'id'})
  credentials: Credentials;
}

@model
class Credentials {
  @belongsTo(() => User)
  @property({id: true})
  id: string;
}

Thoughts?

@bajtos
Copy link
Member Author

bajtos commented Nov 9, 2018

@b-admike and me have discussed this matter yesterday. We concluded there are two ways forward:

  1. Look into transaction-based implementation of findOrCreate
  2. Require the target of hasOne relation to use the foreign key as the primary key. I believe the unique constraint of primary keys is enforced by all databases (SQL and NoSQL), thus not atomic operation like findOrCreate is needed.

@dhmlau
Copy link
Member

dhmlau commented Nov 26, 2018

@b-admike, per our discussion last week, IIUC since this feature is only applicable for a small subset of connectors (or databases), this task is not a blocker for the hasOne task.
If this should not be part of the Nov milestone or top of backlog, please update accordingly. Thanks.

@b-admike b-admike removed this from the November 2018 Milestone milestone Nov 27, 2018
@b-admike
Copy link
Contributor

Yes @dhmlau per #1956 (comment), we've decided to take approach 2) with #2005, and decided to close #1974 as rejected. I've removed this issue from the milestone.

@dhmlau dhmlau removed the TOB label Feb 12, 2019
@stale
Copy link

stale bot commented Feb 7, 2020

This issue has been marked stale because it has not seen activity within six months. If you believe this to be in error, please contact one of the code owners, listed in the CODEOWNERS file at the top-level of this repository. This issue will be closed within 30 days of being stale.

@stale stale bot added the stale label Feb 7, 2020
@stale
Copy link

stale bot commented Mar 8, 2020

This issue has been closed due to continued inactivity. Thank you for your understanding. If you believe this to be in error, please contact one of the code owners, listed in the CODEOWNERS file at the top-level of this repository.

@stale stale bot closed this as completed Mar 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Repository Issues related to @loopback/repository package stale
Projects
None yet
Development

No branches or pull requests

5 participants