-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for manual application of indexes when using code first #914
Comments
@williamdenton thanks for raising this important question, it's something I thought out before but never had time to fully explore. First, when you say "apply migration manually to production db", do you mean issue the CREATE INDEX yourself via SQL, and then updating __MigrationHistory yourself? If so, what exactly do you gain by doing this, as opposed to just executing the EF migration as usual? Is this simply a way to add CONCURRENTLY to the creation? If that is so, then adding an option to have EF Core generated CONCURRENTLY on index creation seems like the right solution. In fact, I'm wondering whether it should not become the default - there are some caveats to concurrent index creation (here are the docs) but the pros seem to outweigh the cons. Either way we would still allow users to change the behavior via the fluent API. Otherwise I may be missing something in your description. FYI the generally recommended way to apply migrations to production databases is to generate an SQL migration script and apply that to databases. |
One step of our deployment process is to execute migrations using Index creation sometimes happens as a result of performance tuning. Index creation on very large tables can take some time to complete and we sometimes choose to do this manually via psql rather than normal deployment. What i would like is an option where ef can apply the index in dev, staging and newly bootstrapped environments automatically, but for some environments we choose to apply the index in advance, perhaps in response to a production issue we face as the database volume grows over time and the query optimizer changes the plan as statistics are updated. Wrapping the create index in an if block that checks if the index name already exists would be helpful to enable this scenario. so we dont have to mess the the migrations table is is fraught with human error. Im not sure of how this would be exposed in the api and i am not suggesting that this become the default behaviour of creating indexes. the request i am making here is orthogonal from the concurrent option, although that would improve the options available to us in creating indexes on large tables. |
guessing any change would probably be made in here? (this is from a 5 min search of github rather than any deep understanding of what is actually happening |
Thanks for the added details. First, it would indeed be trivial to add IF NOT EXISTS to CREATE INDEX. This is generally not done, probably because it is not needed if EF manages your schema, and mixed EF/manual schema seem very error-prone. For example, a migration could be thought to have been applied, but in fact table creation was skipped because someone already created a slightly different table manually. It seems like a dangerous-enough pit of failure to not have it. That's for the general case - I do agree that chances for this kind of mess are less for indices. But I'm still curious about the way you work: EF allows you to generate an SQL script for your migrations, and apply it outside of your program (this is in fact the recommended practice). This would do all schema changes and update the migration history table, without risk of human error. What advantage does manual index creation have over this? BTW any opinion on whether CONCURRENTLY should be the default or not? @ajcvickers @divega any opinion here? |
I'll give you a background on how we deploy, it's always helpful to understand how others do it, so i'll share mine. Infrastructure wise I run in AWS using docker on ECS using Amazon RDS Aurora. My app consists of several web nodes behind an ALB (application load balancer), typically 3 (across three availability zones). Also a job engine of which I run a single instance. Deployments look a little like:
This is done for each environment. It happens automatically to the staging/test environment after the build completes. Then to production by clicking deploy in octopus by the developer who write the PR that was initially merged. Developers are responsible for their code all the way to production (and beyond) If a database migration was part of the merged code that will usually be the only code in the deployment, no functional changes to the service are deployed at the same time. This is to reduce risk. Rinse - repeat. Process from merging to deployment to production can be as short as 10 minutes but typically about 30 - depending on risk and need to test new functionality in the test environment. We rely heavily on feature flagging to disable new functional code in production, but still allow that code path to be tested in the staging/test environment. Releasing of new functionality to production is simply changing the value of the feature flag to true for that environment. Once feature is on in all environments, the technical debt of the feature flag infrastructure is then paid down and the Very rarely an operational issue is encountered that may need immediate remediation in production, this is where an index may be applied directly to the database then back ported into the ef model to keep schema consistent. We call this co-piloting as this is always done with two people, two sets of eyes on what is being applied. This is a fairly rare case, but in this situation it would be valuable to have the if not exists flag on the index name, so we have a trust me, i did this manually escape hatch. This may seem unnecessary given how fast we can deploy changes, but this is still a useful option to have. Currently we also co-pilot new indexes to production because they cause locking of the tables and take some time to execute. Running the migrations in the usual manner above for an index that takes more than 10 -20 seconds to build would result in unacceptable table locking and essentially result in service downtime. The concurrent option would help here. This use case would never apply to tables or other schema objects. |
Thanks for the detail @williamdenton, the process is very clear. But I'm still struggling to understand why for your manual updates you can't just use the EF-generated migration SQL script... It's essentially the same as the CREATE INDEX that you're already doing manually, except it also updates the migrations history table for you, with no risk of error anywhere. All this is regardless of adding CONCURRENTLY to CREATE INDEX in migrations, which I'm leaning towards but no yet 100% sure (would be good to have other opinions). |
@williamdenton CREATE INDEX CONCURRENTLY is being added in #967. Am going to move this discussion to the backlog as you haven't posted any responses on the other questions - but we can definitely revisit. |
Using code first in Efcore means defining indexes as well as schema in code, generated migrations then apply migrations to the database.
For large tables, adding a new index must be done with care to avoid locking to keep the db running while the migration is taking place.
Currently our work around is to
__migration_history
this means that on the next migration execution (which we do as part of the deploy) there is no work to do.
A couple of thinking points:
The text was updated successfully, but these errors were encountered: