Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Resource: azurerm_sql_failover_group #3901

Merged
merged 12 commits into from
Aug 20, 2019
Merged

New Resource: azurerm_sql_failover_group #3901

merged 12 commits into from
Aug 20, 2019

Conversation

simongh
Copy link
Contributor

@simongh simongh commented Jul 22, 2019

I've created a resource for adding SQL Failover Groups. I've added tests & documentation. Hopefully there's not much that I've missed. Still a novice when it comes to Go

(fixes #2989)

@leewilson86
Copy link
Contributor

I am very interested in this as a native resource. I currently have this functionality in place using ARM templates, but a proper resource would be much nicer.

It looks like you’ve covered all the key points.

The only questions I have are:

  • does this resource work properly on deletion, when you have databases synchronised into the secondary region? As failover groups will create databases in the secondary region for you, when using ARM templates, TF becomes unaware of the secondary databases existence, meaning an error is thrown by the ARM API on delete as the FOGs contain databases. Have you handled and tested for this eventuality?

  • does this work ok with databases in elastic pools? On creation, update and delete?

  • have you considered that in the primary region a database may be in an elastic pool, but when may be moved out as a Single SQL Database? Likewise, databases may be moved from one elastic pool into another, and a Single SQL Database may be moved into an elastic pool. Ideally the secondary region should always mirror the primary region so when failover happens infrastructure is identical. Does your work account for this?

@simongh
Copy link
Contributor Author

simongh commented Jul 26, 2019

Failover groups do create a secondary if they don't exist, but there is no way with the API to prevent this or as far as I can tell make TF create a resource for the secondary. If you have this requirement, you can create the primary & secondary, then add the failover group. You're then tracking everything. Doing this also gives you control over the sizing etc of the secondary.

Elastic pools shouldn't affect the databases. Certainly the docs make no mention of this, but then the docs are fairly light on detail.

With regard to moving or sizing the databases, failover groups have no control over this. if you need this control, it's best to create & manage the resources as specific items.

This seems to be the intent of failover groups as defined in Azure. I'm not sure that making it do more is the right thing, but then I'm not always right ;-)

Copy link
Collaborator

@katbyte katbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the new resource @simongh,

I've left some comments inline that once addressed this should be good to merge

azurerm/resource_arm_sql_failover_group.go Show resolved Hide resolved
azurerm/resource_arm_sql_failover_group.go Show resolved Hide resolved
azurerm/resource_arm_sql_failover_group.go Outdated Show resolved Hide resolved
azurerm/resource_arm_sql_failover_group.go Outdated Show resolved Hide resolved
azurerm/resource_arm_sql_failover_group.go Outdated Show resolved Hide resolved
website/docs/r/sql_failover_group.html.markdown Outdated Show resolved Hide resolved
website/docs/r/sql_failover_group.html.markdown Outdated Show resolved Hide resolved
website/docs/r/sql_failover_group.html.markdown Outdated Show resolved Hide resolved
website/docs/r/sql_failover_group.html.markdown Outdated Show resolved Hide resolved
@katbyte katbyte added this to the v1.33.0 milestone Aug 2, 2019
@simongh
Copy link
Contributor Author

simongh commented Aug 4, 2019

I think I've addressed your comments

@ghost ghost removed the waiting-response label Aug 4, 2019
Copy link
Collaborator

@katbyte katbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making the requested changes @simongh,

It looks like you missed one, and i've left a couple additional minor ones. Overall this is looking pretty good and nearly ready to merge 🙂

for _, server := range *input {
info := make(map[string]interface{})
info["id"] = *server.ID
info["location"] = *server.Location
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be nil checked

azurerm/resource_arm_sql_failover_group.go Outdated Show resolved Hide resolved
azurerm/resource_arm_sql_failover_group.go Outdated Show resolved Hide resolved
azurerm/resource_arm_sql_failover_group.go Outdated Show resolved Hide resolved
azurerm/resource_arm_sql_failover_group.go Outdated Show resolved Hide resolved
azurerm/resource_arm_sql_failover_group.go Outdated Show resolved Hide resolved
azurerm/resource_arm_sql_failover_group.go Outdated Show resolved Hide resolved
azurerm/resource_arm_sql_failover_group.go Outdated Show resolved Hide resolved
@simongh
Copy link
Contributor Author

simongh commented Aug 5, 2019

It causes an error if you set the grace minutes & the mode to manual

Failure sending request: StatusCode=400 -- Original Error: Code="FailoverGroupCreateOrUpdateRequestInvalidReadWriteEndpointFieldsForManualPolicy" Message="Grace period value should not be provided when failover policy Manual is selected for the read-write endpoint."

@ghost ghost removed the waiting-response label Aug 5, 2019
@simongh
Copy link
Contributor Author

simongh commented Aug 5, 2019

I've renamed the funcs as suggested. I'm not sure I follow where the nil check you mentioned needs to go. Partner Server is required & you always need at least 1.

Copy link
Collaborator

@katbyte katbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simongh,

I've elaborated with suggestions inline on how to do the nil checks (and spotted another couple minor issues), but once those are fixed up this should be good to merge!

azurerm/resource_arm_sql_failover_group.go Outdated Show resolved Hide resolved
azurerm/resource_arm_sql_failover_group.go Outdated Show resolved Hide resolved
for _, server := range *input {
info := make(map[string]interface{})
info["id"] = *server.ID
info["location"] = *server.Location
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ie

Suggested change
info["location"] = *server.Location
if v := server.Location; v != nil {
info["location"] = *v
}

azurerm/resource_arm_sql_failover_group.go Outdated Show resolved Hide resolved
@simongh
Copy link
Contributor Author

simongh commented Aug 9, 2019

Gotacha, thanks!

I've addressed those comments & rebased to the most recent master, fixing the merge conflict.

@ghost ghost removed the waiting-response label Aug 9, 2019
Copy link
Collaborator

@katbyte katbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the revisions @simongh,

This LGTM now hwoever the tests are failing, i pushed a small change to try and fix them (hope you don't mind) but now i am getting:

------- Stdout: -------
=== RUN   TestAccAzureRMSqlFailoverGroup_basic
=== PAUSE TestAccAzureRMSqlFailoverGroup_basic
=== CONT  TestAccAzureRMSqlFailoverGroup_basic
--- FAIL: TestAccAzureRMSqlFailoverGroup_basic (502.57s)
    testing.go:568: Step 0 error: errors during apply:
        
        Error: Error issuing create/update request for SQL Failover Group "acctest190809180127363915" (Resource Group "acctestRG-190809180127363915", Server "acctestsqlserver190809180127363915-primary"): sql.FailoverGroupsClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidFailoverGroupRegion" Message="Secondary server specified in a Failover Group cannot reside in the same region."
        
          on /opt/teamcity-agent/temp/buildTmp/tf-test378278636/main.tf line 36:
          (source code not available)
        
        
FAIL

@simongh
Copy link
Contributor Author

simongh commented Aug 10, 2019

Don't mind at all.

Failover groups require the partner servers to be in different regions, ideally region replication partners, but any 2 region will do. The tests use testLocation() to get the location in which to create the resources. I'm not sure how to get 2 different locations for the test, other than hard-coding in 2 different regions.

@ghost ghost removed the waiting-response label Aug 10, 2019
@katbyte
Copy link
Collaborator

katbyte commented Aug 10, 2019

@simongh,

we can use testAltLocation() to get another location 🙂

@simongh
Copy link
Contributor Author

simongh commented Aug 10, 2019

@leewilson86 I don't think changing the tests is a hiding anything. Having tested it myself, a secondary db gets created, which isn't tracked by TF, but it's existence doesn't prevent the server from being deleted.

@katbyte could the tests be picking up on the fact that an untracked db was created on the secondary server? I've spotted & fixed another issue with the read-write policy. I've also removed the tags support, since although the API supports them, they don't appear anywhere in the UI.

@katbyte
Copy link
Collaborator

katbyte commented Aug 10, 2019

@simongh,

First, even if the tags don't show up in the portal i think we should leave them in? I think that test is failing because it is the only test that attempts to update the resource? and for some reason it is trying to create a new DB. If you create another test that calls update i imagine you will see the same error?

for insufficient items for attribute "partner_servers"; must have at least 1 i have a feeling the API isn't returning the correct number of partner_servers ? at some point.

@leewilson86, could you suggest a terraform config/test that we could write to ensure we have implemented it correctly?

@simongh
Copy link
Contributor Author

simongh commented Aug 11, 2019

It could be. I've tried doing updates, both by changing the resource & by changing the HCL, all without failure.

If you delete a server it also appears to remove the failover group.
If you delete the failover group, it does leave behind the replicated database it made, but then running apply just re-adds the failover group ok & destroy destroys as expected.

I'm at a loss as to what's causing the errors. I'll add the tags support back in though. I'm running the acceptance tests now & I'll see what I can figure out.

@ghost ghost removed the waiting-response label Aug 11, 2019
@simongh
Copy link
Contributor Author

simongh commented Aug 12, 2019

I found a couple of things - typos in the tests. The update now passes. The delete is still failing, even though I thought I found the cause. I'm fairly certain it's something that the test is doing, since it works when I manually run the same HCL.

Since I'm only half certain of what the test is actually doing, would you mind casting an eye over the test? I've likely missed something simple.

@katbyte
Copy link
Collaborator

katbyte commented Aug 13, 2019

👋 @simongh,

So i consulted core internally, and it sounds like that error means upon read, we got zeropartnert servers instead of the expected 1. Looking at the requests i see

{
	"location": "West Europe",
	"tags": {},
	"properties": {
		"readWriteEndpoint": {
			"failoverPolicy": "Automatic",
			"failoverWithDataLossGracePeriodMinutes": 60
		},
		"readOnlyEndpoint": {
			"failoverPolicy": "Disabled"
		},
		"replicationRole": "Primary",
		"replicationState": "CATCH_UP",
		"partnerServers": [{
			"id": "/subscriptions/x/resourceGroups/acctestRG-190813120733021836/providers/Microsoft.Sql/servers/acctestmssql190813120733021836-secondary",
			"location": "West US",
			"replicationRole": "Secondary"
		}],
		"databases": ["/subscriptions/x/resourceGroups/acctestRG-190813120733021836/providers/Microsoft.Sql/servers/acctestmssql190813120733021836-primary/databases/acctestdb190813120733021836"]
	},
	"id": "/subscriptions/x/resourceGroups/acctestRG-190813120733021836/providers/Microsoft.Sql/servers/acctestmssql190813120733021836-primary/failoverGroups/acctestsfg190813120733021836",
	"name": "acctestsfg190813120733021836",
	"type": "Microsoft.Sql/servers/failoverGroups"
}

a delete, and then a

	"location": "West Europe",
	"tags": {},
	"properties": {
		"readWriteEndpoint": {
			"failoverPolicy": "Automatic",
			"failoverWithDataLossGracePeriodMinutes": 60
		},
		"readOnlyEndpoint": {
			"failoverPolicy": "Disabled"
		},
		"replicationRole": "Primary",
		"replicationState": "CATCH_UP",
		"partnerServers": [],
		"databases": []
	},
	"id": "/subscriptions/x/resourceGroups/acctestRG-190813120733021836/providers/Microsoft.Sql/servers/acctestmssql190813120733021836-primary/failoverGroups/acctestsfg190813120733021836",
	"name": "acctestsfg190813120733021836",
	"type": "Microsoft.Sql/servers/failoverGroups"
}

So it seems like the API starts a delete and gets a:

{
	"operation": "DropFailoverGroup",
	"startTime": "2019-08-13T19:11:58.047Z"
}

and when terraform does a get on it next, it still returns the resource as its not done deleting yet. You may have to add a poll there so the delete call doesn't return until it succeeds?

@simongh
Copy link
Contributor Author

simongh commented Aug 17, 2019

You may have to add a poll there so the delete call doesn't return until it succeeds?

Do you some existing resource that does this that I can look at it & get the test working?

@ghost ghost removed the waiting-response label Aug 17, 2019
@katbyte
Copy link
Collaborator

katbyte commented Aug 18, 2019

@simongh,

the cosmos DB resources do this:

//the SDK now will return a `WasNotFound` response even when still deleting
	stateConf := &resource.StateChangeConf{
		Pending:    []string{"Deleting"},
		Target:     []string{"NotFound"},
		Timeout:    60 * time.Minute,
		MinTimeout: 30 * time.Second,
		Refresh: func() (interface{}, string, error) {

			resp, err2 := client.Get(ctx, resourceGroup, name)
			if err2 != nil {

				if utils.ResponseWasNotFound(resp.Response) {
					return resp, "NotFound", nil
				}
				return nil, "", fmt.Errorf("Error reading CosmosDB Account %q after delete (Resource Group %q): %+v", name, resourceGroup, err2)
			}

			return resp, "Deleting", nil
		},
	}
	if _, err = stateConf.WaitForState(); err != nil {
		return fmt.Errorf("Waiting forCosmosDB Account %q to delete (Resource Group %q): %+v", name, resourceGroup, err)
	}

@katbyte katbyte changed the title Added SQL failover group resource New Resource: azurerm_sql_failover_group Aug 18, 2019
@simongh
Copy link
Contributor Author

simongh commented Aug 19, 2019

I think I've found the problem. The resource does a future.WaitForCompletionRef and the acceptance test didn't. I modified the test to use this & it's working as expected.

I've also updated the documentation to outline what happens with secondary databases

@ghost ghost removed the waiting-response label Aug 19, 2019
Copy link
Collaborator

@katbyte katbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! thanks for fixing that @simongh 🙂 this LGTM now 🚀

@katbyte
Copy link
Collaborator

katbyte commented Aug 20, 2019

I hope you don't mind but i pushed some changes to fix the merge errors.

@katbyte katbyte merged commit a1483fd into hashicorp:master Aug 20, 2019
katbyte added a commit that referenced this pull request Aug 20, 2019
@simongh
Copy link
Contributor Author

simongh commented Aug 20, 2019 via email

@ghost
Copy link

ghost commented Aug 22, 2019

This has been released in version 1.33.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 1.33.0"
}
# ... other configuration ...

@ghost
Copy link

ghost commented Sep 20, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Sep 20, 2019
@simongh simongh deleted the sql-failover-groups branch August 21, 2020 18:54
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New Resource: Azure SQL Failover Groups
3 participants