Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SqlAG/SqlAGReplica: Are both needed? #1059

Open
johlju opened this issue Mar 7, 2018 · 14 comments
Open

SqlAG/SqlAGReplica: Are both needed? #1059

johlju opened this issue Mar 7, 2018 · 14 comments
Labels
help wanted The issue is up for grabs for anyone in the community. resource proposal The issue is proposing a new resource in the resource module.

Comments

@johlju
Copy link
Member

johlju commented Mar 7, 2018

Details of the scenario you tried and the problem that is occurring:
Wouldn't it be better to concatenate SqlAG and SqlAGReplica into SqlAG? I want to discuss to see if this is possible, there might be a scenario that make having two resources necessary.

When looking at issue #518 and issue #1057 I'm started to wonder if SqlAGReplica is really needed. They need the same properties, which means the same code (which could be resolved by helper functions if the resources are kept as-is). We have properties that can only be set on the primary replica, so there is a logic in SqlAGReplica to search for and connect to the primary to make those changes.
The "new SqlAG" could instead use $availabilityGroup.LocalReplicaRole to determine which role the replica is running under, and set the properties accordingly.

In the "new SqlAG" there could be a property to specify an existing replica (for example ExistingReplicaName). This new property is assigned a server instance 'ServerName\InstanceName', which is the same format that is needed for the Name parameter. This ExistingReplicaName is the same as the properties PrimaryReplicaServerName and PrimaryReplicaInstanceName in SqlAGReplica. This will be used to find the primary replica, for example to be able to add a secondary to the Availability Group.

/cc @randomnote1, @TraGicCode

The DSC configuration that is using the resource (as detailed as possible):
n/a

Version of the Operating System, SQL Server and PowerShell the DSC Target Node is running:
n/a

What module (SqlServer or SQLPS) and which version of the module the DSC Target Node is running:
n/a

Version of the DSC module you're using, or 'dev' if you're using current dev branch:
n/a

@johlju johlju added the discussion The issue is a discussion. label Mar 7, 2018
@TraGicCode
Copy link
Contributor

@johlju i feel as if a refactor is needed for these DSC resources. There are a couple of questions i had to ask myself when utilizing these. As is, these dsc resources have the following issues.

  1. I would assume someone could use both sqlag + sqlagreplica resources on the same node and things would not blow up but the resources would both be managing the same "thing" which is not a good or desirable thing. This also reveals a design flaw. To allow someone to do this means the public api is not well designed to allow people to misuse it so to speak.

Proposed Solution:
i think SqlAg OR SqlAgReplica should go away and we should follow the xCluster ( https://github.com/powershell/xfailovercluster ) DSC resource design in which there is 1 resource that Creates the cluster AND joins nodes to the cluster. This also will reduce the duplication we currently have between the 2 resources.

Benefits:

  • Prevents misuse and confusion between these resources.
  • Reduces Code Duplication.
    • Typically when dealing with clustering in other technologies the logic is the same across all nodes except the first or all the other nodes that are not the first have a tiny but of different logic.
  • Consistency between other DSC resource modules. Namely ( xFailOverCluster ).
  1. There is a constant mention of a "primary replica" on the SqlAgReplica dsc resource. This is confusing because your "primary replica" can change when failovers happen.

Proposed Solution:
Maybe this should be called "initialPrimaryReplica" or "initialPrimaryNode" or if a replica can be added to an availability group simply by contacting and executing sql on ANY replica/node in the availability group ( regardless of if it's primary ) it should then be called "existingNodeInAvailabilityGroup" or "existingReplicaInAvailabiltyGroup".

Benefits:

  • Prevents confusion as to if the current primary, the first node in the AG, or any node in the ag can be specified.

@johlju
Copy link
Member Author

johlju commented Mar 18, 2018

@TraGicCode I agree with both your proposals, both proposals aligns with what I wrote in the issue description. The second proposal aligns with the proposed parameter name ExistingReplicaName which would be set to the service instance of an existing node in the availability group, i.e. 'MyServer\MyInstance. or ExistingReplicaName could be split up into the parameters ExistingReplicaServerName and ExistingReplicaInstanceName.

Right now it seems better to concatenate the resources SqlAG and SqlAGReplic, but is there a scenario where the resource SqlAGReplica is needed. 🤔

@johlju
Copy link
Member Author

johlju commented Mar 18, 2018

I tried to wrap my head around this. I concatenated the parameters for both SqlAG and SqlAGReplica into a new SqlAGNew resource (temporary name just for the purpose of this discussion, in the end I'm seeing it replacing the current SqlAG). See the new suggested schema.mof below.
I added assumptions and a questions table below that we can add stuff that comes up. All is up for discussion. I tried to figure out how the Ensure parameter would work, please provide feedback on this, and anything else that you see being a potential problem (high and low).

Assumptions

Number Assumption
A1 When Ensure is set to $true the Availability Group will be created.
A2 When Ensure is set to $false the Availability Group will be removed.
A3 When Ensure is not set (default value $null) the target node will only attempt to join an existing Availability Group.
A4 Some settings should only be set when the target node is in the primary role. If the target node is in secondary role these settings (parameters) will be ignored, and also, in these instances the resource should not not use ExistingReplicaServerName an ExistingReplicaInstanceName to find and set these on the current primary replica.
A5 Some settings should only be set when the target node is in the secondary role. If the target node is in primary role these settings (parameters) will be ignored.
A6 The replica name to use when joining the Availability Group is used from ServerName and InstanceName
A7 The parameters ExistingReplicaServerName and ExistingReplicaInstanceName must be set for the resource to be able to join the target node to the Availability Group.

Questions

Number Question Suggested answer
Q1 How should Ensure work for adding the Availability Group? If A7 is not fulfilled, and A1 is fulfilled, the Availability Group will be created.
Q2 How should Ensure work for removing the Availability Group? If A7 is not fulfilled, and A2 is fulfilled, the Availability Group will be removed (if possible).
Q3 How should Ensure work for adding a node to the Availability Group? If A7 and A3 is fulfilled, the target node will try join Availability Group.
Q4 How should Ensure work for removing a node from the Availability Group? ?
Q5 If assumption A3 is wrong (because of Q4), and we instead should use assumption A1 and A2 to add or remove a target node from an existing Availability Group. How should Ensure work for not trying to create the Availability Group again when used on the second target node? A7 must be fulfilled. When A7 is fulfilled the Availability Group will not be created or removed, instead we use A7 to find the current active primary replica to remove or join the node to the existing Availability Group depending on the value in the Ensure parameter. If A7 is not fulfilled, or A7 cannot be contacted, then an error should be thrown.
Q6 If Q5 would work for the second node, how would we know how to create the Availability Group on the first node? A7 does not need to be fulfilled on the first node, see Q1 and Q2.
Q7 Is A7 never needed on the first node, even when it moved from the primary role and starts to run in the secondary role? ?

Suggested schema.mof

[ClassVersion("1.0.0.0"), FriendlyName("SqlAGNew")]
class MSFT_SqlAGNew : OMI_BaseResource
{
    [Key, Description("The name of the availability group.")] String Name;
    [Required, Description("Hostname of the SQL Server to be configured.")] String ServerName;
    [Key, Description("Name of the SQL instance to be configured.")] String InstanceName;
    [Write, Description("Specifies if the availability group should be present or absent. Default is Present."), ValueMap{"Present","Absent"}, Values{"Present","Absent"}] String Ensure;
    [Write, Description("Specifies the automated backup preference for the availability group. Default is None"), ValueMap{"Primary","SecondaryOnly","Secondary","None"}, Values{"Primary","SecondaryOnly","Secondary","None"}] String AutomatedBackupPreference;
    [Write, Description("Specifies the replica availability mode. Default is 'AsynchronousCommit'."), ValueMap{"AsynchronousCommit","SynchronousCommit"}, Values{"AsynchronousCommit","SynchronousCommit"}] String AvailabilityMode;
    [Write, Description("Specifies the desired priority of the replicas in performing backups. The acceptable values for this parameter are: integers from 0 through 100. Of the set of replicas which are online and available, the replica that has the highest priority performs the backup. Default is 50.")] UInt32 BackupPriority;
    [Write, Description("Specifies the type of availability group is Basic. This is only available is SQL Server 2016 and later and is ignored when applied to previous versions.")] Boolean BasicAvailabilityGroup;
    [Write, Description("Specifies if the option Database Level Health Detection is enabled. This is only available is SQL Server 2016 and later and is ignored when applied to previous versions.")] Boolean DatabaseHealthTrigger;
    [Write, Description("Specifies if the option Database DTC Support is enabled. This is only available is SQL Server 2016 and later and is ignored when applied to previous versions. This can't be altered once the Availability Group is created and is ignored if it is the case.")] Boolean DtcSupportEnabled;
    [Write, Description("Specifies how the availability replica handles connections when in the primary role."), ValueMap{"AllowAllConnections","AllowReadWriteConnections"}, Values{"AllowAllConnections","AllowReadWriteConnections"}] String ConnectionModeInPrimaryRole;
    [Write, Description("Specifies how the availability replica handles connections when in the secondary role."), ValueMap{"AllowNoConnections","AllowReadIntentConnectionsOnly","AllowAllConnections"}, Values{"AllowNoConnections","AllowReadIntentConnectionsOnly","AllowAllConnections"}] String ConnectionModeInSecondaryRole;
    [Write, Description("Specifies the hostname or IP address of the availability group replica endpoint. Default is the instance network name which is set in the code because the value can only be determined when connected to the SQL Instance.")] String EndpointHostName;
    [Write, Description("Specifies the automatic failover behavior of the availability group."), ValueMap{"OnServerDown","OnServerUnresponsive","OnCriticalServerErrors","OnModerateServerErrors","OnAnyQualifiedFailureCondition"}, Values{"OnServerDown","OnServerUnresponsive","OnCriticalServerErrors","OnModerateServerErrors","OnAnyQualifiedFailureCondition"}] String FailureConditionLevel;
    [Write, Description("Specifies the length of time, in milliseconds, after which AlwaysOn availability groups declare an unresponsive server to be unhealthy. Default is 30000.")] UInt32 HealthCheckTimeout;
    [Write, Description("Specifies that the resource will only determine if a change is needed if the target node is the active host of the SQL Server Instance.")] Boolean ProcessOnlyOnActiveNode;
    [Write, Description("Specifies the failover mode. Default is 'Manual'."), ValueMap{"Automatic","Manual"}, Values{"Automatic","Manual"}] String FailoverMode;
    [Write, Description("Specifies the fully-qualified domain name (FQDN) and port to use when routing to the replica for read only connections.")] String ReadOnlyRoutingConnectionUrl;
    [Write, Description("Specifies an ordered list of replica server names that represent the probe sequence for connection director to use when redirecting read-only connections through this availability replica. This parameter applies if the availability replica is the current primary replica of the availability group.")] String ReadOnlyRoutingList[];
    [Write, Description("Hostname of the SQL Server node that exist in the Availability Group which can be used to find the primary replica. If the primary replica is not found here, the resource will attempt to find the host that holds the primary replica and connect to it.")] String ExistingReplicaServerName;
    [Write, Description("Name of the SQL Server instance that exist in the Availability Group which can be used to find the primary replica. This is must be used in conjunction with the parameter ExistingReplicaServerName.")] String ExistingReplicaInstanceName;
    [Read, Description("Output the endpoint URL of the Availability Group Replica.")] String EndpointUrl;
    [Read, Description("Output the network port the database mirroring endpoint is listening on.")] UInt32 EndpointPort;
    [Read, Description("Gets the major version of the SQL Server instance.")] UInt32 Version;
    [Read, Description("Determines if the current node is actively hosting the SQL Server instance.")] Boolean IsActiveNode;
};

Update 1: Updated the suggested answer for Q5

@stale
Copy link

stale bot commented Jun 6, 2018

This issue has been automatically marked as needs more information because it has not had activity from the community in the last 30 days. It will be closed if no further activity occurs within 10 days. If the issue is label with any of the work labels (e.g bug, enhancement, documentation, or tests) the issue will not auto-close.

@stale stale bot added the needs more information The issue needs more information from the author or the community. label Jun 6, 2018
@johlju johlju added help wanted The issue is up for grabs for anyone in the community. resource proposal The issue is proposing a new resource in the resource module. and removed discussion The issue is a discussion. needs more information The issue needs more information from the author or the community. labels Jun 6, 2018
@nabrond
Copy link
Contributor

nabrond commented Jul 1, 2018

Reviving this thread a bit as I recently went through some pain implementing AGs with DSC. I agree completely that these two resources should be merged. I think the MOF that @johlju proposed above covers all the necessary functionality. I am questioning whether a few of these parameters are necessary?

  • EndpointHostname - I understand the use case for this, but if the consumer needs to customize the Endpoint URL, should we just require them to specify the entire URL rather than dynamically calculating it?
  • ProcessOnlyOnActiveNode - Should be default action for the resource, not configurable.
  • ExistingReplicaServerName - Can be determined automatically by querying the cluster.
  • ExistingReplicaInstanceName - Can be determined automatically by querying the cluster.

I also was thinking that we want to add some additional parameters:

[Read, Description("Unique identifier representing the Availability Group.")] String GroupId;
[Read, Description("Unique identifier representing the current replica.")] String ReplicaId;
[Read, Description("Role for the current replica, Primary or Secondary."] String ReplicaRole;
[Write, Description("Sets the mode used for seeding secondary replicas. Automatic seeding is only available on SQL Server 2016 and greater."), ValueMap{"Automatic","Manual"}, Values{"Automatic","Manual"}] String SeedingMode;

Last thought, is there value in also merging the SqlAGDatabase resource as well? I think this would only require an additional two parameters:

[Write, Description("Database(s) to be include in the Availability Group. Supports explicit, wild card, or regular expressions.")] String IncludeDatabases[];
[Write, Description("Defines the share to be used when 'SeedingMode' is set to 'Manual'")] String ManualSeedingShare;

Thoughts on these additions?

@johlju
Copy link
Member Author

johlju commented Jul 3, 2018

Regarding EndpointHostname, wouldn't it be easier for the user if we did as now an calculated it? If there are benefits if we choose to set the entire EndpointUrl, then that is a way to go? Bit it is also an optional parameter, so if the user leave it out, we still need to calculate the EndpointUrl?

Regarding ProcessOnlyOnActiveNode it sounds inline with my assumption 4 (A4) above? If ProcessOnlyOnActiveNode is removed should it be default action for some of the properties (mentioned in A4)? If it is always, then the secondary node will not be configured until the node is primary. Are there configuration options that should be configured while in sencondary role?

Regarding ExistingReplicaServerName and ExistingReplicaInstanceName, could you explain more how we can determine those from the cluster?

Regarding merging SqlAGDatabase, there could be beneficial to have that outside the new (refactored) resource since then it is possible for (partial) configurations to add databases to an existing Availability Group.

@nabrond
Copy link
Contributor

nabrond commented Jul 15, 2018

Since the resource's code will be running on a cluster member, we can utilize the cluster registry to access some mapping information stored with the cluster resource. Based on some research and testing, I think the following flow would work for replacing ExistingReplicaServerName and ExistingReplicaInstanceName:

  1. Use WMI to query the AG resource information from the cluster
$ag = Get-CimInstance -Namespace root/MSCluster -ClassName MSCluster_Resource -Filter "Type = 'SQL Server Availability Group' AND Name = '$Name'"
  1. Use the Id to build the instance map for the AG
$registryPath = "HKLM:\Cluster\Resources\$($ag.Id)\SqlInstToNodeMap"
$instanceNodeMap = (Get-Item -Path $registryPath).Property | ForEach-Object -Process { 
    New-Object -Property @{ 
        SqlServerInstance = $_
        NodeName = (Get-ItemPropertyValue -Path $registryPath -Name $_)
    }
}
  1. With the node map, we can just match up the OwnerNode property from the Availability Group
$ReplicaServerName, $ReplicaInstanceName = ($instanceNodeMap | Where-Object -Property NodeName -EQ $ag.OwnerNode).SqlServerInstance -split '\\'
if ([String]::IsNullOrEmpty($ReplicaInstanceName))
{
    $ReplicaInstanceName = 'MSSQLSERVER'
}

@johlju
Copy link
Member Author

johlju commented Jul 18, 2018

@nabrond That looks doable!

@mdaniou
Copy link
Contributor

mdaniou commented Jan 25, 2019

@johlju I was wondering if this would be implemented.
I need seeding mode and ReadOnlyRouting to be handle for my current deployment.
Before considering to work on any update to the resources, I would like to know where we stand in this debate here :)
Or it anyone working on it already ?

Cheers all !

@nabrond
Copy link
Contributor

nabrond commented Jan 26, 2019

@mdniou I started implementing these changes awhile back, but was sidetracked by other work. I think all that is left is some unit testing.

@johlju
Copy link
Member Author

johlju commented Jan 26, 2019

@nabrond Would be great to get this issue resolved. If you have something working already then maybe we can help out getting it tested?

@mdaniou
Copy link
Contributor

mdaniou commented Jan 28, 2019

@nabrond I'd be happy to help finish your work so that we can benefit of it.

@bvaradinov-c
Copy link

bvaradinov-c commented Dec 8, 2020

I'm doing SQL AG Cluster automation and found this issue. We are using SQL 2019 and I'm trying to use automatic seeding. However, it looks like it is not implemented in the current module version. Any chance to work again on this?

@johlju
Copy link
Member Author

johlju commented Dec 12, 2020

@bvaradinov-c If you want to work on this then please do, I don't think there have been any work on this. Happy to review a PR for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted The issue is up for grabs for anyone in the community. resource proposal The issue is proposing a new resource in the resource module.
Projects
None yet
Development

No branches or pull requests

5 participants