You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had searched in the issues and found no similar feature requirement.
Problem Description
At present, the multiple Microservices deployed by DSS in each environment are single nodes. No matter service exceptions or host exceptions, there is a great risk of service unavailability, which affects the availability of the entire product. And during version upgrades, all services need to be stopped for 1-2 hours each time, which can also affect the user experience to a certain extent.
Therefore, it is necessary to transform all Microservices of the DSS into a multi live mode to ensure that the DSS service is still available when an exception occurs at a node.
Description
Realize the multi active deployment of DSS to ensure that during the maintenance of a certain set of service machines, the services of other machines can be used as usual without affecting users and without their perception. Based on this, a complete multi activity deployment plan needs to be provided.
If a certain service is abnormal during the publishing process, an error message will be returned indicating that the system has taken a nap. Please try again later.
Use case
No response
solutions
1. Overall design
To transform DSS from only supporting single node deployment to supporting multi node multi activity deployment, the points to be considered include: data sharing and synchronization, data consistency, load balancing and failover, and service discovery and registration. The latter two can directly reuse the existing functions of Linkis. DSS needs to care about two parts: whether cache is involved in the process of service invocation, and whether cache is involved in each Microservices, To avoid data inconsistency; The second is the tasks executed in the service, such as executing workflow or node tasks, publishing workflow tasks, copying workflow or project tasks, workflow import and export tasks, etc., to prevent abnormal task states of nodes from being returned to users.
1.1 Technical Architecture
DataSphereStudio
category
lectotype
version
Microservices module
Microservices governance
Spring Cloud
Finchley.RELEASE
Service Registration and Discovery
Nacos
Not involved yet
Unified Configuration Center
Managis
1.3.5
Gateway Routing
Spring Cloud Gateway
2.0.1.RELEASE
service call
OpenFeign
2.0.0.RELEASE
Service Security Certification
UC
Under planning
Interface Document Engine
GITBOOK(Swagger)
Not involved yet
Service application monitoring
Spring Cloud Admin
Not involved yet
Service Link Tracking
Skywalking
Under planning
The service is degraded, fused, or restricted
Organize and compare Sentinel/Hystrix
Under planning
Load balancing between services
Spring Cloud ribbon
2.0.0.RELEASE
Basic universal module
database
Mysql
5.1.34(Driver version)
Data access persistence
Mybatis
3.4.6
MVC
Spring mvc
1.19.1
Load Balance
Nginx
1.16.1
Project build and Management Tools
Maven
3.0+
Distributed locks
Tentative DB implementation
Unified Distributed cache
Research when needed
Not involved yet
Unified log collection and storage
Tentative ELK
Under planning
Message queue
Research when needed
Not involved yet
distributed transaction
Research when needed
Not involved yet
Log printing
Log4j2 + self4j
2.17.1
Front end frame
TypeScriptis
3.5.3
1.2 Business architecture
From the user's perspective, there is no perception of whether the backend service is a single node or multiple nodes, so the business architecture remains unchanged.
2. Module design
Since the service has been merged into two services in the Microservices merging, and there are no cache related calls between the two services, the cache problem does not need to be considered. Therefore, the focus is on the various tasks executed in a single service, because when a node has certain executing tasks, if the node encounters an exception at this time, it must provide feedback to the user that the task has failed through other nodes. Here, a regular inspection method is adopted to check the status of the task and save the status to the database for return to the user. This scheduled task is controlled through parameter configuration and is executed every 60 seconds by default.
2.1 Workflow Publishing Tasks
2.1.1 Open source workflow conversion
Due to the fact that there is no publish operation in the open source version and only the DSS workflow is converted into a scheduling system workflow, it is necessary to save the task state in the OrchestratorConversionJob. As the existing code only saves the state of the job in the cache, the job state needs to be stored in the database. Here, dss_ orchestrator_ job_ info table is reused. The scheduled task at this location is CheckOrchestratorConversionJobTask, defined in the Orchestrator server module.
In the first step, if all the instances obtained are alive, you can return directly, otherwise save the instance information; The second step is to obtain information about tasks being executed or initialized from the dss_orchestrator_job_info table. The third step is to compare the instance information. If the instance of a task that is being executed does not exist on Eureka, then the status of these tasks needs to be updated to failed. Step 4 Update the task status information; Step 5: If a node is abnormal, you need to send an alarm message to the developer, including the information about failed tasks on the node.
It should be noted that in the ConvertOrchestration method of OrchestratorPluginServiceImpl, the current instance needs to be obtained through the Sender.getThisInstance method and saved to the table dss_ Orchestrator_ Job_Info At the same time, this table will save the information of the conversion workflow task, and then update the information of the conversion workflow task in the OrchestratorConversionJob.
The existing dss_orchestrator_job_info table is used. Changes to the table add two fields, instance_name and status, and change the updated_time field to update_time.
2.2 Open Source workflow Executes tasks
The existing table dss_workflow_task is used here to write the instance information, and the timed task is CheckWorkflowExecuteTask, defined in the flow-execution-server module. The entire implementation process is similar to 2.1.1.
The persist method in WorkflowPersistenceEngine saves instance information, while the change method updates workflow execution information.
2.3 Open source workflow copy task
Using existing table dss_orchestrator_copy_info here, in which writing instance information, timing task for CheckOrchestratorCopyTask, defined in the framework-orchestrator-server module. The entire implementation process is similar to 2.1.1.
The copyOrchestrator method in OrchestratorFrameworkServiceImpl saves instance information, while OrchestratorCopyJob updates workflow copy task information.
2.4 Determine whether the scheduled cleaning of CS tasks is supported in a multi active state.
3. Data structure/storage design (determine which field to follow and modify the initialization statement for the first installation)
3.1 Workflow Publishing Tasks
3.1.1 Add fields instance_name and status in table dss_orchestrator_job_info, and change updated_time to update_time
ALTER TABLE `dss_orchestrator_job_info` ADD `instance_name` varchar(128) DEFAULT NULL COMMENT 'An instance of executing a task';
ALTER TABLE `dss_orchestrator_job_info` ADD `status` varchar(128) DEFAULT NULL COMMENT 'Transition Task Status';
ALTER TABLE `dss_orchestrator_job_info` ADD `error_msg` varchar(2048) DEFAULT NULL COMMENT 'Conversion task exception information';
ALTER TABLE `dss_orchestrator_job_info` CHANGE `updated_time` `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP;
ALTER TABLE `dss_orchestrator_job_info` MODIFY `job_id` varchar(64) DEFAULT NULL COMMENT 'task id';
3.2 New field instance_name in table dss_workflow_task
ALTER TABLE `dss_workflow_task` ADD `instance_name` varchar(128) DEFAULT NULL COMMENT 'An instance of executing a task' AFTER `status`;
3.3 Add instance_name in table dss_orchestrator_copy_info
ALTER TABLE `dss_orchestrator_copy_info` ADD `instance_name` varchar(128) DEFAULT NULL COMMENT 'An instance of executing a task' AFTER `status`;
3.4 DDL statements of related tables must be updated at the same time for the initial installation
Anything else
No response
Are you willing to submit a PR?
Yes I am willing to submit a PR!
The text was updated successfully, but these errors were encountered:
Search before asking
Problem Description
At present, the multiple Microservices deployed by DSS in each environment are single nodes. No matter service exceptions or host exceptions, there is a great risk of service unavailability, which affects the availability of the entire product. And during version upgrades, all services need to be stopped for 1-2 hours each time, which can also affect the user experience to a certain extent.
Therefore, it is necessary to transform all Microservices of the DSS into a multi live mode to ensure that the DSS service is still available when an exception occurs at a node.
Description
Realize the multi active deployment of DSS to ensure that during the maintenance of a certain set of service machines, the services of other machines can be used as usual without affecting users and without their perception. Based on this, a complete multi activity deployment plan needs to be provided.
If a certain service is abnormal during the publishing process, an error message will be returned indicating that the system has taken a nap. Please try again later.
Use case
No response
solutions
1. Overall design
To transform DSS from only supporting single node deployment to supporting multi node multi activity deployment, the points to be considered include: data sharing and synchronization, data consistency, load balancing and failover, and service discovery and registration. The latter two can directly reuse the existing functions of Linkis. DSS needs to care about two parts: whether cache is involved in the process of service invocation, and whether cache is involved in each Microservices, To avoid data inconsistency; The second is the tasks executed in the service, such as executing workflow or node tasks, publishing workflow tasks, copying workflow or project tasks, workflow import and export tasks, etc., to prevent abnormal task states of nodes from being returned to users.
1.1 Technical Architecture
From the user's perspective, there is no perception of whether the backend service is a single node or multiple nodes, so the business architecture remains unchanged.
2. Module design
Since the service has been merged into two services in the Microservices merging, and there are no cache related calls between the two services, the cache problem does not need to be considered. Therefore, the focus is on the various tasks executed in a single service, because when a node has certain executing tasks, if the node encounters an exception at this time, it must provide feedback to the user that the task has failed through other nodes. Here, a regular inspection method is adopted to check the status of the task and save the status to the database for return to the user. This scheduled task is controlled through parameter configuration and is executed every 60 seconds by default.
2.1 Workflow Publishing Tasks
2.1.1 Open source workflow conversion
Due to the fact that there is no publish operation in the open source version and only the DSS workflow is converted into a scheduling system workflow, it is necessary to save the task state in the OrchestratorConversionJob. As the existing code only saves the state of the job in the cache, the job state needs to be stored in the database. Here, dss_ orchestrator_ job_ info table is reused. The scheduled task at this location is CheckOrchestratorConversionJobTask, defined in the Orchestrator server module.
In the first step, if all the instances obtained are alive, you can return directly, otherwise save the instance information; The second step is to obtain information about tasks being executed or initialized from the dss_orchestrator_job_info table. The third step is to compare the instance information. If the instance of a task that is being executed does not exist on Eureka, then the status of these tasks needs to be updated to failed. Step 4 Update the task status information; Step 5: If a node is abnormal, you need to send an alarm message to the developer, including the information about failed tasks on the node.
It should be noted that in the ConvertOrchestration method of OrchestratorPluginServiceImpl, the current instance needs to be obtained through the Sender.getThisInstance method and saved to the table dss_ Orchestrator_ Job_Info At the same time, this table will save the information of the conversion workflow task, and then update the information of the conversion workflow task in the OrchestratorConversionJob.
The existing dss_orchestrator_job_info table is used. Changes to the table add two fields, instance_name and status, and change the updated_time field to update_time.
2.2 Open Source workflow Executes tasks
The existing table dss_workflow_task is used here to write the instance information, and the timed task is CheckWorkflowExecuteTask, defined in the flow-execution-server module. The entire implementation process is similar to 2.1.1.
The persist method in WorkflowPersistenceEngine saves instance information, while the change method updates workflow execution information.
2.3 Open source workflow copy task
Using existing table dss_orchestrator_copy_info here, in which writing instance information, timing task for CheckOrchestratorCopyTask, defined in the framework-orchestrator-server module. The entire implementation process is similar to 2.1.1.
The copyOrchestrator method in OrchestratorFrameworkServiceImpl saves instance information, while OrchestratorCopyJob updates workflow copy task information.
2.4 Determine whether the scheduled cleaning of CS tasks is supported in a multi active state.
3. Data structure/storage design (determine which field to follow and modify the initialization statement for the first installation)
3.1 Workflow Publishing Tasks
3.1.1 Add fields instance_name and status in table dss_orchestrator_job_info, and change updated_time to update_time
3.2 New field instance_name in table dss_workflow_task
3.3 Add instance_name in table dss_orchestrator_copy_info
3.4 DDL statements of related tables must be updated at the same time for the initial installation
Anything else
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: