English | 中文
DataSphere Studio (DSS for short) is WeDataSphere, a big data platform of WeBank, a self-developed one-stop data application development management portal.
DataSphere Studio is positioned as a data application development framework, and the closed loop covers the entire process of data application development. With a unified UI, the workflow-like graphical drag-and-drop development experience meets the entire lifecycle of data application development from data import, desensitization cleaning, data analysis, data mining, quality inspection, visualization, scheduling to data output applications, etc.
With a pluggable framework architecture, DSS is designed to allow users to quickly integrate new data application tools, or replace various tools that DSS has integrated.
DSS has integrated a variety of upper-layer data application systems by implementing multiple AppConns, which can basically meet the data development needs of users.
If desired, new data application systems can also be easily integrated to replace or enrich DSS's data application development process. Click me to learn how to quickly integrate new application systems
Component | Description | DSS0.X Version Requirements | DSS1.0 Version Requirements | Version Planning |
---|---|---|---|---|
DataApiService | Data API service. The SQL script can be quickly published as a Restful interface, providing Rest access capability to the outside world. | Not supported | >=1.0.0 | Released |
Scriptis | Support online script writing such as SQL, Pyspark, HiveQL, etc., submit to Linkis to perform data analysis web tools. | >=0.5.0 | >=1.0.0 | Released |
Schedulis | Workflow task scheduling system based on Azkaban secondary development, with financial-grade features such as high performance, high availability and multi-tenant resource isolation. | >=0.5.0 | >=1.0.0 | Released |
EventCheck | Provides cross-business, cross-engineering, and cross-workflow signaling capabilities. | >=0.5.0 | >=1.0.0 | Released |
SendEmail | Provides the ability to send data, all the result sets of other workflow nodes can be sent by email | >=0.5.0 | >=1.0.0 | Released |
Qualitis | Data quality verification tool, providing data verification capabilities such as data integrity and correctness | >=0.5.0 | 1.0.1(Version currently in preparation) | Expected end of January |
Streamis | Streaming application development management tool. It supports the release of Flink Jar and Flink SQL, and provides the development, debugging and production management capabilities of streaming applications, such as: start-stop, status monitoring, checkpoint, etc. | Not supported | 1.0.1(Version currently in preparation) | Expected end of January |
Exchangis | A data exchange platform that supports data transmission between structured and unstructured heterogeneous data sources, the upcoming Exchangis1. 0, will be connected with DSS workflow | not supported | Planned in 1.0.2 | In Development |
Visualis | A data visualization BI tool based on the second development of Davinci, an open source project of CreditEase, provides users with financial-level data visualization capabilities in terms of data security. | >=0.5.0 | Planned in 1.0.2 | In Development |
Prophecis | A one-stop machine learning platform that integrates multiple open source machine learning frameworks. Prophecis' MLFlow can be connected to DSS workflow through AppConn. | Not supported | Planned in 1.0.2 | In Development |
UserManager | Automatically initialize all user environments necessary for a new DSS user, including: creating Linux users, various user paths, directory authorization, etc. | >=0.9.1 | Planned in 1.0.2 | In Development |
DolphinScheduler | Apache DolphinScheduler, a distributed and scalable visual workflow task scheduling platform, supports one-click publishing of DSS workflows to DolphinScheduler. | Not supported | Planned in 1.1.0 | In Development |
UserGuide | It mainly provides help documentation, beginner's guide, Dark mode skinning, etc. | Not supported | Planning in 1.1.0 | In Development |
DataModelCenter | It mainly provides the capabilities of data warehouse planning, data model development and data asset management. Data warehouse planning includes subject domains, data warehouse layers, modifiers, etc.; data model development includes indicators, dimensions, metrics, wizard-based table building, etc.; data assets are connected to Apache Atlas to provide data lineage capabilities. | Not supported | Planning in 1.2.0 | In Development |
Airflow | Supports publishing DSS workflows to Airflow for scheduling. | >=0.9.1, not yet merged | Not supported | No plans yet |
Please go to the DSS Releases Page to download a compiled version or a source code package of DSS.
Please follow Compile Guide to compile DSS from source code.
Please refer to Deployment Documents to do the deployment.
The function of DataSphere Studio supporting script execution has high security risks, and the isolation of the WeDataSphere Demo environment has not been completed. Considering that many users are inquiring about the Demo environment, we decided to first issue invitation codes to the community and accept trial applications from enterprises and organizations.
If you want to try out the Demo environment, please join the DataSphere Studio community user group (Please refer to the end of the document), and contact WeDataSphere Group Robot to get an invitation code.
DataSphereStudio Demo environment user registration page: click me to enter
DataSphereStudio Demo environment login page: click me to enter
For a complete list of documents for DSS1.0, see DSS-Doc
The following is the installation guide for DSS-related AppConn plugins:
We opened an issue for users to feedback and record who is using DSS.
Since the first release of DSS in 2019, it has accumulated more than 700 trial companies and 1000+ sandbox trial users, which involving diverse industries, from finance, banking, tele-communication, to manufactory, internet companies and so on.
DataSphere Studio uses GitBook for management, and the entire project will be organized into a GitBook e-book for everyone to download and use.
WeDataSphere will provide a unified document reading entry in the future. For the usage of GitBook, please refer to: GitBook Documentation。
Contributions are always welcomed, we need more contributors to build DSS together. either code, or doc, or other supports that could help the community.
For code and documentation contributions, please follow the contribution guide.
For any questions or suggestions, please kindly submit an issue.
You can scan the QR code below to join our WeChat and QQ group to get more immediate response.
DSS is under the Apache 2.0 license. See the License file for details.