Skip to content

Latest commit

 

History

History
915 lines (713 loc) · 145 KB

DevLakeDomainLayerSchema.md

File metadata and controls

915 lines (713 loc) · 145 KB
title description sidebar_position
Domain Layer Schema
The data tables to query engineering metrics
1

Summary

This document describes Apache DevLake's domain layer schema.

Referring to DevLake's architecture, the data in the domain layer is transformed from the data in the tool layer. The tool layer schema is based on the data from specific tools such as Jira, GitHub, GitLab, Jenkins, etc. The domain layer schema can be regarded as an abstraction of tool-layer schemas.

DevLake Dataflow

Use Cases

  1. All metrics from pre-built dashboards are based on this data schema.
  2. As a user, you can create your own customized dashboards based on this data schema.
  3. As a contributor, you can refer to this data schema while working on the ETL logic when adding/updating data source plugins.

Data Models

This is the up-to-date domain layer schema for DevLake. Tables (entities) are categorized into 5 domains.

  1. Issue tracking: Jira issues, GitHub issues, GitLab issues, etc.
  2. Source code management: Git/GitHub/GitLab commits and refs(tags and branches), etc.
  3. Code review: GitHub PRs, GitLab MRs, etc.
  4. CI/CD: Jenkins jobs & builds, etc.
  5. Code Quality: SonarQube issues, hotspots, file metrics, etc.
  6. Cross-domain: entities that map entities from different domains to break data isolation.

Schema Diagram

Domain Layer Schema

When reading the schema, you'll notice that many tables' primary key is called id. Unlike auto-increment id or UUID, id is a string composed of several parts to uniquely identify similar entities (e.g. repo) from different platforms (e.g. GitHub/GitLab) and allow them to co-exist in a single table.

Tables that end with WIP are still under development.

Naming Conventions

  1. The name of a table is in plural form. E.g. boards, issues, etc.
  2. The name of a table which describe the relation between 2 entities is in the form of [BigEntity in singular form]_[SmallEntity in plural form]. E.g. board_issues, sprint_issues, pull_request_comments, etc.
  3. Value of the field in enum type are in capital letters. E.g. table.issues.type has 3 values, REQUIREMENT, BUG, INCIDENT. Values that are phrases, such as 'IN_PROGRESS' of table.issues.status, are separated with underscore '_'.

How to Customize Data Models

Apache DevLake provides 2 plugins:

  • customize: to create/delete columns in the domain layer schema with the data extracted from raw layer tables
  • dbt: to transform data based on the domain layer schema and generate new tables

DWD Entities - (Data Warehouse Detail)

Domain 1 - Issue Tracking

issues

An issue is the abstraction of GitHub/GitLab/BitBucket/Jira/TAPD/Zentao... issues.

field type length description key
id varchar 255 An issue's id is composed of < plugin >:< Entity >:< PK0 >[:PK1]..."
  • For Github issues, a Github issue's id is like "github:GithubIssues:< GithubIssueId >". E.g. 'github:GithubIssues:1049355647'
  • For Jira issues, a Github repo's id is like "jira:JiraIssues:< JiraSourceId >:< JiraIssueId >". E.g. 'jira:JiraIssues:1:10063'. < JiraSourceId > is used to identify which Jira source the issue came from, since DevLake users can import data from several different Jira instances at the same time.
PK
issue_key varchar 255 The key of this issue. For example, the key of this Github issue is 1145.
url varchar 255 The url of the issue. It's a web address in most cases.
title varchar 255 The title of an issue
description longtext The detailed description/summary of an issue
type varchar 100 The standard type of this issue. There are 3 standard types:
  • REQUIREMENT: this issue is a feature
  • BUG: this issue is a bug found during test
  • INCIDENT: this issue is a bug found after release
The 3 standard types are transformed from the original type of an issue. The transformation rule is set in the Scope Config on the Connections page of Jira, GitHub and etc.
For issues with an original type that has not mapped to a standard type, the value of type will be the same as issue's original_type.
original_type varchar 100 The original type of an issue.
status varchar 100 The standard statuses of this issue. There are 3 standard statuses:
  • TODO: this issue is in backlog or to-do list
  • IN_PROGRESS: this issue is in progress
  • DONE: this issue is resolved or closed
The 3 standard statuses are transformed from the original statuses of an issue. The transformation rule:
  • For Jira issue status: transformed from the Jira issue's statusCategory. Jira has 3 default issue status categories: 'To Do', 'In Progress', 'Done'.
  • For Github issue status:
    • open -> TODO
    • closed -> DONE
original_status varchar 100 The original status of an issue.
story_point double The story point of this issue. Only certain types(e.g. story) of Jira or TAPD issues has story points
priority varchar 255 The priority of the issue
urgency varchar 255 The urgency of the issue
component varchar 255 The component a bug-issue affects. This field only supports Github plugin for now. The value is transformed from Github issue labels by the rules set according to the user's configuration of .env by end users during DevLake installation.
severity varchar 255 The severity level of a bug-issue. This field only supports Github plugin for now. The value is transformed from Github issue labels by the rules set according to the user's configuration of .env by end users during DevLake installation.
parent_issue_id varchar 255 The id of its parent issue
epic_key varchar 255 The key of the epic this issue belongs to. For tools with no epic-type issues such as Github and GitLab, this field is default to an empty string
original_estimate_minutes int The original estimation of the time allocated for this issue
time_spent_minutes int The original estimation of the time allocated for this issue
time_remaining_minutes int The remaining time to resolve the issue
creator_id varchar 255 The id of issue creator
creator_name varchar 255 The name of the creator
assignee_id varchar 255 The id of issue assignee.
  • For Github issues: this is the last assignee of an issue if the issue has multiple assignees
  • For Jira issues: this is the assignee of the issue at the time of collection
assignee_name varchar 255 The name of the assignee
created_date datetime 3 The time issue created
updated_date datetime 3 The last time issue gets updated
resolution_date datetime 3 The time the issue changes to 'DONE'.
lead_time_minutes int Describes the cycle time from issue creation to issue resolution.
  • For issues whose type = 'REQUIREMENT' and status = 'DONE', lead_time_minutes = resolution_date - created_date. The unit is minute.
  • For issues whose type != 'REQUIREMENT' or status != 'DONE', lead_time_minutes is null
original_project varchar 255 The name of the original project this issue belongs to. Transformed from a Jira project's name, a TAPD workspace's name, etc.
icon_url varchar 255 The url of the issue icon.
x_custom_field_1 It depends on the type of the converted field - The value of the custom field. This field is available when utilizing the customize plugin to convert Jira's raw layer fields to the domain layer fields.

issue_assignees

This table shows the assignee(s) of issues. Multiple entries can exist per issue, as a GitHub/TAPD issue may have multiple assignees at the same time. This table can be used to get the detailed information of all issue assignees.

field type length description key
issue_id varchar 255 Issue ID FK_issues.id
assignee_id varchar 255 Assignee ID FK_accounts.id
assignee_name varchar 255 Assignee Name

issue_labels

This table shows the labels of issues. Multiple entries can exist per issue. This table can be used to filter issues by label name.

field type length description key
label_name varchar 255 Label name. Collect from GitHub issue labels or Jira issue labels
issue_id varchar 255 Issue ID FK_issues.id

issue_comments

This table shows the comments of issues. Only GitHub and TAPD issue comments are collected. Issues with multiple comments are shown as multiple records. This table can be used to calculate metric - issue response time.

field type length description key
id varchar 255 The unique id of a comment PK
issue_id varchar 255 Issue ID FK_issues.id
account_id varchar 255 The id of the account who made the comment FK_accounts.id
body longtext The body/detail of the comment
created_date datetime 3 The creation date of the comment
updated_date datetime 3 The update date of the comment

issue_changelogs

This table shows the changelogs of issues. Only Jira issue changelogs are collected for now. Issues with multiple changelogs are shown as multiple records. This is transformed from Jira or TAPD changelogs.

field type length description key
id varchar 255 The unique id of an issue changelog PK
issue_id varchar 255 Issue ID FK_issues.id
author_id varchar 255 The id of the user who made the change FK_accounts.id
author_name varchar 255 The id of the user who made the change
field_id varchar 255 The id of changed field
field_name varchar 255 The id of changed field
original_from_value longtext The original value of the changed field
original_to_value longtext The new value of the changed field
from_value longtext The transformed/standardized original value of the changed field
to_value longtext The transformed/standardized new value of the changed field
created_date datetime 3 The creation date of the changelog

issue_worklogs

This table shows the work logged under issues. Only Jira issue worklogs are collected for now. Usually, an issue has multiple worklogs logged by different developers.

field type length description key
id varchar 255 The id of the worklog. PK
author_id varchar 255 The id of the author who logged the work FK_accounts.id
comment longtext 255 The comment made while logging the work.
time_spent_minutes int The time logged. The unit of value is normalized to minute. E.g. 1d =) 480, 4h30m =) 270
logged_date datetime 3 The time of this logging action
started_date datetime 3 Start time of the worklog
issue_id varchar 255 Issue ID FK_issues.id

issue_relationships

This table shows the metadata of information about relationships between issues.

field type length description key
id varchar 255 Issue ID PK
source_issue_id int ID of the source issue in the relationship
target_issue_id int ID of the target issue in the relationship
original_type varchar 255 Type of relationship between the source and target issues

issue_repo_commits

This table shows the metadata of commits made to a code repository associated with specific issues.

field type length description key
issue_id varchar 255 Issue ID PK
repo_url varchar 255 The URL of the code repository PK
commit_sha varchar 255 The SHA of the commit. PK
host varchar 255 The hostname
namespace varchar 255 The namespace of the code repository
repo_name varchar 255 The name of the code repository.

issue_custom_array_fields

The table below presents the custom fields of issues in an 'array' type. This table is available when utilizing the customize plugin to convert Jira's raw layer fields to the domain layer fields. It is important to note that custom fields of other types will be displayed as 'x_custom_field_1' in the issues table.

field type length description key
issue_id varchar 255 Issue ID PK
field_id varchar 255 The ID of the array field of the issue. It starts with 'x_', e.g. x_product_lines.
field_value varchar 255 The value of the array field. E.g. DevLake, DevSea, DevPond.

You can refer to the following SQL to use this table.

-- query issue count by product lines
select 
  count(*) as issue_count, field_value 
from issues 
  join issue_custom_array_fields on issues.id = issue_custom_array_fields.issue_id
where field_id = 'x_product_lines'
group by field_value;

boards

A board is an issue list or a collection of issues. It's the abstraction of a Jira board, a Jira or TAPD project, a GitHub repo's issue list or a GitLab repo's issue list. This table can be used to filter issues by the boards they belong to.

field type length description key
id varchar 255 A board's id is composed of "< plugin >:< Entity >:< PK0 >[:PK1]..."
  • For a Github repo's issue list, the board id is like "< github >:< GithubRepos >:< ConnectionId >:< GithubRepoId >".
    E.g. "github:GithubRepo:384111310"
  • For a Jira Board, the id is like "< jira >:< JiraSourceId >< JiraBoards >:< ConnectionId >:< JiraBoardsId >".
    E.g. "jira:1:JiraBoards:1:12"
PK
name varchar 255 The name of the board. Note: the board name of a Github repo 'apache/incubator-devlake' is 'apache/incubator-devlake', representing the default issue list.
description varchar 255 The description of the board.
url varchar 255 The url of the board. E.g. https://github.com/apache/incubator-devlake/issues
created_date datetime 3 Board creation time
type varchar 255 Identify scrum and non-scrum board

board_issues

This table shows the relation between boards and issues. This table can be used to filter issues by board.

field type length description key
board_id varchar 255 Board id FK_boards.id
issue_id varchar 255 Issue id FK_issues.id

sprints

A sprint is the abstraction of Jira sprints, TAPD iterations and GitHub milestones. A sprint contains a list of issues.

field type length description key
id varchar 255 A sprint's id is composed of "< plugin >:< Entity >:< PK0 >[:PK1]..."
  • A sprint in a Github repo is a milestone, the sprint id is like "< github >:< GithubRepos >:< GithubRepoId >:< milestoneNumber >".
    Eg. The id for this sprint is "github:GithubRepo:384111310:5"
  • For a Jira Board, the id is like "< jira >:< JiraSourceId >< JiraBoards >:< JiraBoardsId >".
    Eg. "jira:1:JiraBoards:12"
PK
name varchar 255 The name of sprint.
For Github projects, the sprint name is the milestone name. For instance, 'v0.10.0 - Introduce Temporal to DevLake' is the name of this sprint.
url varchar 255 The url of sprint.
status varchar 255 There are 3 statuses of a sprint:
  • CLOSED: a completed sprint
  • ACTIVE: a sprint started but not completed
  • FUTURE: a sprint that has not started
  • SUSPENDED: a sprint that has been suspended
started_date datetime 3 The start time of a sprint
ended_date datetime 3 The planned/estimated end time of a sprint. It's usually set when planning a sprint.
completed_date datetime 3 The actual time to complete a sprint.
original_board_id datetime 3 The id of board where the sprint first created. This field is not null only when this entity is transformed from Jira sprints.
In Jira, sprint and board entities have 2 types of relation:
  • A sprint is created based on a specific board. In this case, board(1):(n)sprint. This field original_board_id is used to show the relation.
  • A sprint can be mapped to multiple boards, a board can also show multiple sprints. In this case, board(n):(n)sprint. This relation is shown in table.board_sprints
FK_boards.id

sprint_issues

This table shows the relation between sprints and issues that have been added to sprints. This table can be used to show metrics such as 'ratio of unplanned issues', 'completion rate of sprint issues', etc

field type length description key
sprint_id varchar 255 Sprint id FK_sprints.id
issue_id varchar 255 Issue id FK_issues.id

board_sprints

field type length description key
board_id varchar 255 Board id FK_boards.id
sprint_id varchar 255 Sprint id FK_sprints.id

incidents

An incidents is the abstraction of data sources' incidents.

field type length description key
id varchar 255 An incident's id is composed of < plugin >:< Entity >:< PK0 >[:PK1]..."
  • For Github incident, a Github incident's id is like "github:GithubIssues:< GithubIssueId >". E.g. 'github:GithubIssues:1049355647'
  • For Jira incident, it is like "jira:JiraIssues:< JiraSourceId >:< JiraIssueId >". E.g. 'jira:JiraIssues:1:10063'. < JiraSourceId > is used to identify which jira source the incident came from, since DevLake users can import data from several different Jira instances at the same time.
PK
incident_key varchar 255 The key of this incident.
url varchar 255 The url of the incident. It's a web address in most cases.
title varchar 255 The title of an incident
description longtext The detailed description/summary of an issue
status varchar 100 The standard statuses of this incident. There are 3 standard statuses:
  • TODO: this incident is in backlog or to-do list
  • IN_PROGRESS: this incident is in progress
  • DONE: incident issue is resolved or closed
The 3 standard statuses are transformed from the original statuses of an incident. The transformation rule:
  • For Jira issue status: transformed from the Jira issue's statusCategory. Jira issue has 3 default status categories: 'To Do', 'In Progress', 'Done'.
  • For Github issue status:
    • open -> TODO
    • closed -> DONE
original_status varchar 100 The original status of an incident.
priority varchar 255 The priority of the incident
urgency varchar 255 The urgency of the incident
component varchar 255 The component a bug-incident affects. This field only supports Github plugin for now. The value is transformed from Github issue labels by the rules set according to the user's configuration of .env by end users during DevLake installation.
severity varchar 255 The severity level of a bug-incident. This field only supports Github plugin for now. The value is transformed from Github issue labels by the rules set according to the user's configuration of .env by end users during DevLake installation.
parent_incident_id varchar 255 The id of its parent incident
original_estimate_minutes int The original estimation of the time allocated for this incident
time_spent_minutes int The original estimation of the time allocated for this incident
time_remaining_minutes int The remaining time to resolve the incident
creator_id varchar 255 The id of incident creator
creator_name varchar 255 The name of the creator
created_date datetime 3 The time incident created
updated_date datetime 3 The last time incident gets updated
resolution_date datetime 3 The time the incident changes to 'DONE'.
lead_time_minutes int Describes the cycle time from incident creation to incident resolution. The unit is minute.
original_project varchar 255 The name of the original project this incident belongs to. Transformed from a Jira project's name, a TAPD workspace's name, etc.

incident_assignees

This table shows the assignee(s) of incidents. Multiple entries can exist per incident, some data sources may have multiple assignees at the same time. This table can be used to get the detailed information of all incidents' assignees.

field type length description key
incident_id varchar 255 Incident ID FK_incidents.id
assignee_id varchar 255 Assignee ID FK_accounts.id
assignee_name varchar 255 Assignee Name

Domain 2 - Source Code Management

repos

GitHub, GitLab or BitBucket repositories.

field type length description key
id varchar 255 A repo's id is composed of "< plugin >:< Entity >:< PK0 >[:PK1]..."
For example, a Github repo's id is like "< github >:< GithubRepos >:< ConnectionId >:< GithubRepoId >". E.g. 'github:GithubRepos:1:384111310'
PK
name longtext The name of repo. For DevLake, it's 'apache/incubator-devlake'
description longtext The description of repo.
url longtext The url of repo. E.g. https://github.com/apache/incubator-devlake
owner_id varchar 255 The id of the owner of repo FK_accounts.id
language varchar 255 The major language of repo. E.g. The language for apache/incubator-devlake is 'Go'
forked_from longtext Empty unless the repo is a fork in which case it contains the id of the repo the repo is forked from.
deleted tinyint 1 0: repo is active 1: repo has been deleted
created_date datetime 3 Repo creation date
updated_date datetime 3 Last full update was done for this repo

repo_commits

The commits belong to the history of a repository. More than one repo can share the same commits if one is a fork of the other.

field type length description key
repo_id varchar 255 Repo id FK_repos.id
commit_sha char 40 Commit sha FK_commits.sha

refs

A ref is the abstraction of a branch or tag.

field type length description key
id varchar 255 A ref's id is composed of "< plugin >:< Entity >:< PK0 >[:PK1]..."
For example, a Github ref is composed of "github:GithubRepos:< GithubRepoId >:< RefUrl >". E.g. The id of release v5.3.0 of PingCAP/TiDB project is 'github:GithubRepos:384111310:refs/tags/v5.3.0' A repo's id is composed of "< plugin >:< Entity >:< PK0 >[:PK1]..."
PK
name varchar 255 The name of the ref. E.g. 'refs/tags/v0.9.3' or 'origin/main'
repo_id varchar 255 The id of repo this ref belongs to FK_repos.id
commit_sha char 40 The commit this ref points to at the time of collection
is_default tinyint 1
  • 0: not the default branch
  • 1: the ref is the default branch. By the definition of Github, the default branch is the base branch for pull requests and code commits.
ref_type varchar 64 There are 2 typical types:
  • BRANCH
  • TAG

commits_diffs

This table shows the commits added in a new commit compared to an old commit. This table can be used to support tag-based and deploy-based metrics.

The records of this table are computed by RefDiff plugin. The computation should be manually triggered after using GitRepoExtractor to collect commits and refs. The algorithm behind is similar to this.

field type length description key
new_commit_sha char 40 The commit new ref/deployment points to at the time of collection PK
old_commit_sha char 40 The commit old ref/deployment points to at the time of collection PK
commit_sha char 40 One of the added commits in the new ref compared to the old ref/deployment PK
sorting_index bigint An index for debugging, please skip it

ref_commits

field type length description key
new_ref_id varchar 255 The new ref's id for comparison PK
old_ref_id varchar 255 The old ref's id for comparison PK
new_commit_sha char 40 The commit new ref points to at the time of collection
old_commit_sha char 40 The commit old ref points to at the time of collection

commits

field type length description key
sha char 40 One of the added commits in the new ref compared to the old ref FK_commits.sha
message varchar 255 Commit message
author_name varchar 255 The value is set with command git config user.name xxxxx commit
author_email varchar 255 The value is set with command git config user.email xxxxx author
authored_date datetime 3 The date when this commit was originally made
author_id varchar 255 The id of commit author FK_accounts.id
committer_name varchar 255 The name of committer
committer_email varchar 255 The email of committer
committed_date datetime 3 The last time the commit gets modified.
For example, when rebasing the branch where the commit is in on another branch, the committed_date changes.
committer_id varchar 255 The id of committer FK_accounts.id
additions bigint Added lines of code
deletions bigint Deleted lines of code
dev_eq int A metric that quantifies the amount of code contribution. The data can be retrieved from AE plugin.

commit_files

The files that have been changed by commits.

field type length description key
id varchar 255 The id is composed of "< Commit_sha >:< file_path >" FK_commits.sha
commit_sha char 40 Commit sha FK_commits.sha
file_path varchar 255 Path of a changed file in a commit
additions bigint The added lines of code in this file by the commit
deletions bigint The deleted lines of code in this file by the commit

components

The components of files extracted from the file paths. This can be used to analyze Git metrics by component.

field type length description key
repo_id varchar 255 The repo id FK_repos.id
name varchar 255 The name of component
path_regex varchar 255 The regex to extract components from this repo's paths

commit_file_components

The relationship between commit_file and component_name.

field type length description key
commit_file_id varchar 255 The id of commit file FK_commit_files.id
component_name varchar 255 The component name of a file

commit_parents

The parent commit(s) for each commit, as specified by Git.

field type length description key
commit_sha char 40 commit sha FK_commits.sha
parent_commit_sha char 40 Parent commit sha FK_commits.sha

Domain 3 - Code Review

pull_requests

Pull requests are the abstraction of GitHub pull requests, GitLab merge requests, BitBucket pull requests, etc.

field type length description key
id varchar 255 A pull request's id is composed of "< plugin >:< Entity >:< PK0 >[:PK1]..." E.g. For 'github:GithubPullRequests:1347'
title longtext The title of pull request
description longtext The body/description of pull request
status varchar 100 The PR/MR statuses are standardized to 'OPEN', 'MERGED' and 'CLOSED'. Learn how each plugin's PR statuses are standardized.
original_status varchar 100 The original status of pull requests.
parent_pr_id varchar 255 The id of the parent PR
pull_request_key varchar 255 The key of PR. E.g. 1563 is the key of this PR
base_repo_id varchar 255 The repo that will be updated.
head_repo_id varchar 255 The repo containing the changes that will be added to the base. If the head repository is NULL, this means that the corresponding project had been deleted when DevLake processed the pull request.
author_name varchar 100 The author's name of the pull request
author_id varchar 100 The author's id of the pull request
url varchar 255 the web link of the pull request
type varchar 255 The work-type of a pull request.For example: feature-development, bug-fix, docs,etc.
component varchar 255 The component this PR affects.
The value is transformed from Github/GitLab pull request labels by configuring GITHUB_PR_COMPONENT in .env file during installation.
created_date datetime 3 The time PR created.
merged_date datetime 3 The time PR gets merged. Null when the PR is not merged.
closed_date datetime 3 The time PR closed. Null when the PR is not closed.
merge_commit_sha char 40 the merge commit of this PR. By the definition of Github, when you click the default Merge pull request option on a pull request on Github, all commits from the feature branch are added to the base branch in a merge commit. FK_commits.sha
base_ref varchar 255 The branch name in the base repo that will be updated
head_ref varchar 255 The branch name in the head repo that contains the changes that will be added to the base
base_commit_sha char 40 The base commit of this PR.
head_commit_sha char 40 The head commit of this PR.

pull_request_labels

This table shows the labels of pull request. Multiple entries can exist per pull request. This table can be used to filter pull requests by label name.

field type length description key
label_name varchar 255 Label name
pull_request_id varchar 255 Pull request ID FK_pull_requests.id

pull_request_commits

A commit associated with a pull request.

The list is additive. This means if a rebase with commit squashing takes place after the commits of a pull request have been processed, the old commits will not be deleted.

field type length description key
pull_request_id varchar 255 Pull request id FK_pull_requests.id
commit_sha char 40 Commit sha FK_commits.sha
commit_author_name varchar 255 The name of the person who authored the commit
commit_author_email varchar 255 The email address of the person who authored the commit.
commit_authored_date varchar 255 The date and time when the commit was authored.

pull_request_comments

Normal comments, review bodies, reviews' inline comments of GitHub's pull requests or GitLab's merge requests.

field type length description key
id varchar 255 Comment id PK
pull_request_id varchar 255 Pull request id FK_pull_requests.id
body longtext The body of the comments
account_id varchar 255 The account who made the comment FK_accounts.id
created_date datetime 3 Comment creation time
position int Deprecated
type varchar 255 - For normal comments: NORMAL
- For review comments, ie. diff/inline comments: DIFF
- For reviews' body (exist in GitHub but not GitLab): REVIEW
review_id varchar 255 Review_id of the comment if the type is REVIEW or DIFF
status varchar 255 Status of the comment

Domain 4 - CI/CD

cicd_scopes

The entity to filter or group 'cicd_pipelines'.

  • For GitHub: a GitHub repo is converted to a cicd_scope
  • For GitLab: a GitLab project is converted to a cicd_scope
  • For Jenkins: a Jenkins job is converted to a cicd_scope
  • For Bamboo CI: a Bamboo plan is converted to a cicd_scope
field type length description key
id varchar 255 A cicd_scope's id is composed of "< plugin >:< Entity >:< PluginConnectionId >[:PK1]..."
For example, a GitHub cicd_scope's id is "github:GithubRepos:< GithubConnectionId >:< GithubRepoId >", a Bamboo cicd_scope's id is 'bamboo:BambooPlan:< BambooConnectionId >:< BambooPlanKey >'
PK
name varchar 255 The name of cicd_scope.
description longtext The description of cicd_scope.
url varchar 255 The url of cicd_scope. E.g. https://github.com/apache/incubator-devlake or https://jenkins.xxx.cn/view/PROD/job/OPS_releasev2/
created_date datetime 3 Creation date of the cicd_scope, nullable
updated_date datetime 3 Updation date of the cicd_scope, nullable

cicd_pipelines

A cicd_pipeline is the abstraction of a top-level CI/CD execution, e.g. a GitHub workflow run, a GitLab pipeline, a BitBucket pipeline, a Jenkins build, a Bamboo plan build, etc. A cicd_pipeline contains one or more of cicd_tasks.

field type length description key
id varchar 255 This key is generated based on details from the original plugin PK
name varchar 255 For gitlab, as there is no name for pipeline, so we use projectId, others have their own name
result varchar 100 The result of the pipeline. It will be standardized to 'SUCCESS', 'FAILURE', '' in DevLake based on each plugin's possible results.
original_result varchar 100 The original_result of the pipeline. Its value depends on the state of the corresponding entity in the different plugins.
status varchar 100 The status of the pipeline. It will be standardized to 'DONE', 'IN_PROGRESS', 'OTHER' in DevLake based on each plugin's possible statues.
original_status varchar 100 The original_status of the pipeline. Its value depends on the state of the corresponding entity in the different plugins.
type varchar 100 The value will be set to 'DEPLOYMENT' if it matched the regex configured in the Scope Config, otherwise it is an empty string.
environment varchar 255 The value will be set to 'PRODUCTION' if it matched the regex configured in the Scope Config, otherwise it is an empty string.
duration_sec double how long does this pipeline take
queued_duration_sec double how long does this pipeline take queuing
created_date datetime 3 When this pipeline created.
queued_date datetime 3 The queued time of the pipeline.
started_date datetime 3 The started time of the pipeline.
finished_date datetime 3 When this pipeline finished.
cicd_scope_id longtext The id of cicd_scope this pipeline belongs to FK_cicd_scopes.id

cicd_pipeline_commits

field type length description key
pipeline_id varchar 255 This key is generated based on details from the original plugin PK
commit_sha varchar 255 The commit that triggers this pipeline PK
branch varchar 255 The branch that triggers this pipeline
repo_url varchar 255
repo_id varchar 255 The repo that this pipeline belongs to

cicd_tasks

A cicd_task is the abstraction of the bottom-level CI/CD execution.

  • For GitHub: a cicd_task is a GitHub job run in a GitHub workflow run.
  • For GitLab: a cicd_task is a GitLab job run of a GitLab pipeline run.
  • For Jenkins: a cicd_task is a subtask of a Jenkins build. If a build does not have subtask(s), then the build will also be saved as a cicd_task in this table.
  • For Bamboo CI: a cicd_task is a Bamboo job build in a Bamboo plan build.
field type length description key
id varchar 255 This key is generated based on details from the original plugin PK
name varchar 255
pipeline_id varchar 255 The id of the cicd_pipeline it belongs to
result varchar 100 The result of the task. It will be standardized to 'SUCCESS', 'FAILURE', '' in DevLake based on each plugin's possible.
original_result varchar 100 The original_result of the task. Its value depends on the state of the corresponding entity in the different plugins.
status varchar 100 The status of the task. It will be standardized to 'DONE', 'IN_PROGRESS', 'OTHER' in DevLake based on each plugin's possible states.
original_status varchar 100 The original_status of the task. Its value depends on the state of the corresponding entity in the different plugins.
type varchar 100 The value will be set to 'DEPLOYMENT' if it matched the regex configured in the Scope Config, otherwise it is an empty string.
environment varchar 255 The value will be set to 'PRODUCTION' if it matched the regex configured in the Scope Config, otherwise it is an empty string.
duration_sec double How long does this task take
ququed_duration_sec double How long does this task take queuing
created_date datetime 3 The created time of the task.
queued_date datetime 3 The queued time of the task.
started_date datetime 3 When this task started
finished_date datetime 3 When this task finished
cicd_scope_id longtext The id of cicd_scope this task belongs to FK_cicd_scopes.id

cicd_deployments

A cicd_deployment refers to a deployment at the project level. In the case where a pipeline run or build deploys across three distinct repositories, it will be categorized as ONE cicd_deployment while being recorded as THREE separate cicd_deployment_commits. It may come from several sources:

  • Domain layer cicd_pipelines, such as GitHub workflow runs, GitLab pipelines, Jenkins builds and BitBucket pipelines, etc. Deployments from cicd_pipelines will be transformed according to the regex configuration set in the Blueprint transformation before adding to this table.
  • Tool layer deployments: in v0.20, only the BitBucket\Bamboo\GitLab and GitHub(Use GraphQL APIs) plugins collect the independent deployment entity which you can find in table._tool_bitbucket_deployments and _tool_bamboo_deploy_builds, there will be more in the future.
  • Deployments pushed directly from webhooks

Additional Notes

  • CICD deployments are recorded at the project level, not the repository level.
  • CICD deployment commits are individual commits that represent a single deployment across multiple repositories.
field type length description key
id varchar 255 The deployment_id of this deployment. The value will be set with id when it comes from webhooks. PK
cicd_scope_id varchar 255 The id of cicd_scope this deployment belongs to FK_cicd_scopes.id
name varchar 255 The name of the deployment
result varchar 100 The result of the deployment, enum. 'SUCCESS', 'FAILUR'E, ''
original_result varchar 100 The original_result of the deployment. Its value depends on the state of the corresponding entity in the different plugins.
status varchar 100 The status of this deployment, enum: 'IN_PROGRESS', 'DONE', 'OTHER'
original_status varchar 100 The original_status of the deployment. Its value depends on the state of the corresponding entity in the different plugins.
environment varchar 255 The environment to deploy, only 'PRODUCTION' deployment will appear in v0.17
original_environment varchar 255 The original environment field of the deployment. It appeared since v1.0.
created_date datetime 3 The created time of the deployment.
queued_date datetime 3 The queued time of the deployment.
started_date datetime 3 The started time of the deployment.
finished_date datetime 3 The finished time of the deployment
duration_sec double The time this deployment takes
ququed_duration_sec double The time this deployment takes queuing

cicd_deployment_commits

A cicd_deployment_commit is a deployment in a specific repo. A deployment may come from several sources:

  • Domain layer cicd_pipelines, such as GitHub workflow runs, GitLab pipelines, Jenkins builds and BitBucket pipelines, etc. Deployments from cicd_pipelines will be transformed according to the regex configuration set in the Blueprint transformation before adding to this table.
  • Tool layer deployments: in v0.18, only the BitBucket and Bamboo plugins collect the independent deployment entity which you can find in table._tool_bitbucket_deployments and _tool_bamboo_deploy_builds, but there will be more in the future.
  • Deployments pushed directly from webhooks

You can query deployments from this table by SELECT DISTINCT cicd_deployment_id FROM cicd_deployment_commits.

Normally, one deployment only deploy to one repo. But in some cases, one deployment may deploy in multiple repos with different commits. In these cases, there will be multiple pairs of deployment-commit-repo, appeared in multiple entries in this table.

field type length description key
id varchar 255 This key is the combination of the deployment's id and repo_url, e.g.
- from a GitHub workflow run: github:GithubRun:1:384111310:3521097091:https://github.com/apache/incubator-devlake
- from a Jenkins build, jenkins:JenkinsBuild:1:deploy#7:https://github.com/apache/incubator-devlake
- from a webhook, webhook:1:90489d3951711d72:e6bde456807818c5c78d7b265964d6d48b653af6
PK
cicd_scope_id varchar 255 The id of cicd_scope this deployment_commit belongs to FK_cicd_scopes.id
cicd_deployment_id varchar 255 The deployment_id of this deployment_commit. The value will be set with id when it comes from webhooks.
name varchar 255 The name of the deployment
result varchar 100 The result of the deployment, enum: 'SUCCESS', 'FAILURE', ''
original_result varchar 100 The original_result of the deployment. Its value depends on the state of the corresponding entity in the different plugins.
status varchar 100 The status of this deployment, enum: 'IN_PROGRESS', 'DONE', 'OTHER'
original_status varchar 100 The original_status of the deployment. Its value depends on the state of the corresponding entity in the different plugins.
environment varchar 255 The environment to deploy
original_environment varchar 255 The original environment of the deployment. It appeared since v1.0.
created_date datetime 3 The created time of the deployment.
queued_date datetime 3 The queued time of the deployment.
started_date datetime 3 The started time of the deployment.
finished_date datetime 3 The finished time of the deployment
duration_sec double The time this deployment takes
queued_duration_sec double The time this deployment takes queueing
commit_sha char 40 The commit sha that triggers the deployment
ref_name varchar 255 The ref (branch/tag) name of the commit
repo_id varchar 255 -
repo_url varchar 191 The url of the repo
prev_success_deployment_commit_id varchar 255 The last successful deployment_commit_id before this one in the same cicd_scope, repo and environment, which is used to calculate the new commits deployed by this deployment, thereby measuring DORA - change lead time.

Domain 5 - Code Quality

The names of tables in the 'Code Quality' domain will start with a prefix cq_

cq_projects

field type length description key
id varchar 255 This key is generated based on details from the original plugin PK
name varchar 255 The name of the project in SonarQube
qualifier varchar 255 The type of project. Examples include "TRK" for regular projects and "VW" for views
visibility varchar 64 The visibility of the project. Examples include "public" and "private"
last_analysis_date datatime 3 The date and time of the most recent analysis of the project
commit_sha varchar 128 It represents the version number or code version identifier of a project

cq_issues

field type length description key
id varchar 255 This key is generated based on details from the original plugin PK
rule varchar 255 The key of the rule that the issue is violating
severity varchar 255 The severity level of the issue
component varchar 255 The name of the component where the issue was found
project_key varchar 255 The key of the project that the issue belongs to
line bigint The line number where the issue was found
status varchar 255 The status of the issue
message longtext The message associated with the issue
debt bigint The estimated time required to fix the issue
effort bigint The effort required to fix the issue
commit_author_email varchar 255 The email address of the author of the commit that introduced the issue
assignee varchar 255 The person assigned to fix the issue
hash varchar 255 A hash code for the issue
tags varchar 255 Any tags associated with the issue
type varchar 255 The type of the issue
scope varchar 128 The scope of the issue
start_line bigint 255 The starting line of the issue
end_line bigint 255 The ending line of the issue
start_offset bigint 255 The starting offset of the issue
end_offset bigint 255 The ending offset of the issue
vulnerability_probability varchar 100 The probability of the issue being a vulnerability
security_category varchar 100 The security category of the issue
created_date datetime 3 The time when the issue was created
updated_date datetime 3 The time when the issue was last updated

cq_issue_code_blocks

field type length description key
id varchar 255 This key is generated based on details from the original plugin PK
issue_key varchar 255 A string that stores the key of the issue that the code block is associated with
component varchar 255 A string that stores the name of the component that the code block is associated with
start_line bigint 255 An integer that stores the line number where the code block starts
end_line bigint 255 An integer that stores the line number where the code block ends
start_offset bigint 255 An integer that stores the offset where the code block starts
end_offset bigint 255 An integer that stores the offset where the code block ends
msg longtext A long text field that stores the message associated with the code block

cq_file_metrics

field type length description key
id varchar 255 This key is generated based on details from the original plugin PK
project_key varchar 255 The key of the project that the issue belongs to PK
file_name longtext longtext fields that store the name of the file
file_path longtext longtext fields that store the path of the file
file_language longtext longtext fields that store the language of the file
code_smells bigint Code smells of this file
sqale_index bigint Sqale index of the file
sqale_rating double Sqale rating of the file
bugs bigint Bugs rating of the file
reliability_rating longtext Reliability rating of the file
vulnerabilities bigint Vulnerabilities of the file
security_rating longtext Security rating of the file
security_hotspots bigint Security hotspots of the file
security_hotspots_reviewed double Security hotspots reviewed of the file
security_review_rating longtext Security review rating of the file
ncloc bigint Ncloc of the file
coverage double Ncoverage of the file
lines_to_cover bigint Lines to cover of the file
duplicated_lines_density double Duplicated lines density of the file
duplicated_blocks bigint Duplicated blocks of the file
duplicated_files bigint Duplicated files of the file
duplicated_lines bigint Duplicated lines of the file
effort_to_reach_maintainability_rating_a bigint Effort to reach maintainability rating a of the file
complexity bigint Complexity of the file
cognitive_complexity bigint Cognitive complexity of the file
num_of_lines bigint Num of lines of the file

Domain 6 - Cross-Domain Entities

These entities are used to map entities between different domains. They are the key players to break data isolation.

There are low-level entities such as issue_commits, users, and higher-level cross domain entities such as board_repos

issue_commits

A low-level mapping between "issue tracking" and "source code management" domain by mapping issues and commits. Issue(n): Commit(n).

The original connection between these two entities lies in either issue tracking tools like Jira or source code management tools like GitLab. You have to use tools to accomplish this.

For example, a common method to connect Jira issue and GitLab commit is a GitLab plugin Jira Integration. With this plugin, the Jira issue key in the commit message written by the committers will be parsed. Then, the plugin will add the commit urls under this jira issue. Hence, DevLake's Jira plugin can get the related commits (including repo, commit_id, url) of an issue.

field type length description key
issue_id varchar 255 Issue id FK_issues.id
commit_sha char 40 Commit sha FK_commits.sha

pull_request_issues

This table shows the issues closed by pull requests. It's a medium-level mapping between "issue tracking" and "source code management" domain by mapping issues and commits. Issue(n): Commit(n).

The data is extracted from the body of pull requests conforming to certain regular expression. The regular expression can be defined in GITHUB_PR_BODY_CLOSE_PATTERN in the .env file

field type length description key
pull_request_id varchar 255 Pull request id FK_pull_requests.id
issue_id varchar 255 Issue id FK_issues.id
pull_request_number varchar 255 Pull request key
issue_number varchar 255 Issue key

board_repos (Deprecated)

A way to link "issue tracking" and "source code management" domain by mapping boards and repos. Board(n): Repo(n).

field type length description key
board_id varchar 255 Board id FK_boards.id
repo_id varchar 255 Repo id FK_repos.id

accounts

This table stores of user accounts across different tools such as GitHub, Jira, GitLab, etc. This table can be joined to get the metadata of all accounts. metrics, such as 'No. of Issue closed by contributor', 'No. of commits by contributor',

field type length description key
id varchar 255 An account's id is the identifier of the account of a specific tool. It is composed of "< Plugin >:< Entity >:< PK0 >[:PK1]..."
For example, a Github account's id is composed of "< github >:< GithubAccounts >:< GithubUserId >)". E.g. 'github:GithubUsers:14050754'
PK
email varchar 255 Email of the account
full_name varchar 255 Full name
user_name varchar 255 Username, nickname or Github login of an account
avatar_url varchar 255
organization varchar 255 User's organization(s)
created_date datetime 3 User creation time
status int 0: default, the user is active. 1: the user is not active

users

field type length description key
id varchar 255 id of a person PK
email varchar 255 the primary email of a person
name varchar 255 name of a person

user_accounts

field type length description key
user_id varchar 255 users.id Composite PK, FK
account_id varchar 255 accounts.id Composite PK, FK

teams

field type length description key
id varchar 255 id from the data sources, decided by DevLake users PK
name varchar 255 name of the team. E.g. team A, team B, etc.
alias varchar 255 alias or abbreviation of a team
parent_id varchar 255 teams.id, default to null FK
sorting_index int 255 the field to sort team

team_users

field type length description key
team_id varchar 255 Full name of the team. E.g. team A, team B, etc. Composite PK, FK
user_id varchar 255 users.id Composite PK, FK

project

field type length description key
name varchar 255 name for project PK
description longtext description of the project
created_at datetime 3 created time of project
updated_at datetime 3 last updated time of project

project_metric_settings

field type length description key
project_name varchar 255 name for project PK
plugin_name varchar 255 name for plugin PK
plugin_option longtext check if metric plugins have been enabled by the project
enable tinyint 1 if the metric plugins is enabled

project_mapping

field type length description key
project_name varchar 255 name for project PK
table varchar 255 the table name of Scope PK
row_id varchar 255 the row_id in the Scope table PK

project_pr_metrics

field type length description key
id varchar 255 Id of PR PK
project_name varchar 100 The project that this PR belongs to PK
first_review_id longtext The id of the first review on this pr
first_commit_sha longtext The sha of the first commit
pr_coding_time bigint The time it takes from the first commit until a PR is issued
pr_pickup_time bigint The time it takes from when a PR is issued until the first comment is added to that PR
pr_review_time bigint The time it takes to complete a code review of a PR before it gets merged
deployment_id longtext The id of cicd_task which deploy the commits of this PR
pr_deploy_time bigint The time it takes from when a PR is merged to when it is deployed
pr_cycle_time bigint The total time from the first commit to when the PR is deployed

project_issue_metrics

field type length description key
id varchar 255 Id of Issue PK
project_name varchar 100 The project that this Issue belongs to PK
deployment_id longtext The id of cicd_task which cause an incident

refs_issues_diffs

This table shows the issues fixed by commits added in a new ref compared to an old one. The data is computed from table.commits_diffs, table.pull_requests, table.pull_request_commits, and table.pull_request_issues.

This table can support tag-based analysis, for instance, 'No. of bugs closed in a tag'.

field type length description key
new_ref_id varchar 255 The new ref's id for comparison FK_refs.id
old_ref_id varchar 255 The old ref's id for comparison FK_refs.id
new_ref_commit_sha char 40 The commit new ref points to at the time of collection
old_ref_commit_sha char 40 The commit old ref points to at the time of collection
issue_number varchar 255 Issue number
issue_id varchar 255 Issue id FK_issues.id

Get Domain Layer Models in Developer Mode

When developing a new plugin, you need to refer to domain layer models, as all raw data should be transformed to domain layer data to provide standardized metrics across tools. Please use the following method to access the domain data models.

import "github.com/apache/incubator-devlake/models/domainlayer/domaininfo"

domaininfo := domaininfo.GetDomainTablesInfo()
for _, table := range domaininfo {
// do something
}

If you want to learn more about plugin models, please visit PluginImplementation