Skip to content

Commit

Permalink
docs: Update website-docs
Browse files Browse the repository at this point in the history
  • Loading branch information
PULAK0717 committed Sep 5, 2023
1 parent 552acb1 commit b74104d
Show file tree
Hide file tree
Showing 5 changed files with 258 additions and 7 deletions.
3 changes: 2 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@

Welcome to the Datachecks Documentation!

Let's jump to the **<u>[Getting Started!](getting_started.md)</u>**
Let's jump to the **<u>[Getting Started!](getting_started.md)</u>**

33 changes: 32 additions & 1 deletion docs/metrics/combined.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,34 @@
# **Combined Metrics**

## Updating Soon ....
Combined metrics in data quality serve as a cornerstone for ensuring the accuracy and efficiency of your data operations. These metrics provide a holistic view of your data ecosystem, amalgamating various aspects to paint a comprehensive picture.

By consistently tracking these combined metrics, you gain invaluable insights into the overall performance of your data infrastructure. This data-driven approach enables you to make informed decisions on optimization, resource allocation, and system enhancements. Moreover, these metrics act as sentinels, promptly detecting anomalies or bottlenecks within your data pipelines. This proactive stance allows you to mitigate potential issues before they escalate, safeguarding the integrity of your data.

Combined metrics raises error on more than 2 arguments in one operation.


## **Available Function**

- `div()`
- `sum()`
- `mul()`
- `sub()`
- `percentage()`

**Example**

```yaml title="dcs_config.yaml"
metrics:
- name: combined_metric_example
metric_type: combined
expression: sum(count_us_parts, count_us_parts_valid)
```

**Example**

```yaml title="dcs_config.yaml"
metrics:
- name: combined_metric_example
metric_type: combined
expression: div(sum(count_us_parts, count_us_parts_valid), count_us_parts_not_valid)
```
70 changes: 68 additions & 2 deletions docs/metrics/completeness.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,69 @@
# **Completeness Metric**
# **Completeness Metrics**

## Updating Soon ....
Completeness metrics play a crucial role in data quality assessment, ensuring your datasets are comprehensive and reliable. By regularly monitoring these metrics, you can gain profound insights into the extent to which your data captures the entirety of the intended information. This empowers you to make informed decisions about data integrity and take corrective actions when necessary.

These metrics unveil potential gaps or missing values in your data, enabling proactive data enhancement. Like a well-oiled machine, tracking completeness metrics enhances the overall functionality of your data ecosystem. Just as reliability metrics guarantee up-to-date information, completeness metrics guarantee a holistic, accurate dataset.


## **Null Count**

Null count metrics gauge missing data, a crucial aspect of completeness metrics, revealing gaps and potential data quality issues.



**Example**

```yaml title="dcs_config.yaml"
metrics:
- name: null_count_in_dataset
metric_type: null_count
resource: product_db.products
field_name: first_name

```


## **Null Percentage**

Null percentage metrics reveal missing data, a vital facet of completeness metrics, ensuring data sets are whole and reliable.

**Example**

```yaml title="dcs_config.yaml"
metrics:
- name: null_percentage_in_dataset
metric_type: null_percentage
resource: product_db.products
field_name: first_name

```

## **Empty String**

Empty string metrics gauge the extent of missing or null values, exposing gaps that impact data completeness and reliability.

**Example**

```yaml title="dcs_config.yaml"
metrics:
- name: empty_string_in_dataset
metric_type: empty_string
resource: product_db.products
field_name: first_name

```

## **Empty String Percentage**

Empty String Percentage Metrics assess data completeness by measuring the proportion of empty strings in datasets.

**Example**

```yaml title="dcs_config.yaml"
metrics:
- name: empty_string_percentage_in_dataset
metric_type: empty_string_percentage
resource: product_db.products
field_name: first_name

```
128 changes: 126 additions & 2 deletions docs/metrics/numeric_distribution.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,127 @@
# **Numeric Distribution Metric**
# **Numeric Distribution Metrics**

## Updating Soon ....
Numeric distribution metrics serve as vital tools for ensuring the ongoing integrity of your data. These metrics offer valuable insights into the distribution of values within your datasets, aiding in data quality assurance.

By consistently monitoring these metrics, you gain a deeper understanding of how your data behaves. This knowledge empowers you to make informed decisions regarding data cleansing, anomaly detection, and overall data quality improvement.

Furthermore, numeric distribution metrics are your early warning system. They help pinpoint outliers and anomalies, allowing you to address potential data issues before they escalate into significant problems in your data pipelines.


## **Average**

Average metrics gauge performance in transitional databases and search engines, offering valuable insights into overall effectiveness.


**Example**

```yaml title="dcs_config.yaml"
metrics:
- name: avg_price
metric_type: avg
resource: product_db.products
field_name: price
filters:
where: "country_code = 'IN'"
```
## **Minimum**
Minimum metrics ensure consistency across transitional databases and search engines, enhancing data quality and retrieval accuracy.
**Example**
```yaml title="dcs_config.yaml"
metrics:
- name: min_price
metric_type: min
resource: product_db.products
field_name: price
```
## **Maximum**
Maximum metrics gauge the highest values within datasets, helping identify outliers and understand data distribution's upper limits for quality assessment.
**Example**
```yaml title="dcs_config.yaml"
metrics:
- name: max_price
metric_type: max
resource: product_db.products
field_name: price
```
```yaml title="dcs_config.yaml"
- name: max_price_of_products_with_high_rating
metric_type: max
resource: product_db.products
field_name: price
filters:
where: "rating > 4"
```
## **Variance**
Variance in data quality measures the degree of variability or dispersion in a dataset, indicating how spread out the data points are from the mean.
**Example**
```yaml title="dcs_config.yaml"
metrics:
- name: variance_of_price
metric_type: variance
resource: product_db.products
field_name: price
```
## **Skew**
Skew metric in data quality measures the extent of asymmetry or distortion in the distribution of data values. It helps assess the balance and uniformity of data distribution.
**Example**
```yaml title="dcs_config.yaml"

```

## **Kurtosis**

Kurtosis is a data quality metric that measures the level of peakedness or flatness of a dataset's probability distribution in a geometric space.

**Example**

```yaml title="dcs_config.yaml"

```

## **Sum**

The sum metric in data quality measures the accuracy and consistency of numerical data by assessing the total of a specific attribute across different records.

**Example**

```yaml title="dcs_config.yaml"

```

## **Geometric Mean**

The geometric mean metric in data quality is a statistical measure that calculates the nth root of the product of n data values, often used to assess the central tendency of a dataset

**Example**

```yaml title="dcs_config.yaml"

```

## **Harmonic Mean**

The Harmonic mean metric in data quality is a statistical measure used to assess the quality of data by calculating the reciprocal of the average of the reciprocals of data values.

**Example**

```yaml title="dcs_config.yaml"

```
31 changes: 30 additions & 1 deletion docs/metrics/uniqueness.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,32 @@
# **Uniqueness Metrics**

## Updating Soon ....
Uniqueness metrics play a pivotal role in upholding data quality standards. Just as reliability metrics ensure timely data updates, uniqueness metrics focus on the distinctiveness of data entries within a dataset.

By consistently tracking these metrics, you gain valuable insights into data duplication, redundancy, and accuracy. This knowledge empowers data professionals to make well-informed decisions about data cleansing and optimization strategies. Uniqueness metrics also serve as a radar for potential data quality issues, enabling proactive intervention to prevent major problems down the line.


## **Distinct Count**

A distinct count metric in data quality measures the number of unique values within a dataset, ensuring accuracy and completeness.

**Example**

```yaml title="dcs_config.yaml"
metrics:
- name: distinct_count_of_product_categories
metric_type: distinct_count
resource: product_db.products
field_name: product_category
```
## **Duplicate Count**
Duplicate count is a data quality metric that measures the number of identical or highly similar records in a dataset, highlighting potential data redundancy or errors.
**Example**
```yaml title="dcs_config.yaml"

```

0 comments on commit b74104d

Please sign in to comment.