Skip to content

Commit

Permalink
[watremark] Add watermark HLD (#245)
Browse files Browse the repository at this point in the history
* [watremark] Add watermark HLD

Signed-off-by: Mykola Faryma <[email protected]>
  • Loading branch information
mykolaf authored and lguohan committed Nov 28, 2018
1 parent 20f045d commit 05df96c
Show file tree
Hide file tree
Showing 3 changed files with 366 additions and 0 deletions.
366 changes: 366 additions & 0 deletions doc/watermarks_HLD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,366 @@
# Watermark counters in SONiC
# High Level Design Document
### Rev 0.1

# Table of Contents
* [List of Tables](#list-of-tables)
* [Revision](#revision)
* [About this Manual](#about-this-manual)
* [Scope](#scope)
* [Definitions/Abbreviation](#definitionsabbreviation)

* [1 Overview](#1-overview)
* [1.1 System Chart](#11-system-chart)
* [1.2 Modules description](#12-modules-description)
* [1.2.1 gRPC](#121-grpc)
* [1.2.2 Counter DB](#122-counter-db)
* [1.2.3 Orchestration Agent](#123-orchestration-agent)
* [1.2.4 SAI Redis](#124-sai-redis)
* [1.2.5 SAI DB](#125-sai-db)
* [1.2.6 syncd](#126-syncd)
* [1.2.7 SAI (Redis and Switch)](#127-sai-redis-and-switch)

* [2 Requirements](#2-requirements)
* [2.1 Watermark requirements](#21-watermark-requirements)

* [3 Modules Design](#3-modules-design)
* [3.1 Modules that need to be updated](#31-modules-that-need-to-be-updated)
* [3.1.1 CLI](#311-cli)
* [3.1.2 Counter DB](#312-counter-db)
* [3.1.3 Lua scripts and plugins](#313-lua-scripts-and-plugiins)
* [3.1.4 Orchestration Agent](#314-orchestration-agent)
* [3.1.5 SAI Redis](#315-sai-redis)
* [3.1.6 ASIC DB](#316-asic-db)
* [3.1.7 Syncd](#317-syncd)
* [3.1.8 SAI](#318-sai)
* [3.1.9 gRPC](#319-grpc)

* [4 Flows](#4-flows)

* [5 Open questions](#5-open-questions)

# List of Tables
* [Table 1: Revision](#revision)
* [Table 2: Abbreviations](#definitionsabbreviation)
* [Table 3: COUNTERS_DB Table details](#table-details)
* [Table 4: Virtual paths](#virtual-paths)

###### Revision
| Rev | Date | Author | Change Description |
|:---:|:-----------:|:------------------:|-----------------------------------|
| 0.1 | | Mykola Faryma | Initial version |

# About this Manual
This document provides general information about the watermark feature implementation in SONiC.
# Scope
This document describes the high level design of the watermark feature.
# Definitions/Abbreviation
###### Table 2: Abbreviations
| Definitions/Abbreviation | Description |
|--------------------------|--------------------------------------------|
| gRPC | gRPC Remote Procedure Calls |
| gNMI | gRPC Network Management Interface |
| API | Application Programmable Interface |
| SAI | Switch Abstraction Interface |

# 1 Overview
## 1.1 System Chart
Following diagram describes a top level overview of the architecture:

![](https://github.com/mykolaf/SONiC/blob/gh-pages/images/watermark_HLD/SystemOverview.png)

## 1.2 Modules description
### 1.2.1 gRPC
System data telemetry infrastructure. Basically allows to getRequest data from SONiC DBs (and more).
### 1.2.2 Counter DB
Located in the Redis DB instance #2 running inside the container "database". Redis DB works with the data in format of key-value tuples, needs no predefined schema and holds various counters like port counters, ACL counters, etc.
### 1.2.3 Orchestration Agent
This component is running in the "orchagent" docker container and is responsible for processing updates of the APP DB and do corresponding changes in the SAI DB via SAI Redis.
### 1.2.4 SAI Redis
SAI Redis is an implementation of the SAI API which translates API calls into SAI objects which are stored in the ASIC DB.
### 1.2.5 ASIC DB
Redis DB instance #1. Holds serialized SAI objects.
### 1.2.6 syncd
Reads SAI DB data (SAI objects) and performs appropriate calls to Switch SAI.
### 1.2.7 SAI (Redis and Switch)
An unified API which represent the switch state as a set of objects. In SONiC represented in two implementations - SAI DB frontend and ASIC SDK wrapper.
# 2 Requirements
The following watermarks should be supported:
## 2.1 Watermark counters requirements
| | SAI attribute mapping |
|------------------------------------------|------------------------|
| Ingress headroom per PG | SAI_INGRESS_PRIORITY_GROUP_STAT_XOFF_ROOM_WATERMARK_BYTES |
| Ingress shared pool occupancy per PG | SAI_INGRESS_PRIORITY_GROUP_STAT_SHARED_WATERMARK_BYTES |
| Egress shared pool occupancy per queue (including both unicast queues and multicast queues) | SAI_QUEUE_STAT_SHARED_WATERMARK_BYTES |

System behavior:
We consider a maximum of one regular user and a maximum of one special user that comes from streaming telemetry (grpc)

Streaming telemetry is only interested in periodic watermark, i.e., it queries the watermark at regular intervals. The interval is configurable. Streaming telemetry does not care about persistent watermark.
Regular user is able to query the watermark. Regular user is able to reset the watermark. When the watermark is reset, watermark starts a new recording from the time reset is issued.
Regular user is able to query the persistent watermark. Regular user is able to reset the persistent watermark. When the persistent watermark is reset, persistent watermark starts a new recording from the time reset is issued.

When one regular user and the streaming telemetry coexist, they do not interfere with each other. Their behaviors stay the same as described above. So the software should be able to handle the following situations and return the correct watermark values to each user:

![](https://github.com/mykolaf/SONiC/blob/gh-pages/images/watermark_HLD/timeline2.png)


t0 - clear user watermark event

t1 - show user watermark event. Shows highest watermark value for the period t0-t1

t2 - show user watermark event. Shows highest watermark value for the period t0-t2

t3 - clear perisitent watermark event

t4 - show persistent watermark event. Shows highest watermark value for the period t3-t4

t5 - show persistent watermark event. Shows highest watermark value for the period t3-t5

t6 - clear perisitent watermark event

t7 - clear user watermark event

t8 - show user watermark event. Shows highest watermark value for the period t7-t8

t9 - show persistent watermark event. Shows highest watermark value for the period t6-t9

# 3 Modules Design
## 3.1 Modules that need to be updated

### 3.1.1 Counter DB

#### The following new Queue counters should be available for each queue entry in the DB:
- "COUNTERS:queue_vid"
- SAI_QUEUE_STAT_SHARED_WATERMARK_BYTES
#### For every Ingress PG the following should be available in the DB:
- "COUNTERS:pg_vid"
- SAI_INGRESS_PRIORITY_GROUP_STAT_XOFF_ROOM_WATERMARK_BYTES
- SAI_INGRESS_PRIORITY_GROUP_STAT_SHARED_WATERMARK_BYTES
#### Additionally a few mappings should be added:
- "COUNTERS_PG_PORT_MAP" - map PG oid to port oid
- "COUNTERS_PG_NAME_MAP" - map PG oid to PG name
- "COUNTERS_PG_INDEX_MAP" - map PG oid to PG index

The watermark counters are provided via Flex Counter, with a period of 1s. Flex Counter does clear the value from HW.

#### New tables will be introduced:

| Table | Updated by | Cleared by | Used by | Purpose |
| ------------- | ------------- | --- | --- | --- |
| COUNTERS | Flex counter | No need to clear, Flex Counter clears the value on HW every 1s(by default) and overwrites the DB | Lua plugins(Flex counter plugins) | Contains the counters updated by Flex counters |
| PERIODIC_WATERMARKS | Flex counter lua plugins | Cleared on telemetry period (watermark orch handles the timer) | Used by Cli (show queue\|priority-group watermark, accessible for telemetry via virtual path | Contains the telemetry watermarks |
| PERSISTENT_WATERMARKS | Flex counter lua plugins | Cleared by user using clear Cli (clear queue\|priority-group persistent-watermark) | Used by Cli (show queue\|priority-group persistent-watermark), accessible for telemetry via virtual path | Contains the highest watermark from switch boot or last clear of persistent watermark |
| USER_WATERMARKS | flex counter lua plugins | Cleared on user request (clear queue\|priority-group watermark) | Used by CLI (show queue\|priority-group watermark |


The structure of all three this tables is the same as COUNTERS table, but the hashes only contain watermark counters.

For example:

- "PERIODIC_WATERMARKS:queue_vid"
- "SAI_QUEUE_STAT_SHARED_WATERMARK_BYTES"
- "PERIODIC_WATERMARKS:pg_vid"
- "SAI_INGRESS_PRIORITY_GROUP_STAT_XOFF_ROOM_WATERMARK_BYTES"
- "SAI_INGRESS_PRIORITY_GROUP_STAT_SHARED_WATERMARK_BYTES"
- "PERSISTENT_WATERMARKS:queue_vid"
- "SAI_QUEUE_STAT_SHARED_WATERMARK_BYTES"
- "PERSISTENT_WATERMARKS:pg_vid"
- "SAI_INGRESS_PRIORITY_GROUP_STAT_XOFF_ROOM_WATERMARK_BYTES"
- "SAI_INGRESS_PRIORITY_GROUP_STAT_SHARED_WATERMARK_BYTES"
- "USER_WATERMARKS:queue_vid"
- "SAI_QUEUE_STAT_SHARED_WATERMARK_BYTES"
- "USER_WATERMARKS:pg_vid"
- "SAI_INGRESS_PRIORITY_GROUP_STAT_XOFF_ROOM_WATERMARK_BYTES"
- "SAI_INGRESS_PRIORITY_GROUP_STAT_SHARED_WATERMARK_BYTES"


### 3.1.2 CLI

The CLI flow does not incolve any logic, the cli only gets the data from a related table in DB (see table above).
It does not do any comparison between watermark values.

#### 3.1.2.1 CLI show

New script and alias should be implemented to provide watermark values:

$ show priority-group [watermark|persistent-watermark] headroom

```
Ingress headroom per PG:
Interface PG0 PG1 PG2 PG3 PG4 PG5 PG6 PG7
Ethernet0 0 0 0 23 0 0 0 0
Ethernet128 0 0 0 0 0 0 0 0
```
$ show priority-group [watermark|persistent-watermark] shared

```
Ingress shared pool occupancy per PG:
Interface PG0 PG1 PG2 PG3 PG4 PG5 PG6 PG7
Ethernet0 0 1092 0 380 0 0 0 0
Ethernet128 0 0 0 0 0 0 0 0
```
$ show queue [watermark|persistent-watermark] unicast

```
Egress shared pool occupancy per unicast queue:
Interface UC0 UC1 UC2 UC3 UC4 UC5 UC6 UC7
Ethernet0 0 14 0 11 0 1 0 0
Ethernet128 0 0 0 0 0 0 0 0
```
$ show queue [watermark|persistent-watermark] multicast

```
Egress shared pool occupancy per multicast queue:
Interface MC0 MC1 MC2 MC3 MC4 MC5 MC6 MC7
Ethernet0 0 3 0 0 0 0 0 0
Ethernet128 0 0 0 0 0 0 0 0
```

#### 3.1.2.2 CLI clear

In addition clear functionality will be added:

```
# clear priority-group [watermark|persistent-watermark] headroom
# clear priority-group [watermark|persistent-watermark] shared
# clear queue [watermark|persistent-watermark] unicast
# clear queue [watermark|persistent-watermark] mutlicast
```

The user can clear the persistent watermark, and the "user" watermark. The user can not clear the periodic(telemetry) watermark. The clear command requires sudo, as the watermark is shared for
all users, and clear will affect every user(if a number of people are connected through ssh).

#### 3.1.2.3 Show/configure telemetry interval

The telemetry interval will be available for viewing and configuring with the folowing CLI:

```
$ show watermark telemetry interval
# config watermark telemetry interval <value>
```

Note: after the new interval is configured, it will be changed only when the current telemetry interval ends.

### 3.1.3 Lua plugins

In order to keep track of highest watermark plugins for queue and priority groups will be implemented.
They will read the new watermark value from COUNTERS table, compare and overwrite the values in PERIODIC_WATERMARKS, PERSISTENT_WATERMARK and USER_WATERMARK table.

The plugin logic as pseudo code:

```
lua:
PERIODIC_WATERMARKS[object_vid][watermark_name] = max(COUNTERS[object_vid][watermark_name], PERIODIC_WATERMARKS[object_vid][watermark_name])
PERSISTENT_WATERMARK[object_vid][watermark_name] = max(COUNTERS[object_vid][watermark_name], PERSISTENT_WATERMARK[object_vid][watermark_name])
USER_WATERMARK[object_vid][watermark_name] = max(COUNTERS[object_vid][watermark_name], USER_WATERMARK[object_vid][watermark_name])
```

### 3.1.4 SWSS

Portorch should be updated:
- implement new flex counter groups for queue and PG watermarks. This groups are configured with read and clear stats mode, meaning clear from HW every time it's read.
- implement PG to port map generation

New watermark orch should be implemented with the following functionality:
- Handle watermarks configuration, for example configuring TELEMETRY_INTERVAL.
- Listen to CLEAR_WATERMARK notification channel, handle clear watermark requests for USER_WATERMARKS and for HIGHEST_WATERMARKS for every type: PG_HEADROOM, PG_SHARED, QUEUE_UNICAST, QUEUE_MULTICAST.
Clear request only means clearing the data from the related table.
- Create and manage a timer, which clears the telemetry watermark every TELEMETRY_INTERVAL.

### 3.1.5 SAI Redis

Flex counter should be extended to support new PG counters.

### 3.1.6 CONFIG DB

Add new table WATERMARK_TABLE with fields like TELEMETRY_PERIOD

### 3.1.7 Syncd

FlexCounter should be extended:

to collect PG stats.
generate maps (PG to port, PG to index, PG to name)
support a new attribute STATS_MODE
use get_*_stats_ext() calls for counter collection to support read_and_clear stats mode.
To for the stats mode the flex counter group schema will be extended:
1) "POLL_INTERVAL"
2) "1000"
3) "STATS_MODE"
4) "STATS_MODE_READ_AND_CLEAR"
5) "FLEX_COUNTER_STATUS"
6) "disable"


### 3.1.8 SAI

The sai APIs anf calls are:

- sai_queue_api

sai_get_queue_stats_ext()

- sai_buffer_api

sai_get_ingress_priority_group_stats_ext()

### 3.1.9 gRPC

Sonic-telemetry will have acess to data in WATERMARK an HIGHEST_WATERMARK tables. For this the virtual db should be extended to access the said tables, virual path should should support mapping
ports to queues and priority groups. The exact syntax of the virtual paths is TBD.

Examples of virtual paths:

| | | |
|---- |:----:| ----|
| COUNTERS_DB | "WATERMARKS/Ethernet*/Queues/PERIODIC_WATERMARKS" | Queue watermarks on all Ethernet ports |
| COUNTERS_DB | "WATERMARKS/Ethernet``<port number``>/Queues/PERIODIC_WATERMARKS" | Queue watermarks on one Ethernet ports |
| COUNTERS_DB | "WATERMARKS/Ethernet*/PriorityGroups/PERIODIC_WATERMARKS" | PG watermarks on all Ethernet ports |
| COUNTERS_DB | "WATERMARKS/Ethernet``<port number``>/PriorityGroups/PERIODIC_WATERMARKS" | PG watermarks on one Ethernet ports |
| COUNTERS_DB | "WATERMARKS/Ethernet*/Queues/PERSISTENT_WATERMARKS" | Queue highest watermarks on all Ethernet ports |
| COUNTERS_DB | "WATERMARKS/Ethernet``<port number``>/Queues/PERSISTENT_WATERMARKS" | Queue highest watermarks on one Ethernet ports |
| COUNTERS_DB | "WATERMARKS/Ethernet*/PriorityGroups/PERSISTENT_WATERMARKS" | PG highest watermarks on all Ethernet ports |
| COUNTERS_DB | "WATERMARKS/Ethernet``<port number``>/PriorityGroups/PERSISTENT_WATERMARKS" | PG highest watermarks on one Ethernet ports |

### 4 Flows

#### 4.1 Watermark general flow

![](https://github.com/mykolaf/SONiC/blob/gh-pages/images/watermark_HLD/WM_general.png)

The core components are the flex counter, watermark orch, DB, CLI.

The flex counter reads and clears the watermarks on a peroid of 1s by default. The values are put directly to COUNTERS table. The flex counter also has plugins configured for queue and pg, which will be triggered on every flex counter group interval. The lua plugin will update PERIODIC_WATERMARKS, PERSISTENT_WATERMARKS and USER_WATERMARKS with if the new value exceeds the vlaue that was read from the table.

The watermark orch has 2 main functions:
- Handle the Timer that clears the PERIODIC_WATERMARKS table. Handle the configuring of the interval for the timer.
- Handle Clear notificatons. On clear event the orch should just zero-out the corresponding watermarks from the table. It will be soon repopulated by lua plugin.

The DB contains all the tables with watemarks, and the configuration table.

The Cli reads the watermarks from the tables, formats and outputs it.

#### 4.2 Resetting the telemetry period flow

![](https://github.com/mykolaf/SONiC/blob/gh-pages/images/watermark_HLD/WM_period.PNG)

The watermark orch handles notifications on changes in WATERMARK_TABLE in config DB. The new interval will be assigned to the timer during the timer handling, so the orch will reset the interval only when the current timer expires.

#### 4.3 Cli flow

![](https://github.com/mykolaf/SONiC/blob/gh-pages/images/watermark_HLD/WM_cli.png)

### 5 Open questions

#### Does the addintion of watermark counters to flex counter influence the performance of the PFC WD counters?
Binary file added images/watermark_HLD/SystemOverview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/watermark_HLD/timeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 05df96c

Please sign in to comment.