Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Intel RDT/MBA support for OCI/runc and Docker #1596

Open
xiaochenshen opened this issue Sep 18, 2017 · 11 comments
Open

Proposal: Intel RDT/MBA support for OCI/runc and Docker #1596

xiaochenshen opened this issue Sep 18, 2017 · 11 comments

Comments

@xiaochenshen
Copy link
Contributor

xiaochenshen commented Sep 18, 2017

The descriptions of Intel RDT/MBA features, user cases and Linux kernel interface are
heavily based on the Intel RDT documentation of the Linux kernel:

https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

Thanks to the authors of the kernel patches:
* Vikas Shivappa <[email protected]>
* Fenghua Yu <[email protected]>
* Tony Luck <[email protected]>

Status: Intel RDT/MBA support for OCI and Docker software stack

Intel RDT/MBA support in OCI (merged PRs):

1. Intel RDT/MBA support in OCI/runtime-spec

opencontainers/runtime-spec#932

2. Intel RDT/MBA support in OCI/runc

#1632
#1913
#1930
#1955
#2042

3. Intel RDT/MBA Software Controller support in OCI/runtime-spec

opencontainers/runtime-spec#992

4. Intel RDT/MBA Software Controller support in OCI/runc

#1919

TODO list - Intel RDT/MBA support in Docker:

3. Intel RDT/MBA support in containerd

4. Intel RDT/MBA support in Docker Engine (moby/moby)

5. Intel RDT/MBA support in Docker CLI


What is Intel RDT/MBA:

Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature of Intel Resource Director Technology (RDT). And Cache Allocation Technology (CAT) is another one. Please refer to the details of Intel RDT and Cache Allocation Technology (CAT) support for runc and Docker in #433 .

MBA hardware details could be found in the section 17.18 of Intel Software Developer Manual and Intel RDT Homepage.

MBA provides indirect and approximate throttle over memory bandwidth (b/w) for the software. A user controls the resource by indicating the percentage of maximum memory bandwidth or memory bandwidth limit in MBps unit if MBA Software Controller is enabled (#1919).

Linux kernel interface for Intel RDT/MBA:

In Linux 4.12 kernel and newer, Intel RDT/MBA is supported on some Intel Xeon platforms with kernel config CONFIG_INTEL_RDT. In Linux 5.1 kernel and newer, with kernel config CONFIG_X86_CPU_RESCTRL.

To check if MBA is enabled:
$ cat /proc/cpuinfo
Check if output have 'rdt_a' and 'mba' flags.

The Intel RDT kernel interface is documented as below, MBA and CAT make use of the same interface.
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

Intel RDT "resource control" filesystem hierarchy:

mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
|   |-- L3
|   |   |-- cbm_mask
|   |   |-- min_cbm_bits
|   |   |-- num_closids
|   |-- MB
|       |-- bandwidth_gran
|       |-- delay_linear
|       |-- min_bandwidth
|       |-- num_closids
|-- ...
|-- schemata
|-- tasks
|-- <container_id>
    |-- ...
    |-- schemata
    |-- tasks

For MBA support for runc, we will reuse the infrastructure and code base of Intel RDT/CAT which implemented in #1279 . We could also make use of tasks and schemata configuration for memory b/w resource constraints.

The file tasks has a list of tasks that belongs to this group (e.g., <container_id>" group). Tasks can be added to a group by writing the task ID to the "tasks" file (which will automatically remove them from the previous group to which they belonged). New tasks created by fork(2) and clone(2) are added to the same group as their parent.

The file schemata has a list of all the resources available to this group. Each resource (L3 cache, memory b/w) has its own line and format.

Memory b/w is per L3 cache domain. The schema format:

    Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."

The examples for runc:

For example on a two-socket machine with two L3 caches where the minimum memory b/w of 10%
with a memory b/w granularity of 10%. Tasks inside the container may use a maximum memory
b/w of 20% on socket 0 and 70% on socket 1.

"linux": {
    "intelRdt": {
        "memBwSchema": "MB:0=20;1=70"
    }
}

If MBA Software Controller is enabled through mount option "-o mba_MBps":

mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl`

We could specify memory bandwidth in "MBps" (Mega Bytes per second) unit instead of "percentages". The kernel underneath would use a software feedback mechanism or a "Software Controller" which reads the actual bandwidth using MBM counters and adjust the memory bandwidth percentages to ensure: "actual memory bandwidth < user specified memory bandwidth".

For example, on a two-socket machine, the schema line could be "MB:0=5000;1=7000" which means 5000 MBps memory bandwidth limit on socket 0 and 7000 MBps memory bandwidth limit on socket 1.

"linux": {
    "intelRdt": {
        "memBwSchema": "MB:0=5000;1=7000"
    }
}
@cyphar
Copy link
Member

cyphar commented Sep 18, 2017

First of all, this needs a PR against runtime-spec (as did RDT/CAT). Secondly, this schema:

Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."

Doesn't make sense to me in the context of JSON. Why not make it an array (or map) and then generate this in our code rather than adding this weird type information in a string?

@xiaochenshen
Copy link
Contributor Author

xiaochenshen commented Sep 18, 2017

@cyphar

First of all, this needs a PR against runtime-spec (as did RDT/CAT).

Yes, I will submit a PR in runtime-spec soon. Thank you.

Doesn't make sense to me in the context of JSON. Why not make it an array (or map) and then generate this in our code rather than adding this weird type information in a string?

The string "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..." is just the MBA schema in schemata file in kernel.

The number of cache domains in MBA schema heavily depends on Intel Xeon CPU hardware topology. For example, we could run the runc container on a server with 1-socket, 2-socket or 4-socket which means we have 1, 2 or 4 cache domains for the <cache_idx>. The user who is interested in Intel RDT feature may try to know some CPU topology details firstly and then write appropriate MBA JSON config accordingly.

I am open with either single string or array/map format MBA JSON config. And also I'd like to hear other maintainers' and reviewers' opinions. The array/map format JSON may look like:

"linux": {
    "intelRdt": {
        "l3CacheSchema": "L3:0=7f0;1=1f",
        "memBwSchema": [
            {
                "cacheId": 0,
                "bwPercentage": 20
            },
            {
                "cacheId": 1,
                "bwPercentage": 70
            }
        ]
    }
}

In my opinion, the single string format MBA JSON config has some advantages:

  1. (+) It is more straightforward for who is familiar with Intel RDT kernel interface, because it is kept as the same string as in kernel interface file.
  2. (+) The update command support for MBA is simpler (e.g., runc update --mem-bw-schema "MB:0=10;1=80").
  3. (+) If we support this for Docker in future. Docker will have a simpler docker run option (e.g., --mem-bw-schema) to support MBA.

And the drawbacks:

  1. (-) The JSON config looks not as user-friendly as an array/map of MBA schema.

@cyphar
Copy link
Member

cyphar commented Sep 19, 2017

@xiaochenshen There are several reasons why I don't like having an opaque string. Ultimately the runtime-spec maintainers are the ones that make a decision here, but I believe they'd agree with me:

  • Validation of the spec using a JSON schema (which we publish in releases) is not really possible for opaque strings. So there's no real way for a tool to automatically verify whether the string is correct (without writing code explicitly for it).

  • Users have to generate this string before calling down to runc (or any OCI configuration). While that might be fine for some users that are using runc (or whatever) interactively, scripts will have to generate the schema.

  • If the format is extended in the future, it's much less transparent when upgrades occur (in a JSON object you can add extra fields).

@xiaochenshen
Copy link
Contributor Author

@cyphar

There are several reasons why I don't like having an opaque string. Ultimately the runtime-spec maintainers are the ones that make a decision here, but I believe they'd agree with me:

Make sense to me. Thank you.

xiaochenshen added a commit to xiaochenshen/runtime-spec that referenced this issue Oct 16, 2017
Add support for Intel Resource Director Technology (RDT) /
Memory Bandwidth Allocation (MBA). Add memory bandwidth resource
constraints in Linux-specific configuration.

This is the prerequisite of this runc proposal:
opencontainers/runc#1596

For more information about Intel RDT/MBA, please refer to:
opencontainers/runc#1596

Signed-off-by: Xiaochen Shen <[email protected]>
xiaochenshen added a commit to xiaochenshen/runtime-spec that referenced this issue Oct 18, 2017
Add support for Intel Resource Director Technology (RDT) /
Memory Bandwidth Allocation (MBA). Add memory bandwidth resource
constraints in Linux-specific configuration.

This is the prerequisite of this runc proposal:
opencontainers/runc#1596

For more information about Intel RDT/MBA, please refer to:
opencontainers/runc#1596

Signed-off-by: Xiaochen Shen <[email protected]>
xiaochenshen added a commit to xiaochenshen/runtime-spec that referenced this issue Oct 18, 2017
Add support for Intel Resource Director Technology (RDT) /
Memory Bandwidth Allocation (MBA). Add memory bandwidth resource
constraints in Linux-specific configuration.

This is the prerequisite of this runc proposal:
opencontainers/runc#1596

For more information about Intel RDT/MBA, please refer to:
opencontainers/runc#1596

Signed-off-by: Xiaochen Shen <[email protected]>
xiaochenshen added a commit to xiaochenshen/runtime-spec that referenced this issue Oct 18, 2017
Add support for Intel Resource Director Technology (RDT) /
Memory Bandwidth Allocation (MBA). Add memory bandwidth resource
constraints in Linux-specific configuration.

This is the prerequisite of this runc proposal:
opencontainers/runc#1596

For more information about Intel RDT/MBA, please refer to:
opencontainers/runc#1596

Signed-off-by: Xiaochen Shen <[email protected]>
xiaochenshen added a commit to xiaochenshen/runtime-spec that referenced this issue Oct 18, 2017
Add support for Intel Resource Director Technology (RDT) /
Memory Bandwidth Allocation (MBA). Add memory bandwidth resource
constraints in Linux-specific configuration.

This is the prerequisite of this runc proposal:
opencontainers/runc#1596

For more information about Intel RDT/MBA, please refer to:
opencontainers/runc#1596

Signed-off-by: Xiaochen Shen <[email protected]>
xiaochenshen added a commit to xiaochenshen/runtime-spec that referenced this issue Oct 18, 2017
Add support for Intel Resource Director Technology (RDT) /
Memory Bandwidth Allocation (MBA). Add memory bandwidth resource
constraints in Linux-specific configuration.

This is the prerequisite of this runc proposal:
opencontainers/runc#1596

For more information about Intel RDT/MBA, please refer to:
opencontainers/runc#1596

Signed-off-by: Xiaochen Shen <[email protected]>
xiaochenshen added a commit to xiaochenshen/runtime-spec that referenced this issue Oct 19, 2017
Add support for Intel Resource Director Technology (RDT) /
Memory Bandwidth Allocation (MBA). Add memory bandwidth resource
constraints in Linux-specific configuration.

This is the prerequisite of this runc proposal:
opencontainers/runc#1596

For more information about Intel RDT/MBA, please refer to:
opencontainers/runc#1596

Signed-off-by: Xiaochen Shen <[email protected]>
xiaochenshen added a commit to xiaochenshen/runtime-spec that referenced this issue Oct 19, 2017
Add support for Intel Resource Director Technology (RDT) /
Memory Bandwidth Allocation (MBA). Add memory bandwidth resource
constraints in Linux-specific configuration.

This is the prerequisite of this runc proposal:
opencontainers/runc#1596

For more information about Intel RDT/MBA, please refer to:
opencontainers/runc#1596

Signed-off-by: Xiaochen Shen <[email protected]>
xiaochenshen added a commit to xiaochenshen/runtime-spec that referenced this issue Oct 20, 2017
Add support for Intel Resource Director Technology (RDT) /
Memory Bandwidth Allocation (MBA). Add memory bandwidth resource
constraints in Linux-specific configuration.

In this PR, the spec for memory bandwidth (memBwSchema) keeps
the same format as existed spec for L3 cache (l3CacheSchema)
for consistency and compatibility in runtime-spec 1.x.

Example:

"linux": {
    "intelRdt": {
        "l3CacheSchema": "L3:0=7f0;1=1f",
        "memBwSchema": "MB:0=20;1=70"
    }
}

This is the prerequisite of this runc proposal:
opencontainers/runc#1596

For more information about Intel RDT/MBA, please refer to:
opencontainers/runc#1596

Signed-off-by: Xiaochen Shen <[email protected]>
@xiaochenshen
Copy link
Contributor Author

@cyphar
Do you mind if I submit a runc Pull Request with "unstructed opaque string" format for memBwSchema throughout 1.x spec lifetime for the "tradeoff" reasons?

  1. Consistency and compatibility requirement throughout 1.x spec lifetime for existed l3CacheSchema in runtime-spec and runc.
  2. All RDT resources (memory bandwidth and L3 cache) should have unified formats (e.g., "l3CacheSchema": "L3:0=7f0;1=1f", "memBwSchema": "MB:0=20;1=70").

Here is the background as below. Thank you for review.

@wking and I have a discussion for the format of l3CacheSchema and memBwSchema
in opencontainers/runtime-spec#932 (comment)

I don't think the spec is a good place to play with the config format, because now that we've cut 1.0.0 with the existing l3CacheSchema, we need to continue to support it until this spec hits v2.

we'd need to continue to support the deprecated l3CacheSchema throughout the 1.x spec lifetime.

My plan for runtime-spec part:
opencontainers/runtime-spec#932 (comment)

  1. Firstly, I will address "L3 cache" and "memory bandwidth" with unified formats in single runtime-spec PR.
  2. To support existed "l3CacheSchema" throughout 1.x spec lifetime, and to avoid confusion of deprecated property,
  3. If we have requirement to change all Intel RDT resources into "structured schemata" in spec 2.0, I could open a new PR to slightly rework on appropriate time slot in the phase of spec 2.0.

xiaochenshen added a commit to xiaochenshen/runtime-spec that referenced this issue Oct 27, 2017
Add support for Intel Resource Director Technology (RDT) /
Memory Bandwidth Allocation (MBA). Add memory bandwidth resource
constraints in Linux-specific configuration.

In this PR, the spec for memory bandwidth (memBwSchema) keeps
the same format as existed spec for L3 cache (l3CacheSchema)
for consistency and compatibility in runtime-spec 1.x.

Example:

"linux": {
    "intelRdt": {
        "l3CacheSchema": "L3:0=7f0;1=1f",
        "memBwSchema": "MB:0=20;1=70"
    }
}

This is the prerequisite of this runc proposal:
opencontainers/runc#1596

For more information about Intel RDT/MBA, please refer to:
opencontainers/runc#1596

Signed-off-by: Xiaochen Shen <[email protected]>
xiaochenshen added a commit to xiaochenshen/runtime-spec that referenced this issue Oct 27, 2017
Add support for Intel Resource Director Technology (RDT) /
Memory Bandwidth Allocation (MBA). Add memory bandwidth resource
constraints in Linux-specific configuration.

In this PR, the spec for memory bandwidth (memBwSchema) keeps
the same format as existed spec for L3 cache (l3CacheSchema)
for consistency and compatibility in runtime-spec 1.x.

Example:

"linux": {
    "intelRdt": {
        "l3CacheSchema": "L3:0=7f0;1=1f",
        "memBwSchema": "MB:0=20;1=70"
    }
}

This is the prerequisite of this runc proposal:
opencontainers/runc#1596

For more information about Intel RDT/MBA, please refer to:
opencontainers/runc#1596

Signed-off-by: Xiaochen Shen <[email protected]>
xiaochenshen added a commit to xiaochenshen/runtime-spec that referenced this issue Oct 28, 2017
Add support for Intel Resource Director Technology (RDT) /
Memory Bandwidth Allocation (MBA). Add memory bandwidth resource
constraints in Linux-specific configuration.

This is the prerequisite of this runc proposal:
opencontainers/runc#1596

For more information about Intel RDT/MBA, please refer to:
opencontainers/runc#1596

Signed-off-by: Xiaochen Shen <[email protected]>
xiaochenshen added a commit to xiaochenshen/runtime-spec that referenced this issue Oct 28, 2017
Add support for Intel Resource Director Technology (RDT) /
Memory Bandwidth Allocation (MBA). Add memory bandwidth resource
constraints in Linux-specific configuration.

In this PR, the spec for memory bandwidth (memBwSchema) keeps
the same format as existed spec for L3 cache (l3CacheSchema)
for consistency and compatibility in runtime-spec 1.x.

Example:

"linux": {
    "intelRdt": {
        "l3CacheSchema": "L3:0=7f0;1=1f",
        "memBwSchema": "MB:0=20;1=70"
    }
}

This is the prerequisite of this runc proposal:
opencontainers/runc#1596

For more information about Intel RDT/MBA, please refer to:
opencontainers/runc#1596

Signed-off-by: Xiaochen Shen <[email protected]>
xiaochenshen added a commit to xiaochenshen/runtime-spec that referenced this issue Oct 28, 2017
Add support for Intel Resource Director Technology (RDT) /
Memory Bandwidth Allocation (MBA). Add memory bandwidth resource
constraints in Linux-specific configuration.

In this PR, the spec for memory bandwidth (memBwSchema) keeps
the same format as existed spec for L3 cache (l3CacheSchema)
for consistency and compatibility in runtime-spec 1.x.

Example:

"linux": {
    "intelRdt": {
        "l3CacheSchema": "L3:0=7f0;1=1f",
        "memBwSchema": "MB:0=20;1=70"
    }
}

This is the prerequisite of this runc proposal:
opencontainers/runc#1596

For more information about Intel RDT/MBA, please refer to:
opencontainers/runc#1596

Signed-off-by: Xiaochen Shen <[email protected]>
@xiaochenshen
Copy link
Contributor Author

ping @cyphar
Could you help comment #1596 (comment)? Thank you.

xiaochenshen added a commit to xiaochenshen/runtime-spec that referenced this issue Sep 5, 2018
Add support for Intel Resource Director Technology (RDT) /
Memory Bandwidth Allocation (MBA). Add memory bandwidth resource
constraints in Linux-specific configuration.

In this PR, the spec for memory bandwidth (memBwSchema) keeps
the same format as existed spec for L3 cache (l3CacheSchema)
for consistency and compatibility in runtime-spec 1.x.

Example:

"linux": {
    "intelRdt": {
        "l3CacheSchema": "L3:0=7f0;1=1f",
        "memBwSchema": "MB:0=20;1=70"
    }
}

This is the prerequisite of this runc proposal:
opencontainers/runc#1596

For more information about Intel RDT/MBA, please refer to:
opencontainers/runc#1596

Signed-off-by: Xiaochen Shen <[email protected]>
xiaochenshen added a commit to xiaochenshen/runtime-spec that referenced this issue Sep 11, 2018
Add support for Intel Resource Director Technology (RDT) /
Memory Bandwidth Allocation (MBA). Add memory bandwidth resource
constraints in Linux-specific configuration.

In this PR, the spec for memory bandwidth (memBwSchema) keeps
the same format as existed spec for L3 cache (l3CacheSchema)
for consistency and compatibility in runtime-spec 1.x.

Example:

"linux": {
    "intelRdt": {
        "closID": "guaranteed_group",
        "l3CacheSchema": "L3:0=7f0;1=1f",
        "memBwSchema": "MB:0=20;1=70"
    }
}

This is the prerequisite of this runc proposal:
opencontainers/runc#1596

For more information about Intel RDT/MBA, please refer to:
opencontainers/runc#1596

Signed-off-by: Xiaochen Shen <[email protected]>
xiaochenshen added a commit to xiaochenshen/runtime-spec that referenced this issue Sep 11, 2018
Add support for Intel Resource Director Technology (RDT) /
Memory Bandwidth Allocation (MBA). Add memory bandwidth resource
constraints in Linux-specific configuration.

In this PR, the spec for memory bandwidth (memBwSchema) keeps
the same format as existed spec for L3 cache (l3CacheSchema)
for consistency and compatibility in runtime-spec 1.x.

Example:

"linux": {
    "intelRdt": {
        "closID": "guaranteed_group",
        "l3CacheSchema": "L3:0=7f0;1=1f",
        "memBwSchema": "MB:0=20;1=70"
    }
}

This is the prerequisite of this runc proposal:
opencontainers/runc#1596

For more information about Intel RDT/MBA, please refer to:
opencontainers/runc#1596

Signed-off-by: Xiaochen Shen <[email protected]>
@caoruidong
Copy link

@xiaochenshen Any progress on containerd or docker?

@xiaochenshen
Copy link
Contributor Author

@caoruidong We have plan to support on containerd and Docker, but some dependencies in runc is still working in progress.

@caoruidong
Copy link

@xiaochenshen Do you mean opencontainers/runtime-spec#1076? I see most of RDT feature PRs have been merged in runtime-spec

@xiaochenshen
Copy link
Contributor Author

@caoruidong This is one of the reasons. Generally, we need to make the framework and APIs stable enough in runtime-spec and runc.

@caoruidong
Copy link

@xiaochenshen Thanks for the information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants