Skip to content

Commit

Permalink
Merge branch 'ethtool-add-ability-to-control-transceiver-modules-powe…
Browse files Browse the repository at this point in the history
…r-mode'

Ido Schimmel says:

====================
ethtool: Add ability to control transceiver modules' power mode

This patchset extends the ethtool netlink API to allow user space to
control transceiver modules. Two specific APIs are added, but the plan
is to extend the interface with more APIs in the future (see "Future
plans").

This submission is a complete rework of a previous submission [1] that
tried to achieve the same goal by allowing user space to write to the
EEPROMs of these modules. It was rejected as it could have enabled user
space binary blob drivers.

However, the main issue is that by directly writing to some pages of
these EEPROMs, we are interfering with the entity that is controlling
the modules (kernel / device firmware). In addition, some functionality
cannot be implemented solely by writing to the EEPROM, as it requires
the assertion / de-assertion of hardware signals (e.g., "ResetL" pin in
SFF-8636).

Motivation
==========

The kernel can currently dump the contents of module EEPROMs to user
space via the ethtool legacy ioctl API or the new netlink API. These
dumps can then be parsed by ethtool(8) according to the specification
that defines the memory map of the EEPROM. For example, SFF-8636 [2] for
QSFP and CMIS [3] for QSFP-DD.

In addition to read-only elements, these specifications also define
writeable elements that can be used to control the behavior of the
module. For example, controlling whether the module is put in low or
high power mode to limit its power consumption.

The CMIS specification even defines a message exchange mechanism (CDB,
Command Data Block) on top of the module's memory map. This allows the
host to send various commands to the module. For example, to update its
firmware.

Implementation
==============

The ethtool netlink API is extended with two new messages,
'ETHTOOL_MSG_MODULE_SET' and 'ETHTOOL_MSG_MODULE_GET', that allow user
space to set and get transceiver module parameters. Specifically, the
'ETHTOOL_A_MODULE_POWER_MODE_POLICY' attribute allows user space to
control the power mode policy of the module in order to limit its power
consumption. See detailed description in patch #1.

The user API is designed to be generic enough so that it could be used
for modules with different memory maps (e.g., SFF-8636, CMIS).

The only implementation of the device driver API in this series is for a
MAC driver (mlxsw) where the module is controlled by the device's
firmware, but it is designed to be generic enough so that it could also
be used by implementations where the module is controlled by the kernel.

Testing and introspection
=========================

See detailed description in patches #1 and #5.

Patchset overview
=================

Patch #1 adds the initial infrastructure in ethtool along with the
ability to control transceiver modules' power mode.

Patches #2-#3 add required device registers in mlxsw.

Patch #4 implements in mlxsw the ethtool operations added in patch #1.

Patch #5 adds extended link states in order to allow user space to
troubleshoot link down issues related to transceiver modules.

Patch #6 adds support for these extended states in mlxsw.

Future plans
============

* Extend 'ETHTOOL_MSG_MODULE_SET' to control Tx output among other
attributes.

* Add new ethtool message(s) to update firmware on transceiver modules.

* Extend ethtool(8) to parse more diagnostic information from CMIS
modules. No kernel changes required.

[1] https://lore.kernel.org/netdev/[email protected]/
[2] https://members.snia.org/document/dl/26418
[3] http://www.qsfp-dd.com/wp-content/uploads/2021/05/CMIS5p0.pdf

Previous versions:
[4] https://lore.kernel.org/netdev/[email protected]/
[5] https://lore.kernel.org/netdev/[email protected]/
[6] https://lore.kernel.org/netdev/[email protected]/
[7] https://lore.kernel.org/netdev/[email protected]/
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
  • Loading branch information
kuba-moo committed Oct 7, 2021
2 parents 9cbfc51 + 235dbbe commit 4c82708
Show file tree
Hide file tree
Showing 13 changed files with 697 additions and 6 deletions.
81 changes: 79 additions & 2 deletions Documentation/networking/ethtool-netlink.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,11 @@ In the message structure descriptions below, if an attribute name is suffixed
with "+", parent nest can contain multiple attributes of the same type. This
implements an array of entries.

Attributes that need to be filled-in by device drivers and that are dumped to
user space based on whether they are valid or not should not use zero as a
valid value. This avoids the need to explicitly signal the validity of the
attribute in the device driver API.


Request header
==============
Expand Down Expand Up @@ -179,7 +184,7 @@ according to message purpose:

Userspace to kernel:

===================================== ================================
===================================== =================================
``ETHTOOL_MSG_STRSET_GET`` get string set
``ETHTOOL_MSG_LINKINFO_GET`` get link settings
``ETHTOOL_MSG_LINKINFO_SET`` set link settings
Expand Down Expand Up @@ -213,7 +218,9 @@ Userspace to kernel:
``ETHTOOL_MSG_MODULE_EEPROM_GET`` read SFP module EEPROM
``ETHTOOL_MSG_STATS_GET`` get standard statistics
``ETHTOOL_MSG_PHC_VCLOCKS_GET`` get PHC virtual clocks info
===================================== ================================
``ETHTOOL_MSG_MODULE_SET`` set transceiver module parameters
``ETHTOOL_MSG_MODULE_GET`` get transceiver module parameters
===================================== =================================

Kernel to userspace:

Expand Down Expand Up @@ -252,6 +259,7 @@ Kernel to userspace:
``ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY`` read SFP module EEPROM
``ETHTOOL_MSG_STATS_GET_REPLY`` standard statistics
``ETHTOOL_MSG_PHC_VCLOCKS_GET_REPLY`` PHC virtual clocks info
``ETHTOOL_MSG_MODULE_GET_REPLY`` transceiver module parameters
======================================== =================================

``GET`` requests are sent by userspace applications to retrieve device
Expand Down Expand Up @@ -520,6 +528,8 @@ Link extended states:
power required from cable or module

``ETHTOOL_LINK_EXT_STATE_OVERHEAT`` The module is overheated

``ETHTOOL_LINK_EXT_STATE_MODULE`` Transceiver module issue
================================================ ============================================

Link extended substates:
Expand Down Expand Up @@ -613,6 +623,14 @@ Link extended substates:
``ETHTOOL_LINK_EXT_SUBSTATE_CI_CABLE_TEST_FAILURE`` Cable test failure
=================================================== ============================================

Transceiver module issue substates:

=================================================== ============================================
``ETHTOOL_LINK_EXT_SUBSTATE_MODULE_CMIS_NOT_READY`` The CMIS Module State Machine did not reach
the ModuleReady state. For example, if the
module is stuck at ModuleFault state
=================================================== ============================================

DEBUG_GET
=========

Expand Down Expand Up @@ -1521,6 +1539,63 @@ Kernel response contents:
``ETHTOOL_A_PHC_VCLOCKS_INDEX`` s32 PHC index array
==================================== ====== ==========================

MODULE_GET
==========

Gets transceiver module parameters.

Request contents:

===================================== ====== ==========================
``ETHTOOL_A_MODULE_HEADER`` nested request header
===================================== ====== ==========================

Kernel response contents:

====================================== ====== ==========================
``ETHTOOL_A_MODULE_HEADER`` nested reply header
``ETHTOOL_A_MODULE_POWER_MODE_POLICY`` u8 power mode policy
``ETHTOOL_A_MODULE_POWER_MODE`` u8 operational power mode
====================================== ====== ==========================

The optional ``ETHTOOL_A_MODULE_POWER_MODE_POLICY`` attribute encodes the
transceiver module power mode policy enforced by the host. The default policy
is driver-dependent, but "auto" is the recommended default and it should be
implemented by new drivers and drivers where conformance to a legacy behavior
is not critical.

The optional ``ETHTHOOL_A_MODULE_POWER_MODE`` attribute encodes the operational
power mode policy of the transceiver module. It is only reported when a module
is plugged-in. Possible values are:

.. kernel-doc:: include/uapi/linux/ethtool.h
:identifiers: ethtool_module_power_mode

MODULE_SET
==========

Sets transceiver module parameters.

Request contents:

====================================== ====== ==========================
``ETHTOOL_A_MODULE_HEADER`` nested request header
``ETHTOOL_A_MODULE_POWER_MODE_POLICY`` u8 power mode policy
====================================== ====== ==========================

When set, the optional ``ETHTOOL_A_MODULE_POWER_MODE_POLICY`` attribute is used
to set the transceiver module power policy enforced by the host. Possible
values are:

.. kernel-doc:: include/uapi/linux/ethtool.h
:identifiers: ethtool_module_power_mode_policy

For SFF-8636 modules, low power mode is forced by the host according to table
6-10 in revision 2.10a of the specification.

For CMIS modules, low power mode is forced by the host according to table 6-12
in revision 5.0 of the specification.

Request translation
===================

Expand Down Expand Up @@ -1620,4 +1695,6 @@ are netlink only.
n/a ``ETHTOOL_MSG_CABLE_TEST_TDR_ACT``
n/a ``ETHTOOL_MSG_TUNNEL_INFO_GET``
n/a ``ETHTOOL_MSG_PHC_VCLOCKS_GET``
n/a ``ETHTOOL_MSG_MODULE_GET``
n/a ``ETHTOOL_MSG_MODULE_SET``
=================================== =====================================
193 changes: 190 additions & 3 deletions drivers/net/ethernet/mellanox/mlxsw/core_env.c
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ struct mlxsw_env_module_info {
bool is_overheat;
int num_ports_mapped;
int num_ports_up;
enum ethtool_module_power_mode_policy power_mode_policy;
};

struct mlxsw_env {
Expand Down Expand Up @@ -445,6 +446,152 @@ int mlxsw_env_reset_module(struct net_device *netdev,
}
EXPORT_SYMBOL(mlxsw_env_reset_module);

int
mlxsw_env_get_module_power_mode(struct mlxsw_core *mlxsw_core, u8 module,
struct ethtool_module_power_mode_params *params,
struct netlink_ext_ack *extack)
{
struct mlxsw_env *mlxsw_env = mlxsw_core_env(mlxsw_core);
char mcion_pl[MLXSW_REG_MCION_LEN];
u32 status_bits;
int err;

if (WARN_ON_ONCE(module >= mlxsw_env->module_count))
return -EINVAL;

mutex_lock(&mlxsw_env->module_info_lock);

params->policy = mlxsw_env->module_info[module].power_mode_policy;

mlxsw_reg_mcion_pack(mcion_pl, module);
err = mlxsw_reg_query(mlxsw_core, MLXSW_REG(mcion), mcion_pl);
if (err) {
NL_SET_ERR_MSG_MOD(extack, "Failed to retrieve module's power mode");
goto out;
}

status_bits = mlxsw_reg_mcion_module_status_bits_get(mcion_pl);
if (!(status_bits & MLXSW_REG_MCION_MODULE_STATUS_BITS_PRESENT_MASK))
goto out;

if (status_bits & MLXSW_REG_MCION_MODULE_STATUS_BITS_LOW_POWER_MASK)
params->mode = ETHTOOL_MODULE_POWER_MODE_LOW;
else
params->mode = ETHTOOL_MODULE_POWER_MODE_HIGH;

out:
mutex_unlock(&mlxsw_env->module_info_lock);
return err;
}
EXPORT_SYMBOL(mlxsw_env_get_module_power_mode);

static int mlxsw_env_module_enable_set(struct mlxsw_core *mlxsw_core,
u8 module, bool enable)
{
enum mlxsw_reg_pmaos_admin_status admin_status;
char pmaos_pl[MLXSW_REG_PMAOS_LEN];

mlxsw_reg_pmaos_pack(pmaos_pl, module);
admin_status = enable ? MLXSW_REG_PMAOS_ADMIN_STATUS_ENABLED :
MLXSW_REG_PMAOS_ADMIN_STATUS_DISABLED;
mlxsw_reg_pmaos_admin_status_set(pmaos_pl, admin_status);
mlxsw_reg_pmaos_ase_set(pmaos_pl, true);

return mlxsw_reg_write(mlxsw_core, MLXSW_REG(pmaos), pmaos_pl);
}

static int mlxsw_env_module_low_power_set(struct mlxsw_core *mlxsw_core,
u8 module, bool low_power)
{
u16 eeprom_override_mask, eeprom_override;
char pmmp_pl[MLXSW_REG_PMMP_LEN];

mlxsw_reg_pmmp_pack(pmmp_pl, module);
mlxsw_reg_pmmp_sticky_set(pmmp_pl, true);
/* Mask all the bits except low power mode. */
eeprom_override_mask = ~MLXSW_REG_PMMP_EEPROM_OVERRIDE_LOW_POWER_MASK;
mlxsw_reg_pmmp_eeprom_override_mask_set(pmmp_pl, eeprom_override_mask);
eeprom_override = low_power ? MLXSW_REG_PMMP_EEPROM_OVERRIDE_LOW_POWER_MASK :
0;
mlxsw_reg_pmmp_eeprom_override_set(pmmp_pl, eeprom_override);

return mlxsw_reg_write(mlxsw_core, MLXSW_REG(pmmp), pmmp_pl);
}

static int __mlxsw_env_set_module_power_mode(struct mlxsw_core *mlxsw_core,
u8 module, bool low_power,
struct netlink_ext_ack *extack)
{
int err;

err = mlxsw_env_module_enable_set(mlxsw_core, module, false);
if (err) {
NL_SET_ERR_MSG_MOD(extack, "Failed to disable module");
return err;
}

err = mlxsw_env_module_low_power_set(mlxsw_core, module, low_power);
if (err) {
NL_SET_ERR_MSG_MOD(extack, "Failed to set module's power mode");
goto err_module_low_power_set;
}

err = mlxsw_env_module_enable_set(mlxsw_core, module, true);
if (err) {
NL_SET_ERR_MSG_MOD(extack, "Failed to enable module");
goto err_module_enable_set;
}

return 0;

err_module_enable_set:
mlxsw_env_module_low_power_set(mlxsw_core, module, !low_power);
err_module_low_power_set:
mlxsw_env_module_enable_set(mlxsw_core, module, true);
return err;
}

int
mlxsw_env_set_module_power_mode(struct mlxsw_core *mlxsw_core, u8 module,
enum ethtool_module_power_mode_policy policy,
struct netlink_ext_ack *extack)
{
struct mlxsw_env *mlxsw_env = mlxsw_core_env(mlxsw_core);
bool low_power;
int err = 0;

if (WARN_ON_ONCE(module >= mlxsw_env->module_count))
return -EINVAL;

if (policy != ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH &&
policy != ETHTOOL_MODULE_POWER_MODE_POLICY_AUTO) {
NL_SET_ERR_MSG_MOD(extack, "Unsupported power mode policy");
return -EOPNOTSUPP;
}

mutex_lock(&mlxsw_env->module_info_lock);

if (mlxsw_env->module_info[module].power_mode_policy == policy)
goto out;

/* If any ports are up, we are already in high power mode. */
if (mlxsw_env->module_info[module].num_ports_up)
goto out_set_policy;

low_power = policy == ETHTOOL_MODULE_POWER_MODE_POLICY_AUTO;
err = __mlxsw_env_set_module_power_mode(mlxsw_core, module, low_power,
extack);
if (err)
goto out;

out_set_policy:
mlxsw_env->module_info[module].power_mode_policy = policy;
out:
mutex_unlock(&mlxsw_env->module_info_lock);
return err;
}
EXPORT_SYMBOL(mlxsw_env_set_module_power_mode);

static int mlxsw_env_module_has_temp_sensor(struct mlxsw_core *mlxsw_core,
u8 module,
bool *p_has_temp_sensor)
Expand Down Expand Up @@ -794,15 +941,33 @@ EXPORT_SYMBOL(mlxsw_env_module_port_unmap);
int mlxsw_env_module_port_up(struct mlxsw_core *mlxsw_core, u8 module)
{
struct mlxsw_env *mlxsw_env = mlxsw_core_env(mlxsw_core);
int err = 0;

if (WARN_ON_ONCE(module >= mlxsw_env->module_count))
return -EINVAL;

mutex_lock(&mlxsw_env->module_info_lock);

if (mlxsw_env->module_info[module].power_mode_policy !=
ETHTOOL_MODULE_POWER_MODE_POLICY_AUTO)
goto out_inc;

if (mlxsw_env->module_info[module].num_ports_up != 0)
goto out_inc;

/* Transition to high power mode following first port using the module
* being put administratively up.
*/
err = __mlxsw_env_set_module_power_mode(mlxsw_core, module, false,
NULL);
if (err)
goto out_unlock;

out_inc:
mlxsw_env->module_info[module].num_ports_up++;
out_unlock:
mutex_unlock(&mlxsw_env->module_info_lock);

return 0;
return err;
}
EXPORT_SYMBOL(mlxsw_env_module_port_up);

Expand All @@ -814,7 +979,22 @@ void mlxsw_env_module_port_down(struct mlxsw_core *mlxsw_core, u8 module)
return;

mutex_lock(&mlxsw_env->module_info_lock);

mlxsw_env->module_info[module].num_ports_up--;

if (mlxsw_env->module_info[module].power_mode_policy !=
ETHTOOL_MODULE_POWER_MODE_POLICY_AUTO)
goto out_unlock;

if (mlxsw_env->module_info[module].num_ports_up != 0)
goto out_unlock;

/* Transition to low power mode following last port using the module
* being put administratively down.
*/
__mlxsw_env_set_module_power_mode(mlxsw_core, module, true, NULL);

out_unlock:
mutex_unlock(&mlxsw_env->module_info_lock);
}
EXPORT_SYMBOL(mlxsw_env_module_port_down);
Expand All @@ -824,7 +1004,7 @@ int mlxsw_env_init(struct mlxsw_core *mlxsw_core, struct mlxsw_env **p_env)
char mgpir_pl[MLXSW_REG_MGPIR_LEN];
struct mlxsw_env *env;
u8 module_count;
int err;
int i, err;

mlxsw_reg_mgpir_pack(mgpir_pl);
err = mlxsw_reg_query(mlxsw_core, MLXSW_REG(mgpir), mgpir_pl);
Expand All @@ -837,6 +1017,13 @@ int mlxsw_env_init(struct mlxsw_core *mlxsw_core, struct mlxsw_env **p_env)
if (!env)
return -ENOMEM;

/* Firmware defaults to high power mode policy where modules are
* transitioned to high power mode following plug-in.
*/
for (i = 0; i < module_count; i++)
env->module_info[i].power_mode_policy =
ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH;

mutex_init(&env->module_info_lock);
env->core = mlxsw_core;
env->module_count = module_count;
Expand Down
10 changes: 10 additions & 0 deletions drivers/net/ethernet/mellanox/mlxsw/core_env.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,16 @@ int mlxsw_env_reset_module(struct net_device *netdev,
struct mlxsw_core *mlxsw_core, u8 module,
u32 *flags);

int
mlxsw_env_get_module_power_mode(struct mlxsw_core *mlxsw_core, u8 module,
struct ethtool_module_power_mode_params *params,
struct netlink_ext_ack *extack);

int
mlxsw_env_set_module_power_mode(struct mlxsw_core *mlxsw_core, u8 module,
enum ethtool_module_power_mode_policy policy,
struct netlink_ext_ack *extack);

int
mlxsw_env_module_overheat_counter_get(struct mlxsw_core *mlxsw_core, u8 module,
u64 *p_counter);
Expand Down
Loading

0 comments on commit 4c82708

Please sign in to comment.