Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a dispatch to Ephemeral net class for is_BSD() #363

Closed
wants to merge 4 commits into from

Conversation

igalic
Copy link
Collaborator

@igalic igalic commented May 13, 2020

add BSD support to EphemeralIPv4Network by dispatching on is_BSD()

@igalic
Copy link
Collaborator Author

igalic commented May 13, 2020

first test: cloud-init once again doesn't recognize my DataSource

second test: "it works", here's the generated rc.conf:

clear_tmp_enable=YES
syslogd_flags=-ss
sendmail_enable=NONE
hostname=cloud-init-dev01
ifconfig_vtnet0=DHCP
ifconfig_vtnet0_ipv6='inet6 2a01:4f9:c010:6cdf::/64'
sshd_enable=YES
ntpd_enable=YES
# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev=AUTO
zfs_enable=YES
cloudinit_enable=YES
ifconfig_vtnet0_name=eth0
ifconfig_eth0=DHCP

which of course then means that IPv6 no longer works, because vtnet0 has been renamed to eth0, and the freebsd renderer doesn't support IPv6 yet.
But, baby steps.

@igalic igalic force-pushed the ephemeral-net/freebsd branch from 961d110 to 24ab08c Compare May 14, 2020 07:40
@igalic igalic changed the title add a dispatch to Ephemeral class for is_FreeBSD() add a dispatch to Ephemeral class for is_BSD() May 14, 2020
@igalic igalic changed the title add a dispatch to Ephemeral class for is_BSD() add a dispatch to Ephemeral net class for is_BSD() May 14, 2020
@igalic
Copy link
Collaborator Author

igalic commented May 14, 2020

@goneri would you mind taking a look at this?
i'm not sure how you'd best test it at this stage, without DHCP support tho…

@goneri
Copy link
Contributor

goneri commented May 14, 2020

@goneri would you mind taking a look at this?
i'm not sure how you'd best test it at this stage, without DHCP support tho…

Hi!

Well, I can try to run my usual tests (ConfigDrive and OpenStack+DHCP). At least we will know if there is a regression.

@OddBloke
Copy link
Collaborator

Hey Mina, thanks for putting this together! I've only taken a cursory look at this, but I'm worried about using a pattern that ensures we will have to have many, many different functions within a same class to support multiple platforms. This seems like a clear case where separate classes for each of BSD and Linux would make sense.

One approach to implementing such classes piece-by-piece (rather than landing a single enormous refactor PR which would take ~months to write and review) would be to have EphemeralIPv4Network instantiate the appropriate class in its __init__ and store it on self. Then the entire rest of the class can (hopefully) be written to be platform-agnostic. For example, _network_delete could just call self.network_thingie.delete_address(self.interface).

Another, perhaps better long-term approach, would be to follow @rjschwei's proposal on the mailing list from back in January and start moving these methods onto the Distro classes. This would require plumbing a distro object through to EphemeralIPv4Network wherever it is used. A cursory examination suggests it is used directly by the Hetzner datasource (which has the distro object passed to it on instantiation) and EphemeralDHCPv4. It looks to me like EphemeralDHCPv4 is always invoked by a datasource (albeit via a helper object in Azure's case), so I think this is a feasible forward direction. (This would then mean that _network_delete would call something like self.distro.delete_address(self.interface).)

I think the galaxy brain iteration of this approach would actually move EphemeralIPv4Network and EphemeralDHCPv4 to be accessed via the Distro classes, so instead of instantiating the objects directly, datasources would do something like ephipv4 = self.distro.ephemeral_ipv4_network(**kwargs) or with self.distro.ephemeral_dhcpv4(...):. That feels like more than we should attempt to achieve in the first iteration of this process, though.

(As an aside, we may want to consider implementing the distro networking so it's in a separate namespace because e.g. distro.bringup_device() is ambiguous (what type of device?). We could rename it distro.network_bringup_device() or distro.bringup_network_device(), but I would propose a more structured approach: all Distro subclasses having a partner Networking class which is instantiated at Distro instantiation time at self.networking. This makes the interface for consumers: distro.networking.bringup_device() which gives us a clear indication that this is a networking command without having to mentally parse the method's name to determine what its field of operation is. I think it will also make our distro code clearer and more organised.)

@igalic
Copy link
Collaborator Author

igalic commented May 14, 2020

my understanding of a Networking subclass is that it would be mostly static methods? or?

can a method of subclass still be extended?

class Linux(Distro): # i know this is a lie, let's just pretend
  def __init(self):
    self.networking = Networking.new()

  class Networking:
    def is_wireless(self):
       pass

how do we extend is_wireless() in the SLES Distro?

@OddBloke OddBloke self-assigned this May 14, 2020
@OddBloke
Copy link
Collaborator

OddBloke commented May 14, 2020

I would envision something like this:

diff --git a/cloudinit/distros/__init__.py b/cloudinit/distros/__init__.py
index e99529dff..a71bf7ede 100755
--- a/cloudinit/distros/__init__.py
+++ b/cloudinit/distros/__init__.py
@@ -56,6 +56,21 @@ PREFERRED_NTP_CLIENTS = ['chrony', 'systemd-timesyncd', 'ntp', 'ntpdate']
 LDH_ASCII_CHARS = string.ascii_letters + string.digits + "-"
 
 
+class Networking(metaclass=abc.ABCMeta):
+
+    @abc.abstractmethod
+    def is_wireless(self):
+        pass
+
+
+class LinuxNetworking(Networking):
+
+    def is_wireless(self):
+        # use common Linux commands to check
+        is_wireless = util.subp(['some', 'helper'])
+        return is_wireless
+
+
 class Distro(metaclass=abc.ABCMeta):
 
     usr_lib_exec = "/usr/lib"
@@ -66,11 +81,13 @@ class Distro(metaclass=abc.ABCMeta):
     init_cmd = ['service']  # systemctl, service etc
     renderer_configs = {}
     _preferred_ntp_clients = None
+    networking_cls = LinuxNetworking
 
     def __init__(self, name, cfg, paths):
         self._paths = paths
         self._cfg = cfg
         self.name = name
+        self.networking = self.networking_cls()
 
     @abc.abstractmethod
     def install_packages(self, pkglist):
diff --git a/cloudinit/distros/bsd.py b/cloudinit/distros/bsd.py
index 37cf93bfb..9173db75c 100644
--- a/cloudinit/distros/bsd.py
+++ b/cloudinit/distros/bsd.py
@@ -10,7 +10,15 @@ from cloudinit import util
 LOG = logging.getLogger(__name__)
 
 
+class BSDNetworking(distros.Networking):
+
+    def is_wireless(self):
+        is_wireless = util.subp(['bsd', 'helper'])
+        return is_wireless
+
+
 class BSD(distros.Distro):
+    networking_cls = BSDNetworking
     hostname_conf_fn = '/etc/rc.conf'
     rc_conf_fn = "/etc/rc.conf"
 
diff --git a/cloudinit/distros/opensuse.py b/cloudinit/distros/opensuse.py
index 68028d20c..19883a67d 100644
--- a/cloudinit/distros/opensuse.py
+++ b/cloudinit/distros/opensuse.py
@@ -22,7 +22,15 @@ from cloudinit.settings import PER_INSTANCE
 LOG = logging.getLogger(__name__)
 
 
+class OpenSuseNetworking(distros.LinuxNetworking):
+
+    def is_wireless(self):
+        is_wireless = util.subp(['suse', 'specific', 'helper'])
+        return is_wireless
+
+
 class Distro(distros.Distro):
+    networking_cls = OpenSuseNetworking
     clock_conf_fn = '/etc/sysconfig/clock'
     hostname_conf_fn = '/etc/HOSTNAME'
     init_cmd = ['service']

So we have a parallel hierarchy of Networking sub-classes, distros specify which one they should use, and the common __init__ ensures that it's instantiated appropriately. Does that make sense?

@igalic igalic force-pushed the ephemeral-net/freebsd branch from 24ab08c to b827075 Compare May 14, 2020 16:59
@igalic
Copy link
Collaborator Author

igalic commented May 14, 2020

thank you for your guidance @OddBloke.
i think the approach we're settling on is a Networking subclass…

what i'm still unclear on is what to do with the current module functions (such as the is_wireless())
once they're moved into the subclass, we'll have to change how all of them are used…

and by all of them i mean:

  • is_physical()
cloudinit/sources/DataSourceOpenNebula.py
460:    return dict([(m, n) for m, n in devs.items() if net.is_physical(n)])

cloudinit/sources/helpers/digitalocean.py
57:    nics = [f for f in cloudnet.get_devicelist() if cloudnet.is_physical(f)]
  • is_netfail_master()
cloudinit/sources/DataSourceOracle.py
27:    is_netfail_master,
146:                    elif is_netfail_master(cur_name):
157:                    elif is_netfail_master(cur_name):
  • is_up()
cloudinit/sources/DataSourceAzure.py
479:            use_cached_ephemeral = (net.is_up(self.fallback_interface) and
1398:    if net.is_up(fallback_nic):
  • read_sys_net_int() — extremely Linux specific
cloudinit/sources/helpers/digitalocean.py
60:    return min(nics, key=lambda d: cloudnet.read_sys_net_int(d, 'ifindex'))
  • find_fallback_nic() — lots:
cloudinit/sources/DataSourceHetzner.py
52:        nic = cloudnet.find_fallback_nic()

cloudinit/sources/__init__.py
399:            self._fallback_interface = net.find_fallback_nic()

cloudinit/net/__init__.py
334:def find_fallback_nic(blacklist_drivers=None):
453:    target_name = find_fallback_nic(blacklist_drivers=blacklist_drivers)

cloudinit/net/dhcp.py
16:    EphemeralIPv4Network, find_fallback_nic, get_devicelist,
145:        nic = find_fallback_nic()

cloudinit/sources/DataSourceOracle.py
197:            with dhcp.EphemeralDHCPv4(net.find_fallback_nic()):

cloudinit/sources/DataSourceScaleway.py
213:            self._fallback_interface = net.find_fallback_nic()
239:            self._fallback_interface = net.find_fallback_nic()
  • is_disabled_cfg()
cloudinit/stages.py
661:            if net.is_disabled_cfg(ncfg):
  • get_devicelist()
cloudinit/net/netplan.py
12:from cloudinit.net import SYS_CLASS_NET, get_devicelist
254:                    for iface in get_devicelist() if

cloudinit/net/dhcp.py
16:    EphemeralIPv4Network, find_fallback_nic, get_devicelist,
149:    elif nic not in get_devicelist():
151:            'Skip dhcp_discovery: nic %s not found in get_devicelist.', nic)

cloudinit/sources/helpers/digitalocean.py
57:    nics = [f for f in cloudnet.get_devicelist() if cloudnet.is_physical(f)]

cloudinit/net/cmdline.py
18:from . import get_devicelist
61:            for k in get_devicelist():

etc, etc…

@igalic
Copy link
Collaborator Author

igalic commented May 14, 2020

and finally, if we're splitting out utility functions from EphemeralIPv4, we need a channel to communicate back how to undo what the functions did, or else have the equivalent teardown_x() for every bringup_x() the latter seems more sensible… and i don't understand why it's not done like that in EphemeralIPv4.

@OddBloke
Copy link
Collaborator

what i'm still unclear on is what to do with the current module functions (such as the is_wireless())
once they're moved into the subclass, we'll have to change how all of them are used…

Right, this is not a small change that we're talking about. All of these call sites will need to have access to the Distro object to call distro.networking.<method>(...).

Where we're calling these things from a datasource, we know we have self.distro available to us, so it's really just a case of (a) using self.distro in methods, (b) passing that into to functions, or (c) converting module-level functions to methods so they can use self.distro.

For the non-datasource cases, it's less clear-cut. But, for example, the cloudinit.net.cmdline get_devicelist() call does originate from Init._find_networking_config, which does have the Distro object available at self.distro. I would expect that we will find that the majority of these are achievable, but we should figure out up-front if we anticipate any of them being a major problem.

An important point I want to make (or perhaps restate): this is a big change we're talking about, but we don't need to land this change in one piece. We can (and absolutely should, IMO) migrate function-by-function, and make this big change as a series of incremental small changes. (We should figure out upfront what all the functions we intend to move are, so that we know when we are Done (TM), but we can land the individual migrations separately.)

is_disabled_cfg()

This one isn't distro-specific, I don't think, it's related to network configuration that is passed to cloud-init, not network configuration that cloud-init is writing to the system. This is an example of something that this proposed refactor would help with: distinguishing which networking functions are to do with cloud-init's network configuration (these functions are not distro-specific, and would presumably stay in cloudinit.net), and which are to do with applying that configuration to the system (which are distro-specific, and would move into the Networking class hierarchy).

@OddBloke
Copy link
Collaborator

OddBloke commented May 14, 2020

and finally, if we're splitting out utility functions from EphemeralIPv4, we need a channel to communicate back how to undo what the functions did

So I was thinking that the interface for EphemeralIPv4Network would be (using Hetzner as the example; not wedded to TitleCase for the name):

with self.distro.networking.EphemeralIPv4Network(
    nic, "169.254.0.1", 16, "169.254.255.255"
):
    md = hc_helper.read_metadata(
        self.metadata_address, timeout=self.timeout,
        sec_between=self.wait_retry, retries=self.retries)
    ud = hc_helper.read_userdata(
        self.userdata_address, timeout=self.timeout,
        sec_between=self.wait_retry, retries=self.retries)

So we don't need to communicate anything back to the caller of EphemeralIPv4Network: all that logic is encapsulated in the context manager.

The follow-up question is then: what does that implementation look like on the Networking class. We could decide that each Networking class defines that itself (presumably we're only really talking about one for Linux and one for BSD), or we could have a hierarchy of EphemeralIPv4Network classes too, with the super-class looking something like this:

class EphemeralIPv4Network(metaclass=abc.ABCMeta):

    def __init__(self, interface, ip, prefix_or_mask, broadcast, router=None,
                 connectivity_url=None, static_routes=None):
        # ... the current content of EphemeralIPv4Network.__init__ ...
        pass

    def __enter__(self):
        # ... the current content of EphemeralIPv4Network.__enter__ ...
        pass

    def __exit__(self, excp_type, excp_value, excp_traceback):
        # ... the current content of EphemeralIPv4Network.__exit__ ...
        pass

    @abc.abstractmethod
    def _delete_address(self, address, prefix):
        pass

    @abc.abstractmethod
    def _bringup_device(self):
        pass

    @abc.abstractmethod
    def _bringup_static_routes(self):
        pass

    @abc.abstractmethod
    def _bringup_router(self):
        pass

Then the Linux subclass would basically be the current class (with the __init__, __enter__ and __exit__ methods dropped, of course) and the BSD class would have all the __thing_on_bsd methods you've implemented in this PR (without the extra _ or "_on_bsd", of course).

or else have the equivalent teardown_x() for every bringup_x() the latter seems more sensible… and i don't understand why it's not done like that in EphemeralIPv4.

This becomes a problem if bringup_x could perform part of its operation and then fail (_bringup_router is an example of this). teardown_x would then have to be written in such a way that it can (a) detect exactly what state the system is in as regards "x", and (b) handle tearing down from any of the potential intermediate states that bringup_x could have left the system in. Contrast that with the current implementation, where the corresponding "teardown" command is pushed into self.cleanup_cmds after a particular "bringup" command succeeds: all that is required on failure is that every cleanup_cmd is run in order: no state detection required, and no conditional logic to handle partial teardown.

@raharper
Copy link
Collaborator

@dan and @igalic

Thanks for working through this topic. In general I'm happy to see us push toward a distro owned Networking subclass; this has a number of benefits including easier handling of rendering for the variants (knowing it's on Fedora vs. Centos vs SLES or OpenSuSE) and utlizing distro specific tooling and paths for acquiring the requested change or information.

Dan has outlined a solid approach to refactoring this and I'm on board with enumerating all of the caller's users so we can see the scope of the work. A couple of thoughts for consideration.

  1. The rendering classes will likely need a refactor as well; they lightly touch distro specific things but I suspect this will need a bigger change to avoid re-implementing the renderer types in each distro. The current call path is:
    cloudinit/stages.py -> distro.apply_network_config -> net.renderers(config)

and we've "lost" the distro variant info; and I think we want the renderer to take the distro class object; it can then abstractly call paths and networking sublcass functions to render out the appropriate content for the distro.

This is not required initially, but as we work through the conversion I think it will clean up a lot of the code in say, sysconfig, which is already rather split-brained for different variants.

  1. for the networking subclass, I'm excited to see if we can apply some lru_cache() to the attributes; cloud-init invokes many of the net methods which rarely change (though some will). Some careful though on this can be a source of improved boot speed

  2. The datasource has a distro object attribute; this is persisted in many datasources (/var/lib/cloud/data/obj.pkl; adding a networking subclass to this could have some pitfalls here w.r.t the types of data we keep (if we keep any, see (2)).

  3. Do we expect to want to retain any cloudinit.net. wrappers for out-of-tree users. There are some known users of the cloudinit module API. If we wanted to, I suspect that importing from cloudinit.net could on import determine the distro and instantiate the distro networking subclass and expose the API.

  4. Another path is to retain cloudinit.net as an API that is just abstract methods that are provided by an instance of distro.networking subclass; if at import time of cloudinit.net we could determine the current distro, we could plug in the subclass; avoid re-writing all callers of the cloudinit.net API from having to provide/get the distro class and calling distro.net.foo whey they already call net.foo.

@igalic igalic force-pushed the ephemeral-net/freebsd branch from b827075 to 25f27b7 Compare May 18, 2020 18:07
@igalic
Copy link
Collaborator Author

igalic commented May 18, 2020

@raharper thank you very much for weighing in.
As some of y'all might know, i'm not an actual software engineer. I hail from Ops, and all development i have done is usually Ops related (i count cloud-init as ops related)

however, this is a major transformation, and I will need engineering guidance, and as such invite you or @OddBloke or whoever else feels qualified (by the virtue of having the skill, insight into the code, or simply: the time) to help me out here by pushing commits providing the basic structure and tests transformation, and so i can "colour in the blanks".

igalic added 2 commits May 20, 2020 20:20
start adding FreeBSD support to EphemeralIPv4Network. Very crudely.
so we use util.is_BSD() as dispatcher, and name our functions _on_bsd()
while we're at it, we can also start fixing pylint and pycodestyle
errors
@igalic igalic force-pushed the ephemeral-net/freebsd branch from 25f27b7 to 5ac9eb3 Compare May 20, 2020 19:00
@OddBloke
Copy link
Collaborator

Thanks for the thoughts, Ryan!

  1. The rendering classes will likely need a refactor as well; they lightly touch distro specific things but I suspect this will need a bigger change to avoid re-implementing the renderer types in each distro. The current call path is:
    cloudinit/stages.py -> distro.apply_network_config -> net.renderers(config)

and we've "lost" the distro variant info; and I think we want the renderer to take the distro class object; it can then abstractly call paths and networking sublcass functions to render out the appropriate content for the distro.

This is not required initially, but as we work through the conversion I think it will clean up a lot of the code in say, sysconfig, which is already rather split-brained for different variants.

Yep, agreed that the path here is to pass something into the renderers that they can use to, essentially, parameterise functionality. We may want to consider passing in the Networking instance of the distro instead of the full Distro instance, but I also agree that this isn't required initially, and I think the appropriate refactor in this particular instance will be clearer once we're further down the broader refactoring path.

  1. for the networking subclass, I'm excited to see if we can apply some lru_cache() to the attributes; cloud-init invokes many of the net methods which rarely change (though some will). Some careful though on this can be a source of improved boot speed

Yes, I think this could definitely give us some performance wins. I propose that we treat this as a second-pass: focus on the correctness first, and then look for performance wins. (We don't necessarily need to wait until the whole refactor is complete to start this second-pass, to be clear; once $function is migrated over, then we can examine it for cache-ability.)

  1. The datasource has a distro object attribute; this is persisted in many datasources (/var/lib/cloud/data/obj.pkl; adding a networking subclass to this could have some pitfalls here w.r.t the types of data we keep (if we keep any, see (2)).

You're absolutely right, we'll need to be careful about (a) what we're storing in the pickled object, and (b) how that gets round-tripped.

(Live look at me remembering the problems I've had with obj.pkl in the past: image)

  1. Do we expect to want to retain any cloudinit.net. wrappers for out-of-tree users. There are some known users of the cloudinit module API.

Who are they, and do we know how badly affected they would be by an API break?

If we wanted to, I suspect that importing from cloudinit.net could on import determine the distro and instantiate the distro networking subclass and expose the API.

I'd be hesitant to do this on import, but certainly a function which instantiates and returns the subclass would make sense. This would be an API break, but provided we do it thoughtfully then it could set us up well for future API breaks too. (Though, at the same time: as I don't believe we consider this a public interface, we should be careful to not make it seem too carefully curated, as that may lead people to depend on it more than we would like.)

(An alternative API might be a classmethod on the base Networking class which determines the Distro class, instantiates it and returns distro.networking: from cloudinit.distros import Networking; networking = Networking.determine_from_environment() or similar.)

My hesitancy to do it on import stems from the fact that I think consumers would find this behaviour unexpected. Generally, you don't expect importing one module or another to be particularly expensive, and this might lead people to move imports from happening at their top-level to the specific place they want to use cloudinit.net (so that they only pay the cost if they're using it). This is generally not the way imports in Python work, which indicates that an API that pushed people to do that is the wrong one. (We would also need to be very careful about accidentally importing cloudinit.net ourselves, due to the cost.)

  1. Another path is to retain cloudinit.net as an API that is just abstract methods that are provided by an instance of distro.networking subclass; if at import time of cloudinit.net we could determine the current distro, we could plug in the subclass; avoid re-writing all callers of the cloudinit.net API from having to provide/get the distro class and calling distro.net.foo whey they already call net.foo.

Again, I have concerns about doing work at import time. That aside, I see no reason why we would change the names during the refactor (absent some specific reason for individual functions, perhaps), so if we provide an interface that returns an instantiated Networking object, then net = please_give_me_a_networking_object() could be a fairly decent replacement for from cloudinit import net; both return a namespace that contains networking-related callables with the same names and similar interfaces.

If we do want to provide a consistent API for external consumers, however, that does mean that we need to slightly revisit how we approach this refactor. If, as previously proposed, we did it by removing a function from cloudinit.net and adding it to the appropriate Networking (sub)classes, one at a time, then we wouldn't have a complete API in either location from the time the first function is moved to the time the last function is moved. That is going to be longer than this API can acceptably be broken for, because of the upstream snapshotting we do for Ubuntu.

So, instead, I propose that at the start of this effort, we determine what functions from cloudinit.net should be moved over (as we planned to do anyway) and then add concrete method implementations for every single one of them to the Networking super-class that look like this:

def is_physical(self, devname):
    return cloudinit.net.is_physical(devname)

This means that we immediately have the full, functionally-equivalent API available on the Networking (sub)classes and so can immediately direct people to migrate to it (via the distro-discovery helper described above, of course). The incremental work of the migration would then be:

  • pick a function in Networking
  • refactor its implementation from cloudinit.net into the Networking hierarchy (e.g. if it has an if/else on BSD, this is the time to put the implementations in their respective subclasses)
  • refactor all of its callers to call the Networking method instead of cloudinit.net (this is likely to be the most time-consuming step, as it may require plumbing Distro objects through to places that previously have not consumed them)
  • Finally, remove it from cloudinit.net (else we will have parallel implementations).

(If we settle on retaining some sort of module-level interface for cloudinit.net then the "remove" step won't be exactly that, but it will still remove the duplicate implementation.)

@OddBloke
Copy link
Collaborator

(FYI, I hope to put up a WIP proposal PR for that initial "start" of the refactor tomorrow, unless someone raises an objection before I get to it. So if you have a fundamental objection, please at least ping me!)

@raharper
Copy link
Collaborator

Thanks for the thoughts, Ryan!

  1. The rendering classes will likely need a refactor as well; they lightly touch distro specific things but I suspect this will need a bigger change to avoid re-implementing the renderer types in each distro. The current call path is:
    cloudinit/stages.py -> distro.apply_network_config -> net.renderers(config)

and we've "lost" the distro variant info; and I think we want the renderer to take the distro class object; it can then abstractly call paths and networking sublcass functions to render out the appropriate content for the distro.
This is not required initially, but as we work through the conversion I think it will clean up a lot of the code in say, sysconfig, which is already rather split-brained for different variants.

Yep, agreed that the path here is to pass something into the renderers that they can use to, essentially, parameterise functionality. We may want to consider passing in the Networking instance of the distro instead of the full Distro instance, but I also agree that this isn't required initially, and I think the appropriate refactor in this particular instance will be clearer once we're further down the broader refactoring path.

Agreed, that sounds like the right path.

  1. for the networking subclass, I'm excited to see if we can apply some lru_cache() to the attributes; cloud-init invokes many of the net methods which rarely change (though some will). Some careful though on this can be a source of improved boot speed

Yes, I think this could definitely give us some performance wins. I propose that we treat this as a second-pass: focus on the correctness first, and then look for performance wins. (We don't necessarily need to wait until the whole refactor is complete to start this second-pass, to be clear; once $function is migrated over, then we can examine it for cache-ability.)

Agreed.

  1. Do we expect to want to retain any cloudinit.net. wrappers for out-of-tree users. There are some known users of the cloudinit module API.

Who are they, and do we know how badly affected they would be by an API break?

Well, I don't know. For a start, we can look at cloudinit code itself to see
the callers of cloudinit.net ... I look at those as "external callers" even if
they are part of the code base. The common methods used would be a good guess
toward what outside tree code might call.

If we wanted to, I suspect that importing from cloudinit.net could on import determine the distro and instantiate the distro networking subclass and expose the API.

I'd be hesitant to do this on import, but certainly a function which instantiates and returns the subclass would make sense. This would be an API break, but provided we do it thoughtfully then it could set us up well for future API breaks too. (Though, at the same time: as I don't believe we consider this a public interface, we should be careful to not make it seem too carefully curated, as that may lead people to depend on it more than we would like.)

(An alternative API might be a classmethod on the base Networking class which determines the Distro class, instantiates it and returns distro.networking: from cloudinit.distros import Networking; networking = Networking.determine_from_environment() or similar.)

My hesitancy to do it on import stems from the fact that I think consumers would find this behaviour unexpected. Generally, you don't expect importing one module or another to be particularly expensive, and this might lead people to move imports from happening at their top-level to the specific place they want to use cloudinit.net (so that they only pay the cost if they're using it). This is generally not the way imports in Python work, which indicates that an API that pushed people to do that is the wrong one. (We would also need to be very careful about accidentally importing cloudinit.net ourselves, due to the cost.)

Right

  1. Another path is to retain cloudinit.net as an API that is just abstract methods that are provided by an instance of distro.networking subclass; if at import time of cloudinit.net we could determine the current distro, we could plug in the subclass; avoid re-writing all callers of the cloudinit.net API from having to provide/get the distro class and calling distro.net.foo whey they already call net.foo.

Again, I have concerns about doing work at import time. That aside, I see no reason why we would change the names during the refactor (absent some specific reason for individual functions, perhaps), so if we provide an interface that returns an instantiated Networking object, then net = please_give_me_a_networking_object() could be a fairly decent replacement for from cloudinit import net; both return a namespace that contains networking-related callables with the same names and similar interfaces.

If we do want to provide a consistent API for external consumers, however, that does mean that we need to slightly revisit how we approach this refactor. If, as previously proposed, we did it by removing a function from cloudinit.net and adding it to the appropriate Networking (sub)classes, one at a time, then we wouldn't have a complete API in either location from the time the first function is moved to the time the last function is moved. That is going to be longer than this API can acceptably be broken for, because of the upstream snapshotting we do for Ubuntu.

So, instead, I propose that at the start of this effort, we determine what functions from cloudinit.net should be moved over (as we planned to do anyway) and then add concrete method implementations for every single one of them to the Networking super-class that look like this:

def is_physical(self, devname):
    return cloudinit.net.is_physical(devname)

This means that we immediately have the full, functionally-equivalent API available on the Networking (sub)classes and so can immediately direct people to migrate to it (via the distro-discovery helper described above, of course). The incremental work of the migration would then be:

* pick a function in `Networking`

* refactor its implementation from `cloudinit.net` into the `Networking` hierarchy (e.g. if it has an if/else on BSD, this is the time to put the implementations in their respective subclasses)

* refactor all of its callers to call the `Networking` method instead of `cloudinit.net` (this is likely to be the most time-consuming step, as it may require plumbing `Distro` objects through to places that previously have not consumed them)

* Finally, remove it from `cloudinit.net` (else we will have parallel implementations).

(If we settle on retaining some sort of module-level interface for cloudinit.net then the "remove" step won't be exactly that, but it will still remove the duplicate implementation.)

I like this quite a lot Dan. Thanks for moving this forward!

@OddBloke
Copy link
Collaborator

OddBloke commented May 22, 2020 via email

@OddBloke
Copy link
Collaborator

@igalic any objections to closing this out now that we're going in a different direction (starting in #384)?

@igalic igalic closed this May 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants