Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Work in progress] Implement concurrent resync #852

Merged
merged 98 commits into from
Nov 17, 2015
Merged

Conversation

fasaxc
Copy link
Member

@fasaxc fasaxc commented Oct 21, 2015

Rather than stopping the world to resync with etcd, process the resync in parallel with watching etcd. Improves scalability under high churn.

  • Introduce driver threads
  • Merge updates into consistent stream
  • Pass logging config to driver
  • Driver initialises logging
  • Pass status messages from driver to Felix
  • Handle config key updates in Felix
  • Handle Ready key update in driver
  • Special-case top-level directory deletions? If a delete nukes the ready flag, it's better to resync than to send in 1000s of deletes and spot the ready flag in the middle.
  • Rework snapshot processing to use in-sync message instead
    • do cleanup after in-sync message
    • ipset manager defers update to dirty tags until we're in-sync
    • dispatch chains defers updates until in-sync
    • profile rules manager defers programming (at least deletions) until we're in sync
    • hosts ipset
    • status reporting cleanup
  • Explicit logfile name in config
  • Handle trie corner cases: encode invalid characters?
  • Handle etcd error cases in driver:
    • Snapshot error response
    • Event error response
  • Tidy up
    • Driver code: comments, logging
    • ijson loop
    • fetcd code
    • Check syslog process name
  • Make simple etcd response handling common (reuse python-etcd objects?)
  • logrotate config for new driver log file
  • Fix existing UTs
  • UTs for driver
  • UTs for felix
  • Package ijson for Ubuntu
  • Package datrie for Ubuntu
  • Package deps for Redhat?
  • Remove private version number
  • Docs

Extras?:
Let's do these as follow-ups:

  • Optimize incremental parsing perf?
  • Use faster json library for parsing events?

@fasaxc fasaxc self-assigned this Oct 21, 2015
@fasaxc fasaxc added this to the 1.3.0 milestone Oct 21, 2015
@fasaxc fasaxc force-pushed the smc-concurrent-resync branch 2 times, most recently from 215fd2a to 3947ec7 Compare October 23, 2015 11:02
@fasaxc fasaxc force-pushed the smc-concurrent-resync branch 4 times, most recently from 2d51fed to a3a96ce Compare October 27, 2015 14:26
@fasaxc
Copy link
Member Author

fasaxc commented Oct 27, 2015

retest this please

@fasaxc
Copy link
Member Author

fasaxc commented Oct 27, 2015

@plwhite I think this is worth a first look while I'm on vacation. Some areas that I know still need work:

  • There's duplicated etcd request/response code in the driver. Might make sense to use python-etcd for the simpler requests.
  • I have to fake up a Node object to pass to the PathDispatcher, which feels a bit ugly. Maybe that's liveable-with though.
  • Feels like the socket read/write code could go into utility functions.

@fasaxc fasaxc assigned plwhite and unassigned fasaxc Oct 27, 2015
@@ -47,6 +47,7 @@ def __init__(self, config, ip_version, iptables_updater):
self.ifaces = set()
self.programmed_leaf_chains = set()
self._dirty = False
self._datamodel_in_sync = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the name of this variable confusing. It really means "is there a snapshot in progress", but I read it as "the first snapshot has completed". Worth renaming it, I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your first reading was right: it means "first snapshot complete".

@plwhite
Copy link
Contributor

plwhite commented Oct 29, 2015

Mostly this just looks right, though had a poke round rather than a full review. Some comments.

  • Can we rely in messages arriving in order with SOCK_STREAM? Probably this is OK since there is only one thing on the other end, so it must be fine, but should we use something with guaranteed ordering to make the point?
  • If the driver process dies, how does Felix notice? I assume it must, but I couldn't see the code.

I need to do more, but net of this is that I'm comfortable with what I've seen.

@fasaxc
Copy link
Member Author

fasaxc commented Nov 2, 2015

  • SOCK_STREAM is guaranteed in-order. It's basically equivalent to TCP.
  • If the driver dies, Felix should get an exception on its recv().

@Lukasa
Copy link
Contributor

Lukasa commented Nov 2, 2015

@plwhite For the record, SOCK_STREAM is definitionally in-order, regardless of the underlying protocol (AF_UNIX, AF_APPLETALK, whatever), which means your desire to "use something with guaranteed ordering to make the point" is actually already done by this code! This is one of the few good decisions of the socket API: you can ask for "whatever is equivalent to TCP" on your chosen protocol and get it. Specifically, from the man page:

SOCK_STREAM provides sequenced, reliable, two-way, connection-based byte streams.

Note also:

SOCK_DGRAM supports datagrams (connectionless, unreliable messages of fixed maximum length).

Though I have no idea how AF_UNIX and SOCK_DGRAM combine to produce unreliability: feels like a fun thing to investigate some time: can you actually lose a datagram on an AF_UNIX socket?

@matthewdupre might be interested by the possibility of combining AF_UNIX and SOCK_SEQPACKET:

SOCK_SEQPACKET provides a sequenced, reliable, two-way connection-based data transmission path for datagrams of fixed maximum length; a consumer is required to read an entire packet with each input system call.

In other words: SCTP you can actually use! Clearly @matthewdupre needs to get this into the Felix code sometime.

@fasaxc fasaxc force-pushed the smc-concurrent-resync branch from 8a68443 to 1d6bc1c Compare November 2, 2015 10:38
@fasaxc
Copy link
Member Author

fasaxc commented Nov 2, 2015

@Lukasa I started with SOCK_SEQPACKET but it puts a maximum size limit on the size of a message. Although that limit is large, I'm always wary of limits that could bite at some random time in the future so I switched to SOCK_STREAM instead. (msgpack had good stream support so there wasn't any pain in our code to use a stream anyway.)

@fasaxc
Copy link
Member Author

fasaxc commented Nov 2, 2015

retest this please

@plwhite
Copy link
Contributor

plwhite commented Nov 3, 2015

I concluded that it probably was OK, but I'm happy with being told that it definitely is! So if that's not the subtle, hard-to-find and yet critical bug that's hidden in the code, where is it?

@fasaxc
Copy link
Member Author

fasaxc commented Nov 4, 2015

retest this please

@fasaxc fasaxc force-pushed the smc-concurrent-resync branch from ea6edfd to 8c90301 Compare November 4, 2015 09:08
@fasaxc
Copy link
Member Author

fasaxc commented Nov 5, 2015

retest this please

2 similar comments
@fasaxc
Copy link
Member Author

fasaxc commented Nov 5, 2015

retest this please

@matthewdupre
Copy link
Member

retest this please

@matthewdupre
Copy link
Member

I suggest you add package the RPM python debs up to the list too. While we haven't done it for the others, we can at least prevent the problem from growing.

@matthewdupre
Copy link
Member

Another required pre-merge item: remove the 1.3.0~~smc version.

@fasaxc fasaxc force-pushed the smc-concurrent-resync branch from 9c322f6 to 025379c Compare November 10, 2015 09:10
@fasaxc
Copy link
Member Author

fasaxc commented Nov 10, 2015

@matthewdupre the debs were cleanly built with py2dsc so I say we defer the build until there's something for Jenkins to build.

plwhite pushed a commit that referenced this pull request Nov 17, 2015
[Work in progress] Implement concurrent resync
@plwhite plwhite merged commit 573af2f into master Nov 17, 2015
@plwhite
Copy link
Contributor

plwhite commented Nov 17, 2015

This PR is merged, but we should still discuss the queue size issue. That is a small enough detail not to block anything.

@fasaxc fasaxc deleted the smc-concurrent-resync branch December 7, 2015 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants