Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-merge the CMI branch #2

Closed
quicksketch opened this issue Sep 4, 2013 · 36 comments
Closed

Re-merge the CMI branch #2

quicksketch opened this issue Sep 4, 2013 · 36 comments
Assignees
Milestone

Comments

@quicksketch
Copy link
Member

We need to re-merge the CMI branch into the repository. It looks like the initial version of CMI was after the PSR-0 change, but only for the last month of development. A mostly working (presumably) version was complete the month before, that we can probably utilize instead of the PSR-0 version.

Here's the last commit to the CMI branch prior to the PSR-0 conversion, when heyrocker pulled from 8.x before starting the conversion:

http://drupalcode.org/project/drupal.git/commit/b6d5723

So to generate the non-PSR-0 version of CMI such as we have available, this diff is the best starting point (that commit against its parent):

http://drupalcode.org/project/drupal.git/commitdiff/b6d5723?hp=a08405829ebe6e20f82c94d8e5c323f73ab1741b

@quicksketch
Copy link
Member Author

It should be noted that this version of CMI used XML, while D8 currently uses YAML. The most descriptive summary of why XML was used at the time can be found here: https://groups.drupal.org/node/167584

Ironically, JSON was eliminated as a choice at the time because when it encoded UTF-8 characters, it encoded the strings for Unicode. Since then PHP 5.4 has added options both for pretty-printing and skipping Unicode encoding, potentially making JSON a more feasible option: http://us3.php.net/json_encode

But at the same time, unless we're going to depend on PHP 5.4 (I don't think we should), then this will require some work to make a backwards compatibility layer for PHP 5.3. It seems entirely possible to emulate both PHP 5.4 json_encode() flags to make JSON human-readable. As noted several times in the comments on that g.d.o thread though, json_decode() already works fine (even in PHP 5.3) with UTF-8 characters that are not Unicoded. Likewise, pretty printing has no effect on json_decode() either, so the only compatibility layer we'd need to write is around json_encode(), decoding will work fine on all versions without any assistance.

Or of course we could just convert to using YAML like D8 did anyway, despite its performance problems.

@quicksketch
Copy link
Member Author

Hey @heyrocker! Yes! There's a button at the bottom of each issue that should say "Watch thread". If you comment, you're subscribed automatically.

@bojanz
Copy link

bojanz commented Sep 11, 2013

It makes more sense for Backdrop to implement PSR4 directly, and convert code to use it, than reinvent CMI and plugins. Effort-wise, that's insanity.

@davereid
Copy link

Yeah, I actually like the current CMI. :/

@rudiedirkx
Copy link

I think Backdrop should depend on PHP 5.4. It's been stable 18 months and Backdrop isn't releaseworthy yet. And YAML is fine, as long as it's used everywhere, not combined with .info and .xml and .json.

@swentel
Copy link

swentel commented Sep 11, 2013

I wouldn't care anymore about php 5.3 either tbh, last security fixes will be around june/july 2014, see http://php.net/archive/2013.php#id2013-07-11-1

Other than that, also agreeing with bojanz in regards to effort and time: this won't be funny to get right - again ..

@larowlan
Copy link

Be sure to update the performance charts on backdropcms.org once this goes in. The install time is influenced by the amount of default config to be imported and the current comparison is apples and oranges.

@rudiedirkx
Copy link

If we cherry-pick/merge D8's CMI, how hard is it to unify all config format to .info or .yaml?

@sun
Copy link

sun commented Sep 12, 2013

None of the initial CMI versions worked correctly. In fact, after the initial merge into core, we discovered critical problems with the implementation, and so we rewrote the fundamental CMI architecture and implementation ~3 times to make it work. If you want to incorporate CMI, I would strongly recommend to use the latest code in D8.

Luckily, the component and module are cleanly isolated, so there should be no problems at all to copy and use the latest code. After all, a potential reuse in other projects was an architectural design goal right from the start. :)

If you want to include history, I'd recommend to perform a git subtree split + join. I believe that @RobLoach actually maintains git subtree splits on github somewhere.

Lastly, the summary of the format discussion above is very incomplete. Opinions are irrelevant, only technical aspects matter. I personally invested multiple weeks of painful work to actually implement and test the various formats, so as to determine whether they are compatible with the requirements. I strongly advise to stay with YAML as the file format.

@davereid
Copy link

I would also highly suggest the recent CMI being used as well.

@davidstrauss
Copy link

Recent CMI is important, considering the engineering work we're doing at Pantheon to support it. We'd like it to work for both Drupal 8 and Backdrop.

@alexweber
Copy link

@sun Thanks for explaining that, it was pretty enlightening :) I'm curious as to why you want to stick with YAML over JSON though?

I never got the whole YAML thing in Symfony, it doesn't add that much more value for readability or anything-wise (AFAIK) that it's worth the overhead over using JSON, which is built-in right into PHP.

I'm not familiar with new CMI internals but I'm pretty sure it won't be very hard to swap out the storage back-end and supporting APIs... will it?

@sun
Copy link

sun commented Sep 14, 2013

@alexweber Please understand that I do not want to repeat all the nitty-gritty details of the configuration file format discussion that exist on drupal.org and groups.drupal.org already. Admittedly, they are cluttered across more than a dozen of different threads that have ~300+ comments each, so not exactly easy to distill.

I'm not interested in repeating that discussion, and I believe that questioning the outcome based on personal opinions and personal favorites shows nothing but disrespect against the most brilliant minds of the Drupal community, who worked insanely hard for multiple months to come to a final and working solution that meets all the technical requirements, whereas the final answer factually was objected to by personal opinions of some of these minds originally, but everyone is actively supporting it now. I've been one of the few lead developers and architects of CMI at that time.

All I can and will share here is that YAML is the only format that meets all technical requirements as well as DX requirements. If you do not trust my word on this and believe you know better, then of course you're free to redo and repeat the entire technical analysis on your own. Nothing prevents you from doing so.

I do, however, recommend to block and force-stop any attempt on re-opening the format discussion.

Lastly, to be clear, YAML is an open and portable data serialization format that aims to be human friendly. It has nothing to do with Symfony. The Symfony project just happens to have a library/component for parsing and writing YAML, which is just one of many implementations. We'd all use the native php-yaml extension, (1) if it was installed/enabled in all PHP/OS distros by default and (2) if it would implement the latest 1.2 spec of YAML (as opposed to 1.1).

@quicksketch
Copy link
Member Author

Hi guys, although you might not be interested in rehashing this again @sun, I also think JSON is a better choice.

The downsides of JSON can all be overcome:

  • Unicode illegibility: PHP 5.4 has an option in json_encode() to skip Unicode conversion. All versions of PHP can read non-encoded characters just fine with json_decode() natively.
  • Pretty printing: PHP 5.4 also has an option for this in json_encode(). We'd need to write a wrapper for PHP 5.3. All versions of PHP can decode JSON regardless of formatting.
  • Code comments: Use _comment keys for comments. This isn't ideal, but my experience with CMI has been that you're not actually writing these files by hand. If you're modifying and resaving configs in the UI, comments are not maintained anyway. Lastly, I don't think you need comments in config files at all. These are almost entirely for machines, not humans. The only time you'd look at them is when diffing.

And like @alexweber says, JSON's chief advantage is speed. Parsing JSON is really fast.

What we'd up with here is a wrapper around writing JSON (for PHP 5.3 only) but native JSON reading for all PHP versions. Fast reads and slow writes. Perfectly acceptable for this situation where reading is much more common.

In benchmarking, it looks like JSON is somewhere between 50 and 120 times faster than YAML:

YAML parsing time: 1.1127049922943
JSON parsing time: 0.016693115234375

I used this benchmarking script: https://gist.github.com/quicksketch/6564942, against 182 configuration files in a default install of Drupal 8. I didn't do an averaging, but it's clear that JSON is much, much faster. Sure you could install a PECL extension to speed up YAML, but I don't think that is going to be the common scenario.

Now, I don't think YAML is bad for all situations. I actually think we should use YAML for our .info files just like Drupal 8 has done, because info files are usually written by hand, by humans.

I know that a huge amount of bikeshedding has already occurred on the YAML v. JSON v. others debate. YAML won in D8. However, considering how slow Drupal 8 has become, I think speed is a very valid concern.

@quicksketch
Copy link
Member Author

I appended the test with memory usage:

YAML parsing time: 1.1680061817169
YAML memory usage: 1.75 MB
JSON parsing time: 0.018508911132812
JSON memory usage: 0 bytes

Although parsing time shifts per execution, memory usage was constant. As far as JSON goes, it's free. Symfony's YAML parser takes 1.75 MB of memory, regardless of how many files it parses.

@sun
Copy link

sun commented Sep 14, 2013

Like the variable system in D7 and below, the configuration system caches the parsed configuration file data (normally as serialized PHP). To fully regain the runtime performance of the former variable system, there's an issue to implement preloading of configuration objects (most likely using a page/route-specific algorithm). The old variable system preloaded and cached all available variables at once, which caused major performance issues on some sites.

You can try to go with JSON (using a different format is piece of cake with the CMI architecture), and by doing so, and when considering the intended target audience, you might even be able to skip the entire caching.

You should bump the minimum PHP version to 5.4 either way, since 5.3 entered EOL March 2013, limiting updates to security fixes. I wasn't able to find a clear statement on when 5.3 will be discontinued, but the majority of PHP core devs seem to prefer 1 year, which appears to be in line with the PHP Release Process RFC. In short, it's probably safe to expect 5.3 to be dead in March 2014 (→ 5.5 months from now).

In general, however, I'm really not sure whether there were not further problems with the JSON format than the ones listed above. All I can remember is that I did actually try it out and that I ran into problems. Therefore, I'd recommend to be very careful.

@quicksketch
Copy link
Member Author

and when considering the intended target audience, you might even be able to skip the entire caching.

Yep, I'd like to figure out the benchmark on that too. Right now with CMI, every YAML file has a cache of its file after parsing in the cache_config table. I would guess (but am not certain) that reading and parsing a JSON file from disk is roughly as fast as a SQL query and unserializing the PHP. Because the cache system only loads one entry at a time (typical key/value store), for the default cache that's potentially hundreds of queries before even more caches are in place to cache the combined values (i.e. cache_field's field_info:fields).

I'll think about PHP version requirements. For the immediate future, we'll plan on 5.3.x until we run into limitations.

Therefore, I'd recommend to be very careful.

Of course. :)

@quicksketch
Copy link
Member Author

I posted this over in #78 (YAML for .info files), but it's actually more relevant here:

I amended the tests once again just for fun, comparing raw PHP vs. the others. For anyone curious how fast YAML is versus PHP.

https://gist.github.com/quicksketch/6575436

Symfony YAML parsing time: 1.1098148822784
Spyc YAML parsing time: 0.72001695632935
JSON parsing time: 0.015068054199219
PHP include and execute time: 0.018171072006226
PHP execute-only time: 0.0055980682373047

I put the results into a table, taking the left column and dividing by the top column.

So results are basically (from slowest to fastest):

Parser Symfony YAML Spyc YAML JSON PHP include+execute PHP execute-only
Symfony YAML - 1.5x slower 73x slower 61x slower 200x slower
Spyc YAML 1.5x faster - 48x slower 40x slower 131x slower
JSON 73x faster 48x faster - 1.2x faster 2.7x slower
PHP include+execute 61x faster 40x faster 1.2x slower - 3.2x slower
PHP execute only 200x faster 131x faster 2.7x faster 3.2x faster -

It's interesting to me that JSON is just as fast or faster than including and then executing PHP. These tests are with APC enabled. Without it, the PHP include+execute times take roughly twice as long, making JSON a consistently faster option when needing to load files.

@tedbow
Copy link

tedbow commented Sep 16, 2013

Code comments: Use _comment keys for comments. This isn't ideal, but my experience with CMI has been that you're not actually writing these files by hand.

It seems like having a good/readable comment system in configuration files is not just matter of humans being able to write comments easily but potentially having code be able to create readable comments for humans to read.

The Features module now puts comments in the PHP files it creates for Drupal 7 configuration. I often find these comments useful for just reading through and figuring out what config is in a feature or in combination with "git diff". It would nice if the CMI in Backdrop or in contrib modules integrated with the CMI could write comments in a readable way into the configuration.

@rudiedirkx
Copy link

We could even remove config comments with regex before json_decode'ing. It requires a file_get_contents() anyway. Not sure about performance =)

@ghost ghost assigned quicksketch Nov 22, 2013
@quicksketch
Copy link
Member Author

I'm looking at where we are with this over this weekend. It looks like it's fairly trivial getting the current CMI branch functioning again, but it's far behind what D8 is currently using: no active/staging directories, no separation of settings.php variables and state variables, minimal UI... but it does have some basics in place. I'll start with the existing branch of code and start backporting the general concepts from D8 and see how it goes.

@quicksketch
Copy link
Member Author

I started work on this and filed an in-progress PR at backdrop/backdrop#97. It currently:

  • Lays out the basic API usage.
  • Successfully installs Backdrop with the new CMI settings being installed.
  • Uses JSON for file storage.
  • Saves and updates configuration files when changed.
  • Provides the interface for diffing.

But:

  • Tests are a mess.
  • Forms for syncing, importing, and exporting configs are totally busted.

As others suggested, the state of CMI in our branch was really needing some work. For the most part, this is a direct backport of D8's functionality (though some things like "Config context" are not included). However the conversion is partial, you'll still get a bunch of errors when using the interfaces for importing and exporting configurations.

@quicksketch
Copy link
Member Author

I've pushed up new changes to backdrop/backdrop#97 that get syncing, full import and export working. The only remaining piece of functionality is single import/export. Then we'll need to update the tests.

@victorkane
Copy link

Getting tooled up to work on this issue. Forked your fork :) , checked out branch 2/cmi and refreshed from upstream. Upon attempting to install as I would any backdrop instance, I got the following error:

Error

The website encountered an unexpected error. Please try again later.
Error messageException: The configuration directory type "active" does not exist. in config_get_config_directory() (line 31 of /home/backdrop/backdrop-quicksketch-vk/core/includes/config.inc).

The code is looking for a needed directory which I guess should be there by default:

  if (!empty($config_directories[$type])) {
    // Allow a configuration directory path to be outside of webroot.
    if (empty($config_directories[$type]['absolute'])) {
      $path = conf_path() . '/files/' . $config_directories[$type]['path'];
    }
    else {
      $path = $config_directories[$type]['path'];
    }
  }
  else {
    throw new Exception(format_string('The configuration directory type "@type" does not exist.', array('@type' => $type)));     // this is line 31
  }
  return $path;

Newbie indications? (Later on today when I start studying the code I will probably find it myself, but...)

backdrop-ci referenced this issue in backdrop/backdrop Mar 21, 2022
Cleanup from Pantheon Settings Injection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests