Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change collation in query to adapt to charset #337

Merged
merged 1 commit into from
Jan 31, 2017
Merged

change collation in query to adapt to charset #337

merged 1 commit into from
Jan 31, 2017

Conversation

danrot
Copy link
Contributor

@danrot danrot commented Jan 27, 2017

This change is related to doctrine/DoctrinePHPCRBundle#270. If a connection to the RepositorySchema with a charset of utf8mb4 is passed it is not allowed to use utf8_bin in a select statement.

I simply changed it in here to utf8mb4_bin, which will work then. However, I know that this can't be the final solution. The question is how that one should look like. We could use the connection to get the charset and use an if in here, or use the charset and simply postfix it with _bin, maybe check before if this collation really exists (is that somehow possible? What should be done in case it isn't?).

@dbu
Copy link
Member

dbu commented Jan 27, 2017

where can do you configure the encoding? would it be possible to have that place also configure a binary encoding to go with that, which by default is just $encoding.'_bin' but can be customized, then read that value in the client for that query?

or alternatively, have the client check if the driver is on utf8 and if not not use any collate in the query? from what i understand, collate is only used as a performance improvement. or is it also about being case sensitive or a vs ä?

@danrot
Copy link
Contributor Author

danrot commented Jan 27, 2017

I've been looking for the commit in which this change is, and I've found it: 5273f18

Looks like it is about case-sensitivity. So I guess it it is not possible to simply remove it in case of UTF-8, right? Seems there was a problem at some point.

The current configuration is from the DoctrineBundle, so not really related to Jackalope. If we want to introduce a new configuration we would have to find a place for it first.

I have just checked the available collations on my system, and there is a *_bin-collation for every character set, with binary being the only exception. So should we maybe take the "adding _bin" option and just check for the binary character set?

@dbu
Copy link
Member

dbu commented Jan 27, 2017

i guess that is about as save as the current code using hardcoded utf8_bin. maybe we can add a `setCaseSensitiveEncoding method on the client class, as a last resort for people that have a special case with some requirements? something like:

private $caseSensitiveEncoding;
private function getCaseSensitiveEncoding()
{
 if (!$this->caseSensitiveEncoding) {
   $this->caseSensitiveEncoding = get driver collation .'_bin';
 }

 return $this->caseSensitiveEncoding;
}

(and only call this in the mysql case.)

@danrot
Copy link
Contributor Author

danrot commented Jan 27, 2017

@dbu But that's only useful if it is at least protected, right? If we make it private nobody can change it, so there would be no reason to introduce that. Or am I overseeing something now? 🙈

@danrot danrot changed the title changed collate to utf8mb4 change collation in query to adapt to charset Jan 27, 2017
if ($this->getConnection()->getDriver() instanceof \Doctrine\DBAL\Driver\PDOMySql\Driver) {
$query = 'SELECT id FROM phpcr_nodes WHERE path COLLATE utf8_bin = ? AND workspace_name = ?';
$params = $this->conn->getParams();
$charset = isset($params['charset']) ? $params['charset'] : 'utf8';
Copy link
Contributor Author

@danrot danrot Jan 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a problem with that solution... The charset option and the default charset with which a table is created can be different... How should we handle that? Couldn't find another way to get that from the SchemaManager in Doctrine for the specific table 😕

Just to be clear, I am talking about this situation in the config:

doctrine:
    dbal:
        charset: utf8mb4
        default_table_options:
            charset: utf8
            collate: utf8_unicode_ci

Copy link
Member

@dbu dbu Jan 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh well. but in that case lets expect people to set the case sensitive encoding manually?

i think we really must avoid doing a database call in every request to know this, so it must come from the connection / driver if thats possible at all. worst case we just add this new method and have a compiler pass in the phcpr-bundle that reads the dbal configuration and does something sensible?

@lsmith77 can you help us here on where to get the right information?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm would need to dig in here too .. but I agree we should do as little work as possible at run time ..
so what we can do at install time or at config compile time the better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear: The current state doesn't involve any database call at all, or am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, currently not. i was just saying that we must not end with something that does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you are also saying that the current solution is not satisfying, right? Would it be enough to add the setCaseSensitiveEncoding method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i would say so. have a setter, and a get method that does the code you added above and below, so that this code can just say if mysql and not binary, do collate with $this->getCaseSensitiveEncoding()

@dbu
Copy link
Member

dbu commented Jan 27, 2017 via email

@danrot
Copy link
Contributor Author

danrot commented Jan 30, 2017

@dbu Added these methods, is it ok now?

Copy link
Member

@dbu dbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool! added a few comments, otherwise looks good.

can you please also add an entry to the changelog to mention this fix and the method in case somebody runs into problems?

$params = $this->conn->getParams();
$charset = isset($params['charset']) ? $params['charset'] : 'utf8';

return $charset === 'binary' ? $charset : $charset . '_bin';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do $this->caseSensitiveEncoding = $charset === 'binary' ? $charset : $charset.'_bin'; instead of return, so its only done once

@@ -439,6 +444,23 @@ public function setCheckLoginOnServer($bool)
$this->checkLoginOnServer = $bool;
}

public function setCaseSensitiveEncoding($encoding)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a docblock explaining that this is to control what collate should be used when querying for nodes. and mention that its only relevant for mysql and usually the autodetection of appending _bin to the charset is good enough.

@danrot
Copy link
Contributor Author

danrot commented Jan 30, 2017

@dbu Fixed your comments.

@@ -1,6 +1,13 @@
Changelog
=========

1.2.8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we merge this change into master, it will become 1.3, as master is for 1.3 and there are other changes in it. do you urgently need a fixed release? vondrasekpavel is busy doing lots of cleanups and fixes, we would release sometimes in february.

for an urgent fix, i would have to update the 1.2 branch to have everything that is in 1.2 (it currently looks massively outdated: 1.2...1.2.8). if you need a fix urgently, i can do that and then merge your branch into 1.2 and tag 1.2.9 instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I thought master is the next hotfix release... Actually @alexander-schranz has to tell how fast he needs that fix, for me the next minor release is also good 😃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have talked to @alexander-schranz, it's not that important. Changed it to 1.3

@dbu dbu merged commit ad1284c into jackalope:master Jan 31, 2017
@dbu
Copy link
Member

dbu commented Jan 31, 2017

great, then it will go into 1.3. i guess we should release that soonish, quite a few fixes and improvements going on in the last weeks! thanks daniel!

@danrot danrot deleted the bugfix/utf8mb4 branch January 31, 2017 10:08
@danrot
Copy link
Contributor Author

danrot commented Jan 31, 2017

Cool, thank you! Now only doctrine/DoctrinePHPCRBundle#270 is missing in order for it to work in a symfony environment :-)

@dbu
Copy link
Member

dbu commented Jan 31, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants