Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RegexField should support unicode when use regex string #5667

Closed
zhnpeng opened this issue Dec 14, 2017 · 7 comments
Closed

RegexField should support unicode when use regex string #5667

zhnpeng opened this issue Dec 14, 2017 · 7 comments

Comments

@zhnpeng
Copy link

zhnpeng commented Dec 14, 2017

Checklist

I have verified that that issue exists against the master branch of Django REST framework.
I have searched for similar issues in both open and closed tickets and cannot find a duplicate.

Steps to reproduce

  1. use RegexField in Serializer with regex=r'^\w[\w.@+-]$'
    name = RegexField(r'^\w[\w.@+-]
    $')
  2. pass name="你好" to serialize and validate

Expected behavior

"你好" is a valid string

Actual behavior

"你好" is an invalid string and raise validate failed exception

code of RegexField in the 'master' branch is as below:

class RegexField(CharField):
    default_error_messages = {
        'invalid': _('This value does not match the required pattern.')
    }

    def __init__(self, regex, **kwargs):
        super(RegexField, self).__init__(**kwargs)
        validator = RegexValidator(regex, message=self.error_messages['invalid'])
        self.validators.append(validator)

To support unicode, It should likely be:

class RegexField(CharField):
    default_error_messages = {
        'invalid': _('This value does not match the required pattern.')
    }

    def __init__(self, regex, **kwargs):
        super(RegexField, self).__init__(**kwargs)
        if isinstance(regex, six.string_types):
            regex = re.compile(regex, re.UNICODE)
        validator = RegexValidator(regex, message=self.error_messages['invalid'])
        self.validators.append(validator)
@carltongibson
Copy link
Collaborator

Hi @layjump — Yes. Prima facie this is something we should handle.

RegexValidator accepts a flags argument. I'm thinking the fix is to just use that. (?)

If you want to open a PR adding your failing test case and making the change we can review and merge.

Thanks for the report!

@zhnpeng
Copy link
Author

zhnpeng commented Dec 14, 2017

Hi @carltongibson Well If the flags are set, regex must be a regular expression string, Then a compiled regex won't be supported.

@zhnpeng zhnpeng changed the title RegexField should support unicode RegexField should support unicode when use regex string Dec 14, 2017
@carltongibson
Copy link
Collaborator

carltongibson commented Dec 14, 2017

Ah, yes, of course.

In that case I guess I'd rather check whether we should pass the flag... — it seems like otherwise we're pulling logic back out of RegexValidator that rightly belongs in there... (Ultimately doing that might be the right course of action.)

Just a thought, has this come up on Django itself? Do they have a view on this? (i.e. Should RegexValidator not, these days, be expecting to use the re.UNICODE flag?)

Other than checking on Django, a PR with your test case is the right next step here.

@carltongibson
Copy link
Collaborator

Ah, well done 🙂 #5669

@zhnpeng
Copy link
Author

zhnpeng commented Dec 14, 2017

Django's RegexField is as below:

class RegexField(CharField):
    def __init__(self, regex, max_length=None, min_length=None, error_message=None, *args, **kwargs):
        """
        regex can be either a string or a compiled regular expression object.
        error_message is an optional error message to use, if
        'Enter a valid value' is too generic for you.
        """
        kwargs.setdefault('strip', False)
        super(RegexField, self).__init__(max_length, min_length, *args, **kwargs)
        self._set_regex(regex)

    def _get_regex(self):
        return self._regex

    def _set_regex(self, regex):
        if isinstance(regex, six.string_types):
            regex = re.compile(regex, re.UNICODE)
        self._regex = regex
        if hasattr(self, '_regex_validator') and self._regex_validator in self.validators:
            self.validators.remove(self._regex_validator)
        self._regex_validator = validators.RegexValidator(regex=regex)
        self.validators.append(self._regex_validator)

    regex = property(_get_regex, _set_regex)

It will compile using re.UNICODE when set_regex。
I am using an old version of rest_framework which the bug #2617 is not been fixed yet, so I can't get rid of this problem by passing a compiled regex..
I mean it may be not necessary to add unicode support for regex string .. XD

@carltongibson
Copy link
Collaborator

carltongibson commented Dec 14, 2017

Well, RegexField is not RegexValidator...

RegexValidator hands off to _lazy_re_compile

https://github.com/django/django/blob/6fd6d8383f48ea2fe4e058725fa30529a083e9a5/django/core/validators.py#L50

It looks to me as if that should be doing something sensible with the re.UNICODE flag, just as RegexField does, but it's not:

https://github.com/django/django/blob/6fd6d8383f48ea2fe4e058725fa30529a083e9a5/django/core/validators.py#L16-L25

So, first step is to see if that requirement on RegexValidator has come up on Django. What do they say there? It may be we can work around this ourselves, with your PR, but I'd like to get that feedback first

@zhnpeng
Copy link
Author

zhnpeng commented Dec 15, 2017

Well, I think RegexValidator support this, but it was designed not support both pre-compile regex and flags the same time as the code shown. So they pre-compile regex with re.UNICODE flags and then pass it into RegexValidator in RegexField ( My PR do the same).
Yap, maybe I should see what Django say about this requirement. It may be possible to change behavior of RegexValidator by just ignore flags if regex is pre-compiled or other way.

@encode encode locked and limited conversation to collaborators Mar 3, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

3 participants