Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option for dealing with unusual spaces #32

Closed
GeorgeDewar opened this issue May 31, 2016 · 4 comments
Closed

Option for dealing with unusual spaces #32

GeorgeDewar opened this issue May 31, 2016 · 4 comments

Comments

@GeorgeDewar
Copy link

Strip_attributes currently supports custom regex for characters or patterns that should be removed. However, a requirement that we (and perhaps others) have is to turn non-breaking spaces (U+00A0) into spaces.

I am wondering if you think that this would sit well in strip_attributes, either as an optional normalise_special_spaces type feature, or as an option to declare a custom regex replacement (i.e. in my case, strip_attributes :replace => [/\u00A0/, " "])?

I might be willing to contribute the feature if it is desirable.

@rmm5t
Copy link
Owner

rmm5t commented May 31, 2016

strip_attributes already strips these non-breaking spaces from the beginning and end of strings, but I wanted to get some more clarity on your requirement.

Do you just want to convert these non-breaking spaces to regular spaces, or do you want to potentially collapse both single and multiple non-breaking spaces down to just one (regular) space?

The reason I ask is that it might be prudent to enhance the :collapse_spaces option to collapse all multibyte whitespace instead of just regular spaces.

rmm5t added a commit that referenced this issue May 31, 2016
* Now collapses all multibyte whitespace (non-breaking, joiner,
  separator characters) down to just a regular space.
* This better mimicks the multibyte leading and trailing whitespace
  stripping behavior

Ref #32
@rmm5t
Copy link
Owner

rmm5t commented May 31, 2016

@GeorgeDewar As an experiment, I added multi-byte space collapsing support to the :collapse_spaces options. It's currently only in the master branch (specifically 14f6a35). Would this behavior suffice for your requirements?

Example Usage:

class Comment < ActiveRecord::Base
  strip_attributes collapse_spaces: true
end

To test, edit your Gemfile to point at the master branch:

gem "strip_attributes", github: "rmm5t/strip_attributes"

If this behavior still doesn't suffice, could you please elaborate on your use-case where you want to replace non-breaking spaces, but also avoid collapsing them?

@GeorgeDewar
Copy link
Author

GeorgeDewar commented Jun 2, 2016

Thanks @rmm5t! I believe this solves our problem. Collapsing consecutive spaces is not necessary for us, but not harmful either.

I didn't think of the :collapse_spaces option because I mistakenly thought that it collapsed all whitespace including new lines.

We have an application that deals with inbound emails for various sources, and some email clients do strange things with spaces - including using non-breaking spaces instead of normal spaces, or including the occasional zero-width space (Outlook Web Access does that). These cause various problems for us when trying to process the text in certain ways.

@rmm5t
Copy link
Owner

rmm5t commented Jun 2, 2016

@GeorgeDewar Great. I just published v1.8.0 to rubygems with this feature.

@rmm5t rmm5t closed this as completed Jun 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants