Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing letters and incorrect ordering of authoraties #2482

Closed
petterreinholdtsen opened this issue May 25, 2015 · 12 comments
Closed

Missing letters and incorrect ordering of authoraties #2482

petterreinholdtsen opened this issue May 25, 2015 · 12 comments
Assignees
Labels
bug Breaks expected functionality x:norway

Comments

@petterreinholdtsen
Copy link
Contributor

When visisting https://alaveteli-dev.nuug.no/en/body , the Norwegian letters Æ Ø Å are missing from the list of letters:

Show only...
Beginning with
* A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

The Norwegian letters should have been listed after Z in that list.

Even worse, the ordering of authorities are wrong. For example "Åfjord Municipality" should be found after "Ytre Helgeland District Psychiatric Centre", and not sorted as an A.

Is there something wrong with our setup?

@crowbot crowbot added 1 - new bug Breaks expected functionality and removed 1 - new labels May 26, 2015
@crowbot
Copy link
Member

crowbot commented May 26, 2015

Hi @petterreinholdtsen,

The template that renders the alphabetical list currently just lists an unaccented roman alphabet - you can override that template in your theme by copying the file to lib/views/public_body/_alphabet.html.erb and editing it to display the correct letters.

The ordering of the authorities is done on the basis of a sort by name in the database, so I think the solution here may be to provide a 'collate' clause to the query as described here.

@garethrees
Copy link
Member

Debian Squeeze has Postgres 8.4. The article referenced says that collate is new in 9.1, so we're a bit blocked until we complete #2582

@garethrees
Copy link
Member

Squeeze does have postgres-9.1 in backports, but I'm not sure how easy it would be to drop in on an existing install with existing data.

@garethrees
Copy link
Member

Since Squeeze's EOL is so soon, I don't see any benefit in us dropping support of Postgres 8.4 before we stop supporting Squeeze.

We should be able to use collate under a conditional:

ActiveRecord::Base.connection.send(:postgresql_version)
# => 90112

@garethrees
Copy link
Member

collation-aware comparisons are considerably slower than the traditional "C" sort order

http://rhaas.blogspot.co.uk/2012/03/perils-of-collation-aware-comparisons.html

@garethrees
Copy link
Member

I've taken a first stab at this. I think its going to be a tricky one.

I've got a passing spec:

diff --git a/app/controllers/public_body_controller.rb b/app/controllers/public_body_controller.rb
index 376b202..18bcf01 100644
--- a/app/controllers/public_body_controller.rb
+++ b/app/controllers/public_body_controller.rb
@@ -204,7 +204,7 @@ class PublicBodyController < ApplicationController
                 where_sql += [underscore_locale] + where_parameters
                 @public_bodies = PublicBody.where(where_sql).
                                    joins(:translations).
-                                     order("public_body_translations.name").
+                                     order(%Q(public_body_translations.name COLLATE "#{ underscore_locale }")).
                                        paginate(:page => params[:page], :per_page => 100)
             end

diff --git a/spec/controllers/public_body_controller_spec.rb b/spec/controllers/public_body_controller_spec.rb
index c5c94a4..f748f7f 100644
--- a/spec/controllers/public_body_controller_spec.rb
+++ b/spec/controllers/public_body_controller_spec.rb
@@ -198,6 +198,22 @@ describe PublicBodyController, "when listing bodies" do
         assigns[:description].should == ""
     end

+    it 'list bodies in alphabetical order according to the locale' do
+        AlaveteliLocalization.set_locales('nn_NO', 'en')
+        ActiveRecord::Base.connection.execute(%Q(CREATE COLLATION "nn_NO" (LOCALE = 'nn_NO.utf8');))
+        InfoRequestEvent.delete_all
+        Comment.delete_all
+        InfoRequest.all.each(&:fully_destroy)
+        PublicBody.destroy_all
+        first = FactoryGirl.create(:public_body, :name => 'Ytre Helgeland District Psychiatric Centre')
+        last  = FactoryGirl.create(:public_body, :name => 'Åfjord Municipality')
+        middle = FactoryGirl.create(:public_body, :name => 'Åbjord Municipality')
+
+        get :list
+
+        expect(assigns[:public_bodies]).to eq([first, middle, last])
+    end
+
     it "should support simple searching of bodies by title" do
         get :list, :public_body_query => 'quango'
         assigns[:public_bodies].should == [ public_bodies(:geraldine_public_body) ]

Here are a few issues off the top of my head:

  • The spec doesn't pass unless you install the nn language pack on your system (apt-get install language-pack-nn). This would add another dependency to the list, which would only be appropriate for an Alaveteli run in Norway.
  • I don't know how to get Postgres to update its collations table. I had to do it manually (CREATE COLLATION "nn_NO" (LOCALE = 'nn_NO.utf8');)
  • By convention, the locales postgres uses are of the form xx_XX. Alaveteli (on the whole) only uses a two-character locale name (nn). We could create the collation with that name so that it matches (CREATE COLLATION "nn" (LOCALE = 'nn_NO.utf8');) but it would be nice to use what the OS uses.
  • The change to public_body_controller.rb makes lots of other specs fail because its trying to look up en as the collation, which as explained doesn't exist. It should be looking for en_GB (or whatever variation of English your system is set as).
  • We could use the built in environment variables to find the default collation (ENV['LC_ALL']), though of course this isn't quite right! Ruby: "en_GB.UTF-8" vs Postgres: en_GB.utf8)
  • I'm not sure who's responsibility it should be to do any of this. Should Alaveteli just install/handle all the locale stuff? Should the re-user need to manually (or write a script to) create the collations?

@garethrees
Copy link
Member

We could try COLLATE only if the current locale's collation is available:

@locale
# => 'en'

ActiveRecord::Base.
  connection.
    execute(%(SELECT * FROM pg_collation;)).
      map{ |row| row['collname'] }.
        include?(@locale)
# => false

@petterreinholdtsen
Copy link
Contributor Author

[Gareth Rees]

  • By convention, the locales postgres uses are of the form
    xx_XX. Alaveteli (on the whole) only uses a two-character locale
    name (nn). We could create the collation with that name so that it
    matches (CREATE COLLATION "nn" (LOCALE = 'nn_NO.utf8');) but it
    would be nice to use what the OS uses.

I suspect this comment indicate a misunderstanding of what a locale
really is, as 'nn' is not a locale name, it is a language name. Locales
are really triplets of langauge, country/area and character map, where
the character map sometimes is the language/area default value unless
specified explicitly. So nn_NO.utf8 is nynorsk (nn) language, country
Norway (NO) and charmap UTF-8, while 'nn' only specify the language to
use.

Because collation rules depend on the country, not on the language, it
do not make sense to state that nn is a locale, nor really to associate
any collation rules based on the langauge. The 'nn' string only make
sense with regard to translations, not collation. nn_NO and nn_DK would
use the same translation but different collation rules. By chance nn_NO
is the only locale present in glibc these days, but it is not
unthinkable that this might change. And I know strange locales like
en_DK with DK collation have been used in the past, so the problem might
arise with other locales.

Perhaps you already knew this, but I thought it best to state it
explicity, as I have work too much with locales to let the
"two-character locale name (nn)" statement pass unnoticed.

JFYI.

Happy hacking
Petter Reinholdtsen

@garethrees
Copy link
Member

Perhaps you already knew this, but I thought it best to state it
explicity, as I have work too much with locales to let the
"two-character locale name (nn)" statement pass unnoticed.

I loosely used the term 'locale' as I was just brain-dumping
at the end of the day. Rails/Alaveteli uses 'locale' pretty loosely
too (e.g. all the language and locale translations are in a locale
dir).I was aware of the things you point out, but probably couldn't
have explained it as eloquently, so I definitely have a better
understanding now :)

The main problem I wanted to illustrate is that most of the time,
we only have some variable containing a two-character code
(e.g, nn) of whatever it's actually trying to represent (language,
locale and/or character map) and the thing we need to look up
(the collation in postgres) is the full triplet.

I'm thinking at the moment the cleanest solution is to only use
collation if there's a locale available to postgres, leaving it to the
install maintainer to add a collation that matches the "locale"
they're using from the locale directory in Alaveteli.

@garethrees
Copy link
Member

I've created a PR which results in the desired behaviour #2619.

It needs a bit of cleanup and we need to consider how we suggest people add collation support to their install.

@garethrees
Copy link
Member

Hi @petterreinholdtsen,

https://alaveteli-dev.nuug.no/en/body now seems to be doing the correct thing.

Did you make any changes to get this to work?

@petterreinholdtsen
Copy link
Contributor Author

[Gareth Rees]

Hi @petterreinholdtsen,

https://alaveteli-dev.nuug.no/en/body now seems to be doing the
correct thing.

Did you make any changes to get this to work?

Yes. We followed the instructions from Louise Crow and overrided the
template and recreated the database with another default collation rule.

Gorm have the details. I'll try to get him to have a look and comment if
I forgot something.

Happy hacking
Petter Reinholdtsen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Breaks expected functionality x:norway
Projects
None yet
Development

No branches or pull requests

4 participants