This document goes over how we have set up OpenAustralia on our development machines. For a server set up, you can use this as a base however it might require a different approach unless you have complete control (i.e. you are root) on the server and can install all the dependencies, secure the machine, etc.
Configuring Apache, PHP, MySQL or any other application for optimal performance is beyond the scope of this document. You should be able to find enough information online to help you along the way.
These steps have only been tested on Mac OS X 10.5 (leopard). They might work as well on other Unix derivatives.
- Unix
- Apache + PHP + MySQL (we’ve tested with Apache 2.X.X, PHP5, MySQL 5.0.x)
- Ruby (we’ve used the included version in Leopard)
- the following rubygems
- mechanize (version 0.8.5)
- builder
- rmagick
(this has dependencies like GraphicsMagick/ImageMagick, which in turn needs
ghostscript) - rcov
- htmlentities
- rspec
- activesupport
- log4r
- git
Apache, PHP and Ruby all come with Leopard. If you need to install any of these
on Mac OS X (if for whatever reason you don’t have them installed) there’s a ton
of information online:
You should also be able to get MySQL from MySQL’s website as they now distribute binary versions for Mac OS X (at the time of writing this document, you can find the 5.0.51a MySQL Community Server at MySQL Community Server).
Install DarwinPorts and then install git, ImageMagick and ghostscript:
$ sudo port install git-core $ sudo port install ImageMagick $ sudo port install ghostscript
Note: the previous step takes a long while to complete, make yourself a coffee (or two)
As the parsing of XML files to insert into the database is done with Perl (and there’s quite a few scripts in Perl), you will need a few Perl CPAN modules:
$ sudo perl -MCPAN -e shell cpan> install Error cpan> install XML::Twig cpan> install DBD::mysql cpan> install XML::RSS
Use apt-get to install the requirements:
$ sudo apt-get install apache2 php5 php5-cli mysql-server libmysqlclient15-dev git-core imagemagick libmagick9-dev ghostscript ruby rubygems ruby1.8-dev
Install the required rubygems:
$ sudo gem install mechanize -v 0.8.5 $ sudo gem install builder $ sudo gem install rmagick $ sudo gem install rcov $ sudo gem install htmlentities $ sudo gem install rspec $ sudo gem install activesupport $ sudo gem install log4r
Note: Currently OpenAustralia requires an older version of mechanize (0.8.5), but this might change in the future.
As the parsing of XML files to insert into the database is done with Perl (and there’s quite a few scripts in Perl), you will need a few Perl CPAN modules:
$ sudo perl -MCPAN -e shell cpan> install Error cpan> install XML::Twig cpan> install DBD::mysql cpan> install XML::RSS
Apache, PHP and MySQL can all be installed together with the Xampp for Windows package. Perl can be downloaded from ActiveState in the ActivePerl package. Ruby has its own Windows versions that you need to get from Ruby Downloads (choose the one-click installer option).
The x86 version of ActivePerl comes with a GUI for installing packages that makes the whole process a lot easier. Refer to http://aspn.activestate.com/ASPN/docs/ActivePerl/5.10/faq/ActivePerl-faq2.html for instructions on running it. ActivePerl comes with a lot of packages already installed but there are a few you’ll need to install yourself, namely: XML-Twig, DBD-mysql and Error.
If DBD-mysql fails to install try installing it manually with
ppm install http://cpan.uwinnipeg.ca/PPMPackages/10xx/DBD-mysql.ppd
Reference http://dev.mysql.com/doc/refman/5.0/en/activestate-perl.html (bottom of page).
In addition to the Ruby gems required above you’ll need to install Ruby-MySQL, which can be downloaded from http://www.tmtm.org/en/ruby/mysql/.
For development purposes we have our web application and the parser under /Library/WebServer/Documents/
and, unless you want to patch the configuration too much, we recommend that you install it there (also, if you do it this way, you have the application available under the root of your webserver on your Mac).
Note for Ubuntu/Linux users: rather than /Library/WebServer/Documents/
, the web application should go under /var/www/
. The rest of this document will refer to /Library/WebServer/Documents/
however, so substitute as necessary.
$ cd /Library/WebServer/ $ sudo chown -R $USER:staff Documents
And enter your admin password. This is necessary so that you don’t have to always use sudo when editing files or coding on the website.
$ cd Documents $ git clone git://github.com/mlandauer/openaustralia.git $ cd openaustralia $ git submodule init $ git submodule update
You should now have the website files located at /Library/WebServer/Documents/openaustralia
. Also, you should now have the parser
installed under /Library/WebServer/Documents/openaustralia/openaustralia-parser
.
The only configuration necessary is to change the web-root if you have installed the web application in another location. That value is web_root
in the configuration.yml
file at the root of the openaustralia-parser
.
We now need to configure the web application, which includes creating a DB in MySQL and loading the schema. We assume that you have MySQL running and that your MySQL super user is root
and the account has a password.
Remember to edit your Apache httpd.conf file to include the httpd.conf file in \openaustralia\twfy\conf\.
We need to note again that these instructions are just for developers wanting to run the application on their machines and not recommendations or best-practices in performance and security.
We need to create the database. This is pretty simple:
$ mysqladmin -u root -p create openaustralia Enter password: ******
You are now ready to import the schema
$ mysql -u root -p openaustralia < /Library/WebServer/Documents/openaustralia/twfy/db/schema.sql Enter password: ******
There is a file that you need to edit (and remember NOT to commit your changes on that file) on the web application:
/Library/WebServer/Documents/openaustralia/twfy/conf/general
It’s well documented and quite explanatory. It contains the configuration for MySQL (database name, host, username, etc) as well as the URL and paths for the web application on your machine.
Just for initial testing you probably don’t want to install Xapian, the search engine, so if that’s the case make sure that you set
define ("XAPIANDB", '');
Before you can run the parser, you will need to create the directories that will hold the images of the MPs.
$ mkdir -p pwdata/images/mps pwdata/images/mpsL
You are now ready to create the members information. You should just use:
$ ./parse-members.rb # you should see messages on the console similar to the following Reading members data... Running consistency checks... Writing XML... Replacing existing member with new data for 5 This is for your information only, just check it looks OK. $VAR1 = [ '5', '10006', 1, '', 'Albert', 'Adermann', 'Fisher', 'National Party', '1972-12-02', '1984-12-01', 'general_election', 'elected_elsewhere' ]; [...]
To download the members images:
$ ./member-images.rb
If you want, though it is not particularly important initially, you can also download the links information (which goes on the Representative’s pages) by running:
$ ./parse-member-links.rb
You should now parse the speeches and you would have a full database.
To download the Hansard data (the speeches) for one day, say Sept 20th, 2007:
$ ./parse-speeches.rb 2007.09.20 INFO HansardParser: Parsing speeches for Thu 20 Sep 2007... WARN HansardParser: Not yet supporting: Procedural text: CROSS-BORDER INSOLVENCY BILL 2007: First Reading WARN HansardParser: Not yet supporting: Procedural text: TRADEX SCHEME AMENDMENT BILL 2007: First Reading WARN HansardParser: Not yet supporting: Procedural text: FAMILIES, COMMUNITY SERVICES AND INDIGENOUS AFFAIRS AND OTHER LEGISLATION AMENDMENT (EMERGENCY RESPONSE CONSOLIDATION) BILL 2007: First Reading WARN HansardParser: Not yet supporting: Procedural text: TAX LAWS AMENDMENT (TAXATION OF FINANCIAL ARRANGEMENTS) BILL 2007: First Reading WARN HansardParser: Not yet supporting: Procedural text: VETERANS' ENTITLEMENTS AMENDMENT (DISABILITY, WAR WIDOW AND WAR WIDOWER PENSIONS) BILL 2007: First Reading WARN HansardParser: Not yet supporting: Procedural text: COMMITTEES: Legal and Constitutional Affairs Committee: Report > 09:46:00 WARN HansardParser: Not yet supporting: Procedural text: COMMITTEES: Legal and Constitutional Affairs Committee: Report: Referral to Main Committee WARN HansardParser: Not yet supporting: Procedural text: NATIONAL HEALTH AMENDMENT (PHARMACEUTICAL BENEFITS) BILL 2007: Referred to Main Committee [...] db loading debates 2007-09-20
You should now be able to view the results at Your Webserver URL