About this entry




Unicode Support in Ruby on Rails with MySQL and Microsoft SQL Server

Having the right configuration, Ruby on Rails applications can input and display Unicode through a Unicode-enabled web browser and can read and write Unicode into a Unicode-enabled database. Nevertheless, there are short-comings when doing string manipulations with Unicode characters. This article addresses how to make Ruby more Unicode-aware and how to communicate in Unicode to a database.

Liked it? !

Index

  1. Ruby Code
  2. Connecting to a MySQL Database
  3. Connecting to a SQL Server Database
  4. Further Resources

 

Ruby Code

Ruby source code can be encoded in ASCII, UTF-8, EUC, and SJIS. The format used has to be specified by the command line arguments -kN, -kU, -kE, and -kS respectively or via the environment variable RUBYOPT. Unfortunately, as of Ruby on Rails 1.1.6, despite adding "-kU" to RUBYOPT, neither models, controllers, views, nor migrations can be encoded in UTF-8. All of these have to be encoded in ASCII and any Unicode characters have to be represented with the "\" escape character followed by the octal code (i.e. \303\230). The table listed here can be pretty handing to look up codes.

In terms of string manipulation, the standard  Ruby classes per se do not support Unicode and the String class will simply interpret a Unicode string as an array of bytes, with each byte considered one character. For instance, when dealing with multi-byte Unicode characters, methods like length will return the wrong value. To remedy this, you can prefix your code with the following lines:

$KCODE = 'u' require 'jcode'

In a rails application this code should be included in config/environment.rb.

This code remedies for the String class the methods chop!, chop, delete!, delete, squeeze!, squeeze, succ!, succ, tr!, tr, tr_s!, and tr_s. Furthermore, it adds the methods jlength and jcount which should be used instead of length and count. Be mindful that the methods reverse, size, index, [], downcase, capitalize, downcase, strip, rstrip, lstrip, and slice are not covered by jcode.

 

Connecting to a MySQL Database

The first point is to make sure that your tables are encoded in Unicode. If you are planning on doing full text searches make sure the database type is MyIsam and the character set is UTF-8; it does not work with other database types and the ucs2 Unicode binary encoding character set. The SQL statements for the table creations would then look something like this:

DROP TABLE IF EXISTS `people`; CREATE TABLE `people` (   `id` smallint(5) unsigned NOT NULL auto_increment,   `first_name` varchar(100) collate utf8_spanish_ci NOT NULL default '',   `last_name` varchar(100) collate utf8_spanish_ci NOT NULL default '', PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_spanish_ci;

You may simply add the following lines to MySQL's configuration file in order to be the default values for all table creations:

default-character-set=utf8 default-storage-engine=MyISAM

Configuring Rails to establish a Unicode connection is easier than ever. Simply set encoding to utf8 to each environment in database.yml. It would look something like this:

development: adapter: mysql database: database_name username: user password: password host: localhost port: 3306 encoding: utf8

 

Connecting to a SQL Server Database

Make sure you use the data types that support Unicode: nchar, nvarchar, and ntext (instead of char, varchar, and text). To ensure backward compatibility the default encoding to SQL Server databases is not Unicode. In your Rails application, setting the encoding to utf8 in the database.yml does not do anything. Instead, to force Rails' connection to a SQL Server database to be Unicode add the following lines the the ApplicationController class in app/controllers/application.rb:

before_filter :set_charset, :set_locale def set_charset response.headers["Content-Type"] = "text/html; charset=utf-8" WIN32OLE.codepage = WIN32OLE::CP_UTF8 end

 

Further Resources

Technorati tags: , , , , , , , ,

Liked it? !

Posted on November 15th | 4 comments | Filed Under: Ruby on Rails