Import CSV files in Rails 3
CSV is a ubiquitous importing and exporting format for tables of information conforming to row-column layout. Lines correspond to rows. On each line, commas separate values under each column; hence Comma-Separated Values. Values between commas define table-cell contents, strings or numbers. The format is relatively flexible, relatively easy for machines and even humans to read and write.
In terms of user stories, it answers the following requirement.
As a: database maintainer
I want to: import information from external files, particularly Comma-Separated Value files where rows appear on each line and columns within each row have comma delimiters
So that I can: select columns (and rows) from the CSV files and merge these to create other new pieces of information.
Question is, How do you import CSV files in Rails 3. This article attempts an answer, or at least part of an answer, enough perhaps to point you in the right direction. You can find the sources here at GitHub.
Prerequisites
- Rails 3
- FasterCSV gem (Ruby parlance for an installable package)
- JQuery
Why?
That is an important question. Has this sort of thing not been done before, many times? Well, yes. This is nothing new. But it does offer a fresh look at the subject, especially in the light of the upcoming new version of Rails and with preference for jQuery over Prototype for Rails’ JavaScript library. If you like, consider this a scenic voyage into Rails 3 and jQuery. The seas are fairly calm even though only at Beta.
Check your versions
Entering rails -v on the command line reports,
Rails 3.0.0.beta4
At the time of writing this version is the latest prerelease. Install it using sudo gem install rails --pre; option pre allows installation of prerelease versions, i.e. betas.
Doing similarly for RubyGems using gem -v gives,
1.3.7
Test-driven approach
Begin with a new Rails application.
The command to create a new application has changed since Rails 2. The Rails project now brings together the various disparate scripts and generators under one umbrella, so to speak. New applications now appear when you ask,
rails new APP_PATH
where APP_PATH specifies where your Rails application tree will reside on the local file system. The exact name does not actually matter too much. Although the name does become part of the application proper: it names the application module, and appears in the cookie store key; but you can change these later if you wish.
I will give it a long name. The following command creates a new folder called import-csv-files-in-rails-3. Rails fills it with basic structure and content, enough to get you started.
rails new import-csv-files-in-rails-3
The application is ready to run already. It does not do anything yet, of course. However, you can launch the server and access the application at localhost:3000. Note, syntax for launching a local server has also changed in Rails 3. You no longer enter script/server; it is now in the slightly-less wordy form,
rails s
s for server, naturally.
Test bed
As a simple starting point, imagine a Rails application comprising people and places. People have a first and last name; places have an address. Let the application’s database store these two entities, along with these attributes. Person entity with first and last name attributes, both strings. Place entity with address attribute, a simple string for now.

Generating the basic Rails scaffolding is very easy. Just two commands gives us the entities and their properties, together with corresponding controllers and views to access them via HTML and RESTful XML. g is short for generate in the following two commands.
rails g scaffold person first_name:string last_name:string
rails g scaffold place address:string
Migrate the database using rake db:migrate to update the database schema. By default Rails will use SQLite. Good enough for this exercise. You can then launch a local server to access people at localhost:3000/people and places at localhost:3000/places.
Once set up, the goal will become to import people and place information using CSV files. This is the real challenge.
Test import
One more thing. You may not have a list of people and places handy. But a CSV file or two will definitely be very useful for testing. So just for testing purposes, let us write a script to generate a file.
See below. Copy this and run it to generate some CSV files comprised of random words using Dan Croak’s Webster gem, a package for randomly generating short non-offensive words. You will need to install the gem using sudo gem install webster. Do the same for fastercsv if you haven’t already.
#!/usr/bin/env ruby
options = OpenStruct.new
options.rows = 10
options.columns = 10
OptionParser.new do |opts|
opts.on('-r', '--rows [INTEGER]', 'Number of rows', Integer) do |x|
options.rows = x
end
opts.on('-c', '--columns [INTEGER]', 'Number of columns', Integer) do |x|
options.columns = x
end
end.parse!
webster = Webster.new
ARGV << '/dev/stdout' if ARGV.empty?
ARGV.each do |file_name|
FasterCSV.open(file_name, 'w') do |csv|
options.rows.times do |row_index|
row = []
options.columns.times do |column_index|
row << webster.random_word
end
csv << row
end
end
end
You will find this script here as a Gist. Download it, make it executable and run using,
./gen-webster-csv.rb test.csv
Two-stage process: upload and merge
For simplicity, the application design will take a two-stage approach to importing.
- Stage 1: upload
- Stage 2: merge
Here, upload refers to the action of receiving a raw CSV file from a browser and storing it somewhere within the application’s database or otherwise readily-accessible place. Merge refers to the action of moving selected columns to the database proper. In a real application, you might choose to fold both stages into one action. But for this exercise, we make them distinct. The first stage takes a CSV file and stores it. The second stage lets the user pick out one or more columns and choose where to move the data: to people or places within the database. This second stage does have an advantage: it allows the user to see the imported information and select and deselect columns. Only selected columns will merge and become useful to the app.
Uploading, stage 1
First upload the CSV file, then store it.
Uploading files
Uploading a file is straightforward. Just add a form with a ‘file field.’ When submitted, the form will send a POST request with the file contents attached.
Create a CSV controller with an import action.
$ rails g controller csv import
create app/controllers/csv_controller.rb
route get "csv/import"
invoke erb
create app/views/csv
create app/views/csv/import.html.erb
invoke test_unit
create test/functional/csv_controller_test.rb
invoke helper
create app/helpers/csv_helper.rb
invoke test_unit
create test/unit/helpers/csv_helper_test.rb
Notice, Rails creates a default controller, a GET-request route for the action, an ERB-based view for the action along with test and helper stubs. Hence, when an external browser sends the application a GET request at csv/import, Rails will match the route, find the ERB template, render the HTML response and reply. The ‘view’ only needs to incorporate a standard HTML form which the browser will promptly display.
Navigate to the import view, app/views/csv/import.html.erb and add the HTML form, as follows.
TextMate makes this very easy to type. Type ff then press Tab to lay out the basic form_for skeleton. Use : then Tab to add a key-value pair of the form :key => "value"; type the key, press Tab and type the value. Tab again moves you to the comma delimiter, and so on.
The most important part of the form is the :html => { :multipart => true } option. This ensures that the POST-request parameters will contain the actual file contents. The request will contain a parameter called upload (as defined by the first argument of the form_for) which maps to a dictionary with key csv mapping to a File object which the application can parse to extract the uploaded CSV. Within the Rails controller for handling the POST request, the File object will be found by accessing params[:upload][:csv].
When the user fills in the form and clicks Upload, the browser will send a POST request containing the CSV-file contents. The Rails application needs to respond. Hence it needs a route describing how the request and URL path translates to controller and action. Under Rails 3, this becomes very easy. Just add the following line to your config/routes.rb. Here, csv/import specifies the URL path; while csv#upload specifies controller and action respectively. When Rails routing sees a POST at that URL path, it will invoke method upload of the CSV controller, class CsvController.
post "csv/import" => 'csv#upload'
This route needs a corresponding action for the CSV controller. Add this method to your CsvController class, source at app/controllers/csv_controller.rb under the application’s source tree.
# While under development, just respond by rendering some in-line text.
# Send back the request parameters in JSON (JavaScript Object Notation)
# format, i.e. something reasonably easy to parse with the human eye.
render :text => params.to_json
end
Store the CSV contents
Where to store the imported CSV information? We could store the CSV file verbatim somewhere on a local file system; that would be typical. But let us assume that there is no local file system such as for web applications running in the cloud.
Storing the CSV temporarily in some kind of document-based key-value store like MongoDB or CouchDB would be another alternative. It would live outside the database. The application database would contain a link to the external ‘document’ and read it when required. But no. For now, let us keep it fairly simple. Instead, we will conceptually map the CSV to a simple data model and store the CSV information within the database itself. See data model below.

ImportTables have many ImportCells; ImportCells have row and column indices, string contents and belong to one particular ImportTable. The database will index the cells by their import_table_id (back-reference to the owning import table) in order to pull all cells belonging to a table quickly from the database into main memory.
Implementing this model in Rails proves relatively easy.
$ rails g scaffold import_table original_path:string
invoke active_record
create db/migrate/20100715123031_create_import_tables.rb
create app/models/import_table.rb
invoke test_unit
create test/unit/import_table_test.rb
create test/fixtures/import_tables.yml
route resources :import_tables
invoke scaffold_controller
create app/controllers/import_tables_controller.rb
invoke erb
create app/views/import_tables
create app/views/import_tables/index.html.erb
create app/views/import_tables/edit.html.erb
create app/views/import_tables/show.html.erb
create app/views/import_tables/new.html.erb
create app/views/import_tables/_form.html.erb
invoke test_unit
create test/functional/import_tables_controller_test.rb
invoke helper
create app/helpers/import_tables_helper.rb
invoke test_unit
create test/unit/helpers/import_tables_helper_test.rb
invoke stylesheets
identical public/stylesheets/scaffold.css
$ rails g model import_cell import_table_id:integer row_index:integer column_index:integer contents:string
invoke active_record
create db/migrate/20100715123207_create_import_cells.rb
create app/models/import_cell.rb
invoke test_unit
create test/unit/import_cell_test.rb
create test/fixtures/import_cells.yml
To establish the one-to-many association between import table and import cell, add two new lines: a has_many and a belongs_to. Your app/models/import_table.rb and app/models/import_cell.rb should look like this. The import table’s :dependent => :destroy option instructs Rails to delete its cells when you delete a table.
has_many :import_cells, :dependent => :destroy
end
belongs_to :import_table
end
For speedy performance, you also need to add an index for the import cells. Generate a migration.
$ rails g migration add_index_for_import_cells
invoke active_record
create db/migrate/20100715124738_add_index_for_import_cells.rb
Then edit the newly-created migration code, adding instructions to add and remove an index. Without this, whenever you ask Rails for a table’s cells using table.import_cells where table references an instance of ImportTable, the answer will arrive but slowly because the database cannot quickly lookup the cells given a particular import_table_id which uniquely identifies the import table. Note also, the index is non-unique. The answers to table.import_cells which navigate the one-to-many table-to-cell association is an array of many ImportCell instances.
add_index :import_cells, [:import_table_id]
end
remove_index :import_cells, :column => [:import_table_id]
end
end
Finally, do not forget to run the ‘up’ migrations to upgrade your database schema. Run rake db:migrate.
Upload action
Thus far, we have the means to store CSV table information, but do not yet have the code in place to parse the uploaded CSV into import tables. The CsvController method below uses the FasterCSV gem to parse the incoming file. Edit your app/controllers/csv_controller.rb source to look like this.
end
The code constructs a new import table passing the original file’s path; original_path is a column in the ImportTable database table. The upload method uses the build method on the to-many association between table and cell in order to build new cells and attach them to the table. After parsing, the method saves the table then redirects the user to its ‘show’ page so that the user can see the newly-imported data.
Add the following to Gemfile for Rails 3. This tells Rails to prepare the package for import. Without this you cannot do a require 'fastercsv'. Gemfile appears in the root directory of the application folder. Although, if you are using TextMate on a Mac, locating any file no matter where it lives is very quick using Command+T then start typing the name. A list of matches starts to appear. Just pick the one you want.
gem 'fastercsv'
Then to view the uploaded table, add these lines to your ImportTablesController’s show method.
Add the lines above after finding the import table, i.e. after @import_table = ImportTable.find(params[:id]). Then display cell contents in app/views/import_tables/show.html.erb by adding this piece of Ruby-embedded HTML.
Given the import cells collected by the CSV controller together with the index maxima, this view code builds a standard HTML table comprised of tr and td (table row and table data) tags. The row = line (line 3) optimises the iteration somewhat by first collecting together all cells matching the current row index. This is a smaller number of cells from which to select individual cells matching the column index within the inner loop.
Merging, stage 2
At this point, you can navigate to http://localhost:3000/csv/import. You can generate and drop a CSV file from Finder onto the Choose File button (a Mac shortcut for choosing a file) and click Upload. The Rails application then uploads and builds an import table of cells. By redirection, the app then displays the table. So far so good.
However, this is only half the story. What the user actually wants to do is to create People and Places. The CSV information is just the raw data. Some of the columns will contain names of people (first name, last name) while some columns may contain addresses. The columns just contain random words of course, unless you have some real data to play with. But you get the idea. The data needs merging with the other tables to create workable records.
Sketch below outlines the design for merging CSV import tables.

We will add a row of table headers to the import table viewing page. Initially they will contain empty ‘select’ buttons. The user starts by selecting which table to import into. After choosing, the page will automatically update the column headers with the appropriate choices for input field mapping. If the chosen table is Person, column choices switch to none, first name or last name. For Places, only two choices: none or address. The user can then assign columns to fields, clicking Merge when ready.
We therefore want the import-table show page to become dynamic. When the user selects the table, the selectors for column field must change appropriately. To make this happen, we take a page out of Ryan Bates’ book. Since the database records are static, we can set up a JavaScript array of mappings between tables and columns. When the user selects the table to import into, person or place, JavaScript within the page can respond by updating the column selectors.
Converting to jQuery with UJS
We plan to use jQuery rather than Prototype as our JavaScript framework. By default however, Rails has set up our application to include Prototype and script.aculo.us. Remove the following list of files.
public/javascripts/controls.jspublic/javascripts/dragdrop.jspublic/javascripts/effects.jspublic/javascripts/prototype.js
Replace the rails.js UJS script found in public/javascripts with the jQuery-based version found at GitHub. UJS stands for Unobtrusive JavaScript, new in Rails 3. By making use of HTML 5 enhancements, the UJS feature avoids littering HTML pages with embedded JavaScript. Download and install jQuery 1.4.1 in the same place, at public/javascripts.
There is one more thing to do before jQuery becomes the new default JavaScript framework for our application: create a new ‘initialiser’ to override Prototype and script.aculo.us as the default framework. Create a new Ruby script, config/initializers/jquery.rb; exact name is not critical. This will run when the Rails server initialises. Add the following code.
end
This will override the default JavaScript sources with jQuery 1.4.1 and its UJS helper. jQuery ready!
Table headers, table selection, Merge button
Next, we need to add the import table headers, a table selection and a Merge button to the current import-table viewing code. Add the following. I’ve commented out where the table contents go, just to make it a little clearer.
<% form_for :merge do |f| -%>
<table border="1" cellspacing="1" cellpadding="1">
<tr>
<% 0.upto(@column_index_max) do |column_index| %>
<th>
<%= f.select(column_index, []) %>
</th>
<% end %>
</tr>
# same as before goes here
</table>
<label><%= 'Select a table for merging:' %></label>
<%= f.select(:table, @tables.collect {|t| [t, t] }, {:include_blank => true }, {:onchange => "mergeTableSelected();" }) %>
<%= f.submit "Merge", :disable_with => 'Merging...' %>
<% end -%>
Some things to note about this code:
- It encloses the entire table within a form. That means that the form will send a POST request with the form parameters as fields. Within the receiving controller method, expression
params[:merge]will answer a Hash of values: one for each of the columns, and one for the selected table. - Initially, the table headers are null selections. It will produce select tags with no options. We will fill this with content using JavaScript when the user chooses which table (people or places) for merging into.
- Whenever the user changes the table selection, the browser will invoke a JavaScript function named
mergeTableSelected.
The code accesses an instance variable @tables. This is not yet defined. But we will now define it in the import tables controller as follows.
This code performs a little bit of Active Record introspection. It asks Rails for the tables associated with its database connection. The answer is an array of table names, strings. The Array select method de-selects the schema_migrations table, an internal table used by Rails. We want our app to ignore this one, in future possibly others too.
Merge table selected
Next, add the following JavaScript code to the show.html.erb amoung the import tables views. It injects a blob of JavaScript into the show page.
var columns = new Array();
columns.push(new Array('', ''));
function mergeTableSelected() {
var o = $('select#merge_table option:selected');
$('th select option').remove();
$('th select').each(function(i, select) {
select.options.add(new Option());
$.each(columns, function(j, column) {
if (o.text() == column[0]) {
select.options.add(new Option(column[1]));
}
});
});
}
When Rails renders the page, it iterates the introspected @tables, building an array of arrays called columns. Each element contains a table name and a column name. When the user picks a table, the mergeTableSelected() function accesses the selection, removes all existing options from the table header select tags and fills them with new options based on the column names matching the newly selected table name. Does that make sense? The JavaScript relies on jQuery of course. See jQuery documentation.
The merge
Finally, the actual merge operation. When the user clicks Merge, their browser posts the parameters to import_tables/:id where :id stands for the unique identifier of the relevant import table. We need to match this to a method on the import tables controller. Add the following route to routes.rb.
match 'import_tables/:id' => 'import_tables#merge'
And add the merge method implementation.
Hopefully, the in-line comments will help to elucidate what this method is doing and why.
Conclusions
What have we learned? Importing CSV is relatively straightforward using Rails 3 and jQuery. But the exercise has raised some outstanding questions and pointed forward to some interesting future directions.
Outstanding questions
- Uploading of CSV files does not deal with encoding issues. For the English language that might not be an enormous problem. But supporting other languages might begin to throw up bigger and bigger issues depending on how exotic the language needing conversion to Unicode (the internal encoding). There may also be number-formatting issues, e.g. some languages use commas for full stops (periods) within numbers.
- Column typing has not been considered. Some CSV columns may contain strings, others numbers. How these convert from CSV to real data within the application will depend on the conversion path between uploading and merging of data.
Future directions
- Some CSV files may have headings in the first row. Import merging might be friendlier if the user could ignore the first row by checking a box entitled, “Ignore first row.”
- The software does not yet support any form of row selection and de-selection. It merges all rows. A real application would need to trim off the fat: remove duplicates, sort by columns, combine rows by dragging and dropping cell contents in-between rows, etc.
30 days later
Great job! Pitty it doesn’t run on Rails 3.0.0.rc
about 1 month later
Good point, Arwed. Thanks. I noticed that too when upgrading from beta4 to rc. I’ve updated the code to support 3.0.0.rc at GitHub.