FieldWorks Database Port Plan

From LSDevLinux
Jump to: navigation, search

FieldWorks uses SQL Server Express for its database. It runs only on Windows. In order to run on other platforms, we have to use a different database.

Research and Select Databases Replacement Candidates

We have researched different databases several times over the past decade. The most recent study, done in 2005, turned up three potential candidates: MySQL, PostgreSQL, and Firebird. Each of them have their own strengths and weaknesses.

MySQL was the world leader, and had the most momentum. But its code base was the least mature, and the GPL license continues to be a constant source of tension. PostgreSQL has table inheritance, which is very attractive for our object-oriented mapping of classes. However, in the end, Firebird was selected. While it is largely unknown to Americans, it has an active user base in other parts of the world. Its small footprint and embedded capabilities have long been attractive for low power devices, and we may be able to make use of its custom collations.

Create Database Schema

The database schema is generated from a UML object model, drawn in a software application called Magic Draw. Whenever "the model" changes, the database schema changes. Our test databases, TestLangProj, Lela-Teli 2, and Lela-Teli 3, get regenerated. (Databases in the field must be updated, however, and we write migration scripts to accomplish this.)

The code that generates the SQL to create the new schema must account for differences in SQL. For example, Firebird expects semicolons at the end of each statement. SQL Server understands semicolons, because it is a SQL standard, but does not require them, and most of the SQL in our code was written without them. Other differences must be accounted for as well. For example, FieldWorks uses a bit data type in SQL Server. Firebird does not have a bit data type. (See Mapping SQL Server data types to Firebird data types.) A particular frustration is that Firebird limits names to 31 characters. Many of our generated names are longer than that. We were forced to shorten names in our existing code base, which took more than two months of labor.

A significant portion of schema generation comes from SQL code. For instance, tables aren't created with the CREATE TABLE command. Instead, a record is inserted into the Class$ table of the database. Nor are the fields of the table created with an ALTER TABLE command. Rather, records are inserted into the Field$ table. The insert triggers of Class$ and Field$ take care of generating the schema. They use a surprising number of stored procedures to do this. Therefore these triggers and stored procedures must be ported to the new database before the database schema can be created.

Load Test Database Data

The data for each of our test databases is stored in an XML file. The file that loads that data is called LoadXML. LoadXML is also used in FieldWorks to backup data into an XML file. LoadXML must be able to connect to the appropriate database and perform the necessary INSERT commands, with the appropriate data types.

Port SQL Procedures and their Unit Tests

We have tens of thousands of lines of SQL code that must be ported. These include stored procedures, stored functions, and triggers.

Rather than port all of the procedures over at once, the team decided to concentrate on porting only the SQL necessary to run the FieldWorks Translation Editor (TE). TE uses a minimum number of procedures. (But as noted above, a significant number of procedures must be ported to create the initial schema.)

To ensure that the procedures and functions work properly, each should have a unit test. Since most of the SQL was built prior to implementing unit tests in FieldWorks, these will need to be written from scratch.

Furthermore, existing unit tests and our current unit test architecture, TSQLUnit, is written specifically for SQL Server, in Transact-SQL (MS SQL Server's version of SQL). Our options to resolve this were:

  1. Port TSQLUnit to Firebird.
  2. Find a unit test framework in Firebird, and port all the existing unit tests to that.
  3. Find a unit test framework that would work with both SQL Server and Firebird.

The problem with the first two solutions is that they require redundant test code. Writing dozens of unit tests once for SQL Server, and then again for Firebird, seems to be a waste of time. Therefore, research was done to see if unit test code could be found for both.

Python has a unit test framework of its own, called PyUnit [1]. It is built into the language. Moreover, Python has a database API specification to connect to different databases [2]. Firebird has an API built to this specification called KInterbasDB [3].

SQL Server has at least one API built to this specification, but it did not support CallProc(). CallProc() is the method that calls a stored procedure. I believe I saw another that did support CallProc(), but did not support return values. To address this, I started playing with the .NET Data Provider. The result was an API that supports CallProc(). It is now on SourceForge as ipymssql. [4].

ipymssql uses IronPython [5]. The last time I tried to upload to SourceForge a download file (in 2007), I couldn't do it. Nor could local IT support help. Now that the proxy server has gone away in Dallas, I might have more success. Various TODOs are scattered throughout ipymssql. The most important one is probably that Cursor.execute doesn't yet support parameters.

Modify Application Code to Connect to the Database

The connection string to a Firebird database is different than it is for SQL Server.

Research is taking place give FieldWorks the ability to swap out the "back end". This means that FieldWorks could use a relational database such as Firebird or SQL Server, an object database such as db4objects, or even use XML as a datastore. The discussion is archived in Jira as FWC-48 ([6]).

Modify SQL in Application Code

Hundreds of SQL commands exist in the application code. Most of these commands are simple SELECT queries, and may need nothing more than a semicolon added to the end of them. However, other blocks of code have more complicated SQL commands, sometimes bundling several commands together in one string. Still other application code assembles bits of SQL together.

Various approaches have been suggested. The one proposed by John Thomson and John Hatton involves moving the SQL out of the various areas of the application and putting them in a set of classes. A ServiceLocator would determine which back end is in use. For more information, see the Jira issue FWC-48 ([7]).