I make no representation whatsoever about the production-readiness of WordGratis. I started working on it a couple of years ago to improve my knowledge of the Microsoft Word object model (I had programmed in Visual Basic, Excel, and Access VBA for years, but never had occasion to develop an
... [More]
application in Word). WordGratis came about as a result of looking for an interesting problem to solve while learning Word's object model.
I am releasing all of the source code, as it is highly unlikely that I alone would ever have the time needed to make WordGratis a viable standalone CAT tool. I would ask two things, however. First, that you give me credit in any code you may modify or reuse and/or provide a link to http://www.perly-gates.com/The_Perly-Gates/WordGratis.html. Second, that you upload any improvements you make back to this website, so everyone else can benefit.
For those of you who are interested in the code itself, you will see that I tried various approaches to searching TMs and glossaries. I started with the simplest (i.e. looping through each line of the file looking for a match) and graduated to using SQLite, an open-source, free database engine contained in a single file on disk. In particular, I am using dhSQLite, a Visual Basic wrapper for SQLite. dhSQLite is a pleasure to program with, as it closely mimics the ADO classes used natively by Visual Basic to access databases.
Obviously there are many areas in which the code can be improved. One of the most interesting (and unexpectedly so, naively on my part) was the challenge of identifying a segment. I made the decision to use sentence segmentation (an option which could very easily be changed) early on, as paragraph-level segmentation didn’t seem to be very popular in most of the CAT tools that I briefly read about. You can see the resulting wildcard string used to select the sentence segment in the "SelectSentence" sub in the ‘segments’ module. I won’t pretend that the "SelectSentence" sub works flawlessly in all cases, so any suggested improvements would be very welcome. Considering this is the entry point to all the other functionality like glossary matching and TM fuzzy matching, this is a rather important area to focus on.
I also make no claims on user interface programming ability. In other words, please feel free to modify the look and feel of any of the toolbars, menus, or forms. I have also not verified that all these user elements work under Word 2007, as I’m still on 2003.
The other interesting challenge (as if the above aren’t enough) is the SQL behind the glossary and TM matching. At first I was using the FTS2 module of SQLite to provide full-text scanning of the glossary and tm tables. However, this module didn’t seem to work very well with the version of dhSQLite I was using at the time. Prior to uploading this project to Google Code, however, I updated to the newest version of dhSQLite, which may have some improvements in this area. SQLite itself also now supports FTS3, which may have added some compelling functionality. For those that don’t know, FTS3 is developed by Google and is used for text matching in their search engine. They generously donated it to SQLite for free.
Due to its open-source nature, using SQLite also presents advantages when accessing the data in your TMs and glossaries. There are many, many tools out there (besides the ones in WordGratis itself) that you can use to import and export data from your "WordGratis.db" SQLite database. Using WordGratis does not lock your data in a proprietary format. You can access YOUR data when and how YOU want.
Ok, let's get started. In order to install WordGratis, you can download the WordGratis.zip, extract the contents to a folder, then launch "install WordGratis.doc". This document contains a macro that will copy the SQLite dll's to the Word application's startup path and register them with Windows. Which brings me to another relevant point - WordGratis is currently Windows-only. However, SQLite comes installed with Mac OS X, so it should be possible to port WordGratis. [Less]