bobquasit: (Default)
bobquasit ([personal profile] bobquasit) wrote2004-10-05 09:46 am
Entry tags:

Unicode Compression

Okay. I know that not many people read this journal. Of those, I don't know how many work with Microsoft Access. But for the one or two who might, this should be a very useful tip.

Unicode compression is an option for every text field in an Access database. It allows Access files with text data to be greatly reduced in size. The larger the file, the greater the compression. But this isn't like zipping a file; it remains fully functional, and in fact it opens and runs proportionally faster. And unless you're using extremely unusual foreign language characters, you won't lose any data.

I took a two-gigabyte file, turned on unicode compression, and reduced it to under 50 MEGABYTES without losing any data. And it ran much, much faster (the 2-GB file actually crashed some computers here). Even a very small file can be shrunk by 20% or more.

I won't try to explain the underlying logic, so instead I'm grabbing an explanation from online:

Unicode is used in Access and Jet as the format for Text fields. This is a concern because Unicode employs two bytes per character, which for western character sets includes the normal ASCII value as the low order byte and zero for the high-order byte. This would make text fields twice as large as in older versions of Jet.Access, for no useful reason for many users.

To ameliorate this, Access/Jet includes a "Unicode compression" feature, which, in theory looks at the high-order bytes of characters in a text field, and if zero stores the string in a format that omits the zeros. Unicode compression is visible as a property of text fields in the Access Table Design window.

Unfortunately, there are a number of end-user and programming situations where Access or Jet defaults to Unicode Compression disabled. One such situation is where Access 97 databases are imported to Access 2000. It is possibly to manually go through each table, set the compression on, and then compact the database -- though it's sometimes not evident whether this has had the desired effect.

- "Access 2000, Unicode and File Size: Some Notes" by Graham Wideman


But in my own experience, unicode compression is ALWAYS turned off by default in Access 2000. And you can reduce large files to far less than 50% of the original files.

I had to experiment a little to make it work, and it didn't work the way I was told - so here's how to do it.

Quick caveat: I'm running Access 2000. Some of my data is sometimes imported from Access 97, but not always. I haven't tried any of this with any other version of Access. You would be wise to back up your data the first time you try this.

Open the database, and go to the design view of the first table. Choose the first text field, and look at the General properties. Third from the bottom is "Unicode Compression". Change it to "Yes".

Repeat for every text field.

Note: as far as I know, unicode compression ONLY applies to text fields. It definitely doesn't apply to number fields, although it does apply to numbers which are formatted as text. The listed data type is the controlling factor.

It's possible that there are other field types to which UC applies, but I doubt it.


Once you've turned UC on for every text field in the table, close the table. The table design will be converted on closing the design view; this can take a few moments, and you'll see a progress bar. For very large tables (upwards of 1,000,000 records) I have sometimes received a warning that conversion wasn't possible due to a lack of system resources. In that case I cancel the process, but when I open the design view again I usually find that quite a few of the text fields (in order, starting with the first) did convert. So far it has always been possible to continue the process and eventually convert the entire table design.

I should note that none of this takes a long time - minutes at most. Unless you have a hundred or more text fields, in which case it would, of course, take longer. Unfortunately there is no way that I know to turn unicode compression on for ALL text fields simultaneously - you have to do each one by hand.

Each table in the database should be fully converted to unicode compression. Actually, I suspect you only need to convert the tables you actually plan to use, but I am not 100% sure of that. I've always converted every field in every table in a database.

Supposedly, at this point unicode compression should shrink the file. But in my experience it doesn't. Access doesn't like to give up space on the hard drive. So you must create a new Access database, and import the tables you've altered directly from the previous database. Unicode compression will apply to that new database.

After you've done all this (not before) you can go to Tools: Database Utilities: Compact and Repair Database to gain a little more compression, if you want.