Itzam Core

A Tutorial

 

Itzam Logo

Itzam/Core 4.0.3
Download: (tar.gz) (bzip2) (zip)
Open Source License (GPL)
Non-free, Closed-Source License: $749

Itzam/Java 2.1.1
Download: (source) (extension)
Open Source License (GPL)
Non-free, Closed-Source License: $749

Documents
Itzam Home Page
Core Library Tutorial (available 1 July 2008)
C API Documentation (available 1 July 2008)
Java Class Documentation

Itzam Core (or just plain "Itzam") embodies a relatively simple set of concepts. The example program, itzam-contact, shows most of Itzam's features.

Most of the processes and concepts used in Itzam are the same as those used "under the hood" by more complex database systems. Itzam is a nuts-and-bolts approach to database development, providing great flexibility and customization at the expense of less abstraction. You define how data is connected and associated; such control, however, requires careful understanding of your data.

Basic Concepts and Structures

Itzam associates key values with objects serialized in files, by creating one or more B-tree indexes to connect keys with objects. In many respects, Itzam is very object-oriented, with functions tied by name to target data structures in much the same way as a C++ program links methods to class objects.

Everything you need is defined in the itzam.h header files. You'll find several data structures and internal definitions inside; you can safely ignore most of this, since you'll be working directly with only a few core types and structures.

itzam_ref (simple type)

By default, an itzam_ref is the address (position) of information inside an Itzam-defined database. By default, Itzam uses 32-bit file pointers; defining the preprocessor macro ITZAM64 during compilation will create an Itzam library that uses 64-bit file pointers. The 32-bit file operations handle files with sizes up to 2,147,483,648 bytes (some file systems may impose a lower limit); using 64-bit file operations allows access to files up to 9 quintillion bytes long.

Using 64-bit file pointers will support the growing size of databases and 64-bit CPUs. Handling 64-bit values imposes a slight performance and has a significant affect on file size.

For practical purposes, itzam_int is simply a synonym for itzam_ref, reflecting different contexts. An itzam_int is a length or size, while and itzam_ref is a file reference (position).

itzam_status (enum)

Most Itzam functions return a status code, which can be set to ITZAM_OKAY, ITZAM_NOT_FOUND, or any of the other constants found in itzam.h. Note that many functions will call an error handler function for serious errors; itzam_status is designed for programatic purpose (e.g., discovering that a key was not found), as opposed to error handling.

itzam_datafile (struct)

An itzam_datafile is a structured data file for storing fixed- or variable-length records; it automatically writes new records in space freed by deleted records. This class is useful without any of the indexing capabilities inherent in itzam_btree. In some applications, it may be practical to store data in individual itzam_datafiles, indexing it via one or more itzam_btrees.

For the purposes of an itzam_datafile, consider "record" to be nothing more than a blob of binary data; the semantics of that "blob" is the province of your program. If you're talking to Itzam from Java, for example, you can serialize objects to and from byte streams stored in an itzam_datafile.

To handle variable-length records, itzam_datafile records the size of each record. When a record is deleted, the space it occupied is marked as empty. The file maintains a linked list of deleted record locations and their sizes.

Inserting a new record involves a traverse of the deleted list, looking for an empty record that is large enough to contain the new information. If the deleted list is empty, or the new record is too large to fit into any open slots, the new object record is appended to the file.

Reusing deleted record space has a drawback: it leaves dead space in the file when a newly-inserted record is smaller than the "deleted" space it overwrites. Deleted records also use space in the file until a new record is written into their location. If your record sizes vary widely, it may make sense to periodically compact the file by removing the wasted space, eliminating deleted records, and regenerating indexes. If your records are fixed-length, the data file shouldn't contain much waste space.

All functions that directly manipulate datafiles follow the naming pattern itzam_datafile_*.

itzam_btree (struct)

An itzam_btree associates key (index) values with a reference or an embedded record. In general, the reference is a file pointer that locates an object that "belongs" to the key value—but the reference can be interpretted however you like, given that the index implies no semantics on the 64-bit itzam_ref values associated with keys.

As a convenience, Itzam implements functions that store records in the index file itself, and use the associated itzam_ref for association with the key. In many case (including the example program), it makes sense to combine the index with records; in other applications, you may want a separate data file indexed by one or more independent indexes. Itzam provides great design flexibility, and can support relational and network database models.

The B-tree data structure maintains a list of keys in order, as determined by a user-supplied key comparison function. You determine the sort order the the comparison function used when opening or creating an itzam_btree. Just be certain that you always use the same comparison function whenever making changes to the index.

All functions that directly manipulate b-tree indexes follow the naming pattern itzam_btree_*.

itzam_btree_iterator (struct)

An itzam_btree_iterator is a list of the records in a database, and can be traversed in key order. It does not store records; instead, it is a forward and backward traversable list of records in order (as defined by the database's key comparison function.)

When you create an itzam_btree_iterator, it represents the current state of the itzam_btree. Changes to the index, especially adding or removing keys, will not be reflected in existing iterators. As with threading, you need to ensure that you either prevent index changes during the lifetime of an iterator, or that you get a new iterator after changes have been made. Higher-level abstractions in C++ or Python will automatically track iterators and provide index locking; Itzam Core is, however, a low-level tool, and should be treated like memory allocation in terms of caution.

In many ways, an itzam_btree_iterator can be used like a traditional database cursor. You can also treat it like a query, filtering records as your iterate through a list of records.

All functions that directly manipulate b-tree iterators follow the naming pattern itzam_btree_iterator_*.

A Contact Database Example

The Itzam Contact Keeper (ICK) is a very simple database of names, addresses, phone numbers and e-mail addresses. Download itzam-contact-1.0.1.tar.gz (134K) for a Linux/GTK+ project with complete source code. The core code below works for any operating system or graphical environment; the user interface is secondary to the underlying code that exercises an Itzam database.

This example is a complete program, but it lacks certain niceties—import and expert, for example— that would make it a practical application for day-to-day use. For the moment, it serves as tutorial material for Itzam Core; in the future, I might expand it into a "real" application, if people show an interest.

Contact Records

The contact record isn't anything fancy, just a struct with fields for various pieces of information.

// our data record
typedef struct
{
    char first_name[33];
    char middle_name[33];
    char last_name[33];
    
    char company[65];
    char address1[65];
    char address2[65];
    char city[65];
    char state[33];
    char postal[17];
    
    char email1[129];
    char email2[129];
    char web[257];
    
    char phone1[33];
    char phone2[33];
    char phone3[33];
    
    char comment[4097];
}
contact_record_t;

static const size_t MAX_KEY_LEN = 100;
static const char * KEY_FORM = "%-32s%-32s%-32s";

The KEY_FORM string is for use by snprintf in generating an index key from the components of the name.

Creating a New Database

In the case of ICK (yes, I recognize what the acronym spells), the database is a single itzam_btree file that also contains the contact data records. When the program starts, it checks for the existence of a database file in ~/.itzam-contact, and creates one if it isn't found.

    // create database
    itzam_btree database;

    if (ITZAM_OKAY != itzam_btree_create(&database,get_db_name(),ITZAM_BTREE_ORDER_DEFAULT,string_comparator))
    {
        show_message("Unable to create contact database",GTK_MESSAGE_WARNING);
        return;
    }
    
    // close database
    itzam_btree_close(&database);

The string comparison function is very simple:

static int string_comparator(const void * key1, const void * key2)
{
    return strcmp((const char *)key1,(const char *)key2);
}

Opening an Existing Database

Opening a database looks very much like creating one.

    // open database, read record
    itzam_btree database;
    itzam_btree_iterator db_iterator;

    // now open it again
    if (ITZAM_OKAY != itzam_btree_open(&database,get_db_name(),string_comparator,true))
    {
        show_message("Unable to find contact database",GTK_MESSAGE_ERROR);
        exit(1);
    }

Adding Records

Adding a record is a one-step process, since the data is stored in the index file itself.

static void add_record(const contact_record_t * record)
{
    // open database
    itzam_btree database;

    if (ITZAM_OKAY != itzam_btree_open(&database,get_db_name(),string_comparator,false))
    {
        show_message("Unable to open contact database",GTK_MESSAGE_WARNING);
        return;
    }
    
    // create a key
    char * key = make_key(record);
    
    // write record to database
    if (ITZAM_OKAY != itzam_btree_insert_rec(&database,key,strlen(key) + 1,record,sizeof(contact_record_t)))
        show_message("Unable to write record",GTK_MESSAGE_WARNING);
    
    // free key memory
    free(key);
    
    // close database
    itzam_btree_close(&database);
}

Retrieving Single Records

Reading a record involves calling itzam_btree_read_rec with a key and a buffer for holding the record associated with that key.

static void get_record(const char * key, contact_record_t * record)
{
    // open database
    itzam_btree database;

    if (ITZAM_OKAY != itzam_btree_open(&database,get_db_name(),string_comparator,false))
    {
        show_message("Unable to open contact database",GTK_MESSAGE_WARNING);
        return;
    }
    
    // write record to database
    if (ITZAM_OKAY != itzam_btree_read_rec(&database,key,record,sizeof(contact_record_t)))
        show_message("Unable to read record",GTK_MESSAGE_WARNING);
    
    // close database
    itzam_btree_close(&database);
}

Removing Records

Since the contact records reside inside the index file, ICK calls itzam_btree_remove with a NULL value for the second parameter, asking the function to remove the record along with its key.

static void remove_record(const char * key)
{
    // open database
    itzam_btree database;

    if (ITZAM_OKAY != itzam_btree_open(&database,get_db_name(),string_comparator,false))
    {
        show_message("Unable to open contact database",GTK_MESSAGE_WARNING);
        return;
    }
    
    // write record to database
    if (ITZAM_OKAY != itzam_btree_remove(&database,key,NULL))
        show_message("Unable to remove key and record",GTK_MESSAGE_WARNING);
    
    // close database
    itzam_btree_close(&database);
}

Retrieving a Set of Records with an Iterator

An iterator works much like an array index; it begins at a position within the index, and moves sequentially forward or backward much as an array index can be incremented or decremented. The following code fragment processed all the records in the contact database, in order by name.

    // get a record iterator
    itzam_btree_iterator_create(&database,&db_iterator);
    
    // if there are records, put them in the list
    if (itzam_btree_iterator_count(&db_iterator) > 0)
    {
        char pretty_name[128];
        gint i = 0;
        contact_record_t record;

        /* add data to the list store */
        for (i = 0; i < itzam_btree_iterator_count(&db_iterator); i++)
        {
            // read the record
            itzam_btree_iterator_read_rec(&db_iterator, & record, sizeof(contact_record_t));

            /***
                Do something with the record
            ***/            
            
            // next item
           itzam_btree_iterator_move_next(&db_iterator);
        }
    }

Changes to the database (deleting records, for instance) may cause an iterator to generate an exception. I strongly recommend against modifying a database while it is being iterated.

Conclusion

I hope this tutorial helps you understand the basic techniques for using Itzam. The package has much more depth than I've described here; it is possible, for example, to create complex queries by combining sets of keys obtained from different indexes. Itzam provides considerable freedom of design in exchange for increased creativity and responsibility on the part of the programmer.

If you have any questions, please e-mail me. I can't promise an instant turn-around for people using Itzam under the GPL -- but I do try to reply within 24 hours.

Thank you.

 
Send E-mail

Consulting Services
Scott's CV

FAQ
Scott's Books
Reviews
Bibliography

Privacy Policy
Legal Stuff



©  2008
Scott Robert Ladd
All rights reserved.
Established 1996