MDBM(3B) MDBM(3B) NAME mdbm: mdbm_open, mdbm_close, mdbm_fetch, mdbm_store, mdbm_delete, mdbm_first, mdbm_firstkey, mdbm_next, mdbm_nextkey, mdbm_pre_split, mdbm_set_alignment, mdbm_limit_size, mdbm_invalidate, mdbm_close_fd, mdbm_sync, mdbm_lock, mdbm_unlock, mdbm_sethash- data base subroutines SYNOPSIS cc [flag ...] file ... -lmdbm [library ...] #include <mdbm.h> MDBM *mdbm_open(const char *file, int flags, mode_t mode, int pagesize); void mdbm_close(MDBM *db); datum mdbm_fetch(MDBM *db, kvpair kv); int mdbm_store(MDBM *db, datum key, datum content, int flags); int mdbm_delete(MDBM *db, datum key); kvpair mdbm_first(MDBM *db, kvpair kv); datum mdbm_firstkey(MDBM *db, datum key); kvpair mdbm_next(MDBM *db, kvpair kv); datum mdbm_nextkey(MDBM *db, datum key); int mdbm_pre_split(MDBM *db, uint64 pages, int flags); int mdbm_set_alignment(MDBM *db, int power); int mdbm_limit_size(MDBM *db, uint64 pages, int (*func)(MDBM *db, datum key, datum content, void *priority)); int mdbm_invalidate(MDBM *db); void mdbm_close_fd(MDBM *db); void mdbm_sync(MDBM *db); int mdbm_lock(MDBM *db); int mdbm_unlock(MDBM *db); int mdbm_sethash(MDBM *db, int number); int mdbm_set_chain(MDBM *db); datum mdbm_chain_fetch(MDBM *db, kvpair kv); kvpair mdbm_chain_first(MDBM *db, kvpair kv); kvpair mdbm_chain_next(MDBM *db, kvpair kv); int mdbm_chain_store(MDBM *db, datum key, datum val, int flags); int mdbm_chain_delete(MDBM *db, datum key); datum mdbm_chainP_fetch(MDBM *db, kvpair kv); kvpair mdbm_chainP_first(MDBM *db, kvpair kv); kvpair mdbm_chainP_next(MDBM *db, kvpair kv); int mdbm_chainP_store(MDBM *db, datum key, datum val, int flags); int mdbm_chainP_delete(MDBM *db, datum key); int mdbm_set_dataformat(MDBM *db, uint8_t dataformat); datum mdbm_test_and_set(MDBM *db, kvpair kv, datum storage); int mdbm_compare_and_swap(MDBM *db , datum key, datum oldval, datum newval); int mdbm_compare_and_delete(MDBM *db, datum key, datum oldval); int mdbm_atomic_begin(MDBM *db, datum key, mdbm_genlock_t *storage); int mdbm_atomic_end(MDBM *db); uint32 mdbm_check(char *file, uint32 flags); uint32 mdbm_repair(char *file, uint32 flags); int mdbm_bytes_per_page(MDBM *db, int *pages); int mdbm_elem_per_page(MDBM *db, int *pages); DESCRIPTION The mdbm routines are used to store key/content pairs in a high performance mapped hash file. Mdbm databases use fixed size pages, but the page size can be set on first open. The database does per page writer locking, and readers can detect and automaticly deal with writers. This allows for scalable multiple simultaneous readers and writers to be operating on the same mdbm file at the same time. Core functions are built into libc, the rest of them require linking with libmdbm. The functions in libc are: mdbm_close, mdbm_close_fd, mdbm_fetch, mdbm_first, mdbm_firstkey, mdbm_invalidate, mdbm_next, mdbm_nextkey, mdbm_open, mdbm_sethash, mdbm_store, mdbm_sync. Before a database can be accessed, it must be opened by mdbm_open. This will open and/or create the given file depending on the flags parameter (see open(2)). The pagesize parameter is a power of two between 8 (256 bytes) and 16 (65536 bytes). Once open, the data stored under a key is accessed by mdbm_fetch . The kvpair argument must be setup as follows: typedef struct { char *dptr; int dsize; } datum; typedef struct { datum key; datum val; } kvpair; kv.key.dptr should point to the key, and kv.key.dsize should be set to the length of the key. kv.val.dptr should point to allocated memory, which will be used to copy the result into. kv.val.dsize should be the size of the data. If kv.val.dptr is null, the returned datum's dsize field will bet set to the size of the value. It is always sufficient to allocate MDBM_PAGE_SIZE(db) bytes. A linear pass through all keys in a database may be made, in an (apparently) random order, by use of mdbm_firstkey (mdbm_first) and mdbm_nextkey(mdbm_next). Mdbm_firstkey will return the first key (key/content) in the database. Mdbm_nextkey will return the next key (key/content) in the database. Space for both the key (and the value) must be allocated as above. The following code will traverse the database retrieving every key and it's associated value. #include <mdbm.h> #include <alloca.h> int pagesize = MDBM_PAGE_SIZE(db); char *key = alloca(pagesize); char *val = alloca(pagesize); for (kv.key.dptr = key , kv.key.dsize=pagesize, kv.val.dptr = val, kv.val.dsize=pagesize, kv.key = mdbm_first(db,kv); kv.key.dptr != NULL; kv.key.dptr = key , kv.key.dsize=pagesize, kv.val.dptr = val, kv.val.dsize=pagesize, kv.key = mdbm_next(db,kv)) It is possible to use mdbm_firstkey and mdbm_nextkey to traverse the database retriving only the keys. Data is placed under a key by mdbm_store. The flags field can be either MDBM_INSERT or MDBM_REPLACE. MDBM_INSERT will only insert new entries into the database and will not change an existing entry with the same key. MDBM_REPLACE will replace an existing entry if it has the same key. A key (and its associated contents) is deleted by mdbm_delete. The mdbm_pre_split function can be used immediately after first open to expand the directory to a specified size. This helps to reduce the number of pages that will be split when storing a large amount of data. The pages parameter is a power of two for the depth of the tree. The pages parameter is used to compute the number of pages in the database. The formula for this is (2^pages). The contents may be forced to exist at a particular byte alignment using mdbm_set_alignment so that you can store structures in the database. The parameter is a power of two between 0 and 3. The only time this is needed is when accessing non-copied data from a mdbm file. This is not safe unless all access are protected by calls to mdbm_lock and mdbm_unlock, which is not recommended. The flags field can be 0 or MDBM_ALLOC_SPACE. MDBM_ALLOC_SPACE attempts to pre-allocate blocks to back the file if it is on a holely filesystem. On the fly memory checking of mdbm database can be requested by passing the MDBM_MEMCHECK flag to mdbm_open. If this is done during the creation of a database, all subsequent opens of the database will inherit this requirement. This checking incurs a measurable performance penalty on all operations. All mdbm functions can operate in a synchronous mode by passing the O_SYNC flag to mdbm_open. All changes to the database will case a msync(2) of the appropriate section of the database to be performed. This incurs a significant performance penalty on all mdbm_store operations. The mdbm_limit_size function can be used after first open to set a maximum size limit on the database. If supplied func is a function which will be called with each item on a page if an item needs to be stored, but there is no space. The priority argument is an integer between 1 and 3 giving the pass number. After three passes through the data, if space has not been freed the store will fail. The pages parameter is used to compute the number of pages in the database. The formula for this is (2^pages). mdbm_sync simply forces the data to disk. Since the database is a mapped file most work is simply done in memory, and the state of the disk file is only guaranteed after mdbm_sync. Mdbm_invalidate should be called to mark a mdbm file as invalid or outdated. This should be done before unlinking the mdbm file. If this is not done long running programs that may have the mdbm file open will not detect that another process has remove the file. If the database is a fixed size then the associated file descriptor can be closed using mdbm_close_fd. Otherwise the database will grow the file and possibly remap it into memory as data is added. The routines mdbm_lock and mdbm_unlock can be used to synchronize access to an mdbm file. These use a very unsophisticated spinlock in the shared file to do the locking. It is almost always unnecessary to use these functions, and their use is highly discouraged. mdbm_sethash will set the has function of a database to number. This must be done immediately after creating the mdbm file. The default hash is 0. 0 32 bit CRC. 1 ejb's hsearch hash. Not recommended. 2 Phong Vo's linear congruential hash. Good for short keys. 3 Ozan Yigit's sdbm hash. Good for long keys. 4 Torek's Duff device hash. Good for ASCII key. Bad for binary key. 5 Fowler/Noll/Vo prime hash. Very good for keys under ~20 bytes. Under normal usage all mdbm databases have a limitation on the size of key/value pairs that can be stored. The available space for a single key/value pair is (pagesize - 10) bytes, where pagesize is determined by raising 2 to the pagesize argument passed to mdbm_open when the database is created. Thus with pagesize set to 7 a page is 128 bytes long (2^7 bytes), and with pagesize set to 12 a page is 4kb (2^12 bytes). If you know the maximum size of key/value pair that you will be storing then you should use this parameter to tune the mdbm database. If you dont know the maxium size of a key/value pair, then you should use the mdbm_chain interface described below. There is always a slight performance penalty when using the mdbm_chain interface. There are a number of functions that allow storage of unlimited size values via mdbm. These are the the mdbm_chain_* functions. Before any keys are stored in an mdbm database, use the mdbm_set_chain routine to enable this feature in a given database. Use mdbm_open and mdbm_close with a mdbm file that supports chaining. It is impossible to get a pointer to the copy of the data in the database by passing a NULL value. In this case the val.dsize field will be set to the size needed to retrieve the data. mdbm_chain_fetch, mdbm_chain_store, and mdbm_chain_delete behave similar to the non chaining versions. mdbm_chain_first and mdbm_chain_next do NOT return the value, and kv.val should have dptr set to NULL with dsize set to 0. On return dsize will be set to the size needed to copyout the data. When chaining is enabled, the first character of a key may not start with a NULL byte unless the MDBM_CHAIN_NULL flag is passed to the mdbm_chain_store function. All keys starting with NULL are ignored during mdbm_chain_first and mdbm_chain_next. There are a number of functions that will do the basic mdbm operations on mdbm files correctly supporting chaining if enabled for a given database. These are the mdbm_chainP_* functions. They can be used in place of the associated mdbm_* or mdbm_chain_* functions. The mdbm_set_dataformat function allows a program to set a program specific database format identified that can be accessed with the (uint8_t) mdbm_dataformat (MDBM *db) function. This dataformat is set to 0 by default, and is only used by programs makeing explict use of this feature. The mdbm_fetch, mdbm_first, mdbm_firstkey, mdbm_next, mdbm_nextkey, and mdbm_store functions as well as the mdbm_chain_* and mdbm_chainP_* counterparts are atomic in nature. There are a number of complex atomic operators that can fetch and store keys in an atomic manner. The mdbm_test_and_set function takes a new key/value pair, and sets key to the value. The current value of key will be returned in the storage provided. The mdbm_compare_and_swap function takes a key and compares the current value it is associated with with oldval. If they are exactly the same key will be set to newval. The mdbm_compare_and_delete is similar, but if the current value is oldval, the key will be delete. Atomic creates can be done with mdbm_store with the MDBM_INSERT flag set. The mdbm_atomic_begin and mdbm_atomic_end functions allow complex atomic functions to be built. mdbm_atomic_begin may be called multiple times for multiple keys. Then use the standard mdbm functions to access or modify these keys. Then call mdbm_atomic_end. Only one call to mdbm_atomic_end is required. The third argument is the address of storage large enough to hold a mdbm_genlock_t. It is suggested that this is the address of a variable on the stack. It is not safe to next mdbm_atomic operations. Do NOT call mdbm_compare_and_swap between mdbm_atomic_begin and mdbm_atomic_end. Use of atomic operators increases the possibility that a program will crash while holding a database lock. If this happens a database can be checked and repaired. It is NOT safe to repair a database while any other processes or threads are accessing it. But it is safe to check a database. The mdbm_check and mdbm_repair functions work on a mdbm file and take a set of flags telling them what to check or try to repair. The functions return a set of flags telling what checked out bad, or was unrepairable. The output of mdbm_check can be used as the input flags to mdbm_repair. The macro MDBM_CHECK_ALL requests that all possible error cases are check. mdbm_bytes_per_page, and mdbm_elem_per_page take a pointer to an array of length MDBM_NUMBER_PAGES (MDBM *db) which will be filled with the appropriate statistic. DIAGNOSTICS All functions that return an int indicate errors with negative values. A zero return indicates ok. Routines that return a datum indicate errors with a null (0) dptr. If mdbm_store called with a flags value of MDBM_INSERT finds an existing entry with the same key it returns 1. If there is an error during an mdbm call, the global error value errno will be set. The following errors will will be set. EBADF During the open or access of a database the file version was not valid, or internal structures are invalid. This implies that the mdbm file is corrupted. EBADFD During the open of a database the file magic number was not valid. File is not a recognised mdbm file. EBUSY During a mdbm_compare_and_swap or mdbm_compare_and_delete the current value in the database is not oldval. EDEADLK A accessed page was locked, and after a number of attempts did not become unlocked. EDEADLOCK During a page lock, a livelock was detected and held locks may have been broken. EEXIST A store with the MDBM_INSERT flag set failed since the key already exists. A function that changes header parameters is attempted when there is data already stored in the database. EINVAL An invlid argument was passed. ENOEXIST The requested key does not exist in the database. ENOLCK A request to find the lock on a page that should be locked by the current process failed. ENOMEM Internal allocation of memory failed. The only allocation that is performed is the MDBM structure that is returned by mdbm_open which is freed by mdbm_close. ENOSPC There is insufficient space in the database to store the requested key or if no shake function is defined for a fixed size database. EPERM Permission to perform a write to the mdbm file is denied. ESTALE The mdbm file has been invalidated. It is suggested that an attempt to re-open the database should be made, since a writing process may invalidate an mdbm file, and create a new one with the same filename. CAVEATS The file is designed to contain holes in files. The EFS file system does not implement holes, so the file will frequently be significantly larger than the actual content. The sum of the sizes of a key/content pair must not exceed the internal block size (defaults to 4096 bytes). Moreover all key/content pairs that hash together must fit on a single page. The page size can be set to a maximum of 64KB on first open. Mdbm_store will return an error in the event that a disk block fills with inseparable data. The order of keys presented by mdbm_first, mdbm_firstkey, mdbm_next and mdbm_nextkey depends on a hashing function and the order they were stored in the database, not on anything interesting. Mdbm_delete does not physically reclaim file space, although it does make it available for reuse. It is incorrect to truncate a mdbm database. The correct procedure is to create a new database, open the old database, rename the new database on top of the old database, invalidate the old database, unlink the old database. Page 8