Member of The Internet Defense League Últimos cambios
Últimos Cambios
Blog personal: El hilo del laberinto Geocaching

Know How Durus 3.4

Última Actualización: 05 de octubre de 2006 - Jueves

This document was published in the 20061005 release of BerkeleyDB Backend Storage Engine for DURUS.

DURUS is an excellent persistence system for the Python programming language.


This document is not related to the BerkeleyDB storage engine for Durus, but tries to clarify Durus operation and inner working.

This document describes operation of Durus 3.4 release.

If you find an error in the documentation, or would like it to be expanded to include new topics, send a mail to jcea@jcea.es.

For ease navigation in this document, each section begins with three "###". You can use such a sequence to go around this text. Each section, also, documents the date of last modification.

### Concurrency using the Durus Storage Server (20060421)

The Durus Storage Server allows a single storage to be shared between several (remote) clients. So you can access the storage remotely, and writes by a client will be visible to others.

Durus Storage Server will listen for requests from all clients connected, but when any request arrives, the server will be busy ONLY with that client alone. Other requests will be queued until finished. If that client is very slow or the disk access is slow, the server will sit idle, even if other clients are demanding attention.

Hope a future Durus release can process multiple read requests in parallel. Each client would wait less, and the disk would be better utilized (better sort multiple seeks to serve several request that a long seek to serve only one request).

Remember, nevertheless, that Durus clients have a local cache to avoid hitting the storage server. Sizing that cache, and learning how to use it in an effective way, are important issues in any demanding Durus deployment.

### ACID using the Durus Storage Server (20060512)

ACID = Atomicity, Consistency, Isolation, Durability

DSS = Durus Storage Server

Since DSS only processes a request from a single client at a time, commits are atomic. No other client will be served until the commit completes.

The Durability is garanteed by the Storage Backend used by Durus. Some backends (for example, my BerkeleyDB Storage backend) can be configured to not garantee Durability in exchange of (vastly) improved performance. Some applications can take advantage of that. Some other requires durability.

Transactions under DSS are Isolated. If you don't do any dirty trick, DSS garantee a "degree 3 isolation". That is, you only see committed data, and reads are repeatable.

You shouldn't do it, but if you manually request a cache shrink, DSS would only garantee "degree 2 isolation". That is, you could get different data in two reads to the same object.

This could be a bug: http://mail.mems-exchange.org/durusmail/durus-users/514/

In responde to this, David Binger <dbinger@mems-exchange.org> says, in a private email:

I don't think this is right. If you read something that has been committed by another client since your last commit or abort, you should get a ReadConflictError.

Of course, he is right. A read of an object modified by another connection raises a conflict exception. But this exception is raised by the DSS, not the client code, so I missed it in my code study. Thanks to David Binger by pointing it out.

So, transactions in Durus are actually garanteed "degree 3 isolation".

BEWARE: http://mail.mems-exchange.org/durusmail/durus-users/610/

Consistence is provided also by the Storage Backend used by Durus. It implies that no transaction can leave the Storage in an inconsistent state physically. If the application logic has integrity constraits, it must be enforced by the application.

### Durus Storage Server conflicts (20060620)

Durus clients implement a local cache to improve performance, avoiding DSS accesses. Objects fetched or written are keep in a cache. The cache size is configurable, and evictions are transparent.

The eviction routine can be directly called or, better, automatically done when you do a transaction commit or abort.

BEWARE: http://mail.mems-exchange.org/durusmail/durus-users/610/

Cache consistency is checked when you do a commit or abort. Each connection has its own cache, even in the same process.

If you do an abort, locally modified objects are purged. If the cache has objects that other client modified, they are also purged. So, after an abort, your cache only keep unmodified objects, both locally and remotely.

If you do a commit, it will fails if you touched any object also remotely modified by another client, during your current transaction. This is a big improvement in Durus 3.4. If your commit conflicts, the eviction procedure will be like the abort case.

BEWARE: http://mail.mems-exchange.org/durusmail/durus-users/623/

If your commit successes, your cache will purge remotelly modified objects not used in current transaction.

If your code touches an object not in cache, and that object was modified remotely since your current transaction started, you will get a conflict in that moment, before the commit/abort.

Some discussion about this issue:

http://mail.mems-exchange.org/durusmail/durus-users/508/
http://mail.mems-exchange.org/durusmail/durus-users/514/

Another important issue is that DSS keeps a changeset per connected client, with the OIDs of the objects changed (and commited) by the other Durus clients. That changeset is sent (and then cleared) to its client when it does a commit or an abort, in order to synchronize its cache. This system has two consequences:

  1. An idle Durus client will have a growing changeset stored in the DSS, waiting for a commit/abort. If the storage write rate is high, could be advisable that idle clients did a periodic "abort" to sync its cache and keep the changeset size low enough.

    If the "idle" client has a very low "duty" cycle, could be better to simply break the DSS connection.

    The changeset size depends of the number of objects changed, and the change rate. But if you have a lot of writes to a small object set, changeset size will be small. It tracks what objects are changed, not how many changes where done to an object.

  2. If a client is going to start a new transaction, but its last activity was time ago, it is advisable to do an "abort" just before beginning the transaction, to synchronize the "cold" cache and reduce the risk of Durus discovering stale data at commit time.

    http://mail.mems-exchange.org/durusmail/durus-users/379/

    Also, keep your transactions as sort as you can, to reduce conflict risk.

The usual approach to conflicts is to abort the transaction, repeat the computation with up-to-date data, and try again. All the times you need.

### Persistent object references and transactions (20060520)

Keeping around persistent object references between transactions is calling for trouble. You SHOULDN'T do that.

Your references can get stale without notice, specially using Storage backends like my BerkeleyDB Storage backend, that deletes garbage objects promptly. Example:

http://mail.mems-exchange.org/durusmail/durus-users/397/

Other problems with keeping references around are:

  • Unbounded object cache growing, since a reference keeps the object in memory.

You should always access objects from "root" object, without keeping intermediate references, unless you know what are you doing and the inner lifecycle of those objects. You can only safely keep references while in a transaction, but discard them at commit/abort time.

You can keep weak references to loaded objects, nevertheless, across transaction boundaries. More info later.

About this section, David K. Hess <dhess@verscend.com> tells me in a private email:

If you are using FileStorage rather than ClientStorage, then (at least it seems in my testing and usage) persistent references to objects are not only safe but handy. When combined with a regular call to commit(), it becomes a very clean, easy and efficient way to persist a singe application's internal data structures to disk. This results in having a snap shot that is very easy to reload application state from in case of a crash/shutdown.

Yes, if a program is the only "mutator" to its data structures, then you don't need to travel all the path from the root, if you are sure that a given object has no vanished under your foot.

### Durus clients, threading and multiple connections (20060424)

Durus is not threadsafe, but you can use it in a threaded python program if you take care:

  • In a given moment, only a thread should have access to objects fetched tru a given durus connection. That thread is the only one adequate to do a commit/abort.

    This is critical, since DSS accesses from multiple threads could be intermixed and a crash would be the better outcome (you would crash the DSS, also, and corrupt your storage database).

  • Different threads can access DSS if each one uses different DSS connections. Objects from different connections MUST NOT be shared. The rules of the previous point applies.

    You can't coordinate transactions between different DSS connections.

    Since sharing objects is forbidden, you can only exchange state between connections, even in the same program, going tru the DSS, using normal transactions.

  • The "do not mix objects fetched from different DSS connections" rule applies also for single thread processes, if they use multiple DSS connections. You can't coordinate transactions between connections, but this pattern can be useful to keep different data over different connections for, for example, better durus cache behaviour. Beware, nevertheless.

  • A newly created persistent object can only be linked to objects from one connection. When the object is created, it it free to be linked from everywhere. But when it is linked to other persistent objects and the program does a "commit", the new object will be vinculated to that connection. More info later.

  • If you commit object changes based on data from objects owned by other connections, you risk commiting data based on stale info, since conflict logic can't detect that dependencies became outdated.

    Don't do that unless you know what are you doing.

  • Same object fetched from different DSS connections will be different objects in RAM. If modified, the first "commit" wins. The others will get a conflict.

### Data compression in the Storage (20060424)

By default Durus stores object data compressed on disk. The algorithm used is zlib (http://www.zlib.net/)

In some situations compression could be inconvenient. For example, the data is big and already compressed (let say, graphics). Or, perhaps, a better algorithm could be used with that data.

You can disable compression simply setting "WRITE_COMPRESSED_STATE_PICKLES" to False in "durus.serialize". This way Durus will save new and modified objects uncompressed. Durus will load correctly both compressed and uncompressed objects, nevertheless, so you don't need to update all your database.

If you need to personalize your compression, you can follow advice by David Binger: (http://mail.mems-exchange.org/durusmail/durus-users/492/)

Here's what I would do. Set WRITE_COMPRESSED_STATE_PICKLES to False. Add __getstate__() and __setstate__() methods to your persistent classes that provide the customized compression behavior. If you want compressed pickles for a certain class, make the __getstate__() return a compressed pickle of self.__dict__ instead of the dict itself. The __setstate__() must have the corresponding inverse behavior.

A curiosity note: all zlib streams starts with the char "x", so if your "__getstate__()" returns a string starting with "x", when loading Durus will try to unzip it. It will fails, of course, and then your "__setstate__()" will be called. So, if you are worried about efficiency, be sure your "__getstate__()" strings never starts with a "x" char :-).

### Weak references (20061003)

Your program can keep internal (non persistent) weak references to persistent loaded instantes. Those references will be cleared automatically if necessary: cache shrink, conflicts, etc. You can keep such references across transactions boundaries.

Persistent objects can't have weak references to other persistent objects to ease garbage collection in the Storage. All interobject references in the Storage will be strong.

Nevertheless you can simulate sort-of weak references by hand using the internal OID of referenced persistent objects, since you can use the connection's "get()" method to load a persistent object given its OID. This manually managed reference doesn't preclude garbage collection of referenced objects if their reference count goes to zero. Just like standard weak references.

Some details:

  • This usage pattern is not recommended by Durus developers. See http://mail.mems-exchange.org/durusmail/durus-users/712/

  • Current Durus code does TWO database fetches when doing connection's "get()". This is a Durus bug and should be solved in a future release. See http://mail.mems-exchange.org/durusmail/durus-users/701/

  • An object fetched via the connection "get()" is not strong referenced, so it can vanish under your feet if other durus client/thread makes its reference count go to zero.

  • Initially created persistent objects have no OID until transaction commit, so you can't store their OID's until commited by a previous transaction.

### Implicit object loading/dump (20060424)

Transparent object loading/dumping is the key to a successful persistence system. The details are simple to understand when you "get it". You can read more about this in:

http://mail.mems-exchange.org/durusmail/durus-users/533/

Some random details:

  • A newly created object doesn't get its OID until you do a "commit".

  • Objects are loaded in RAM when you access (read or write) any of its attributes.

  • When an object is loaded, its state will be in RAM and available. Any persistent reference in that object will create a "ghost" object. That is, an "empty" object of the right class, ready to load its state if you touch it.

    So if you load a persistent object with references to 1000 other persistent objects, only the state of the parent object will be loaded, but 1000 ghost objects will be created.

  • If Durus is going to create a ghost object, but the state of such object is in the cache, it will reuse the cached object. So the same object loaded from different graph paths will be the same object in RAM, also.

  • You can overload "__init__" and "__del__" in your persistent classes, but you must remember that "__del__" will be called everytime the object is "unloaded" from RAM, and won't be called when the object is actually deleted from the Storage. In general you shouldn't use "__del__".

    Remember also that "__init__" will be called only when the object is created first time, not each time it is loaded in RAM.

### "gen_oid_record()" (20060424)

Durus storages backends usually define a "gen_oid_record()" method. That method iterates over all the objects in the Storage, in no particular order. Current backend implementations have the following caveats: (http://mail.mems-exchange.org/durusmail/durus-users/500/)

  • Don't do any writing to the storage while you are iterating, since you could skip or repeat records. You can write on other storages, nevertheless.

  • You can get deleted objects still not collected.

This method is usually used to convert a storage to other format, or to update classes of already stored objects. You can use it, also, for backup purposes.

The usual approach is to iterate over the source storage, loading objects, and storing them as-is in the destination storage. When the migration is done, you do a "giant" commit. This approach is doable when your database is small enough to be loaded in RAM+SWAP but if your machine is 32 bits, you are ultimate limited by the addressable space you have, typically in the 2^30 bytes order.

You can't do multiple smaller "commits" because some storages (for example, my BerkeleyDB storage backend implementation) would do a background garbage collection and delete copied but not yet referenced objects.

Releases of my BerkeleyDB storage backend from 20060509 includes a "migrate()" method to migrate HUGE datasets with no RAM requirements, in a clean and efficient way.

Remember also that "gen_oid_record()" in the ClientStorage (the standard DSS implementation) is very inefficient. Time to transfer all the objects will be O(MAX_OID) and not to O(N). That is, time will be proportional to the number of OIDs ever generated, not to the number of really existant objects.

### ComputedAttribute (20060424)

ComputedAttribute's are especial persistent classes without state, used to keep (in RAM) cached values of "costly" functions. That cached values are discarded if the instance is purged from memory (for instance, cache shrink) or if any other DSS client sent an "invalidation".

The access to the cached value is done via a "get()" method. If the cached value is still current, we will get it. If the cached value wasn't computed before, or was invalidated, a new value will be computed and cached.

The function used to compute the cached value, if necesary, is passed as a parameter to the "get()" method. That function MUST NOT take any parameter. This seems to be a huge issue, but you can use a lambda or closures to pass "hidden" parameters.

Some comments:

  • Even if a ComputedAttribute has no data, it has an OID.

  • Each time you do an invalidation, an (unnecesary) write will be done to the disk. The write is small, but it is synchronous. So, the DSS will be busy some time.

  • The function used to recalculate the cached value is not stored in the storage, so the application must be cautious to keep consistency.

### Non persistent attributes (20060424)

Durus has no support for non-persistent attributes. That is, all attributes are stored on disk, ever.

See: http://mail.mems-exchange.org/durusmail/durus-users/411/

I guess you can implement them in your own persistent classes touching "__getstate__".

Keep in mind comment from David Binger:

In my opinion, convenience attributes on Persistent instances like this invite trouble.

### Newly created persistent objects and conflicts/aborts (20060620)

When you create a new persistent instance, it is not associated to a particular connection, so transaction related actions (commits, aborts, conflicts) don't affect the new object.

When you link your new persistent object to an already persistent object you have, you are linking the new object to the connection vinculated to that old object. Now you have three cases:

  • Commit: Your new object is committed. Now it is like any other persistent object. The new object in vinculated to that connection.

  • Abort: Your new object is not touched. It will be free to be reasigned to another connection, if you wish. Remember, nevertheless, to break first the link from the old object.

  • Conflict: Like abort.

If the object is not touched, you can reuse it as-is in a new transaction try. You don't need to "recreate it", unless you had a conflict and the data in the new object was based in stale objects. Of course, in that case you must recalculate the data.

If the object is not vinculated to a connection, you can transfer it to another one or to another thread. That is, "free" new objects can be shared between threads. But only while the new object is not vinculated to a particular connection via a link from another persistent object.

As a side note, if you have a conflict while commiting a transaction with new objects, you will "lose" OIDs. Not an issue since you have 2^64 available...

Also, latency when commiting lots of new objects can be an issue, since each new object needs a Round Trip Time to get its OID. See the discussion in http://mail.mems-exchange.org/durusmail/durus-users/563/.

### BTree methods (20061003)

Durus provides several persistent classes to use in your programs. The most interesting and "different" is BTree.

BTree provides a (persistent) dictionary-like class. The main advantage is that a BTree is not fully loaded in RAM. Only the elements needed are fetched. So you can have an arbitrary huge BTree, without eating your RAM.

As said, BTree is used like a normal python dictionary, but there are some additional useful features:

  • BTree keys are always kept sorted. So when you iterate over the elements, you get them in order.

  • Some useful methods (some shared with python dictionaries, some BTree specific features):

    • "iteritems": Iterator over the (key,value) data in the BTree. Keys are given in order.

    • "items_from": Iterator over the (key,value) data in the BTree, starting in the specified key. Keys are given in order.

    • "items_backward": Iterator over the (key,value) data in the BTree. Keys will be given in reverse order.

    • "items_backward_from": Iterator over the (key,value) data in the BTree, starting in the specified key. Keys are given in reverse order.

    • "items_range": Iterator over the (key,value) data in the BTree, with keys in the specified range. Keys are given in order.

    • "iterkeys": Iterator over (key) data in the BTree. Keys are given in order.

    • "itervalues": Iterator over (value) date in the BTree. The values are given in key order.

    • "items": Give a list of (key,value) elements, in key order. Beware RAM usage, if your BTree is big.

    • "keys": Give a list of key elements, in key order. Beware RAM usage if your BTree is big.

    • "values": Give a list of value elements, in key order. Beware RAM usage if your BTree is big.

    • "__reversed__": Iterator over (key) data in the BTree, in inverse order.

    • "get_min_item": Gives the (key,value) pair with the minimal key in the BTree. Since BTree's are stored sorted, this method is very fast.

    • "get_max_item": Gives the (key,value) pair with the maximum key in the BTree. Since BTree's are stores sorted, this method is very fast.

    • "add": Add a new key->value to the BTree. Default "value" is "True".

    • "has_key", "setdefault", "clear", "get", etc: Like python dictionaries.



Python Zope ©2006 jcea@jcea.es

Más información sobre los OpenBadges

Donación BitCoin: 19niBN42ac2pqDQFx6GJZxry2JQSFvwAfS