Menu

NHibernate (8)

Implementing Equals() and GetHashCode() in ORM classes with autoincrement keys

The requirements for implementing Equals() and GetHashCode() in .Net are very hard to satisfy and in some areas nearly impossible. There are some situations where an object's identity inevitably changes, and saving an ORM-mapped object with autogenerated key is a case in point. Making its hash code conform with requirements would require an inordinately large quantity of code.

The plot goes something like this: loading ORM objects on the same session (in NHibernate speak) or context (Entity Framework) guarantees that for one record only one object instance will be created. But, if you use multiple sessions/context, no such guarantee exists. (And you want to do this if objects have different lifespans: for example, you fill a combo box once and then bind it to different records... Obviously, I'm talking about WinForms here, but the principle applies to server-side logic although it's probably not as frequent). In case of multiple instances, .Net components don't know it's the same record unless you override Equals() and get them compared by primary key values. In WinForms, for example, this means that a combo box won't know which record in the dropdown is equal to the one in the bound property, and won't select it.

Ok, so we override Equals(): usually, the record has an autoincrement key called, say, ID. We implement it so that it compares ID's of different objects (and type, obviously)... And now we run into the .Net requirement which says that two objects that are equal must have the same hash code, and that the hash code must never change for an object. We can override GetHashCode() to return the hash of the ID, but if the object gets saved to the database, the ID - and therefore the hash - will change.

Here's an example of how it would work: create a new ORM object instance, it's ID is NULL or zero. Use it as index in a dictionary, the dictionary retrieves the index's hash code and stores data in a special bucket for this hash. Save the record - the ID changes. If the hash code changes now, you won't be able to retrieve your data from the dictionary anymore. But if you load this record on a different session/context, it will have a different hash code unless we somehow notify it to use the already generated one... Which would probably mean using a static instance of a component that tracks all objects. Way too much work to get a hash code right, isn't it...

A couple of details that could lead us closer to a solution:

  • On a new object instance (unsaved - with ID null or zero or whatever), we cannot use the ID in our overrides. Two objects with the empty ID are not equal nor will they ever be: if they both get saved into the database, it will create two separate records, and their IDs will acquire new values that were never used before... An unsaved object is only equal to itself. We could generate same hash codes for unsaved objects, but this wouldn't resolve our problem if a saved object gets its hash from the ID - it would still be different.
  • While we're at it, it's useful knowing how the Dictionary works: it calls GetHashCode() on the given index and stores the entry in a bucket tagged with this hash code. Of course, there may be multiple objects with the same hash, and a bucket may contain multiple entries. Therefore, when retrieving data, the dictionary also calls Equals() on indexes in the bucket to see which of the entries is the right one. This means we have to get both Equals() and GetHashCode() right for the dictionary to work: Equals() should be OK in its simplest form if we always use the same instance for the index - basically, Equals() must be able to recognise the object as equal to itself.
  • Other components, like grids and combo boxes, also use hash codes to efficiently identify instances, so a dictionary isn't the only thing we're supporting with this.

One part of the solution seems mandatory: we need to remember the generated hash on the object. This is mandatory only on an object whose ID may change in the future: if a saved object gets a permanent ID (as they usually do), caching is not necessary. If an unsaved object gets saved, we still use the hash we generated using the empty ID. We do this on-demand: when GetHashCode() is called, the hash is generated and remembered. This is probably the only meaningful way to do this, but it's worth pointing out one detail: if the object isn't used in a dictionary, it's hash won't be generated and won't change when it's saved. Thus, we narrowed down our problem to where this feature is actually used.

But there's still the possibility to have two objects that are equal (same record loaded on different sessions) but have different hash codes (the first one saved into the database and then the same record loaded afterwards). I'm not sure what problems this would create, but one is obvious: we wouldn't be able to use both of them interchangeably in a dictionary (or, possibly, a grid). This is not entirely unnatural, at least to me: if I used an object as an index in a dictionary, I'm regarding the stored entry as related to the object instance and not the record that sits behind it. I'm unaware of other consequences - please comment if you know any.

Note that we can also avoid the dictionary problem by using the IDs themselves instead of objects... But still, it would remain for grids and elsewhere. Also I'm not sure if it could be resolved by not using the data layer (ORM) objects in the dictionaries and grids but having data copied into business layer object instances: if we did this, we'd still need a component that tracks duplicates, only it would track business objects instead of data objects.

Can we narrow this further down? A rather important point is that the ID gets changed only when saving data - and ORMs usually save the whole session as one transaction. If we discarded the saved session and started afresh, we'd get correct hash codes and unchanged IDs. We'd only have a brief period of possible irregularity in the time after the old data is saved and before new data is loaded, and only if we load data on different sessions and use it in a dictionary or some other similar component. In client-side applications, this is a risky period anyway because different components get the new data at different times, and care must be taken not to mix it. At least some kind of freeze should be imposed on the components - suspending layout, disabling databinding etc. Also, reloading data is natural if you have logic that runs in the database (usually triggers) that may perform additional changes to data after we saved ours (and it may do this on related records, not just the ones we saved)... But that is a different, as they say, can of worms: it's just that these worms often link up to better plot our ruin.

Workaround for HQL SELECT TOP 1 in a subquery

HQL doesn’t seem to support clauses like “SELECT TOP N…”, which can cause headaches when for example you need to get the data for the newest record from a table. One way to resolve this would be to do something like “SELECT * FROM X WHERE ID in (SELECT ID FROM X WHERE Date IN (SELECT MAX(Date) FROM X))”, a doubly nested query which looks complicated even in this simple example and gets out of control when query conditions need to be more complex.

What is the alternative? Use EXISTS – as in “a newer record doesn’t exist”. It still looks a bit ugly but at least it’s manageable. The above query would then look like this: “SELECT * FROM X AS X1 WHERE NOT EXISTS(SELECT * FROM X AS X2 WHERE X2.Date > X1.Date)”

Note that this works only for “SELECT TOP 1”. For a greater number there doesn’t seem to be a solution at all.

Repository moved temporarily to '/viewvc/nhibernate/trunk/'; please relocate

When trying to switch my local Subversion copy of the NHibernate source to a different tag (from 3.1GA to trunk, in this case), I got this error:

Repository moved temporarily to '/viewvc/nhibernate/trunk/'; please relocate

The frustrating thing was that I was trying to relocate to exactly this url. And if I tried others, it said that I should relocate to them… I searched the net in vain for the solution, the only information I got is that I should re-configure my apache server (thanks a bunch!)

The problem is, in fact, simple: the URL is wrong. I thought I could just copy the repository’s URL from my web browser, like I do with other sites. Not here: there’s a separate entry for direct SVN access. So instead of using this url:

http://nhibernate.svn.sourceforge.net/viewvc/nhibernate/

use this one:

https://nhibernate.svn.sourceforge.net/svnroot/nhibernate

It does seem like a simple problem but the solution wasn’t so easy to find.

How to make LINQ to NHibernate eager-load joined properties like the Criteria API

In terms of “lazyness” of a property, there are currently three different ways in which it can be mapped in NHibernate:

  • lazy=”false” means that it’s not lazy at all – the property’s content will be loaded along with its owner object. This means additional data is always loaded when you load an object, and may mean additional sql queries, too.
  • lazy=”proxy” means that the object contained in the property is loaded when any of its public members is accessed. This property will contain an instance of a proxy, which is an object that knows how to initialize itself on-access. It performs the initialization not by loading its properties but by loading an instance of a real object and redirecting its properties and methods to it. This is why everything on the class that is to be proxied needs to be virtual: proxy is an instance of a class derived from it, which is dynamically generated and has every public member overridden to support lazy-loading.
  • lazy=”no-proxy” means that the property is lazy-loaded, but without a proxy. From what I’ve seen, here the lazy property iself is manipulated on the owner object so that it facilitates on-access loading. In any case, there’s no proxy and no duplicate instances. This feature is currently (in v3.0.0) buggy and it seems to work the same as the first option (lazy=”false”), just as it did in 2.0 when it was unsupported.

Each of the options has its bad sides: with proxies, you get duplicate objects and must make everything virtual on your data classes. In the non-lazy option, for each eager property a join is usually added in the sql query so that the property’s values are loaded at the same time, and the same goes for the property’s properties etc. Even worse, HQL queries don’t respect this joining method, they load the main table in one SQL query and then execute an additional query or two (or dozen) for each record to collect its related data – this is called the “N+1 selects” problem. Needles to say, using HQL for such queries is madness: in these cases, it is best to switch to Criteria API which does the joins properly.

And the bad sides of the no-proxy option? It doesn’t work… Other than that, it seems the perfect solution: no joins, no duplicate objects. If you ask me, I don’t want my data to be loaded on-demand at all. I don’t want the application to decide when it will load its data: if I fill a datagrid with one hundred objects and then the grid triggers lazy-loading on each of the objects in turn, this will create chaos. No, I want the application to break if it accesses data that was not explicitly loaded. But with the current implementation I have no choice: it’s either eager or proxy, and I’m choosing eager, for better or worse.

How about LINQ queries? In 2.x it was implemented over the Criteria API which means it knew how to join-load additional records. Not so in 3.x: now it behaves like HQL, N+1 selects all over the place.

So, what is there to do? It seems the only option left is to write all queries with explicit fetch statements for every non-lazy property… This would definitely solve the execution performance issue, but development performance would suffer: if I add a new non-lazy property, I’d have to rewrite all queries where it appears.

Ok, but if we’re using LINQ, it’s a dynamical query, right? It can be modified to include all required fetches. After some research, it turns out that there’s a solution that (at least on the outside) looks even elegant: use an extension method to do this. So, you would do something like:

session.Query<Person>().Where(…).EagerFetchAllNonLazyProperties()

or:

(from p in session.Query<Person> where … select …).EagerFetchAllNonLazyProperties()

Here’s one way this method could be implemented. Note that this is a somewhat hacked implementation and that there are probably some unsupported cases – one thing that is suspicious to me is that Criteria API joined the eager properties recursively while only the first level is covered here, so be careful. But it’s a good start... Preliminary tests were very promising ;).

public static IQueryable<TOriginating> EagerFetchAll<TOriginating>
  (this IQueryable<TOriginating> query)
{
  // hack the session reference out of the provider - or is
  // there a better way to do this?
  ISession session = (ISession)typeof(NhQueryProvider)
    .GetField("_session", System.Reflection.BindingFlags.Instance
      | System.Reflection.BindingFlags.NonPublic)
    .GetValue(query.Provider);

  IClassMetadata metaData = session.SessionFactory
    .GetClassMetadata(typeof(TOriginating));

  for(int i = 0; i < metaData.PropertyNames.Length; i++)
  {
    global::NHibernate.Type.IType propType = metaData.PropertyTypes[i];

    // get eagerly mapped associations to other entities
    if (propType.IsAssociationType && propType.IsEntityType
      && !metaData.PropertyLaziness[i])
    {
      ParameterExpression par = Expression.Parameter(typeof(TOriginating), "p");

      Expression propExp = Expression.Property(par, metaData.PropertyNames[i]);

      Expression callExpr = Expression.Call(null,
        typeof(EagerFetchingExtensionMethods).GetMethod("Fetch")
          .MakeGenericMethod(typeof(TOriginating), propType.ReturnedClass),
        // first parameter is the query, second is property access expression
        query.Expression, Expression.Lambda(propExp, par)
      );

      LambdaExpression expr = Expression.Lambda(callExpr, par);

      Type fetchGenericType = typeof(NhFetchRequest<,>)
        .MakeGenericType(typeof(TOriginating), propType.ReturnedClass);
      query = (IQueryable<TOriginating>)Activator.CreateInstance
        (fetchGenericType, query.Provider, callExpr);
    }
  }

  return query;
}

Collection owner not associated with session? Not quite.

I hate when this happens. I upgraded to NHibernate 2.0 and then quickly afterward to 2.1.0 (you guessed it: because of LINQ). I had to change a couple of things to support it in my company’s application framework and it all seemed to work well – until I discovered that deleting any entity that has a one-to-many relation with cascade=”all-delete-orphan” stopped functioning. It died with a cryptic error message of “collection owner not associated with session”… If I changed to cascade=”all” it worked, but this is not the point, it wasn’t broken earlier. Of course, I tried looking all over the web and apart from a page in Spanish (which wouldn’t be helpful even if it was in English) came up blank. Tried moving to NHibernate 2.1.2 - which is not that simple since we’re using a slightly modified version of NHibernate (a reason more to suspect that the solution to this problem would be hard to find). So here’s a short post for anyone stumbling upon a similar problem.

In the end, I traced it to this behaviour: the collection owner is not found in the session because NHibernate tries to find it using ID = 0, while it’s original ID was 48. The logic is somewhat strange here, because the method receives the original collection owner (which is in the session), retrieves its ID (which was for some reason reset to 0) and then tries to find it using this wrong ID. Moreover, there’s a commented-out code that says “// TODO NH Different behavior” that would seem to do things properly (I checked it, it’s still standing in the NHibernate trunk as is). But the real reason why this happened is that blasted zero in the ID: further debugging (thankfully, there’s a full source for NHibernate available), revealed that it was reset because “use_identifier_rollback” was turned on in the configuration. Well… I probably set this to experiment with it and forgot. Turning it off solved the problem for me… Luckily, I didn’t really need this rollback functionality - as it’s not exactly what it seems to be: it doesn’t rollback identifiers when the transaction is rolled back, it rolls them back when entities are deleted! Why the second feature made more sense to implement than the first one is a mystery to me...

NHibernate 2.1 updates schema metadata without being asked to

In NHibernate 2.1, the session factory is set up to access the database immediately when you build it. This is done by a Hbm2ddl component to update something called SchemaMetaData: I’m not sure what this is all about, but I am certain that such behaviour is not nice. The previous version of NHibernate didn’t do it, so I expect the new one to behave likewise unless I explicitly order the change.

The solution for this is to add a line to your hibernate.cfg.xml file that says:

<property name="hbm2ddl.keywords">none</property>

Note that completely omitting this setting will actually enable the feature… Did I already mention I don’t like it? I don’t, so much that I decided not to change config files but to hardcode it disabled. I use one global method to load the NHibernate configuration, so this is easy. The code looks something like this:

_configuration = new global::NHibernate.Cfg.Configuration();
_configuration.Configure();
_configuration.SetProperty("hbm2ddl.keywords", "none");

NHibernate queued adds on lazy-load collections

I don’t know if this behaviour is documented (well, yeah, the (N)Hibernate documentation is pretty thin but it’s improving), I wasn’t fully aware of it and this caused a bug… I’m posting this in hope it may save for someone else the time I have lost today :).

In our software we use a custom NHibernate collection type that is derived from AbstractPersistentCollection and keeps track of “back references”: that is, references to the record that owns the collection. It does this automatically when an object is added to the collection.

Now, on collections mapped as lazy-loading, Add() operations are allowed even if the collection is not initialized (i.e. the collection just acts as a proxy). When adding an object to a collection in this state, a QueueAdd() method is called that stores the added object in a secondary collection. Once a lazy initialization is performed, this secondary collection is merged into the main one (I believe it’s the DelayedAddAll() method that does this). This can be hard to debug because lazy load is transparently triggered if you just touch the collection with the debugger (providing the session is connected at that moment), and everything gets initialized properly.

Our backreference was initialized at the moment the object was really added into the main collection. But this is not enough, we had to support queued adds - that is, the cases when QueueAdd returns true. The other alternative is to disable delayed adds by commenting out the places where QueueAdd is called – I don’t know if this is possible, there seems to be some code that supports it. We decided to support delayed add, and it seems to work. The modification looks something like this (this is the PersistentXyz class):

int IList.Add(object value) 
{ 
    if (!QueueAdd(value)) 
    { 
        Write(); 
        return ((IList) bag).Add(value); 
    } 
    else 
    { 
        // if the add was queued, we must set the back reference explicitly 
        if (BackReferenceController != null) 
        { 
            BackReferenceController.SetBackReference(value); 
        } 
        return -1; 
    } 
}

Supporting triggers that occasionally generate values with NHibernate

NHibernate supports fields that are generated by the database, but in a limited way. You can mark a field as generated on insert, on update, or both. In this case, NHibernate doesn’t write the field’s value to the database, but creates a select statement that retrieves its value after update or insert.

Ok, but what if you have a trigger that updates this field in some cases and sometimes doesn’t? For example, you may have a document number that is generated for some types of documents, and set by the user for other types. You cannot do this with NHibernate in a regular way – but there is a workaround…

It is possible to map multiple properties to the same database column. So, if you make a non-generated property that is writable, and a generated read-only property, this works. You have to be careful, though, because the non-generated property’s value won’t be refreshed after database writes.

A more secure solution would be to make one of the properties non-public and implement the other one to support both functionalities. Like this:

//
// This field is used only to send a value to the database trigger:
// the value set here will be written to the database table and can be consumed by
// the trigger. But it will not be refreshed if the value was changed by the trigger.
// 
private int? setDocumentNumber;	

private int? _documentnumber;

//
// The public property that works as expected, generated but not read-only
// public int? DocumentNumber { get { return _documentnumber; } set { // NHibernate is indifferent to this property's value (it will not // be written to the database), so we have to update the setDocumentNumber // field which is regularly mapped _documentnumber = value; setDocumentNumber = value; } }

Here’s the NHibernate mapping for these two:

<property name="DocumentNumber" generated="always" insert="false" update="false"/>
<property name="setDocumentNumber" column="DocumentNumber" access="field"/>
Subscribe to this RSS feed