Isn't it funny how we as developers sometimes take things for granted? We know certain things instinctively, others because we've read about and yet others from personal experience. But it's actually far and between that we (ok maybe that's just me <g>) actually test our assumptions and check up on them, especially if they are dealing with fundamental behaviors and functionality in the .NET framework.

 

I was working on some framework code today to simplify some DataRow to Entity mapping that I’ve been using in my applications. In my framework I use DataSets to pull data down and then work with Entities that point into the

DataSet via the DataRow that acts as a representation for the entity.

 

It actually works well in that it provides the best of both worlds: The DataSets ability to hold data and provide easy updateability and the strong typing of an Entity without the massive overhead of typed DataSets.

 

One of the things I did is formalize a method interface I had been using in my business layer to retrieve an Entity, which was an explicit step. Essentially, the business object would load up a DataSet (often with a single record) and I could then call a method to retrieve this ‘record’ as an entity:

 

busCustomer Customer = new busCustomer();

 

if (!Customer.Load(192))  // by Pk

return;

 

CustomerEntity Cust = Customer.GetEntity();

string Company = Cust.Company;

 

This works well enough but the syntax for this is ugly and it’s also a disconnected approach where everytime the data changes you have to retrieve a new entity. In reality in many applications this is not a problem because you often deal with individual records/entities at a time and in isolation this code is fine.

 

Anyway, I wanted something a little more formal and attached to the business object. With some changes that require a little bit of specialization at the bus object layer you can now do:

 

string Company = Customer.Entity.Company;

 

In this code the Entity automatically syncs to the current DataRow member (if one is available – otherwise it uses a private data member so the Entity works both ways).

 

In order for this to work I added a generic method at the business object level and each specific business  object has to implement the Entity property with its respective type:

 

public tt_entriesRow Entity

{

      get

      {

            if (this.m_Entity == null || this.IsEntityDirty)

                  return (tt_entriesRow) this.GetEntity(false,typeof(tt_entriesRow));

 

            return (tt_entriesRow) this.m_Entity;

      }

      set { this.m_Entity = value; }

}

 

This sort of thing will be a thing of the past once we can use generics – with generics this method could be pushed right back into the business layer as you don't have to hard code the type.

 

Now, as I was contemplating this code and the messiness of having to define this for each business object I create to get the custom type (tt_entriesRow – which is auto-generated from the database), I started wondering what is the real overhead of using a property for this.  

 

In particular the I had the following questions in my mind:

 

  • What’s the overhead of using a light-weight property as opposed to a plain variable
  • What’s the overhead of casting an object value to a specific object type
  • How much performance improvement is there in indexing DataRow columns by Column vs. by Field Name

So I set up a few simple tests to check out these scenarios.

 

The first test was to check direct variable access vs. using a property that does a minmimal bit of code to check for object existence. Basically it’s the code I showed above for the Entity object.

 

To test I ran two sets of loops timing each and then writing out the results to WinForm  (since I’m working on a Winform project at the moment).

 

 

busEntry Entry = new busEntry();

 

// *** Returns an empty object – not null but empty

tt_entriesRow TEntry = Entry.Entity;

 

// *** Load PK 129

Entry.Load(129);

 

 

string lcMessage = "";

DateTime Start = DateTime.Now;

 

// *** Test with Sub-Entity Property

for (int x=0; x < 10000000; x++)

{

      string Value = Entry.Entity.Title;

}

 

DateTime Stop = DateTime.Now;

 

string Message = Stop.Subtract(Start).TotalMilliseconds.ToString() + "\r\n";

 

 

Start = DateTime.Now;

 

// *** Test with copied Entity object

TEntry = Entry.Entity;

TEntry.UseColumns = false;

TEntry.SetDataRow( Entry.DataRow );

 

for (int x=0; x < 10000000; x++)

{

      string Value = TEntry.Title;

}

 

Stop = DateTime.Now;

 

Message += Stop.Subtract(Start).TotalMilliseconds.ToString() + "\r\n";

 

MessageBox.Show(Message);

 

The results where not too surprising – using the property mechanism was roughly 10% slower. However, for the 1 million iterations the times were around:

 

3900 millisecons and 3550 milliseconds

 

The run numbers were surprisingly close each time. As you can see the property access is roughly 10% slower.

However, we also have to keep this in perspective – this loop runs 10 million times and it’s 4 seconds total processing for each of these loops, so overall perf is very fast regardless.

 

Still if you are in a tight loop and reading or writing data to properties or even properties along the call chain via implicit ‘dot’ syntax, each one of those .’s is likely a property and invoking a method call.

 

Taking a complex object path like:

 

string Title = this.Entry.Customer.Entity.Title;

string Name = this.Entry.Customer.Entity.Name;

string Descript = this.Entry.Customer.Entity.Descript;

 

would be more efficiently expressed as:

 

CustEntity = this.Entry.Customer.Entity;

string Title = CustEntity.Title

string Descript = CustEntity.Descript;

string Name = CustEntity.Name;

 

avoiding three Property method calls per access.

 

This makes sense when you make more than a few calls to teh same object in the chain and especially if you are running inside of a loop.  IOW, get rid of the ‘dot’ syntax whenever best performance is required.

Casting Overhead

Next I wanted see what’s the cost of Casting an generic object to a specifc type of object. To do this I set up to loops again:

 

object TEntry1 = Entry.Entity;

 

for (int x=0; x < 10000000; x++)

{

      string Value = ((tt_entriesRow) TEntry1).Title;

}

 

and

 

tt_entriesRow     TEntry = Entry.Entity;

 

for (int x=0; x < 10000000; x++)

{

      string Value = TEntry.Title;

}

 

This was a little surprising to me – there was just about no difference between these two. In fact the difference jumped back and forth between about 20 milliseconds swapping back and forth between the two results.

 

Moral here: Casting objects to a specific type of object is very inexpensive/free. Just be sure there's  no boxing involved – converting value types to reference types or back – up or downcasting objects is not something to worry about.

DataRow indexer access: Field Names vs. Column Names

Finally I wanted to see what the affect of using Column names for an indexer. The Entity object talks to a DataRow and the implementation of the object actually keeps track of the columns optionally. The entity implementation is auto-generated from the database and looks like this:

 

[Serializable()]

public class tt_entriesRow : wwDataRowContainer

{                

      public tt_entriesRow() : base() {}

      public tt_entriesRow(DataRow Row) : base(Row) {}

 

      public Int32 Pk

      {

            get

            {

                  if (this.DataRow == null)

                        return this._Pk;

 

                  if (this.UseColumns)

                        return (Int32) this.DataRow[this.PkColumn];

                  else

                        return (Int32) this.DataRow["Pk"];

            }

            set

            {

                  if (DataRow != null)

                        this.DataRow["Pk"] = value;

                 

                  this._Pk = value;                        

            }

      }

      private Int32 _Pk;

 

      public String Title

      {

            get

            {

                  if (this.DataRow == null)

                        return this._Title;

 

                  if (this.UseColumns)

                        return (String) this.DataRow[this.TitleColumn];

                  else

                        return (String) this.DataRow["Title"];

            }

            set

            {

                  if (DataRow != null)

                        this.DataRow["Title"] = value;

                 

                  this._Title = value;                           

            }

      }

      private String _Title;

 

 

      // *** Column Definitions

      DataColumn PkColumn;

      DataColumn CustomerpkColumn;

      DataColumn ProjectpkColumn;

      DataColumn InvoicepkColumn;

      DataColumn UserpkColumn;

      DataColumn TitleColumn;

 

 

      protected override void CreateColumns()

      {

            PkColumn = this.DataRow.Table.Columns["Pk"];

            CustomerpkColumn = this.DataRow.Table.Columns["Customerpk"];

            ProjectpkColumn = this.DataRow.Table.Columns["Projectpk"];

            InvoicepkColumn = this.DataRow.Table.Columns["Invoicepk"];

            UserpkColumn = this.DataRow.Table.Columns["Userpk"];

            TitleColumn = this.DataRow.Table.Columns["Title"];

            this.ColumnsCreated = true;

      }

 

}

 

if the UseColumns property is set the Entity retrieves information via column indexers rather than by field name indexer. Here's the simple test for that:

 

// *** Plain access with Fieldname indexer

for (int x=0; x < 10000000; x++)

{

      string Value = TEntry.Title;

}

 

 

// *** using Column Indexing for retrieval

TEntry = Entry.Entity;

TEntry.UseColumns = true;

 

// load the row to force columns to load

TEntry.SetDataRow( Entry.DataRow );  

 

for (int x=0; x < 10000000; x++)

{

      string Value = TEntry.Title;

}

 

This result was also surprising. Around 3900ms vs. – drum roll - 650ms.

 

I knew that column lookups were faster than named lookups, but the times here were nearly six times faster than using field name indexer against the DataRow.

 

Behind the scenes the only logical difference is:

 

if (this.UseColumns)

return (Int32) this.DataRow[this.PkColumn];

else

return (Int32) this.DataRow["Pk"];

 

with the first code six time faster.

 

Moral of that story is if you reading data from a DataSet in a loop and you’re assigning it to things like a ListView for example, it pays to set up Column objects up front and retrieve the values out of DataRows with the columns as indexers.

 

Another way to get better performance altogether is to use a DataReader to retrieve data and bypass the dataset storage altogether, which can provide another big boost in performance.