Concurrency and State Tracking

Developer
Jul 11, 2007 at 1:54 AM
Chris,

I look forward to seeing what comes out of this project, but I somewhat have my doubts as to the usefulness without work being done with concurrency and state tracking.

One of the big problems is there is no concurrency help. If one does an update right now, the concurrency method is OverwriteChanges, which is useful only in projects where concurrency is not an issue. One could use a Timestamp, but that is then a database dependent solution and does not fall into your objectives of database independence. It also doesn't work well with legacy databases with no Timestamp Fields. Given that there is no tracking of changes to fields, one can't use other concurrency methods that compare only changed fields or all fields. My thought is that one will need to implement a base class or interface to help with some of this tracking to offer concurrency.

An issue that goes along with this is intelligent updates. When I change only one field, only that field should be updated and not all fields. This is particular important since you don't have concurrency. To pull this off, you would need to track field changes as mentioned above. The good thing is that these two enhancements go together and will kill two birds with one stone.

For more thoughts on this, I recently added this to my own ActiveRecord Framework which is why it is so fresh on my mind:

Hayden.ActiveRecord Updated: Concurrency and 1:1, 1:m, and m:m Relationships - O/R Mapper

Regards,

Dave

______________________________

David Hayden
Microsoft MVP C#
Jul 11, 2007 at 4:41 AM
Dave,

I wonder, how do you see that solution handling an asp.net scenario (where the instance is reinitialized with original values - perhaps even automatically) or one that needs is required to use store procedures instead of queries directly?

I guess those would be the primary issues faced getting that on the factory (solving that might enable them or the community, to get that sooner than later on the factory). On the other hand, I guess supporting an scenario with a high amount of columns per table doesn't plays well with the guidance sense of the factory. Another scenario with a similar need would be updating subsets of entities from a list, but that is probably meant to be handled at a higher layer anyway.

Freddy Rios
Developer
Jul 11, 2007 at 2:38 PM
Hey Freddy,

I don't see the needs any different in an ASP.NET solution with a stateless environment. The entity is read from the database, changes are made to it, and you need to update the record in the database. In a multi-user and scalable environment, you want to make sure you are not 1) overwriting someone's changes and 2) being chatty by sending non-changed data to the database server to be updated un-necessarily. Ideally anyway :)

For SQL Server, a Timestamp (Rowversion) is a natural choice when using stored procedures or if you cannot or will not do any runtime dynamic query generation. I believe the Data Access Guidance Package does look for Timestamps and uses them accordingly to create update commands and stored procedures. If not, it should.

In enterprise environments, however, I often come across legacy databases that are either not SQL Server, have no Timestamp, and for which a Timestamp cannot be added, but of course the need is still there to assure I am not overwriting changes. Pessimistic locking is not the answer although it would sure make things easier. I end up having to generate the query at runtime based on the state of the entity when possible, which of course means I am not using stored procedures. A good O/R Mapper will do this for you easily.

I guess my point is that the Data Access Guidance Package has very limited use and needs a bit of strengthening in the areas of concurrency and intelligent updates to be of greater value in more environments. That being said, you can't really pull these off without introducing some type of change tracking and dynamic query generation, which is a bit more complex than generating simple command factory classes, repository classes, and stored procedures.

Regards,

Dave

_____________________________

David Hayden
Microsoft MVP C#
Developer
Jul 11, 2007 at 3:32 PM
To toss a realistic but simple example out there, let's say we have a blog engine where a Posts table has a many-to-many relationship with a Categories table and we mapped it in the real work like:

public class Post {
 
   CategoryCollection Categories { get; set; }
 
}

and you have a page that of course allows people to update their post and its assigned categories. In your PostRepository Class what do you do during update?

public class PostRepository : Repository<Post> {
 
    public void Update(Post post) {
 
        /// ???
    }
}

Since we have no change / state tracking information available in the classes and i'll be dang if I am going to re-read the original Post from the database and do a comparison in code, I end up doing either of the following:

1. Delete the original post, post categories, etc. and re-add everything as if it was a new Post. Interesting, but unrealistic as we will lose comments, etc.
2. Delete all records in intermediate table PostCategories, re-insert all new PostCategory records, update Post record. Better, but a lot of chatter...

Now those are challenges for a 1 person blog engine. What happens if this is an outside sales application with 5000 sales agents updating customer information? The problem becomes unbearable if we can't make wise choices about what has changed and how to update the information appropriately.

Now there are things you can do to make this situation a little more bearable at the UI level, etc., but that doesn't really solve the problem - only masks it.

This is where we run into problems and more cases than not all applications have this.

Regards,

Dave

________________________________

David Hayden
Microsoft MVP C#
Developer
Jul 11, 2007 at 7:04 PM
I see your problem that arises from your example, David.

I have only one question, is your design right? Are you doing the right stuff. Shouldnt the logic reside in the PostHandlingClass? Why do businesslogic in the Repository?

What are your Use Cases in your example?

regards
Benny

PS:
I dont know the answer, i just ask some other questions.
Developer
Jul 11, 2007 at 8:20 PM
Interesting that you would consider this a business logic problem. I, personally, see it as a persistence issue since the use case here is a Post that needs to be updated in a datastore. At this point any business rules and logic has been done and now the post just needs to be updated in the data store, which is the job of the repository class.

Still, the problem is not where you put the unit of work that has to update the Post Entity and its collection of categories. The problem is how do you handle updates and collisions wisely. That is the guidance missing from this guidance package, but it is also what differentiates a real world problem from a sample.

Regards,

Dave

_______________________________

David Hayden
Microsoft MVP C#
Jul 11, 2007 at 9:11 PM
David,

Those are excellent suggestions. However, this is a much bigger problem than I have budget for right now.

One of the goals here is, once our dev is done, to turn it over to the community.

So, I ask you and everyone else here - what would you like to see? And are you willing/able to build and contribute it?

-Chris
Developer
Jul 11, 2007 at 9:12 PM

DavidHayden wrote:
Interesting that you would consider this a business logic problem. I, personally, see it as a persistence issue since the use case here is a Post that needs to be updated in a datastore. At this point any business rules and logic has been done and now the post just needs to be updated in the data store, which is the job of the repository class.


Its here we fundamentaly disagree with each other. Why should your solution hold the information in Post Entity which purpose is only to hold some presentation data. It looks like your mind is focused on the table layout from the database. This gives your solution a poor object design. Your scenario or others could easy be solved with a Command Pattern. There each change in your post is a Command, Lets say AddCategoryCommand and DeleteCategoryCommand. I recommend Robert C. Martin and his book Agile Principles, Patterns and Practices in C# and focus on The Payroll Case Study.

Your look on the problem is the reason why i'm not sure i agree in the purpose of the Repository project. Because when used wrong it will justify a poor design of the classes used to solve the problems. What is the use of Business Entities when the only hold the raw data for the structure in the database tables.

I dont say your solution is the wrong one. But ask your selv if you need a repository system, when your solution is designed from the base of your database structure. Why decouple the persistence system from your solution, when your design is not decoupled.

regards

Benny
Developer
Jul 11, 2007 at 10:32 PM
You lost me, Benny. I don't understand how you think my domain model is coupled to the database model.

The fact that a post has a collection of categories models the problem domain, not the database.

When the domain model is done doing its interesting domain-related activities, the post gets tossed into the repository. Hence somewhere something like this happens

 IRepository<Post> repository = ObjectFactory.Create<IRespository<Post>>();
 repository.Update(post);

The repository then deals with the persistence.

And what you are saying is there is something like an IUpdateCommand<Post> somewhere within the repository that then handles the updating. Okay-

public class PostRepository : Repository<Post>, IRepository<Post> {
 
   Update(Post post) {
 
       IUpdateCommand<Post> updateCommand = new MyKickAssUpdateCommand( ... ); // Could be factory method
       this.Update(updateCommand, post);
   }
}

I get all that. This isn't a case of me not understanding design patterns to solve a problem. Btw, I read that book :)

The problem isn't above.

The problem is inside MyKickAssUpdateCommand. Using the Data Access Guidance Package ( which I admit I got a little away from here to prove a point ), I don't have any good way to know what is different about the post coming in from what is stored in the database. I don't know what has changed. I can't make an intelligent decision as to what to update. Because I don't know what has changed, I also can't handle concurrency scenarios other than OverwriteChanges and Timestamp ( if I am lucky enough to have one ).

There are ways to solve the problem, but you can't do it without either a base class and/or dynamic query generation. This is my only point. This is something that would be really interesting to add as an option for those who want a bit more intelligence in their updates. It is a fun problem to solve, too, as I just did it in my O/R Mapper.

Hopefully this clears things up. Great conversation, Benny! I love putting a little code in the posts :)

Regards,

Dave

_____________________________

David Hayden
Microsoft MVP C#
Jul 11, 2007 at 10:35 PM
Benny, we had a lot of similar arguments when we were building the data access GP originally. Many of us (myself included) wanted to start from the entities, but our PM (who will remain nameless here, but you all know him) absolutely insisted that starting from the database schema was the way to go. So that's what we built.

Having said that, you don't actually need to generate the entities from the database schema; you can create a repository for any class as long as you've got public properties and the corresponding supporting stored procs in the database.

As far as David's contention that we need change tracking, I'd be more inclined to implement the Unit of Work pattern instead. That way you get effectively change tracking without the overhead inside the entity classes themselves.

And yes, we do need to do something about bringing back timestamps or versions or something for concurrency. I'll go add that to the work item list.

-Chris
Developer
Jul 11, 2007 at 10:38 PM
Chris,

I understand. I knew it was probably a bit out of scope but I wanted to bring it up anyway just in case.

Overall, I really like the guidance package because it is very productive and a nice way to learn how to create a data access layer. I have used it in a few projects with good success.

Regards,

Dave

______________________________

David Hayden
Microsoft MVP C#
Developer
Jul 11, 2007 at 10:52 PM
Chris,

I actually chose the Unit Of Work Pattern for my O/R Mapper and it is the one I prefer, too. However, you still need to track changes for certain forms of concurrency and intelligent updates. The Unit of Work Pattern just helps you better manage a transaction.

NHibernate avoids all the change tracking by caching and comparing the entities returned from the database with what you are going to update, but that can be a decent amount of work to pull off.

Regards,

Dave

______________________________

David Hayden
Microsoft MVP C#
Developer
Jul 11, 2007 at 11:09 PM
The Unit of Work pattern pulled it all together for me. Change tracking is difficult to get without big cost, and maybe my design concersn doesnt belong here.

I know you read that book, David, if u didnt, i wouldnt either, thanks for that, it was a very good book.

We could argue about this much more i believe, but my poor english doesnt stand up for it.

And I agree with you, the Guidance Package did the trick for me to, the first time I laid my eyes on it. Its a good initiative.

Regards
Benny

Developer
Jul 11, 2007 at 11:16 PM

The repository then deals with the persistence.

And what you are saying is there is something like an IUpdateCommand<Post> somewhere within the repository that then handles the updating. Okay-

public class PostRepository : Repository<Post>, IRepository<Post> {
 
   Update(Post post) {
 
       IUpdateCommand<Post> updateCommand = new MyKickAssUpdateCommand( ... ); // Could be factory method
       this.Update(updateCommand, post);
   }
}

I get all that. This isn't a case of me not understanding design patterns to solve a problem. Btw, I read that book :)


And no, this isnt my drift.

The commands them selv calls the Repository and the respective update functions. But as Martin Fowler says, this costs, transactionhandling has to deal with lot of small updates. And thats one reason why we want to use the Unit of Work Pattern.


Regards
Benny
Developer
Jul 11, 2007 at 11:43 PM
Edited Jul 11, 2007 at 11:55 PM

ctavares wrote:

So, I ask you and everyone else here - what would you like to see? And are you willing/able to build and contribute it?

-Chris


In our company our databasedesign uses the following schema. Table names: DatabaseNameTableName , Column names: DatabaseNameTableName_ColumnName.

One table in our database CRM may look like this:
CRM_PERSON (
 CRM_PERSON_ID INT
, CRM_PERSON_NAME NVARCHAR(100)
, CRM_PERSON_ADDRESS NVARCHAR(100)
, CRM_POSTAL_ID INT <- Foreign key.
)

Maybe we are the only people in the world who uses this schema, but if not, i would like to have one way to get the Repository Guidance to remove unwanted information from my entities.

So the result may look like this:
class partial Person 
{
                   Person(id, name, address, postalId)
                   {
                                      this.idField = id;
                                      this.nameField = name;
                                      this.addressField = address;
                                      this.postalIdField = postalIdField;
                   }
}

Another wish was to extract information so that Foreign key autogenerates Partial classes and their constructors to handle ref to other entities.

If i get a Person entity from my schema extraction, i will get a file named PersonPostal.cs that encapulates the Postal object in the contructor (removing the postalid ref from the main file and replacing it with an Postal entity ref.
Person(id, name, address, postalid) <== main file Person.cs
 
Person(id, name, address, postal) <=== PersonPostal.cs
{
this.Postal = postal;
}
 
private Postal postalField;
 
        public Postal Postal
        {
            get { return postalField; }
            set { postalField = value; 
postalidField = postalField.Id;
}
        }
 

Hope this made sence.

Regards
Benny
Jul 13, 2007 at 3:53 AM
David,

I highly agree with Benny about some of the functionallity not belonging to the repository system. Perhaps the real issue is drawing a clear line on what should be the responsibilities of the repository system, thus making it clear what should be handled elsewhere. This doesn't mean the factory shouldn't include those responsilities, instead it means those should be included in the correct places with the correct guidance. Clearly, the functionallity (and thus responsibilities) you include in your O/R mapper is more than a repository system. With this in mind, you shouldn't be looking at the current implementation of the repository system to support all of the responsibilities on your O/R mapper, but rather at how it should enable that implementation.

Regarding Unit Of Work Pattern, take a look at this nice example used in a really different context:http://evanhoff.com/archive/2007/06/28/22.aspx. Using the Unit Of Work pattern, the repository system can have an implementation that enables its usage on a context where the logic layer uses the Unit Of Work pattern to send the repository system the operations on the entities that should be executed. In your example, this means the repository system would reciveve a list of operations on the posts to be modified (only the ones that were modified), and it would also receive modifiaction operations on the updated categories only. If more granular control over what is to be modified is desired i.e per field, it could receive the information on the modification of each entity as one operation with the subset of info to be changed only (dynamic format).

The point is to leave the repository as dumb as posible i.e no logic in reposority system, instead it is telled what to do. Benny is right on the factory misguiding devs on modeling the entities after the tables, specially and most importantly it misguides devs into thinking entities that directly relates to tables is what you are supposed to use in all data access scenarios. Perhaps it is because it doesn't address advanced scenarios like the ones you are pointing out. However, Regarding categories update, I do think the only missing part might be a word of advice i.e if sending a list to be updated only send modified entities in order to save resources (this clearly states that responsibility doesn't belongs to the repository).
Jul 13, 2007 at 5:18 AM

ctavares wrote:
Benny, we had a lot of similar arguments when we were building the data access GP originally. Many of us (myself included) wanted to start from the entities, but our PM (who will remain nameless here, but you all know him) absolutely insisted that starting from the database schema was the way to go. So that's what we built.

Having said that, you don't actually need to generate the entities from the database schema; you can create a repository for any class as long as you've got public properties and the corresponding supporting stored procs in the database.

As far as David's contention that we need change tracking, I'd be more inclined to implement the Unit of Work pattern instead. That way you get effectively change tracking without the overhead inside the entity classes themselves.

And yes, we do need to do something about bringing back timestamps or versions or something for concurrency. I'll go add that to the work item list.

-Chris


Crhis,

You didn't mentioned your PM's arguments regarding why to build it with the entities generation from the database. It is a hard decision, since not doing so also misses the point that the domain model should usually have many similarities with the database model (since both derive from the logical modeling efforts of the problem domain). Following my previous post, one could argue the real problem is wider than the guidance telling you to generate entities from the database model. I think it has more to do with the guidance failing to tell developers what is supposed to be handled at each layer and what scenarios are currently supported, and also the limitation of scope's effect on not having guidance on more advanced features (even if only part of those are supported, like in my previous post). Achieving that is also quite hard, specially since it was created as part of the service factory. In other words, I think the limitations are reasonable on the context this version of the factory was created.

In the version that comes with the service factories timestamps are supported. Did this break in the current factory?


ctavares wrote:
So, I ask you and everyone else here - what would you like to see? And are you willing/able to build and contribute it?

-Chris


I would like to see the following next steps:
  • Include guidelines about the responsibilities of each of the projects included in the factory. Also, if some responsilities not in the factory are already known to be excluded from some projects of the factory, include information about them in the guidance.
  • Enable data operations classes to be created and used on the generated repositories classes. Scope: data operations are configured and created based on existing entities. It could support David's requirement on subset of columns update, by also generating operations that has a more dynamic approach (perhaps similar to filters currently supported for selects).
  • Add support for executing a set of operations that are handled by different repositories.
  • Create a new application block or library, that implements the Unit Of Work pattern. The implementation would help in the automatic construction of the unit of works in different contexts. It would be provider based, in order to accomodate different needs in determining the units of work to be created, and would include one or two provider implementations best suited for the data access operations in a couple documented scenarios (guidance would indicate scenarios supported).
  • Use the created block or library in order to provide guidance on intelligent updates. The guidance would be organized in such a way that clearly includes the data operations creation at a layer higher than data access.

I see each of those steps as adding a great amount of value when released. Thus I would expect them not to be released all at once, but rather on small versions.

I am looking foward to contribute on it.

Keep up the great work!

PS I see the Unit Of Work block or library as a great combination with page flow block of the web client service factory.
Developer
Jul 13, 2007 at 4:46 PM
I actually think we all agree, but it is just difficult to explain and describe in the forums.

One of my big problems in discussing the topic is that I think of the term Repository as what is found in DDD, which is not how it is being used in the Data Access Guidance Package. I think of a Repository as only associated with aggregates whereas with the guidance package each entity has a repository. I think of a Repository as a domain model class who's purpose is to work with the unit of work and do all I mention above. A repository in the guidance package does nothing but shove data to and from the datastore.

I think this is where the communication is breaking down a bit and perhaps it is my tendency to think of Repository in a different manner.

My main goal, and I think it has succeeded, was to just initiate thoughts on what is missing from the current guidance package. It sounds like we all agree that there are some important pieces missing and I am excited about the prospects of them happening.

Regards,

Dave

___________________________

David Hayden
Microsoft MVP C#
Aug 7, 2007 at 11:50 PM
Edited Aug 8, 2007 at 12:05 AM
Hi,

The discussion seems unnecessarily lenghty - stored procedures offer a level of isolation, and in the current version of the Service Factory they are used. Both options - 1) old and new values passed as parameters on update/delete or 2) timestamp (rowversion) field should be taken into account so that the implementor has the freedom to choose an approach.

Any optimizations like updating only a field or two out of all the fields of a record are subject to SQL discussion. Just write the stored procs and use them to your heart's contents - this is just another layer which is not part of the Repository.

Again, domain classess and all this stuff is yet another layer - the business logic layer - it consumes the "building blocks" (possibly assembling much more elaborate domain classes and relations) produced by the Repository and supplies back updated data.

Let's beware of mixing up all these stuff so that the problem remains manageable. To me things look as if someone sees the Repository as a "do it all" piece of SW which is wrong.

Let's keep things simple, as simple as possible, but no simpler... ;) The idea looks good and deserves good implementation.