Data access code follows simple implementations patterns. One of very common patterns consist of fetching data from a database, modifying those data and persisting those data back to the database in the same transaction. Using this pattern in any concurrent environment where multiple threads (or processes) can work with the same record concurrently results in new challenges. How should our application handle scenario when another thread updated the record after the first thread fetched the record from database but before it was able to persist its changes back to the database? This area of database access implementation is in general referred as concurrency. We differ between two common solutions: Optimistic concurrency handling and Pessimistic concurrency handling.
Optimistic concurrency uses a special column in the table to track if the record was changed. The main requirement is that the column must change its value every time when the record is updated. Each relational database offers a special data type for this purpose which is automatically updated every time its record is updated. Microsoft SQL Server offers deprecated TIMESTAMP and newer recommended ROWVERSION data types for this purpose.
If we use Optimistic concurrency to solve the initial scenario with two threads the first thread will load the record together with an actual value of the time stamp and when it tries persisting the change it uses time stamp's value in WHERE condition of UPDATE command together with the record identification (for example key). If another thread updates the record in the mean time the update from the first thread doesn't find any record matching its WHERE condition. This will result in 0 updated records (in Microsoft SQL Server this information is received by executing SELECT @@ROWCOUNT query after the update command) and the thread should somehow react to this situation. The reaction follows common pattern of reloading the record to get current values, applying changes to reloaded data and persisting them again. The real implementation of this pattern may involve for example user interaction where a user must resolve the conflict. Entity framework has direct support for Optimistic concurrency in both ObjectContext API (also shows required mapping in EDMX) and in DbContext API. Code first supports mapping in both data annotations and fluent API.
Pessimistic concurrency doesn't use any special column but it instead uses transactional and locking mechanisms offered by relational database. The whole idea of pessimistic concurrency can be compared to critical section. The fetching of the record, its update and followed persistence must be considered as atomic operation which can be executed by only single thread (per record or record set). When the first thread enters this critical section no other thread can modify the record or enter critical section for the same record. Using this mechanism requires understanding of transaction isolation levels and locking mechanism in used database. The bad news is that Entity framework doesn't have any built-in support for Pessimistic concurrency.
This article was inspired by question asked by Ari on Stack Overflow. The rest of the article will show why base Entity framework update patterns do not provide pessimistic concurrency and why approach described in the mentioned question from Stack Overflow doesn't work. The article will also provide a real solution for pessimistic concurrency which requires usage of native SQL.
Entity framework team today announced a new level of cooperation with community by releasing Entity Framework on CodePlex under an Open Source license (Apache 2.0). This license provides opportunity to get the code, modify the code and use it in our own hobby, open source or commercial projects (for further license details check the license page). We can also contribute back to the repository - Arthur Vickers wrote very nice series for contributors. This move may raise some questions about support of Entity Framework. There will still be the main branch maintained by Microsoft which will be used for official releases with full support. Entity framework follows move already made for example by ASP.NET MVC team. We can only hope that other teams will follow this decision. I would like to see WCF under an Open source license as well (it is the top request in WCF UserVoice). Open sourcing code can make .NET APIs only better.
I finally had some time to finish reading of Programming Entity Framework DbContext by Julia Lerman and Rowan Miller. This book is like the second part of Programming Entity Framework Code First I reviewed last time. I liked this book even more than the first one. Again the book covers exactly what is said in the title because it is definitive guide to using Entity Framework DbContext API. Both books together form an ultimate resource about Entity Framework 4.1 (and newer versions) containing some interesting information not mentioned elsewhere. Authors wrote set of two books providing detailed answers to all commonly asked questions on Stack Overflow and MSDN forum.
Again my biggest complain goes to missing index in the electronic edition of the book bought directly on O'Reilly. It is not a big problem in electronic version but it would be a big disappointment if the index is not present in paper version.
There are still few topics missing in the whole Entity Framework puzzle that are not described in either of these books. These topics include mapping with designer in database and model first approaches which are possible with DbContext API as well. These topics remained same as in Entity Framework 4.0 covered in previous great Entity Framework resource: Programming Entity Framework, 2nd edition by Julia Lerman. Combining new books with few selected topics from the last mentioned book will form a knowledge base we need to learn, understand and successfully use Entity Framework and DbContext.
I found one general suggestion at page 79 where I disagree with the book because it is incomplete. This suggestion describes importance of detaching entities when not using POCOs and claims that detaching is not needed when using POCOs. The suggestion is correct if we use pure POCOs with no lazy loading or change tracking proxies (such classes are used by examples in the book) but in case of proxied POCOs the original importance of detaching entities is still in place.
Eager loading is one of core features in Entity Framework. It allows us loading related entities with main entities in a single round trip to the database. This feature should be extended in the future because the current implementation has several limitations:
We can't specify conditions for eager loading
A lot has been written about this limitation including existing highly voted UserVoice suggestion. The current implementation always loads all related records so if we have a main entity which has thousand related entities and we want to get only 10 most recent through eager loading we don't have a direct way to achieve that. We can use a projection to a custom type or an anonymous type to overcome this limitation but that is just a workaround for the missing feature. Eager loading should offer possibility to limit eager loaded data through conditions and to limit number of returned entities - generally strongly typed Include method should support extension methods like Where, OrderBy and Take.
We can't declare eager loading globally
The current implementation demands eager loading to be specified in every query through Include extension method. In most cases the eager loading definition can be wrapped to a separate extension method and reused so the global eager loading would be just a nice to have feature but there are situation where current Include method doesn't work - for example eager loading on inheritance hierarchies is problematic. Entity framework should offer additional ways to define eager loading. I can imagine two such new ways:
- Global eager loading defined directly in mapping as proposed in already existing UserVoice suggestion
- Context scope eager loading defined per context instance - similar mechanism was available in LINQ-to-SQL and its DataLoadOptions
Both these implementations would force the query engine to automatically use eager loading of related entities every time the main entity is queried. They should work on inheritance strategies but also for example when the main entity is queried through lazy loading (you will lazy load entity with all relations configured for global eager loading).
We can't specify how should eager loading be executed
At the moment all eagerly loaded data are joined to one result set. This can be absolutely fine in some cases (eager loading of reference property or small collection for few entities) but it can have a significant impact on larger queries with a lot of eagerly loaded data - I described this problem in the answer for the question asked by puretechy on Stack Overflow. We should be able to specify if eager loading should use a join or a separate query to load related data. All details to build the separate query automatically are already present in the main query where we are using eager loading. As a workaround we can specify and execute those separate queries manually but the disadvantage is that Entity Framework doesn't offer any way to batch multiple queries to single round trip to the database. I posted the related suggestion to UserVoice yesterday.
All these features would greatly improve our development experience when using Entity Framework and eager loading. They will also fit to the most required feature on UserVoice: Improve SQL Generation because possibility to control how eager loading is executed and how many data are really loaded is nothing more than improved control over generated SQL.
Last week I finished reading of Programming Entity Framework Code First by Julia Lerman and Rowan Miller. I am quite impressed by the content of the book because it contains exactly what the title says and nothing more. It's the definitive guide to using code mapping in Entity Framework but it doesn't cover any additional topics you are not interested in. It keeps the book thin (the real content has 176 pages) so you can finish it within less than one week and use it as reference for your upcoming experience with Entity Framework Code first. The drawback of the book's size and content is that you should not use this book for learning Entity Framework itself. You should first go through some tutorials, trainings or other books before you use this book because this book will not tell you anything about Entity Framework itself. I recommend this book to anybody who has some basic experience with Entity Framework and who is going to use or who is using Entity Framework Code first in his projects. The biggest problem I see in the book is missing index - at least digital versions bought directly on O'Reilly don't have any. Using technical paper book without index as a reference can be pretty annoying.
Side note: The book targets Entity Framework 4.2 so small changes in Entity Framework 4.3 and code first migrations are not covered. There is only very small chapter (2 pages) about migrations explaining what will migrations bring to Code First development approach.
Entity framework contains mapping functionality for entities, complex types and mapping integer columns to enumerations (only when .NET Framework 4.5 Beta and VS 11 is used). Is it enough? I don't think so and because of that I started a new suggestion on Data UserVoice: Support for simpe type mapping or mapped type conversions.
Sometimes we need to map (convert) values for simple types as well. This need for simple type mapping will become much more significant once we try to use Entity Framework with legacy database (or simply database which is not created primarily for Entity Framework). Perhaps these examples can sound familiar:
- Char column containing values y and n or VarChar column containing values yes and no or true and false - we want to map it to a boolean property
- VarChar column containing date - we want to map it to a DateTime or DateTimeOffset property
- VarChar column containing numeric value - we want to map it to a numeric type property
- VarChar column containing enum value - we want to map it to an enum property
But there can be more advanced cases:
- VarChar column containing joined values - we want to map to a list of strings
- VarChar column containing XML - we want to map to XElement or XmlDocument
- VarChar column containing some identification - we want to map it to a type defined by this identification (for example create CultureInfo from a string like en-us)
- Binary data which we want to map to a stream or a specific type
- As the most advanced case we can even think about using multiple columns to get a single property of some specific type (I don't mean mapped complex type in this case)
At the moment this can be achieved only with a workaround in mapped entities where we map the column to the property with the type demanded by Entity Framework and in the same time we expose the second non mapped property doing our conversion internally from the mapped one. That works for .NET code but it has several issues:
- It is mapping logic = it doesn't belong to the entity. The entity should not need to know anything about the way how it is persisted.
- It doesn't work if we want to use our new non mapped property for example for filtering in Linq-To-Entities queries. Linq-to-entities queries must use the original mapped property but it means that our mapping logic will even leave the entity and creep into other parts of our code.
- If Linq-to-entities need to use the mapped property, the property must also be accessible to the code defining the query. It usually means ugly public interface of our entity.
If you feel that this would be a valuable feature in Entity Framework, don't forget to vote for the suggestion on UserVoice because upcoming features in Entity Framework are currently selected only from highly voted suggestions.
Julie Lerman recently wrote a very nice article about using Entity Framework 4.3 code first migrations with an existing database. Using an existing database is very common scenario so this task will be used often. The mentioned article contains one cumbersome operation where we must create the initial migration and manually clean all code prepared for us because we are using the existing database where these changes must not be executed (it would result in an exception and stop our migration process). ADO.NET team promptly addressed this issue and Entity Framework 4.3.1 released yesterday contains a new command parameter which will create the empty initial migration for us.
The rest of the article will show the whole walkthrough of adding the initial migration and it will especially targets the situation where we are upgrading existing database created by the code first approach with Entity Framework 4.1 or Entity Framework 4.2.
Microsoft today released multiple highly anticipated products: Windows 8 Consumer preview, Windows Server 8 Beta, .NET Framework 4.5 Beta, Visual Studio 11 Beta, Visual Studio 11 Team Foundation Server Beta and several related applications, tools or toolkits. If you work with Microsoft technologies and want to check what's new, it is the best time to download this stuff and install some virtual machine for exploration of new features, tools and APIs.
.NET Framework 4.5 also contains a new version of Entity Framework Core libraries (ObjectContext API). This new version contains all features previewed in Entity Framework June 2011 CTP = enums, spatial types, mapped table valued functions and other features are finally here. The new Entity Designer presented in June 2011 CTP is direct part of Visual Studio 11. Together with this release ADO.NET team released Entity Framework 5.0 Beta (DbContext API) which is able to use new features from Entity Framework Core libraries. Some features are still available only when used with database first approach (EDMX) - for example mapped table valued function but features like enums, spatial types, auto-compiled LINQ queries or performance improvements are available for code first as well.
Do you remember Entity framework June 2011 CTP with all that nice features? This CTP was initially released as first try for out-of-band (independently on .NET Framework) release of core .NET library - System.Data.Entity.dll. The out-of-band release caused some issues because we had to use separate .NET target for using it and it makes incompatible with other tools and features. Last week ADO.NET team announced that they will not be able to release core functionality out-of-band. It means that all changes and major features / improvements can be released only with a new .NET framework version. The team also announced that we can expect features from June 2011 CTP in upcoming .NET Framework 4.5. What does it mean? .NET Framework 4.5 should contain new version of Core Entity framework libraries (that is the new name for ObjectContext API and Entity framework core features). The rest of this article describes all changes related to upcoming release including new features, missing features and relation to DbContext API.
Few week ago ADO.NET team published minor update to Entity Framework 4.1 (download page). By following ADO.NET team blog, I guess the purpose of this minor update was fixing few bugs, adding support for context factory and preparing support for Code First Migrations August 2011 CTP which demands this update. Everything looks awesome because ADO.NET team is fixing some bugs, adding new features and working on new tools (EF power tools, migrations) and new version Entity framework. Still sometimes there can appear some unexpected problem. The mentioned update contains one annoying hidden breaking change reported by many developers using external profilers. The rest of the article will describe this problem, its impact and announced fix.