Sunday, September 30, 2007

Adelaide Geek Dinner

I am organising a dinner get-together for local developers to meet, socialise, and learn from each other. I am still finalising details of the time and venue (which will depend on the numbers of guests) but I am currently thinking it will be an evening in the middle of November at a restaurant in the Adelaide CBD.

Also, to avoid being overwhelmed by numbers and to ensure the concept will work here before scaling up, I am going to start by only inviting .NET developers who also publish a blog. I already know a few such people in South Australia and have found a few more on Google and will ensure they receive an invitation but I'm sure I've missed some.

So, if you are, or know someone who is, a .NET developer and a blogger and will be in Adelaide in November, send me an email or post a comment here, and I will make sure you receive an invitation when the details have been settled.

If you are a .NET developer but not a blogger and would really like to attend, just grab a free Blogger or Live Spaces account and post one article about your .NET experiences then let me know.

If the evening is a success then we can plan to have these dinners on a regular basis and open the guest list to a larger range of people. Looking forward to meeting you...

 Friday, September 28, 2007

Streaming Large Objects with ADO.NET, Properly

It has been a while since I last worked with storing files in a SQL database and I decided to Google around to remind myself of the best way to do it. I was very disappointed with most of the approaches I found. Unfortunately, my Google-Fu didn't return the MSDN articles I've linked to below, and I had to find out the hard way.

To begin, all solutions I found dealt only with reading a BLOB from a SQL Server image or varbinary(max) column in a streaming fashion. Worst of all very few actually understood what streaming should do, and that is not load the entire object into an array in memory.

My whinging aside, streaming a file out of a SQL table is easy. You start by using a DataReader created by passing CommandBehavior.SequentialAccess to a DbCommand's ExecuteReader function. I also find that selecting only the blob column and only the desired row(s) from the table is the most effective.

When you have the DataReader positioned on the appropriate record you repeatedly call the GetBytes method in a loop, retrieving a small chunk each time and writing it to the output stream. The output can be any IO.Stream like a file or even your ASP.NET response. This MSDN article has a good description of the situation with the SequentialAccess enumeration and some sample code.

Writing a stream of data into a SQL table turned out to be slightly less obvious. I'm only working with SQL Server 2005 so I didn't consider supporting older versions but the approach is similar. SQL 2005 provides a Write "method" on the large value data types in the UPDATE statement.

My solution was to first insert the new row into the table providing values for all columns except the blob. Then I had a stored procedure that would take the row's primary key values, an offset, and a chunk of the data to insert and use the UPDATE .Write method to update the row.

Similar to the reading code, my writing code would read a small chunk from the incoming IO.Stream and pass it to the stored procedure, incrementing the offset each time. Once again, there is another MSDN article that describes the process well but their code looks like it will also work with SQL versions prior to 2005.

In both cases tweaking the size of the chunk used in each iteration of the loop will require some testing and measuring to find the best performance but now you can read and write files of almost 2GB into SQL Server without trying to allocate a similarly sized array in memory first.

 Wednesday, September 26, 2007

Business and Data Layer Architectures

Strongly Typed DataSets in .NET 2.0 do many things very well but they fall short in other places. For a new project I have been researching alternative approaches for structuring the model, logic, and data access projects to allow for easy customisation and good testing.

Firstly, there are some features about DataSets that are very useful and would need to exist in any alternative I consider:

  • Collection filtering and sorting by any properties.
  • Collection and item change notification.
  • IEditableObject support (ie BeginEdit, EndEdit, CancelEdit).
  • Excellent encapsulation of SQL commands.

There is also the limitations of DataSets that should not exist in the alternative I choose:

  • Computed values too complex for DataColumn.Expression cannot be bound to.
  • AllowDBNull and MaxLength violations cause immediate exceptions.
  • Restrictive relationship between model and underlying database structure.
  • Does not utilise Nullable<T> to support columns that allow DBNull.
  • Change notification events cannot be completely silenced.

So far I have briefly investigated CSLA, SubSonic, NHibernate, and some existing applications both real (DotNetKicks) and example (MS Pet Shop), and found that they are all very similar in many ways. Rather than commit myself to a particular framework that may have it's own set of issues I have decided to hand code my own solution to the business and data layering problem.

Others have expressed concerns about going with a third party framework and to a point I agree, but I'm not totally sold either way yet.

I intend to watch my own code evolve as I solve the difficulties associated with the task and refactor my work into common classes. I expect that my code will begin to look much like one of the existing frameworks (although that may be biased by what I've already seen) and perhaps I will end up switching to the solution that most closely matches my own.

The last thing we need is yet another framework.

 Friday, September 21, 2007

Staying Optimistic

The topic of concurrency control in database applications has been on my mind lately and I wanted to get my thoughts in order. Here I intend to present my views on the situation and probably contort some official definitions in the process.

At the outset there are two choices: pessimistic concurrency that will lock data from the time an edit begins until the time the edit is committed or optimistic concurrency that does not lock but only commits data that has not been changed by another process since the edit began.

Pessimistic concurrency has it's place but for user databases it is losing popularity for two reasons. Firstly it requires additional server resources to track which data is locked and by whom, and secondly it means that one user will need to wait before working on the same data that another user is editing.

Optimistic concurrency avoids the first issue by having the client remember a checksum or timestamp of the original data that it can check for at commit time to ensure no other changes have been made. The compromise is that if changes have been made, the user has to either re-input their changes to the new data or use a merging tool (if available) to reconcile the differences.

Obviously, optimistic concurrency is going to work really well in situations where it is rare for two users to edit the same data and in practice with databases full of thousands of records and a only a relatively small group of users working with them, the chances are low. (So low that many database applications don't bother with a merging tool).

However, time is a major factor with optimistic concurrency. The less time between the reading of the original data and the final committing of new data the less likely another user has changed the same data. If a user begins updating a customer's address though, but stops to answer a phone call then go to lunch, when this user returns and remembers to save there is a much greater chance a fellow employee has worked on the customer record.

There are additional factors beyond a user's actions that can increase the chance of an optimistic concurrency violation. It is fair to expect that an application should load at least a subset of records when it first starts to give the user a populated initial view but now the clock is ticking because the application has remembered the state of the data at the time the program opened. Assuming the architecture handles it, this is easily resolved by re-retrieving the latest data from the server immediately before the user begins making changes.

Also, while it may only take 30 seconds for the user to type in a new phone number for an existing customer, the chance of a concurrency violation continues to grow as each minute passes until the application pushes these changes back to the database. In this example having the application save the new data to the server when the user closes the customer dialog is an obvious solution. But time between read and commit is not the only concern...

When there is a concurrency violation, some action needs to be taken. An easy but often inappropriate action is to simply ignore the changes on the server and overwrite them with the latest data from the user. If that was acceptable for your application you wouldn't be here worrying about concurrency. As mentioned briefly above the options are to compare and merge or discard and re-input.

With either approach, the difficulty to merge or re-input grows proportionally with both the time since the edit was made and the amount of data being committed. The difficulty lies not only in the user's ability to remember all the changes they have entered but also the application's ability to determine the concurrency relationships between the various data elements being committed (ie if an existing order conflicts due to a changed shipping address should the added order line be considered a conflict too?).

Use optimistic concurrency, save often, save automatically.

 Saturday, September 15, 2007

PowerShell Vulnerability

Some time ago I put together a template for a file that can be interpreted as both a batch command script for cmd.exe and a SQL script for SQL Server Management Studio and sqlcmd.exe. It works really well for enabling someone unfamiliar with SQL Server to deploy database update scripts.

I've been working with PowerShell quite a lot lately and I was wondering if it would be possible to have a file that would be interpreted by both cmd.exe and PowerShell.exe. It turns out that Jay Bazuzi, a developer at Microsoft, has already found an elegant solution. However, upon inspecting the code I became concerned how it would be effected by PowerShell's default setting to disable all scripts and especially disable unsigned scripts from remote sources.

I have since worked with Jay's sample on my home machine and discovered that it effectively bypasses PowerShell's anti-script security by piping the commands to it's interactive mode. As a result, it was relatively trivial to write a batch command script that would enable PowerShell to run all future scripts and modify the user's PowerShell profile to ensure it stays that way.

Thankfully, security in the OS and other applications is making it harder to get a command script on to another computer and get the user to run it but with the capabilities of PowerShell potentially available for malicious use it's just one more reason to not run as an administrator and minimise the damage.

 Friday, September 14, 2007

The Theory Of Thirds

It can be expected that for any blog about a difficulty experienced by a user of  Windows that the third comment to be posted will recommend the Linux or Mac OS as the solution to the problem. The first two comments preceding the third usually add actual value for other readers of the blog.

The theory is not restricted to just Windows and could equally be applied to a language (eg VB), a concept (eg Agile), a product (eg iPod), or any topic with distinct followers and opposers. Obviously, in these cases the suggested solution corresponds to the exchanged topic. For example, an article about VB's failings will be responded to with "You should use C# instead".

 Wednesday, September 12, 2007

PowerShell Regular Expression Compiler

As an exercise in the mostly useless, I decided it would be interesting to try my hand at writing a PowerShell script to compile a regular expression pattern to a .NET assembly. I sure someone has already created a NAnt or MSBuild task to do this but I felt this would be a good way to increase my familiarity with PowerShell.

The script requires the RegEx pattern and an output type name and full namespace to work. You can optionally pass an AssemblyName but if omitted the type and namespace will be used to form the output file name. You can also specify the -ignoreCase or -multiLine switches to enable that behaviour on your expression. There are other options that probably should be supported but they can be easily added if you actually have a use for this. Without further delay, here is the code:

param
(
    [string] $pattern = "",
    [string] $typeName = "",
    [string] $fullNamespace = "",
    [System.Reflection.AssemblyName] $assemblyName = $null,
    [switch] $ignoreCase,
    [switch] $multiLine
)

if ($pattern -eq "") { throw ("-pattern required"); }
if ($typeName -eq "") { throw ("-typeName required"); }
if ($fullNamespace -eq "") { throw ("-fullNamespace required"); }

if ($assemblyName -eq $null)
{
    $assemblyName = New-Object System.Reflection.AssemblyName `
        ($fullNamespace + "." + $typeName);
    $assemblyName.Version = New-Object System.Version (1, 0);
}

$sysRx = @{};
$sysRx.Namespace = "System.Text.RegularExpressions";
$sysRx.RegEx = [System.Text.RegularExpressions.RegEx];
$sysRx.RegExOptions = [System.Text.RegularExpressions.RegExOptions];

$options = $sysRx.RegExOptions::None;
if ($ignoreCase) { $options = $options -bor $sysRx.RegExOptions::IgnoreCase; }
if ($multiLine) { $options = $options -bor $sysRx.RegExOptions::Multiline; }

$info = New-Object ($sysRx.Namespace+".RegexCompilationInfo") `
    ($pattern, $options, $typeName, $fullNamespace, $true);

$popDir = [System.Environment]::CurrentDirectory;
[System.Environment]::CurrentDirectory = $PWD;
$sysRx.RegEx::CompileToAssembly($info, $assemblyName);
[System.Environment]::CurrentDirectory = $popDir;

Here is an example usage:

./Compile-RegEx.ps1 "\s+" "Spaces" "CodeAssassin.RegEx" -ignoreCase

 Monday, September 10, 2007

Another DVD to iPod

I have another weekend driving holiday planned in the next few weeks and my fiancee has requested Family Guy episodes on the iPod for the trip. Now that I know how to do it, I figure I better get started because there are a lot of episodes.

This time I decided to try Vrata's suggestion of DVDFab instead of the Videora software I used last time. What follows is a very quick summary of my experience with DVDFab.

Benefits:

  • Does not require separate DVD ripping software.
  • Supports optionally embedding subtitles into the video stream.
  • GUI is very clean and easy to follow.
  • Hasn't crashed yet on Vista x64.

Disappointments:

  • Still insists on running as Administrator.
  • The Platinum version for iPod conversion costs about US$50.

In all other ways DVDFab is similar to Videora. Similar conversion times, same resulting file sizes. Support for multiple devices' resolutions and bit rates.

If you think you'll be converting quite a lot of DVDs to an iPod compatible format, DVDFab is probably worth the price and you can try the fully-functional 30-day trial yourself to be sure.

 Sunday, September 09, 2007

Refactor By Region

The nature of the software development profession is such that you need to be constantly learning new techniques and improving your style. A side-effect of always seeking to be better is that when you revisit old code it suddenly doesn't seem as good as it did when you first wrote it.

I was revisiting a large class recently and was struggling to navigate between all the methods to follow the logic. I decided to make use of #region blocks (something I rarely do) to help group related methods together and hide less relevant code from view.

By arranging groups of methods together by their purpose, a pattern started to emerge. Each region started to look like an ideal candidate to refactor the code into a new class with a single responsibility. I already had a set of unit tests in place for the existing code so I felt confident in being able to refactor and being able to verify I hadn't broken anything afterward.

What was originally one class became eight, resulting in greater maintainability, testability, and of course readability, solving my original problem. For this class the refactoring by region worked well because I had grouped code by it's behaviour and partly by it's call chain.

However, I've seen other approaches to using #region blocks. I've seen all event handlers put in their own region. I've seen collections of overloaded methods grouped by name. I've even seen properties grouped together, method/subs group together, and functions grouped together. In these arrangements not only does a refactoring pattern not automatically emerge but it becomes harder to search for one.

As I mentioned above, I'm not a big user of #regions. This is probably because I try to refactor my code before it gets big enough and because of the region anti-patterns I've wrestled with. I think I might have to give them a second chance.