Friday, April 11, 2008

The Next Step For VS2008 Database Edition

I started using VSTS Database Edition back when it was called Data Dude and available as a CTP download. Since then I have slowly embraced all its features and I now use it as my complete database development solution from schema management, to data generation, and finally deployment.

DB Edition has its quirks but you learn to understand them and work with them and each new version has new features to improve the workflow. However, as I have been using DB Edition each day, an idea has been steadily stewing in my head for where I'd like to see DB Edition go next. My thought process in a nut-shell follows...

A Database Project allows you to define your schema and data generation and, from a Visual Studio context menu, deploy the database to a chosen server as a new database or as an upgrade to an existing database. You can also use the DatabaseTestService in a test project to deploy the database and test data for automated testing. And finally you can use the SqlDeploy MSBuild task to deploy the database as part of a continuous integration build.

However, all of these methods of deployment require the relative path to the Database Project and all its SQL scripts and some settings in a configuration file. This causes problems in deployed test runs and Team Build test runs where the relative path to Database Project often changes. It also restricts using multiple test databases in a single test project due to the way the configuration file works.

I propose that, at build time, DB Edition could package into a .NET assembly, the full definition of the Database Project along with some standard bootstrap code but minus the deployment configuration. Test projects could then include a reference to the DB assembly and make calls into the bootstrap code, perhaps something in the form:

MyDBAssembly.Schema.DeployTo(someConnectionString, someOptions);
MyDBAssembly.SomeDataGenPlan.DeployTo(someConnectionString);

The DB assembly will be treated as a dependency like any other references would and will naturally be moved around wherever the primary assembly gets deployed and all the necessary information will always be available to perform a completely new database deployment or to perform a schema upgrade on any compatible existing database instance.

If the DB assembly happened to double as a console application, it could be used for ad hoc command-line deployments or even included in batch, MSBuild, or PowerShell scripts for automated deployments.

I am contemplating several ways to hack a feature like this into DB Edition myself but I'm hoping someone else has already done it or maybe the DB Edition team already has it on the cards for Rosario.

 Monday, February 04, 2008

Database Evolution

K Scott Allen just recently posted the final article in a series of five about managing the development and deployment of a relational database alongside your code in a team environment.

He highlights some common failing points and good solutions to tricky obstacles, most of which I've faced throughout my career and learned about the hard way. I haven't quite achieved the same level of streamlined schema management that Scott has but it's good to know I'm on the right track.

Visual Studio for Database Professionals projects have started to be included in our product source control and we are pushing this great but still relatively new tool to automate as much of our schema management as possible. No matter how good the tools are though, I'm not the only one who feels at least a few developers on any team "need to step up and get comfortable with SQL".

In fact, getting comfortable with effective use of PowerShell, in addition to SQL, has helped to get reams of lookup data (ie post codes, etc) into repeatable T-SQL scripts and also to manage deployment of schema changes to multiple sites.

I've also just ordered Refactoring Databases: Evolutionary Database Design by Scott Ambler and Pramod Sadalage. The book focuses on applying the same incremental refactoring techniques used in code to evolve a database schema over time with minimal upset to existing systems. I expect to find some real insight into database development in this book.

Ambler has recently recorded some good interviews on both .NET Rocks and OnSoftware and Pramod has an e-book on Recipes For Continuous Database Integration worth investigating.

 Sunday, October 14, 2007

Report Services Automation With PowerShell

In late September Paul Stovell wrote about a set of VB.NET scripts he prepared to help deploy reports to SQL Server Reporting Services. If you've ever had the displeasure of deploying SSRS reports without Visual Studio then you'll understand how much it sucks.

Paul went to the effort to write individual scripts for creating folders and data sources on the server and uploading report definitions and configuring permissions. With Paul's work simple command scripts can then be used deploy reports.

However these command scripts still need to be written and they end up containing much of the same information as can be found in the .rptproj project file and the .rds data source files. I despise the idea of maintaining any sort of configuration information in more than one place so adding to the deploy command script whenever I add a report to the project in Visual Studio just makes me cringe.

Additionally, as Paul briefly mentions, MSBuild (and therefore Team Build) does not support Report Services projects so, once again, to deploy your reports as part of Continuous Integration you need to have separate tools.

Today I constructed a lengthy PowerShell script to take a Report Services .rptproj project file and output a command script that utilises Paul's VB.NET scripts to deploy the reports as per the project settings. Due to the size of the script rather than publishing it inline, you can download it here.

The script accepts three parameters. ProjectFile is the path to .rptproj file for the reports you want to deploy. If you omit this parameter the script uses the first report project file it finds in the current directory. The second parameter, ConfigurationName tells the script which project configuration to use for the target server URL and destination folders. If you omit this parameter the script uses the first configuration defined in the project. The last parameter SearchPaths is a list of paths for the script to search when locating both rs.exe and Paul's .rss files. The SearchPaths parameter is automatically combined with the environment PATH variable and may be omitted.

Here is an example usage:

PS C:\Users\Jason\Dev\MyReports> .\Deploy-SqlReports.ps1 `
    -ProjectName MyReports.rptproj `
    -ConfigurationName Release `
    -SearchPaths "C:\Tools\Report Services\" `
    | Out-File deploy.cmd -Encoding ASCII;

As always, my PowerShell skills are slowly improving and this script is not necessarily perfect in either robustness or efficient use of PowerShell. Hopefully it will be as useful to you as it has been to me and any changes you need should be easily made. Please leave a comment with your thoughts and suggestions.

 Friday, September 28, 2007

Streaming Large Objects with ADO.NET, Properly

It has been a while since I last worked with storing files in a SQL database and I decided to Google around to remind myself of the best way to do it. I was very disappointed with most of the approaches I found. Unfortunately, my Google-Fu didn't return the MSDN articles I've linked to below, and I had to find out the hard way.

To begin, all solutions I found dealt only with reading a BLOB from a SQL Server image or varbinary(max) column in a streaming fashion. Worst of all very few actually understood what streaming should do, and that is not load the entire object into an array in memory.

My whinging aside, streaming a file out of a SQL table is easy. You start by using a DataReader created by passing CommandBehavior.SequentialAccess to a DbCommand's ExecuteReader function. I also find that selecting only the blob column and only the desired row(s) from the table is the most effective.

When you have the DataReader positioned on the appropriate record you repeatedly call the GetBytes method in a loop, retrieving a small chunk each time and writing it to the output stream. The output can be any IO.Stream like a file or even your ASP.NET response. This MSDN article has a good description of the situation with the SequentialAccess enumeration and some sample code.

Writing a stream of data into a SQL table turned out to be slightly less obvious. I'm only working with SQL Server 2005 so I didn't consider supporting older versions but the approach is similar. SQL 2005 provides a Write "method" on the large value data types in the UPDATE statement.

My solution was to first insert the new row into the table providing values for all columns except the blob. Then I had a stored procedure that would take the row's primary key values, an offset, and a chunk of the data to insert and use the UPDATE .Write method to update the row.

Similar to the reading code, my writing code would read a small chunk from the incoming IO.Stream and pass it to the stored procedure, incrementing the offset each time. Once again, there is another MSDN article that describes the process well but their code looks like it will also work with SQL versions prior to 2005.

In both cases tweaking the size of the chunk used in each iteration of the loop will require some testing and measuring to find the best performance but now you can read and write files of almost 2GB into SQL Server without trying to allocate a similarly sized array in memory first.

 Friday, September 21, 2007

Staying Optimistic

The topic of concurrency control in database applications has been on my mind lately and I wanted to get my thoughts in order. Here I intend to present my views on the situation and probably contort some official definitions in the process.

At the outset there are two choices: pessimistic concurrency that will lock data from the time an edit begins until the time the edit is committed or optimistic concurrency that does not lock but only commits data that has not been changed by another process since the edit began.

Pessimistic concurrency has it's place but for user databases it is losing popularity for two reasons. Firstly it requires additional server resources to track which data is locked and by whom, and secondly it means that one user will need to wait before working on the same data that another user is editing.

Optimistic concurrency avoids the first issue by having the client remember a checksum or timestamp of the original data that it can check for at commit time to ensure no other changes have been made. The compromise is that if changes have been made, the user has to either re-input their changes to the new data or use a merging tool (if available) to reconcile the differences.

Obviously, optimistic concurrency is going to work really well in situations where it is rare for two users to edit the same data and in practice with databases full of thousands of records and a only a relatively small group of users working with them, the chances are low. (So low that many database applications don't bother with a merging tool).

However, time is a major factor with optimistic concurrency. The less time between the reading of the original data and the final committing of new data the less likely another user has changed the same data. If a user begins updating a customer's address though, but stops to answer a phone call then go to lunch, when this user returns and remembers to save there is a much greater chance a fellow employee has worked on the customer record.

There are additional factors beyond a user's actions that can increase the chance of an optimistic concurrency violation. It is fair to expect that an application should load at least a subset of records when it first starts to give the user a populated initial view but now the clock is ticking because the application has remembered the state of the data at the time the program opened. Assuming the architecture handles it, this is easily resolved by re-retrieving the latest data from the server immediately before the user begins making changes.

Also, while it may only take 30 seconds for the user to type in a new phone number for an existing customer, the chance of a concurrency violation continues to grow as each minute passes until the application pushes these changes back to the database. In this example having the application save the new data to the server when the user closes the customer dialog is an obvious solution. But time between read and commit is not the only concern...

When there is a concurrency violation, some action needs to be taken. An easy but often inappropriate action is to simply ignore the changes on the server and overwrite them with the latest data from the user. If that was acceptable for your application you wouldn't be here worrying about concurrency. As mentioned briefly above the options are to compare and merge or discard and re-input.

With either approach, the difficulty to merge or re-input grows proportionally with both the time since the edit was made and the amount of data being committed. The difficulty lies not only in the user's ability to remember all the changes they have entered but also the application's ability to determine the concurrency relationships between the various data elements being committed (ie if an existing order conflicts due to a changed shipping address should the added order line be considered a conflict too?).

Use optimistic concurrency, save often, save automatically.

 Monday, July 02, 2007

Dreams and Mistakes

Firstly, in my recent post about the new .Net DateTimeOffset class I mentioned that I would love to see the SQL Server team include this type in databases. My wishes have been answered, we can expect to see this type and some other very interesting additions in the next CTP/Beta of SQL Server 2008. Unfortunately, all of our clients have bought into SQL Server 2005 and won't be interested in upgrading anytime soon so I'll have to find a new project to try all the new toys.

Secondly, also in a recent post I complained about the CollectionBase type being left behind when generics were added to .NET 2.0. I've since discovered the new Collection<T> class tucked away in the System.Collections.ObjectModel namespace. It doesn't offer all the events that the previous equivalent did but it is inheritable and overridable so it should do the job nicely. I obviously didn't search very hard for it all the other times I looked; very embarrassing.

 Tuesday, June 05, 2007

SQL Server 2008

It's hard to keep up. Microsoft have recently released the June CTP for SQL Server 2008. You may need to sign in with your Windows Live ID to follow that link and you definitely need to apply for the SQL CTP program if you want to download it. I guess it makes sense for us to start seeing SQL 2008 now as it probably coincides with the Visual Studio 2008 RTM later this year.

 Thursday, May 31, 2007

Virtual Library

Microsoft now has a collection of Virtual PC/Virtual Server VHD images to trial a decent range of their enterprise products. They currently have Windows Vista, Visual Studio 2005 Team Suite, Windows Server 2003 R2, SQL Server 2005, Exchange 2007, and several others. These should be really useful for testing their new products without requiring a spare machine and also for testing your own software against their various platforms. Hopefully they will continue to keep their images up to date and expand on the packages available.

 Friday, March 02, 2007

Self-executing SQL scripts

I am currently preparing our deployment processes for our most recent software product at work. A fair portion of it involves executing SQL scripts to install and update SQL Server 2005 databases. The deployment will ultimately be performed to several sites by staff who are less familiar with SQL Server than the development team.
 
At the moment it involves executing the SQLCMD tool from the command line with appropriate parameters to connect to the server, process the script and output a log. However, I feel that this is just one more error-prone step that should be avoided. Some time ago I read about Polyglots on Wikipedia and I was inspired. I thought I would try to write a batch file that also contained a SQL script. The ultimate goal would be a single-file that could be double-clicked and the script would run and the log would be created.
 
This meant it must be written so the batch command interpreter would ignore the SQL and the SQLCMD tool would ignore the batch commands. The trick was finding the keywords and structures in each language that had similar syntax. After several attempts I settled on GOTO, proving that it isn't totally harmful.
 
Here is a base example of my solution that you can use to create your own self executing SQL scripts. Just put it in a file with a ".cmd" extension and change as appropriate:
 
:setvar NUMBEROFROWS 15
 
GOTO startofpolyglotsqlbatch /*
:startofpolyglotsqlbatch
@echo off
sqlcmd.exe -S MySqlServer -E -e -i "%~f0" -o "%~f0.log"
more "%~f0.log"
goto endofpolyglotsqlbatch
::
*/
startofpolyglotsqlbatch:

USE MyDb;
GO
SELECT TOP $(NUMBEROFROWS) * FROM MyTable;
GO

/*
:endofpolyglotsqlbatch
::
*/
 
I have added some colour to highlight how the code is interpretted. The initial GOTO is parsed by both the batch command processor and SQLCMD but goes to a different destination for each. The green text is only seen and processed by the batch parser and the blue text is only seen and processed by SQLCMD. The grey text can be replaced with content relevant to your script. This is designed to only work with SQL Server 2005 and only on Windows XP, Windows Server 2003, or later.
 Saturday, February 24, 2007

What rolls down stairs, alone or in pairs?

I recently discovered that our databases in SQL Server 2005 using the Bulk Logged recovery model were not getting their logs backed up by the Log Backup maintenance plan. I did some searching to no avail and ending up posting a question to the MSDN forums. I was told this is a known bug in SQL 2005 RTM and SP1 and will be fixed in SP2. SP2 is now available and the problem is fixed... but not very well.
 
The Log Backup maintenance plan allows you to choose "all databases" when asked what to back up. In pre-SP2 it would select all databases on the server, filter out the ones with inappropriate recovery models, then execute a BACKUP LOG statement against the remaining databases. The problem was that instead of only excluding Simple recovery models databases, it excluded Bulk Logged also. I imagined the fix in SP2 would be to correct the filter. Not so.
 
Now, in SP2, the maintenance plan selects all databases on the server and attempts to execute a BACKUP LOG statement against each of them. Those with appropriate recovery models succeed, the others fail. Unfortunately the failures are then treated by SQL Agent as a job error and the maintenance plan reports as failed even though all the right databases were backed up successfully.
 
The benefit in SP2 is that the Bulk Logged databases actually get backed up now because you couldn't back them up at all with a maintenance plan previously even if you explicitly selected the Bulk Logged databases only. The downside is that I have to update my maintence plan everytime a new database is added or the recovery model changes. I appreciate that you should give more time to getting backups right on a production database, but on the development servers this is tedious.
 Monday, May 01, 2006

Role your own

I discovered an annoying limitation in the SQL Server Management Studio today. The GUI does not offer an option to add a Database Role as a member of another Database Role. In my case I wanted to add my "bigAppAdmins" role to the "bigAppUsers" role so all administrators are implicitly users also. Unfortunately I had to resort to the sp_addrolemember stored procedure to achieve this. If I didn't already know that roles could be members of other roles I might have been convinced that SQL Server didn't support it.
 Monday, February 20, 2006

Stringly Typed Data

The project I am currently working on involves the new SQL Server 2005 Reporting Services. So far, other members of my team have been responsible for the reporting modules but I know I will be working with it soon so I decided to do some preliminary research.

I believe in keeping my blog as a source of positive information and I prefer to post about problems where I have already found a solution I can offer to the community. However, a particular problem I have encountered with Reporting Services does not seem to have a solution, so I hope this post will bring more attention to the problem and perhaps a solution will eventually be provided by Microsoft. This is assuming that this problem is not a result of my inability to find the right documentation.

I am referring to the ReportParameter class in the Microsoft.Reporting.WinForms namespace. This class is used to pass parameters to the Report Server for determining the contents of the report produced. The problem is that the ReportParameter only seems to support String typed parameters. Considering many reports will be based on the results returned by stored procedures in SQL Server I would expect Reporting Services to be using a very similar structure for its input parameters.

With the current structure I cannot pass NULL to a ReportParameter. I would need to use special values such as -1 maybe for an int parameter and the stored procedure would need to be changed to understand that. If I want to be able to distinguish between an empty string and a NULL string I have to go to more effort to choose a special value that won’t be used in my data.

Another filter that I will commonly use for reports is a date filter. The user should be able to choose a start and end date and generate a report with data within that range. With standard Stored Procedure calling, ADO.NET will handle all the necessary converions for passing DateTime parameters in the correct format for the database. With Reporting Services I need to convert the DateTime to a String manually by considering the locale of the user’s computer and the Report Server.

ADO.NET has been built to provide a base level of data classes to suit all types of features and capabilities found in various database engines. There is no reason why Reporting Services should not have been developed the same way, especially considering this is now the second release of the product.

I am interested in reading the comments of others on this topic and if I am wrong about all this can someone please point me in right direction.

 Friday, June 10, 2005

Today's blog brought to you by XML, XSL, and SQL

The major project I am currently working on requires many tables to be created in several copies of an SQL Server database in multiple offices. All of these tables need very similar views and stored procedures. There usually are two views per table, one to show all the records, and one to show the records 'WHERE deleted = 0'. I never delete any information in this database, it just gets hidden. There also at least three stored procedures per table, two for synchronisation with other databases, and the other for inserting new records. Quite obviously this becomes tedious quickly. I couldn't find much on the web that suited my exact requirements and I was feeling too lazy to develop my own software. The answer is a very nice combo of XML, XSL, and a command line transformation tool kindly provided by Microsoft in the form of MSXSL.EXE.

Firstly I define any new tables I want to create in a fairly straight forward XML layout. For example:

<table name="locations">
 <field name="deleted" type="bit" nullable="false" id="deleted" />
 <field name="lastRevision" type="lastRevisionType" nullable="false" />
 <field name="number" type="int" nullable="false" id="pk">
  <constraint type="primaryKey" />
 </field>
 <field name="suburb" type="varchar" size="64" nullable="false" />
 <field name="state" type="varchar" size="4" nullable="false" />
 <field name="postCode" type="varchar" size="4" nullable="false" />
 <field name="freightZone" type="int" nullable="false">
  <constraint type="foreignKey" foreignTableName="freightZones" foreignFieldName="number" />
 </field>
</table>

Secondly I use some rather fine tuned XSL scripts and a neat batch file to call MSXSL with the appropriate parameters to output the CREATE TABLE SQL, and the SQL for all the views and stored procedures into a file ready to be pasted into Query Analyzer. As you can see, the XML layout is very easy to understand and it would be equally easy to export this layout for existing tables in almost any database or to use this layout to generate any SQL or perhaps even documentation, diagrams, .NET classes or .NET Windows Forms with all the needed controls.

The tricky part in using XSL to generate SQL is getting commas between all fields except the last and, even more so, getting the whitespace output neatly. Fortunately I was cunning enough to ask a markup-wizard colleague of mine to do most of the XSL work for me. It was greatly appreciated as I only had to add the finishing touches and didn't need to read the W3C's XSL specs again. Admittedly, I kept the transformations simple. If I need to define a table with a multi-field primary key then I will need to adjust the resulting SQL before I use it. This happens rarely and is still easier than doing it all manually. There are probably other situations that cause the XSL to fall over but I haven't encountered them yet.

For anyone who has used similar techniques before or decides to try it now, I would like to hear some of your experiences. Here is an example of one of the XSL scripts to get you started:

<?xml version="1.0" encoding="iso-8859-1" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
 <xsl:output method="text" encoding="iso-8859-1" indent="no" />
 <xsl:strip-space elements="*" />
 
 <xsl:template match="table">
CREATE VIEW dbo.vw_<xsl:value-of select="@name" /> AS
SELECT
<xsl:for-each select="field[not(@id='deleted')]">
 <xsl:text> </xsl:text>
 <xsl:value-of select="@name"/>
 <xsl:if test="position() &lt; last()"><xsl:text>,&#10;</xsl:text></xsl:if>
</xsl:for-each>
FROM dbo.tb_<xsl:value-of select="@name" />
WHERE
<xsl:for-each select="field[@id='deleted']">
 <xsl:text> </xsl:text>
 <xsl:if test="position() &gt; 1"><xsl:text>AND </xsl:text></xsl:if>
 <xsl:value-of select="@name"/>
 <xsl:text> = 0</xsl:text>
</xsl:for-each>
GO

GRANT SELECT ON dbo.vw_<xsl:value-of select="@name" /> TO sales, visitors
GO
 </xsl:template>
 
</xsl:stylesheet>