and Comments (RSS).
September 22nd, 2013
Most of the modern applications today have at least some degree of modularity and customizability. By customizability I mean its simplest form – having one âvanillaâ application with standard features and many derived versions adapted with minimal modifications to customerâs needs. It is common practice to use dependency injection (DI) and with it we can influence the behaviour of our application by being able to replace one component with another. But DI alone does not provide everything needed to make the application truly customizable. Each aspect of an application needs to support modularity and customizability in its own way. For example, how do you maintain a customized relational database?
This post is intended to be the first of a series regarding issues and solutions we found developing our own customizable application, a kind of an introduction to the subject so that I can write about concrete solutions to concrete problems, as/when they pop up. Here Iâll give an overview of the general architecture, problems and possible solutions.
Our product is a .Net WinForms line-of-business application. I think WinForms is still the best environment for LOB apps as the others are still not mature enough (like WPF) or simply not suitable (like the web). I would say itâs also good for customizability because it doesnât impose an overly complex architecture: customizability will make it complex by itself.
As I said, DI fits naturally at the heart of such a system. DI can be used for large-grained configuration in the sense that it can transparently replace big components with other components implementing the same features. Itâs generally suitable for assembling interfaces and possibly bigger parts of the business logic. Itâs not very suitable for a data access layer: even if you used a superfast DI container that puts no noticeable overhead when used with thousands of records, that would be just part of the solution. A bigger question would be how you can customize queries or the database schema. So, for data access we need a different solution, and I believe that this part of the puzzle is the most slippery since itâs so heterogeneous. Firstly, thereâs usually the relational database thatâs not modular at all: how do you maintain different versions of the same database, each with part of its structure custom-built for a specific client? (It would be, I suppose, a huge relief if an object-oriented database was used, but this is rarely feasible in LOB). Then, there are SQL queries which you cannot reuse/override/customize unless you parse the SQL. Then the data access classes, etc.
At this level, the required solution actually depends on the application architecture. If the architecture contains just what I described at the beginning, a homogenous âvanillaâ application and only one customizing plug-in per client, things get much simpler. For example, in this case it is possible to use SQL scripts to maintain the database schema, just develop a mechanism that remembers what scripts it already executed so that they donât repeat. This is possible because the scripts will always be run in their chronological order. Contrast this with having several modules with different features: if an existing client wants to add a module to his existing application, you would need to upgrade the database by running the scripts from the module. But the database is already up-to-date for current configuration and the scripts from the new module may be written years ago. For this scenario youâd probably need a smarter solution, and Iâm not sure that current migration frameworks can support itâŚ Which sidetracks us to one of the most important things in modularity: integration. Most of the âusualâ frameworks employed in applications donât fully support modularity. You have either to customize and integrate them into your application or write your own equivalents. Which presents a kind of continuity of approach: if you want to develop a customizable application, you have to customize other peopleâs code.
What about the data layer? Once again, many things depend on the rules you establish. If you prohibit the higher level modules from deleting/renaming structures used by lower modules (which, I suppose, is common sense), you will be able to use SQL queries when needed â and given that SQL is unavoidable for complex queries, they most certainly will be. In this respect, SQL actually supports modularity because you can write a query using tables and columns from the base module and it wonât care if there are other structures added to them in the customization process. C# classes arenât like that, but C# interfaces are.
For other things â like simple CRUD editing of data, an ORM is the logical choice. There are many ways in which you can add modularity to an ORM â and it seems to me none of those can be done without customizing the ORM. The route we took is to use dynamic ORM mapping to replace basic types (contained in the vanilla module) with customized ones (present in the client-specific customization module). This is possible only if the module hierarchy is simple â that is, thereâs single inheritance between basic and customized module classes. If two or more different modules wanted to add something to the same data access class, a different method would need to be employed, and that would probably mean using either dynamically generated classes (reflection emit or code generation) or .Net 4âs dynamic objects.
One thing an ORM allows you to do is customize the queries themselves since they are not text-only but represented by a tree-like object structure. You could provide extension points to allow the derived modules to access this structure and modify your SELECTs or JOINs or WHEREsâŚ With LINQ it would probably need to be done by building the query in stages and calling extension methods for each stage because a LINQ query can only be modified by making a different copy of itself. Other ORMS may prove to be more flexible â for example, modifying a NHibernate QueryOver/Criteria query is absolutely trivial since it allows direct modification of the expression structure.
One thing that using LINQ may offer is automatic integration with smarter UI controls: there are some third-party controls that understand LINQ queries and can accept them as a data source. They then modify them to perform server-side paging (with sorting and filtering) in order to retrieve just the data needed for displayâŚ In that case, all the customizations done on the data layer directly carry through to the interface.
For this and other reasons, my company chose to ignore the recommendation of not using data access classes in the business logic or interface layers. If you add a customer-specific field to an existing table, you need to piggy-back it to the appropriate data layer class and it will be transported through the bowels of business logic right to your special overridden method where you can use it. It also goes straight to the CRUD UI and you can get away with just adding a control for this field to the existing form and not changing anything else.
That part sounds deceptively easy – âcan you just add another field for me to the existing form, please?â Visual Studio 2002 supported derived controls which could have been used for this, but Microsoft was incapable of getting this feature to work so they simply abandoned it. For this, WPF is somewhat more suitable because it separates visual from logical control structure so that theoretically you can add a control to the form and then have a UI designer person go over it and make sure everything is in place. But, WPF is also good because itâs layout logic is dictated from the inside out â that is, everything is laid out to suit the content while in WinForms the content gets the space allowed it by its parents. So, if you use WinForms you have to get a smart layout control â and customize it. Iâm under the impression that the whole WinForms designer should be replaced by something less verbose, for example some kind of a fluent interface. There would have to be an intelligent layout engine underneath it so that it can get everything looking right. But, once you get that (yes I know, this also sounds simple but it should be possible to implement some basic logic) a whole world of opportunities opens, because then we can leave it to the engine to actually choose which control to display for which field, which validation logic to attach to it etc. A fluent UI code would also be much easier to merge in your versioning system than the dreaded *.Designer.cs files.
This is probably enough to illustrate the scope and type of the subject. Since itâs an ongoing research I allowed myself to mention some solutions we employed and some we hope to employ. Thereâs a lot of other subjects like validation, business rules, data models that support customization etc. I will be working on those and posting my findings here, and these future posts will be more focused because each will be more practical and cover a single topic.
November 16th, 2012
How do you get the text displayed in a WPF DataGridCell? It should be simple, but incredibly it doesnât seem it is: all the solutions given on the ânet contain at least a page of code (I suppose the grid designers didnât think anyone would want to get the value from a grid cell). But when you quick-view a DataGridCell in the debugger, it routinely shows the required value in the âvalueâ column. It does this by calling a GetPlainText() method, which, unfortunately, isnât public. We can hack it by using reflection â and, absurdly, this solution seems more elegant than any other Iâve seen.
DataGridCell cell = something;
var value = typeof(DataGridCell).GetMethod("GetPlainText",
System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance)
August 16th, 2012
Itâs a silly error but the solution is not very obvious or logicalâŚ I modified my CruiseControl.Net configuration to include multiple source control nodes, but it started complaining that the XML was malformed. The error was something like
[CCNet Server] ERROR CruiseControl.NET [(null)] - Exception:
Unable to instantiate CruiseControl projects from configuration document.
Configuration document is likely missing Xml nodes required for properly populating CruiseControl configuration.
Missing Xml node (sourceControls) for required member (ThoughtWorks.CruiseControl.Core.Sourcecontrol.MultiSourceControl.SourceControls).
Xml: <sourcecontrol><sourcecontrols><hg><executable>C:\Program Files\TortoiseHg\hg.exe</executable> [âŚ]
It complains of a missing sourceControls node for required member SourceControls â but itâs present in the xml. The problem is the capital âCâ: 1. XML is case-sensitive. 2. CruiseControl config tags are inconsistent in that sourcecontrol is not camel-cased but sourceControls is. Thatâs what confused me – hopefully this post will help someone else.
May 7th, 2012
HQL doesnât seem to support clauses like âSELECT TOP NâŚâ, which can cause headaches when for example you need to get the data for the newest record from a table. One way to resolve this would be to do something like âSELECT * FROM X WHERE ID in (SELECT ID FROM X WHERE Date IN (SELECT MAX(Date) FROM X))â, a doubly nested query which looks complicated even in this simple example and gets out of control when query conditions need to be more complex.
What is the alternative? Use EXISTS â as in âa newer record doesnât existâ. It still looks a bit ugly but at least itâs manageable. The above query would then look like this: âSELECT * FROM X AS X1 WHERE NOT EXISTS(SELECT * FROM X AS X2 WHERE X2.Date > X1.Date)â
Note that this works only for âSELECT TOP 1â. For a greater number there doesnât seem to be a solution at all.
March 14th, 2012
The unexpected answer (that I learned the hard way) is: well, it depends on whether you have parameters on your command or not. The point being, if you execute a parameterless SqlCommand, the sql gets executed directly, the same way as if you entered it into the query analyzer. If you add a parameter, the things change in that a call to sp_execsql stored procedure gets inserted in the executed sql. The difference here is the scope: if you create a temporary table from within the sp_execsql, it’s scope will be the stored procedure call and it will be dropped once the stored procedure finishes. In that case, you cannot use different commands to access it. If you execute a parameterless command, the temporary table will be connection-scoped and will be left alive for other commands to access. In that case, the other commands can have parameters because their sp_execsql call will be a child scope and will have access to parent scope’s temporary table.
As to why they did it this way, I can’t say I understand.
January 13th, 2012
Migrating from SVN to Mercurial is a simple process only if the SVN repository has a straight-and-square structure – that is, there are trunk, branches and tags folders in the root and nothing else, not in its present state or ever before. If you used your SVN repository in a way that was convenient in SVN but not in Mercurial â for example, you created branches in various subdirectories, it still shouldnât be too hard to migrate. But if you, like me, decided late in the game to create the mentioned folders in SVN and then moved an renamed your folders, you will need to invest serious time if you donât want to lose parts of your history. You need to plot your migration very thoroughly and do a lot of test runs.
The reason for this is that Mercurialâs ConvertExtension is somewhat of a low-level tool. (In other words, although reliable it is not too bright). Browsing the internet you may get the impression that itâs an automated conversion system: it isnât. It does fully automated migration only for straight SVN repositories, but for the rest itâs more like something to use in your migration script. It seems to do its primary purpose â converting revisions from one repository format to another â quite well but the rest of the tool is not so intelligent and it needs help. So, lesson number one: if you have a complex repository, donât take the migration lightly.
A small disclaimer is in order: this post is not intended to be a complete step-by-step guide to migration. Rather, itâs something to fill in the blanks left by what little is available on the internet. Iâve done a complex migration and I want to do a brain dump for my future reference or for âwhomeverother it may concernâ.
Between the two alternatives I perceived as most promising, hgsubversion and the convert extension, i chose the latter. Hgsubversion was claimed by some to be the better tool for this job, but it was somewhat troublesome. The problem with hgsubversion was that it had a memory leak and broke easily in the middle of the conversion (note that this happened a couple of months ago: things may have changed in the meantime). The solution, they say, was to do hg pull repeatedly until it finishes. I wanted to do a hg clone with a filemap, but when the import broke I was in trouble because hg pull doesnât accept filemaps. (It could be that the filemap was cached somewhere inside of the new repository and my worries were unfounded, I donât really know). I may try that in the future. One other way around it would be to do a straight clone of SVN â no branches or anything â into an intermediate mercurial repository and then split that into separate final repositories. In that case, hgsubversion could be a viable solution, maybe even better than the conversion extension. I had more success with the conversion extension so this is what weâll talk about here.
Part 1 â splitting import by revision
The repository in question here â that is, a folder within the SVN repository â started as part of another project and only a thousand revisions later was moved into its separate folder – thankfully, at that point I at least created the proper trunk and branches folders. So, I had one part of history where everything was trunk but it moved around the repository, and another part where nothing moved but I had a trunk and a couple of branches. Luckily, I had a buffer zone of a couple of hundred revisions where everything was trunk and nothing moved so I didnât have to pinpoint the exact revision on which I had to split the import.
Letâs say that the early version had a structure like this:
Crm/Other unimportant folders (that is, not to be imported)
At revision 1000 the first two folders were moved to
The first branches appeared at revision 1200 in Framework/branches.
So, it was to be done like this:
- Step one, import everything up to the revision 1100 into the default branch. Include the Crm/Fwk* folders and Framework/trunk/Fwk* folders. As I said, in this revision the Framework/branches folder was empty so we donât lose anything.
- Step two, import the rest but tell the conversion extension that the branches are in the Framework/branches folder so that it picks them up properly.
Sounds simple? Note that I had to perform a serious research of my repository: had a branch been created earlier than revision 1000 or had I made a branch at the same moment I created the trunk, things would have been more complicated. That is to say, I would have probably had to split the import into more steps and repeat some operations and do more testing to see on which exact revision I should stop the first step. Lesson number two: know thy repository.
The first step of this import is not so hard. I used the –rev 1100 argument to stop it at revision 1100: the convert extension purportedly remembers what it imported so far and when called again continues at that point (well, not exactlyâŚ read on).
"c:\Program Files\TortoiseHg\hg.exe" convert d:\data\Subversion Framework --rev 1100
-s svn --filemap=fwkmap_step1.txt
Note that I have access to a local SVN repository â for some reason, the convert extension didnât want to access the local repository using an svn:// url (possibly the firewall had something to do with it).
The only thing left is to make a good filemap. Something like this:
rename Framework/trunk .
What we want to do here is to include the old folders â this is the first pair of lines: the folders exist only in earlier revisions since they were removed (that is, moved) later. I was under the impression that it would have been sufficient to include just âFramework/trunkâ and that the convert extension would somehow detect where this path originated from and include the full history, but it didnât work out. On another repository I tried I was surprised to see that it actually did something like that, but it may have been a coincidence (a combination of other includes, possibly). In any case, it doesnât hurt to specify the filemap as precisely as possible since you may have to fiddle with various parameters and do repeat runs. Make the filemap tight so that nothing unexpected leaks through it and eliminate any uncertainty.
The last line of the file map – ârename Framework/trunk .â tells it to make the trunk folder root. This is to make sure the structure of the folders is the same as it will be in the second step, where we use different parameters and import a completely different structure into the same folders.
Always keep in mind that the filemap (probably as well as everything else) is case sensitive. I spent hours debugging my imports because I didnât notice the difference in case. Also, if you have a folder (or file) whose case has changed through history, it may be wise to add a rename statement in your filemap to make it consistent so that the conversion logic understands that itâs the same file/folder in different revisions (otherwise Iâm not sure it would?).
In step two, we tell it where the trunk and branches are, using the —-config convert.svn.trunk and convert.svn.branches parameters. Iâve come to the conclusion that this changes the game for everything: the convert extension regards trunk and each branch as a root folder so a filemap like the one from the first step wouldnât work. I havenât tried it with pathnames relative to trunk and/or branches, though, and it may be worth investigating. In this case I didnât need a filemap because after revision 1100 everything was done âby the bookâ in the Framework/trunk and Framework/branches folders.
So, when I ran both steps I got two distinct revision lineages: one that started at 0 and finished at 1100 and included the first run, and another that had revisions from 1000 onwards, but there was no connection between the two, each ended with its own head. And â oh, yeah, I got the branches the way I wanted them in the second part. But, how to connect the two parts?
The thing is, when running an import from a local repository (be it SVN or another HG repo), a file called SHAMAP is stored in the .hg folder of the destination repository (if you import from a remote repository, thereâs an equivalent file the name of which I forgot â I believe itâs stored somewhere in .hg/svnsomething). The SHAMAP file contains pairs of revision hashes/numbers so that it knows which source revision was converted into which destination revision. For SVN import, it contains a GUID for the repository and a revision number, in the format of âSVN_REPO_GUID@SVNREVâ. Iâm also under impression that revisions stored here wonât be imported again on subsequent repeated conversions â this is (as far as I know) wrong because filemap include/exclude may cause a partial import of a revision and other parts of it may need to be updated again in the following steps. In such cases it is you who needs to help by supplying your own revision mapping file, and thatâs probably what the convert extension authors also thought because it can be done by supplying the REVMAP parameter to hg convert. Remember what I said about it being a low-level tool? This is it. You need to write your own script to do the import properly, and hg convert is a tool used in the script. The bottom line â at the end, you should know what youâre doing. You can (and probably will) learn as you go, though, so donât be afraid to experiment. (And while weâre at it: if youâre doing a time-consuming import in multiple steps, test each step separately and when youâre satisfied with it, zip the resulting repository so that you donât have to repeat that step while testing the next one).
But I digressâŚ Back to SHAMAP: the fact that it remembers revisions already imported and doesnât allow repeated imports didnât bother me here because I donât have overlaps â that is, I donât need to import the same revision (but different files) in multiple passes. The Crm/* folders disappear long before revision 1000 and at that point I only need Framework/trunk, which is also true in the second step that comes in after revision 1100.
Ok, but I did get duplicate revisions. It imported Framework/trunk up to rev 1100 in the first step and then again imported Framework/trunk from its inception to the end. Looking at SHAMAP shows why: the revisions were registered in a different way here in the second step. Instead of the âSVN_REPO_GUID@SVNREVâ format, it stored something like âSVN_REPO_GUID/Framework/trunk@SVNREVâ. Why? Iâm not sure, it may have something to do with treating the trunk and branch folders as roots. Itâs probably an attempt to prevent the problem mentioned above, when a revision needs to be imported multiple times. But this is far from complete, because in that case the filemap also needs to have similar influence on the SHAMAP so that it reflects both filtering and renaming policies set by it. (Mission impossible, I knowâŚ Thatâs probably why the convert extension is badly documented â when you need to explain something like this you risk receiving questions like âso why didnât you make it better to prevent this problem?â).
One solution for this could be the splice map: itâs a file wherein you can define which revision needs to be connected to which during import. I tried this without success (I suspect that I didnât pick the right revisions â probably the two spliced revisions need to be identical) but found a hack that produced immediate results: I opened SHAMAP and did a quick find/replace of âSVN_REPO_GUID@â with âSVN_REPO_GUID/Framework/trunk@â. This converted the revision hashes into the format the second step used, so it understood them and connected them correctly.
Hereâs the command used for conversion.
"c:\Program Files\TortoiseHg\hg.exe" convert d:\data\Subversion Framework -s svn
--config convert.svn.trunk=Framework/trunk --config convert.svn.branches=Framework/branches
--config convert.svn.tags=Framework/tags --branchmap=fwkbranchmap.txt
The fwkbranchmap.txt file has one line (Iâm not sure why it is needed anyway, I supposed that the convert extension understands that âtrunkâ in SVN is âdefaultâ in Mercurial):
So this is one way to do it. I thought I would need to investigate the things further for the import of other repositories, but came up with a different strategy. So itâs left at this state, a bit unpolished but usable.
Part 2 â splitting import by trunk and branches
For the second repository, I had three projects of which two were partial branches of the third one which from now on is to be considered the trunk. So I thought I could import them one by one: the trunk has moved a bit through the repository, and the branches have stayed mostly in place. This is what it looks like:
trunk – up to revision 2000:
Crm/Fwk* which were imported in part 1 and need to be ignored now.
trunk – after revision 2000:
Crm/branches - which we will ignore to make things simpler, as they are obsolete anyway
branch for client1:
Client1/branches â ignored for simplicity
branch for client2:
Client2/branches â ignored for simplicity
What do we do now? Split the repository vertically â import the trunk folders into default branch in step 1. Then import the first client in step 2 using branchmap to move the default branch into a new âclient1â branch and repeat for client2 branch. We wonât use the convert.svn.branches parameter but import each branch explicitly. This we can do even with a straight Mercurial-to-Mercurial conversion, which I tried to do: I imported the full SVN repo into a Mercurial repo and then did the next conversion from it. I thought it would be faster: it wasnât. Also, the difference in speed between importing from an SVN repository folder and importing from a local SVN server is not significant.
In a case like this, you need to exert full control over import. Treat the convert extension like it doesnât know much and tell it all the details about the conversion. From what Iâve seen, its logic is somewhat counter-intuitive: it would make sense for it to reconstruct the revision history by following each file through its revisions, whether through branching or moving around in the repository. In that case it could reconstruct a fileâs history from its creation till today, and all you would need is to tell it where the file is today. But it doesnât do it like that: instead, you give it a bunch of include/exclude filters to tell it which files to retrieve, and these filters are applied at any point in history. If you made an accidental move of a folder at some point in the past, make sure you include that path also or your revision history will stop at that point.
We will possibly be importing each revision multiple times (in case multiple branches were committed to SVN in a single commit – which is improbable but possible), but each time with a different include/exclude filter in the filemap â for this, we have to make sure the filters donât overlap unless necessary. Here we come to a new problem: it seems that the conversion extension doesnât want to do repeated conversions of old history. It seems to remember what was the last revision imported and only imports newer ones. When convert.svn.branches or convert.svn.trunk is used, it views the revisions differently (relative to a different root) and doesnât mind importing them again. But we wonât use those here, at least in this case, and it wouldnât have made much difference anyway â I think it would just help with this particular problem and nothing else.
We solve this by using an empty REVMAP file. An empty REVMAP will replace the SHAMAP and make it look like nothing was imported yet. And, better still, we can also use it to get rid of the splicing problem: put in it the mapping for revisions where the source was branched so that the two branches connect at the proper point. Otherwise, the conversion will create branches that arenât connected. How do we do this? After the first step â in which we import the trunk with all of its history from day one (and here I assume that all branches originate from it), we should have all junction points in the repository. The next step would be to view the history for each branch and note at what revision in the trunk it was branched. We find that revision in our SHAMAP file (which was updated at step 1 – trunk import – but wonât be used afterwards) and add this line in our REVMAP file for that branch. It may happen that multiple revisions from the source repository were imported into this one in our new trunk, in that case I put all of them in the REVMAP, just in case.
One important thing to note here is that we have full power over the outcome. If we miss-connect the branches, we may get odd results â I donât expect the source to be screwed up, but the history may become a bit strange. In fact, the complicated combination of filemap filtering, splicing and the rest may produce some odd contents in the repository. I managed to get a folder that was deleted years ago reappear in the latest version of the repository. It seems that it was deleted on the trunk after a branch was split from it, but the branch didnât include it in the first place â it probably happened that the filtermap filters for the branch made it ignore the folder completely, and since it wasnât mentioned anywhere in the branch history (primarily not as being deleted), it appeared in the branch as it was at the point where the branch was created. Strange stuff, but instead of trying to tune everything to get it correctly imported (and risk getting more unneeded garbage in the process), I simply deleted the folder and committed this change in the destination repository.
Thereâs another lesson learned here: in order to make the import for this part easier, I tried moving folders around in the SVN repository to recreate the canonical trunk/branches/tags structure. If you want to do your conversion without convert.svn.branches â that is, convert each branch separately, donât do it. It will only make your life harder because you will have also to include that folder in your filemap, and probably to rename its contents to become root. I myself stripped this revision from the destination repository as if it never happened.
The command sequence looks something like this. First the trunk import:
"c:\Program Files\TortoiseHg\hg.exe" convert Full Crm -v --filemap=crmfullmap_step1.txt
(I added the â-vâ switch so that I can see whatâs going on, remove if the output is too verboseâŚ This switch is useful because it causes the printout of all files included, so you can check to see if thereâs anything suspicious â it wonât be of too much help determining whether somethingâs missing but youâll be able to see if thereâs anything you donât want).
At this point we need to look at the SHAMAP file generated and create REVMAP files for each branch, as described above. Since the file will be overwritten by the import logic, you may want to make a revmap template file and copy it to the real file used each time this step is run (I zipped the repository after the first step so that I can repeat the second part as many times as needed until I get it right). It looks something like this:
copy "crmfull revmap template step 2.txt" crmfullrevmap_step2.txt
"c:\Program Files\TortoiseHg\hg.exe" convert Full Crm --filemap crmfullmap_step2.txt
--branchmap crmfullbranchmap_step2.txt crmfullrevmap_step2.txt
Note that here I have a Mercurial copy of SVN in the folder named âFullâ, and I import from it. The same thing could probably be done directly from SVN, only the REVMAP file would look different.
The âcrmfullrevmap template step2.txt" file:
Here be7726e1e2c98b3694b0c28ca5f058769a382018 is the hash ID of the revision at which the branch and trunk join. I got this number by going to the original SVN history to see what revision it was joined at, then went to my destination repository (imported at step 1) to find the equivalent Mercurial revision â luckily, the SVN revision numbers are kept even in svn-to-mercurial-to-mercurial conversion. Just to be sure, I put in all the lines from SHAMAP file where this revision appears in the right column.
The branch file crmfullbranchmap_step2.txt is a one-liner to move everything into the appropriate branch, called âclient1â:
This process is then repeated for other branches.
The end result here is that we have working repositories in Mercurial, have been using them for a couple of months now (yes, this post is a bit old but hindsight is also worth something) and all seems right. I havenât noticed losing any of SVN revisions (although I may have) and the imported Mercurial repositories behave just like any others â they even exhibit Mercurialâs flaws (like problems with unicode comments) the same way in the imported and newly created revisions. So, this procedure may be far from perfect but it did the job. If someone creates a better and more automated one, Iâll be sure to try it since I have a couple of low-key projects still left in SVN and awaiting migration.
December 29th, 2011
(Note: this text is a result of a couple of months of researchâŚ Only when I finished the migration and got to the point where things are running smoothly I got around to finishing this post. Some of this information may be a bit outdated, but from what Iâve seen things havenât moved much in the meantime. Anyway, I added comments in the text where the situation might have changed).
Now that I think about it, there was one crucial reason to abandon SVN and upgrade to something more powerful: merging. My company now has a lot of parallel development (several customized instances of an application), and SVNâs support for this is not as good as it should be â that is, not as good as the competitionâs. In SVN you can separate a development branch and keep it parallel to the main one, and merge-reintegrate it back. SVN will remember what revision it merged so that it doesnât try to do it again (which would otherwise produce merge conflicts). But that seems to be the limit of its capabilities: a bit more complicated structure and merge produces so much conflicts that it looks the same as when everything is done manually. Contrast that to a more recent VCS like Git or Mercurial, where the essential feature is the ability to manipulate changesets, stack them one upon another or restructure their connections: if you can merge a set of changes from one branch to another, then continue developing both of them, do a back-merge (even though you shouldnât), and the system doesnât produce an enormous amount of conflicts, it gives you more power.
Of course, there are additional advantages that both Git and Mercurial give over SVN, but in our case that was just a bonus, not a necessity. I believe that many developers think likewise. SVN is good enough at what it does, the problem is when you need something that it wasnât designed to do.
So which one to choose? It seems that Git, Mercurial and Bazaar are the most popular. People say that of the three Git is the most powerful because it was written by Linux kernel developers who do complicated merges every day. Ok, so I seemed natural to choose that one.
The first impression that I got is that Linux kernel developers are guys who talk between themselves in hexadecimal and to the rest of the world in mnemonics. Git had some documentation but not nearly enough explanations on how things work. The error messages the tools produced were cryptic â and not only that, they were formatted to be viewed in the console so when they are displayed by GUI tools they tend to get messed up. Ok, so Linux kernel developers think GUIs are lameâŚ While weâre at it, error messages are also misleading: I spent a lot of time trying to diagnose whatâs wrong with my repository because the clone operation kept producing a warning about the remote head pointing to a non-existent revision or branch (and yeah, it was a fatal warning â a new term, at least for me – because the command didnât do anything but gave only a warning). It turned out at the end that I was using a wrong URL for the repository â that is, I was trying to clone a nonexistent repository. I agree, in a nonexistent repository the nonexistent head points to a nonexistent revision, but itâs a bit of a roundabout way to say it, isnât it?
But these are the things that can â and most probably will â be addressed as the tools mature. Since there are so much Git users, itâs surely reliable enough. I donât mind some rough edges, Iâve been there with SVN and it was worth it (it was much better to cope with occasional bugs in SVN than to start with CVS and then migrate to SVN).
Ok, so how do we get Git to work on a Windows server? The answer that seemed complete enough hinted that I should simulate a Linux environment, install the shell, something like a SSH daemon and everything else. In other words, itâs not supposed to run on Windows but it can. Ok, I tried â there are several variations to the theme and each one had a glossed-over part that needs to be researched (something like âat this point you need to import the created certificate into Puttyâ â well, Putty doesnât want to read this file format). And it didnât help that Git itself doesnât always tell you the real reason something doesnât work, as I already mentioned. Moreover, it was for some reason allergic to empty repositories â unless I commit something locally, it wonât work. And the repository got easily screwed up while experimenting â that is, it got into a state where I didnât know what to do with it and it was easier to create a new one.
At this point it was clear that the client tooling also leaves a lot to be desired â there was TortoiseGit that locked occasionaly on clone/pull (and it never displayed the progress of the operation so you never really knew if it was doing something), there was Git GUI that was a bit more stable, and there was Git Bash that was the most reliable. (One interesting sidenote is that at one point I managed to get three different error messages for doing the same â clone – operation with these three tools). One thing, though, the Bash in Git Bash is probably the best part of the package, I had almost forgotten the comfort of working in a Unix shell. Command prompt is light years behind it.
I did get the server to work, though, after a week of running around in circles and repeating the same stuff with slight variations. At the end I was able to create a new repository, clone it, make changes, commit, push, pullâŚ Everything worked until I restarted the server. Or so it seems â when I tried to clone the same repository afterwards, it started again producing weird error messages. I didnât know what else changed except for the restart (which shouldnât have affected it â and probably didnât, but what else? Itâs an isolated environment). If I didnât have the cloned repository with revision history I would have doubted I actually succeeded in doing itâŚ Ok, so itâs also fragile.
Then I tried to find an alternative: thereâs a couple of PHP scripts that wrap Git server functionality, but they donât seem to work on Windows (I tried them in Apache). Thereâs Bonobo Git server that is written in .Net â well, I never looked at it seriously, howâs a .Net Git server going to work when their own true-to-the-original-idea SSH configuration doesnât? But it does work â it also needs a bit of tinkering (you have to Google-translate a page in Chinese to get the info on how to really do it, WebDAV etc.) but the installation is amazingly painless: it takes a couple of hours, which is nothing compared to a week wasted for the previous experiment.
So, on to the next step: migration. I migrated a test folder from SVN with no trouble. Tried a bigger one, something under a 1000 revisions â well, it finished in a couple of days, I suppose itâs tolerable. Finally, tried to convert the central project â and when after two weeks of non-stop import it managed only 2000 of the repositoryâs 5000 revisions, I gave up. Back to Google: why is it so slow, how to speed it up? Turns out that the Git client is developed on Windows XP and that it should probably work well there. As it did: it managed to get all 5000 revisions in a couple of hours. Ok, now this I didnât like. How can a serious tool not work on newer Windows versions? They said, itâs slow because of the UAC (introduced on Windows Vista), the UAC slows everything down. Well, itâs not like Vista was released yesterday. If this problem exists for years, should I expect a solution ever to appear? More research hinted that Linux kernel programmers think Windows users are lame. So â Git was slow on Windows newer than XP. TortoiseGit seems to execute the same Git shell commands, so itâs the same. I found Git Extensions in the meantime, which is supposed to be independent â but it didnât even handle HTTP repositories.
In the meantime, I tried cloning the big repository I converted â big as in 200 megabytes big â and it was, as expected, slow. But, I donât really know which one was to blame here â seems like Bonobo server choked on the large repository since it nailed the CPU at 100% and produced around 40 bytes per second to the client (possibly it was just some kind of sync messages and no data at all). Ok â Bonobo is open source and was built around GitSharp (or something like that) Git library written in .Net. What if I tried myself to update the library and compile the server? Well â GitSharp is discontinued at version 0.3. Theyâve all gone to some new library written in C.
Ok, that was enough. After three weeks completely wasted, I gave up on Git.
(Update: Bonobo was since upgraded to version 1.1. I looked at the change log hoping to see a note about moving away from GitSharp, but it didnât seem to happen. So as far as I know, this performance issue may still be present â nevertheless Bonobo seems the most promising solution for a Windows Git server).
So? Should I look at Mercurial? The Mercurial supporters seem a bit shy â that is, compared to Git fanatics who shout GIT! GIT! GIT! GIT! at each possible occasion, there occasionally appears one that says âor, you could try Mercurialâ.
Well, the first look revealed the difference: I can download everything I need from a single web page. Mercurial basic installation â right there, server support included (as a script that is installed in a web server). TortoiseHg, right below it â even it has an ad-hoc server built in! Python extensions â do I need this? So I thought â no, this is too easy. Letâs try something unheard of in Git â RhodeCode. Itâs a server implementation that is in fact a full-featured web application. Seems very nice, but due to some incompatibility in a crypto library, very hard to get installed on Windows: it took a lot of workarounds, I ended up installing Visual C++ Express 2008 (it has to be 2008, 2010 doesnât cut it) and another kind of simulated shell environment (MingW32) to try to get the installer to compile it from source but it was impossible. That is: impossible on Windows 2008 R2, 64-bit. The RhodeCode developers say theyâre working on making it more compatible with Windows (and for one thing changing the crypto library), and I found that I believe them, so Iâll be coming back to it. (In the meantime, theyâve released a couple of versions with windows-specific bugfixes, it might be worth it to check it out again).
In the vanilla Mercurial instalation thereâs a script called hgweb.cgi that contains the full functionality needed to get a server running. A bit of tinkering is needed to make it run inside IIS â and there are a couple of slightly outdated tutorials on how to do this. I found out that the best combination is to download Mercurial for Python – so, no Mercurial or TortoiseHG on the server. This download says in its name for which version of Python it was written, and that version of Python is the second thing needed. Once both are installed, it is sufficient to put the hgweb.cgi in a virtual server in the IIS, add a hgweb.config and a web.config file, configure the IIS (basic authentication and what have ya), and set permissions on the folders â including the repositories. It took less than one day, research included, to get it up and running.
The client tooling seems better than SVN. TortoiseSVN (in fact, TortoiseCVS) was a breakthrough idea â a VCS integrated into windows, allowing you to version-control any folder on your disk. Well, TortoiseHG went one step further and actually improved the user experience. It has its bugs â more than SVN â but also has a lot more features. The whole thing was written in Python and seems to have a good API because a lot of plugins have been written for it, and TortoiseHG includes the most important ones. At this point I had to install TortoiseHG on the server because thatâs the only way to get the subversion converter plug-in. The other way would be to install the Python module, but it cannot be done: first of all, theyâll tell you to install Subversion for Python (which is quite simple, thereâs a pre-packaged downloadable setup), but when you do and get an error from the convert extension, youâll find out that you donât need that package but something called SWiG. But SWiG doesnât have anything to do with Subversion â you have to download the Subversion package from Subversion site which moved to Apache leaving the Python part behind and the best you can do is find a source somewhere and compile it, but nobody says how itâs done.
On to converting the repositories â for one thing, itâs speed is normal, on any Windows. As fast as Git on XP, maybe even faster. So I was encouraged to do the thing I never even got around to thinking about with Git â and that is splitting the repositories into individual projects. It did take a week â the details of it will be a subject of a future post â but in the end it produced much less pain then Git.
Looking at it now, I think that Mercurial is unfairly underrated, and this seems to be due to the loud chanting of Git proponents. They say Git is blazingly fast â well, if you calculate the average performance on all operating systems, I think that on the average itâs either very slow or Windows users donât use it at all. On Windows XP, Mercurial is at least as fast as Git, and on newer Windows versions Git is usable only for small repositories. Git is largely badly documented (which is getting better but Mercurial is way ahead â suffice it to say that I understood some Git concepts only when I read the Mercurial documentation). Git tooling, generally speaking, sucks â it was designed to be used in a Unix shell and nowhere else â and on Windows it is total crap. On the other hand, Mercurial tooling on Windows is, after this experience with Git, impressive. My conclusion is that for Git I would have to require special skills when employing programmers – âC# and Bash knowledge requiredâ, how sane is that? Ok, Iâm joking but itâs not far from truth: there has to be at least one Git specialist on the team when ordinary developers get stuck. With Mercurial, the usual SVN-like skill level should be enough for all because itâs not that easy to get stuck. And, after all this, Iâm inclined to think that the story about Git being so great should be taken with a grain of salt since everything Iâve seen so far from it seems to tell exactly the opposite.
August 23rd, 2011
I imported a couple of repositories from SVN into Mercurial and discovered that characters not present in the standard ASCII table have become mangled in the commentsâŚ Or at least they looked mangled in the console output as well as in TortoiseHG â now, the console is not that important, but how to fix this in Tortoise?
I tried searching for a solution on how to modify the import process and found nothing. Tried to add a new comment to the repository with a non-ASCII character and got a Python error (âexpected string, QString foundâ). Some said that I should change my Windowsâ default system encoding (which is English(US)), and that solved the problem but I would have liked a simpler solution, since changing the default encoding used to cause other problems in the past. I managed to find a couple of workarounds that solve the problem of console display and involve setting environment variablesâŚ Would it work for Tortoise? Actually: it does. The solution is simple: go to (this is on Windows 7) Control Panel â System â Advanced System Settings â Environment Variables, add a new variable called HGENCODING and set itâs value to either âutf-8â or your code page (mine is âcp1250â). TortoiseHg respects this. Thereâs a slight difference in the two values, though, because the diff viewer doesnât really like âutf-8â, it prefers the concrete code page. There may be other components that behave like this, so I suppose that setting the code page is the optimal solution.
July 26th, 2011
When trying to switch my local Subversion copy of the NHibernate source to a different tag (from 3.1GA to trunk, in this case), I got this error:
Repository moved temporarily to ‘/viewvc/nhibernate/trunk/’; please relocate
The frustrating thing was that I was trying to relocate to exactly this url. And if I tried others, it said that I should relocate to themâŚ I searched the net in vain for the solution, the only information I got is that I should re-configure my apache server (thanks a bunch!)
The problem is, in fact, simple: the URL is wrong. I thought I could just copy the repositoryâs URL from my web browser, like I do with other sites. Not here: thereâs a separate entry for direct SVN access. So instead of using this url:
use this one:
It does seem like a simple problem but the solution wasnât so easy to find.
May 17th, 2011
A C# project that worked with Visual Studio 2008, when converted to Visual Studio 2010, starts complaining about not being able to find classes defined in Microsoft.SQLServer.ManagedDTS.dll and others. These dlls are contained in the SQL Server 2005. If you try to remove the reference and add it again, the errors disappear in the editor, but appear again when you compile the solution. At the end of the jumble of compiler errors there is a small one that betrays the cause:
warning MSB3258: The primary reference "Microsoft.SQLServer.ManagedDTS, Version=188.8.131.52, Culture=neutral, PublicKeyToken=89845dcd8080cc91, processorArchitecture=MSIL" could not be resolved because it has an indirect dependency on the .NET Framework assembly "mscorlib, Version=2.0.3600.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" which has a higher version "2.0.3600.0" than the version "184.108.40.206" in the current target framework.
The problem lies in the Microsoft.SQLServer.msxml6_interop.dll that references the beta version of the .Net framework 2.0. Yes, even after installing three service packs â and worse still, even if you install SQL Server 2008 it will remain there. Why? Apparently, thereâs a newer msxml6_interop dll with this reference fixed but unfortunately it has the same version as the old one so it doesnât replace it in the GAC. Talk about eliminating DLL hell.
But thatâs not all, you cannot simply find the new dll and replace it in the GAC. The old one cannot be removed because itâs referenced by the Windows Installer. You have to use brute force, something like this: open the command prompt and try to find the real path to the assembly on the disk. (From Windows Explorer you cannot do this because it replaces the real GAC folder structure with a conceptual, flat view). So, CD to c:\Windows\Assembly and find the folder called Microsoft.SqlServer.msxml6_interop. In it, there will be another folder called something like 220.127.116.11__89845dcd8080cc91, and in it the dll weâve been looking for. On my computer, the full path is
Ok, now you should be able to manipulate the dll directly and replace it with the new one. What I like to do in these cases is SUBST the folder and make it accessible from Windows Explorer. Type something like this -
SUBST x: c:\windows\assembly\GAC_MSIL\Microsoft.SqlServer.msxml6_interop\18.104.22.168__89845dcd8080cc91
- and you will be able to see the folder in Windows Explorer as a separate volume X:. From here you can delete the existing file and copy over the newer one. You can find the new one only if you have a machine where SQL Server 2008 is installed first â itâs in the same (or similar) place in the GAC. I used again the command prompt trick to get the file. (Note that I did everything as administrator, you might have to employ additional tricks to work around security).
Hereâs a more detailed description with other possible solutions: