6.11.10

Software archaeology

A recent addition to our portfolio of problems is a widely successful application that has been developed by business users to process commission payments.

The application has largely been built by the client over a period of time and continues to be useful to this day. A change in the operational landscape means that  this software won’t scale beyond its current functionality and processing capacity. An upgrade to a full-scale enterprise system is a must rather than a luxury.

As is the case with such organic systems, the entire concepts of the domain are embedded in functions with lines in excess of 1000. You start off following the code and break pointing every significant statement, and before you know it you have lost your bearings. The questions then becomes - “what was the intention of the civilization that built this technology”; with curiously named variables and loops nested to a depth of 5+ deciphering the intention is non-trivial. One has to methodically piece together all the inputs, configuration data and perhaps chant some voodoo spells to get the system to work.

UML Interaction diagrams seem to be a powerful tool in understanding the runtime intention of legacy software, but the subtlety contained in conditional statements can be quite problematic to document. It is tempting to ignore the body of knowledge embodied in the legacy application, but doing so makes the reverse and forward engineering unnecessarily complex. So like the archaeologist sifting through the fossilised remains of a once flourishing civilization, I crack open the IDE and good old NotePad in an attempt to gain understanding.

18.10.10

A good coffee from back home...Zambia Terranova

I have been contemplating going back to Soho to get my coffee supplies replenished, but this rare find in Sainsburys saved me the trek to central London. The taste is not as sour as I would like and it has not got the heavy taste I have become accustomed to. Anyway I would give it a 3/5.

17.10.10

A delightfully parallel problem

I am currently recruiting and part of this process involves the analysis of lots of CVs. This can be quite time consuming, so I have decided that I will expedite the process by developing a small text mining application that analyzes the resumes and produces a signature for each document. The most commons words in the CV will then be readout by the computer and if I like the sound of the digest I will shortlist the candidate.

The essential parts of text mining are being able to tokenize the document and filtering out noise words. Thanks to PLINQ, I can do the following:

PLINQ clusters words in doc
  1. private static IEnumerable<IGrouping<string, string>> CalculateWordFrequency(string[] content)
  2. {
  3.     var groupedWords =
  4.         content.Select(word => word.ToLowerInvariant()).GroupBy(word => word.ToLowerInvariant()).Where(
  5.             word => word.Count<string>() > 2);
  6.     return groupedWords.AsParallel();
  7. }

The document tokenizer approach is fairly naive and could do with more work.

After we have grouped the words in each resume, we then use the Microsoft Speech API found in the namespace, System.Speech.Synthesis to recite the most common terms

Reciting the most common terms
  1. public static void ReciteResume()
  2. {
  3.     using (var speechSynthesizer = new SpeechSynthesizer())
  4.         foreach (var item in GetGroupedTermsInResume(FilterContent(TokenizeContent(ReadDocument()))))
  5.         {
  6.             speechSynthesizer.SetOutputToDefaultAudioDevice();
  7.             speechSynthesizer.Speak(item.Key);
  8.         }
  9. }

I have already hired a dozen plus developers using the traditional filtering approach, so it would be interesting to see how the results of the automated CV selection process compare.

An alternative method of representing the CV digest is to generate a logarithmic plot,  “signature file” as shown below. I wonder what signature represents an ideal candidate. I would probably need to analyze large amounts of data to arrive at an empirically valid conclusion.

image

Incidentally this candidate was not hired as their CV contained a lot of buzzwords and they could not explain how they had used the technologies.

And this is what it sounds like [audio file].

9.10.10

You gotta love Design By Contract

Code Snippet
  1. [TestMethod]
  2. [ExpectedException(typeof(TradingServiceException))]
  3. public void ShouldThrowExceptionForInvalidTrade()
  4. {
  5.     var mockTrade = new Mock<AbstractTrade>(MockBehavior.Strict);
  6.     mockTrade.SetupAllProperties();
  7.  
  8.     mockTrade.Setup(trade => trade.IsValid()).Returns(false);
  9.  
  10.     var tradeManager = new TradeManager(mockTrade.Object);
  11.     tradeManager.Execute();
  12.  
  13.     mockTrade.Verify();   
  14. }

And when the test fails, we get the following message:

TradeGeneratorTests.ShouldThrowExceptionForInvalidTrade : Failed

Test method Zainco.Commodities.Unit.Tests.TradeGeneratorTests.ShouldThrowExceptionForInvalidTrade threw exception: 
Zainco.Commodities.Exceptions.TradingServiceException: Precondition failed: Trade.IsValid() == true  Trade execution failed

at System.Diagnostics.Contracts.__ContractsRuntime.Requires<TException>(Boolean condition, String message, String conditionText) in :line 0
at Zainco.Commodities.TradingService.TradeManager.Execute() in TradeManager.cs: line 28
at Zainco.Commodities.Unit.Tests.TradeGeneratorTests.ShouldThrowExceptionForInvalidTrade() in TradeGeneratorTests.cs: line 36

Sweet!

CodeContracts break encapsulation and Resharper 5.0 is blissfully unaware of them

Seems I have to make my helper methods public if I want to use them within a code contract, but I avoid this by defining a property with the private setter to end up with the Contract.Requires<TException>(…) implementation below:

Code Snippet
  1. public AbstractTrade Trade
  2. {
  3.     private set { _trade = value; }
  4.     get { return _trade; }
  5. }
  6.  
  7. public void Execute()
  8. {
  9.     Contract.Requires<TradingServiceException>(Trade.IsValid(),
  10.                                                "Trade execution failed");
  11.     if (TradeExecutedEvent != null)
  12.     {
  13.         TradeExecutedEvent(this, new TradeEventArgs(Trade));
  14.     }

C:\Sandbox\Pricing\CommodityServer\TradingService\TradeManager.cs(24,13): error CC1038: Member 'Zainco.Commodities.TradingService.TradeManager.get_Trade' has less visibility than the enclosing method 'Zainco.Commodities.TradingService.TradeManager.Execute'.
C:\Sandbox\Pricing\CommodityServer\TradingService\TradeManager.cs(24,13): warning CC1036: Detected call to method 'Zainco.Commodities.Interfaces.AbstractTrade.IsValid' without [Pure] in contracts of method 'Zainco.Commodities.TradingService.TradeManager.Execute'.
  elapsed time: 294.0169ms

8.10.10

Are all circular dependencies created equal?

I am using the separated interface pattern to implement a commodity trading engine for a bourse in the emerging markets. My unit test package is mocking one of the interfaces and consequently has a dependency on the interfaces package. Likewise the Service package depends on the interfaces package.

Superficially I appear to have a circular dependency and eager Resharper 5.0 complains about this with a fairly descriptive error message “Failed to reference module. Probably reference will produce circular dependencies between projects.”. The result is that intellisense breaks!

image

image

What to do?

Are all circular dependencies created equal? Does it matter that the offending dependency in this case is an abstraction rather than a concrete type?

On further examination, the only real dependency is between the test package and the trading service, the other dependencies essentially enforce the contracts between the interface package and those that must either implement the behaviours defined by this package or use the behaviours provided by these contracts.

Tools hmmm….

29.7.10

Refreshing Exceptions…

Ever faced the cryptic exceptions that leave you red-eyed from looking at the debug window and inspecting automatic variables? Well if you have spent most of your adult life making money from software development, the probability of this happening is fairly high. So when you encounter an exception message like the one below, you kind of know that the software/API has been developed by developers that care:

 

WorkOrderTests.ShouldSaveWorkOrderDocument : Failed

Test method WorMaSysUnitTests.WorkOrderTests.ShouldSaveWorkOrderDocument threw exception: 
System.InvalidOperationException: The maximum number of requests (30) allowed for this session has been reached.
Raven limits the number of remote calls that a session is allowed to make as an early warning system. Sessions are expected to be short lived, and 
Raven provides facilities like Load(string[] keys) to load multiple documents at once and batch saves.
You can increase the limit by setting DocumentConvention.MaxNumberOfRequestsPerSession or DocumentSession.MaxNumberOfRequestsPerSession, but it is
advisable that you'll look into reducing the number of remote calls first, since that will speed up your application signficantly and result in a 
more responsive application.

at Raven.Client.Document.InMemoryDocumentSessionOperations.IncrementRequestCount()
at Raven.Client.Document.DocumentSession.SaveChanges()
at WorMaSysUnitTests.WorkOrderTests.ShouldSaveWorkOrderDocument() in WorkOrderTests.cs: line 58

Playing with the Raven

image

Zainco is currently evaluating NoSQL approaches for a groundbreaking application we are developing for a security client. At present Raven is looking very promising.

18.5.10

30 sprints later…

I have spent the best part of the last 1 year on a large project with many cross functional teams building a .NET application that integrates with a specialised vendor ERP system. As with any experience, you learn a great deal about yourself and others.

My greatest lesson is that there is a lot of crap software out there making lots of money and that good software (read this as open source) rarely, if ever, makes money.

With the crap software come the zealots and snake oil salesmen selling their bogus cures and panaceas .

If ever I had to express this in mathematical terms, I would say that:

The rate of return on investment on a software asset is inversely proportional to the quality of the code deployed.

This appears to tie with the empirical data, but as with any such observations, outliers or exceptions will exist.

I suppose our challenge as software craftsmen is to ensure that quality and ROI is balanced.