Diagnosing performance issues in production environments – Part II of II

It has been a hectic few days and at last I found some time to discuss the potential solutions. In the previous post we had discussed the importance of metrics and how it can help us diagnose performance issues in production environment. We were left with the question as to whether we should implement the metrics logic on our own. The simple answer is, NO. The developer community has already come up with effective libraries that provide us with a variety of features. In this post, let’s look at the potential solutions we have for C#, Java and JavaScript languages.

The project in which I had to integrate performance metrics was Java and therefore I’ll start by discussing the Java solutions first. Once we chose to go ahead with metrics, we found out that there are numerous implementations available for metrics in Java. In order to choose a library we had to make a rational decision, for which we came up with the idea of preparing a DAR report. DAR (Decision Analysis and Resolution)  is a formal process where decisions are made using a formal evaluation process after careful consideration of the identified alternatives based on established criteria. Out of the available Java libraries, we short listed a few which were obviously having the edge over the others based on popularity, features and support.

  • Java Metrics
  • Perf4J
  • JAMon
  • Java Simon

The following criteria (In the order of the weights assigned) were drawn in order to choose the best solution from the above list of available alternatives.

  • Ability to create custom counters and timers for monitoring and measuring performance of code blocks.
  • Ability to visually evaluate the performance results which would help in finding trends and patterns.
  • Free and open source software license considering the cost and ability to do modifications as and when required.
  • Ability to enable or disable the features on demand.
  • Active support provided through mailing list or by any other means.

Java Metrics (http://metrics.dropwizard.io/3.2.0/) library easily outplayed the rest with a total of 47 points based on the criteria weights assigned. It satisfies the performance counter creation requirement with features to create meters, gauges, counters, timers and health checks. Visualization requirement is satisfied through reporting capabilities with Graphite and Ganglia. Licensing requirement is fulfilled with Apache License 2.0.Active support is available through the means of an active mailing list and dedicated Stack overflow tags. Enabling or disabling the features is not provided as a result of which we chose to wrap this library and provide it on our own! Perf4J was the second best option with 34 points.

We also had a C# component which resulted in the DAR report for the list of available C# libraries. Out of all the available libraries, the following three were shortlisted for the DAR analysis.

  • Metrics.NET (etishor)
  • Statsd
  • Serilog Metrics

The same criteria were considered and Metrics.NET was chosen ahead of the rest with 50.5 points. It satisfies the performance counter creation requirement by allowing to create gauges, counters, meters, timers and histograms.Visualization requirement is satisfied through graphite reporting along with other sources of reporting such as HTTP endpoints, influx DB and elastic search. Licensing requirement is fulfilled with Apache Software License 2.0. Unlike the Java solution, Metrics.NET provides with the disabling and enabling feature through a configuration file.This means that the metrics reporting could be completely disabled when not needed. Unlike the Java solution, support is available only through the GitHub page. The second best option we had is statsd with 41.5 points. Metrics.NET could be marked down as an equivalent implementation of the Java Metrics library.

Now let us have a look at the possible JavaScript solutions that are out there.

  • metrics
  • measured
  • node-monitor
  • appmetrics

The same set of criteria were considered and “metrics” library was easily the better option compared to the rest, with 47.5 points. It satisfies the performance counter creation requirement by providing features such as meters, gauges, counters and timers.The visualization requirement has been satisfied through the reporting capability provided through Graphite. CSV and console reporting are among the other options available for reporting of the metrics. Licensing requirements are satisfied with MIT License though configuration to enable or disable the features is not provided. Support is available only through the GitHub page. This could also be identified as an equivalent implementation of the Java Metrics library.

Anyone reading through the previous post would’ve got the doubt, “What do we do when we don’t need the metrics logic to be executed?! “. Reading through the solutions in this post, you would’ve already got the answer. Most of the libraries provide enable/disable feature through configuration and even if it is not present, we could still enable/disable it by wrapping the library to implement it on our own. Another popular question would’ve been “What about the overhead?!“. We carried out a simple test based on our application’s normal flow, with and without the metrics library integration and no visible performance issues were noticed. Besides, the library websites and GitHub pages do provide the guarantee for a very minimal overhead that could simply be ignored. Okay but still, “Won’t having this code integrated into the business logic cause any confusion and reduce readability?!“. Well that is inevitable. But you could isolate the metrics code through the usage of design patterns. Observer pattern could be an option where the business logic could act as the Subject and Metrics logic could act as the Observer.

So the next time you are starting over with a new project, just consider the world of possibilities that capturing performance metrics could present you with. No more dependence on any developer for finding bottlenecks, no need to replicate the production environment data to debug and find the issue and above all, no need to squeeze your brain to think of the critical areas where it could’ve gone wrong!

Diagnosing performance issues in production environments – Part I of II

Several reasons could be put forward for the presence of performance issues in large scale, highly distributed systems. The following factors could be identified as some of the most popular concerns.

  • Lack of understanding towards the language features.
  • Lapse of concentration during development.
  • Ineffective requirements modelling techniques resulting in vague and imprecise requirements.
  • Absence of unit tests and failure to carry out boundary value analysis during the testing phase.
  • Unrealistic schedules resulting in burnouts.

The traditional way of investigating a performance issue would predictably start with log analysis where we would have to read through a bundle of log files to find the cause of the issue. Of course this could come in easy with log analysis tools such as Splunk. But then again, that depends on the quality of logs that have been used and requires the domain knowledge of the whole application to begin with. Another option that would be considered is debugging the application to locate the flaw. This would not be possible unless we could transfer the production data or simulate them at the development environment to recreate the behavior. This would as time and resource consuming as it could get. When there several developers present in a team, it would normally be difficult for someone to pin point to the critical parts of the application, subjective to the size of the team and the number of components in concern. Performance issues could turn into a nightmare if developers who have contributed to certain vital components are not there anymore.

Considering the aforementioned problems, what is the most effective way to approach this issue? Performance metrics would be that one magical element that we had missed.

Incorporating performance metrics into the code would provide you with an idea about the performance of your code and its behavior. An abnormal behavior in the code could be spotted with ease provided that the right set of metrics have been used at the right places of your code. It is important to understand that monitoring and diagnosing are different in motives. Unlike in monitoring, we need all the metrics we could possibly have when it comes to diagnosing. The following could be some of the metrics that could be integrated into your code to better understand its behavior.

  • Timers :
    Provides you with the details of the execution time of a particular code block.
  • Counters :
    Simple counters that could be incremented and decremented to keep track of the sizes of data structures being used.
  • Gauges :
    Simplest metric of them all. Return the size of something like a       cache at frequent intervals to keep an eye on its growth in size.
  • Health Checks :
    Centralize your application health on an external dependency such   as a database connection or communication channel connection.
  • Meters :
    Get the rate of events at given intervals. This could be the amount of messages being published at a given interval from a channel.

Now that we are familiar with the basic set of metrics, how do we use them at moments of crisis? How would we identify the causes of bottlenecks with the use of these? We would be required to have a proper reporting mechanism that is comprehensive enough to help us spot abnormal behaviors in the metrics that we have included. The best option would be to feed the metrics data into a graphing platform such as Graphite or Ganglia so that we could just visit the graph and point to the misbehavior and its time of occurrence. We could also feed the data into the log file we are using to keep them all at one place. So the next time something kicks the bucket, it will only be a matter of checking the metrics graph and finding a pattern. Then reading the log entries at the abnormal behavior times and values retrieved from the metrics to see where the code has fallen short. Not only can we use metrics to diagnose performance issues, we could also use them to benchmark our application performance. This would provide an expected standard level for the testers as well as the clients.

Its amazing to think of the world of possibilities that metrics provides us with. But how do we implement all this? If the plan is to implement all this on our own, that would result in a separate side project with the need for additional resources and effort. Apart from that, there would be concerns about running the metrics code in the production environment. Should we need the capability to use the metrics code as and when required? Even if that’s the plan, wouldn’t integrating the metrics code into your business logic reduce readability and cause confusions? But then again, most of the problems in the world of programming have already been resolved. Its only a matter of finding the right solution and contriving it to our own needs! We will discuss about the list of available solutions for this in the next post!

Creating a MQTT client using Javascript for NodeJS and browser

Recently I came across this requirement where I had to implement a MQTT client using Javascript for NodeJS and browser. It is easier than you think it is using the npm mqtt package . Latest release till date is 1.11.2 but if you run into any errors while using browserify on the created client, then I advice switching back to 1.8 version because of a dependency issue with the dependency mqtt-packet . Install the mqtt dependency using


npm install mqtt --save

You could then require the module as shown below.


var mqtt    = require('mqtt');

Create a simple client passing in the MqttOptions object. Information on the properties has been given on the git hub page but it is very basic. If you’re someone new to MQTT, you’re better off reading the explanation provided in HiveMQ essentials series.

Create a simple MqttOptions as shown below passing in the basic parameters.


var mqttOptions = {
clientId: 'f1b948b7-2114-4c8e-962f-d15f4cf90abe',
protocolId: 'MQTT',
protocolVersion: 4,
keepalive: 10000,
clean: false,
reconnectPeriod: '1000',
will: willMessage
};

The above would require a will message object which would be used to notify connected clients about another disgracefully disconnected client. Creating it would be done as follows.


var willMessage = {
topic: 'WillMessage',
payload: 'This is the last will message',
qos: 2,
retain: true
};

Use the above created MqttOptions object to establish the connection to the Mosquitto broker.


var client = mqtt.connect("mqtt//:localhost:1883", mqttOptions);

URL could be one of  ‘mqtt’, ‘mqtts’, ‘tcp’, ‘tls’, ‘ws’, or ‘wss’. I’ll cover establishing secure connection methods in the up coming tutorials.

Once the connection has been established you can publish and subscribe to messages as shown below.


client.subscribe('someTopic');

client.publish('someTopic','someMessage');

You could also hook onto the following callbacks and implement logic accordingly.

  • connect – function(connack) {}
  • reconnect – function() {}
  • close – function() {}
  • offline – function() {}
  • error- function(error) {}
  • message – (topic, message, packet) {}
  • packetsend – function(packet) {}
  • packetreceive – function(packet) {}

Below is an example implementation of the connect call back mentioned above.

 


client.on('connect', function () {

console.log('client connected');

});

client.on('message', function (topic, message) {

console.log(message.toString());

});

Auto reconnect when failing to connect to the server has already been implemented through the library and you could see this by hooking onto the ‘reconnect’ method. I have come across an abnormal behavior where the previously subscribed topics have been lost after successfully connecting on a reconnect, though I have set clean property in MqttOptions object to false. Check whether you encounter this issue and if so, keep track of the subscribed topics in a list and subscribe to them again in the reconnect callback.

You could terminate the client using client.disconnect method.

I hope that would have given a clear idea as to how to create a MQTT client using Javascript for NodeJS. You could use browserify to create a version of the file that could be used in browsers. As I mentioned at the start of this post, if you encounter an errors while browserifying the file, always switch back to the 1.8 version in which it works fine.

So I’ll wind up this post for here now. I’ll add another post on how to establish ssl/tls and wss connections through the clients! 🙂

Getting started with MQTT and Mosquitto

What it is and why you would need it

If you’re looking for a light weight messaging protocol then MQTT would an answer you could consider. It follows the publish-subscribe mechanism but of course you could tweak it to suit one-one messaging as well. MQTT has been quite a trending topic these days with the evolution of the Internet of Things. Objectives of this protocol is to have high reliability through assurance of delivery while playing with minimum network bandwidth. Clear evidence of this being achievable through the protocol has been visible through the usage of it in IOT where sensors, mobile devices and embedded computers use it for messaging purposes.

Recently I came across the need for a messaging bus where we first came up with the implementation of our own messaging bus. This had taken about 5-7 milliseconds for end-to-end delivery of messages. Later on we came across MQTT and it took only one millisecond for end-to-end delivery of messages. It could also support websocket connections which helped us remove our own implementation of a websocket client from the project as well. FAQ page of MQTT would provide you with answers for most of the questions that would’ve risen in you by now!

Mosquitto – The messaging broker

The messaging broker we had used is Mosquitto which is an open source project from Eclipse that implements the MQTT protocols 3.1 and 3.1.1.

You could download the latest version of Mosquitto from here . Simply run the mosquitto executable from the downloaded folder and you’re good to go. You could enable verbose logging with the -v parameter and configurations could be loaded from a specific file through the usage of -c parameter. Read more about the configurations from here . Running the mosquitto executable would create a non-secure connection through the default port 1883 and secure encrypted connections could network connections and authentications could be established through SSL. Download the latest version of open SSL and copy the libeay32.dll, ssleay32.dll and libssl32.dll files to your mosquitto installation folder. Apart from that, download the pthreadvc2.dll and place that in the mosquitto installation folder as well.

In order to configure the server for certificate authentication, follow these steps and generate a certificate authority certificate and key, server key and a server certificate by creating a CSR and signing it with your CA key. Place the below entries into the configuration file and restart the mosquitto broker.


listener 8883
cafile certs/ca.crt
certfile certs/server.crt
keyfile certs/server.key
require_certificate true

Default port 8883 has been used in this scenario and setting the require_certificate to true would require the client to provide a valid certificate in order to establish the connection. This could be set to false if clients are not expected to be authenticated through their certificates.

Websocket support also needs to be explicitly enabled. This requires libwebsockets and a step by step instruction set on how to achieve this could be found here . You could also enable SSL authentication through websockets and a sample configuration would look like shown below.

listener 9002 127.0.0.1
protocol websockets
cafile certs/ca.crt
certfile certs/server.crt
keyfile certs/server.key
require_certificate true

That’s it and you have setup your mosquitto with additional websocket and SSL support!

MQTT could be the answer for any of your requirements of a lightweight messaging protocol even if it doesn’t involve IOT just like in my case! Hope this would’ve given an idea of what MQTT is and how to setup Eclipse’s Mosquitto broker. Soon I’ll follow this up with tutorials on create clients using Java and NodeJS. Happy messaging folks! 🙂

Getting the progress percentage from a burn bootstrapper installer

One of the basic requirements from an installer created using burn bootstrapper would be to display the progress percentage. I had a tough time finding a proper solution to this as most of the solutions on the internet didn’t work properly while displaying the uninstall percentage. So I have put together variety of things I had searched into one single solution.  So let’s get started!

In order to display the progress bar, we need to handle two events. First of which is the CacheAcquireProgress. This will give you the percentage related to caching the package. Next is the ExecuteProgress percentage, which will give you the percentage for the executed packages. Now most of the sites had specified to add both of the values and divide it by two. This cannot be done as some actions will not be having a cache phase. So in order to find the denominator,  we need to use the OnApplyBegin in v4 of WiX and OnApplyPhaseCount in versions below 4. Since there hasn’t been a stable v4 release yet, I will give you sample of how its done in versions under v4 with the OnApplyPhaseCount method.

Create a view for the percentage bar as shown below.


<WrapPanel Margin="10" >
<Label VerticalAlignment="Center">Progress:</Label>
<Label Content="{Binding Progress}" />
<ProgressBar Width="200"
Height="30"
Value="{Binding Progress}"
Minimum="0"
Maximum="100" />
</WrapPanel>

Now let’s bind this to a property called progress.


private int progress;
public int Progress

{

get
{
return this.progress;
}
set
{
this.progress = value;
this.RaisePropertyChanged(() => this.Progress);
}
}

Now let’s add the event handlers for the CacheAcquireProgress and ExecuteProgress events.


private int cacheProgress;
private int executeProgress;

private int phaseCount;

this.Bootstrapper.CacheAcquireProgress += (sender, args);
{
this.cacheProgress = args.OverallPercentage;
this.Progress = (this.cacheProgress + this.executeProgress) / phaseCount;
};
this.Bootstrapper.ExecuteProgress += (sender, args);
{
this.executeProgress = args.OverallPercentage;
this.Progress = (this.cacheProgress + this.executeProgress) / phaseCount;
};

We then get the phase count by hooking onto the ApplyPhaseCount method as shown below.

WixBA.Model.Bootstrapper.ApplyPhaseCount += this.ApplyPhaseCount;

private void ApplyPhaseCount(object sender, ApplyPhaseCountArgs e)
{
    this.phaseCount= e.PhaseCount;
} 

This would give you the perfect progress percentage for your custom installer!

Passing install path as an argument to burn bootstrapper

I had to add this extra little thing to my Burn bootstrapper EXE where I had to enable the user to pass the installation location as a command line argument. So here is how to do it.

First of all in your chain element of the bootstrapper project’s bundle.wxs, add the MsiProperty element which would allow us to pass value to a variable. Below is an example of such element.


<Chain>
<MsiPackage SourceFile="Awesome1.msi">
<MsiProperty Name="InstallLocation" Value="[InstallerPath]" />
</MsiPackage>
</Chain>

Inside the MSI package’s setup project, add the directory ID to be “InstallLocation” and have it defined as a property as shown below.

<Property id="InstallLocation"/>

<Directory Id="TARGETDIR" name="SourceDir">

<Directory Id="InstallLocation" name=""My Program">

Now back in the bundle.wxs file, add the BalExtension name space as shown below.


xmlns:bal="http://schemas.microsoft.com/wix/BalExtension";

Now declare the variable which is going to hold the install path that would be passed as a command line argument.

<Variable Name="InstallerPath" bal:Overridable="yes"/>

overridable should be set to yes for all the variables that would get their values from command line arguments. Now just run the EXE passing the value.


BootstrapperSetup.exe /i InstallerPath=G:\

That’s all folks! now you have a bootstrap installer that takes the install path as a command line argument!

 

Burn Bootstrapper installer major upgrade doesn’t uninstall previous version

This post provides the solution for one of the worst nightmares I’ve ever had! I created this burn bootstrapper installer setup which installs and uninstall properly. But behaves abnormally during a major upgrade. That is, when you perform a major upgrade, the previous installed version wouldn’t uninstall and the new version will be installed side by side. If this was your issue, then you’re at the right place!

First of all make sure you’ve done the major upgrade the way it is expected to be done. If you have miss any of the following steps, then take a deep breath and just do it!

  • Change the product element’s ID attribute to a new GUID
  • Increment the product element’s version attribute
  •  Add and configure a major upgrade element. Which would look like,

 


<MajorUpgrade DowngradeErrorMessage="A newer version of [ProductName] is already installed"

But in my case, I had done all this and I still was facing the issue. Wasted a lot of time on this as I couldn’t find an answer for this in any blogs or stackoverflow questions. I turned to my installer logs and this is what I found there.

[0980:3888][2016-04-22T16:49:19]i100: Detect begin, 2 packages
[0980:3888][2016-04-22T16:49:19]i102: Detected related bundle: {f57e276b-2b99-4f55-9566-88f47c0a065c}, type: Upgrade, scope: PerMachine, version: 1.0.1.0, operation: None
[0980:3888][2016-04-22T16:49:19]i103: Detected related package: {8C442A83-F559-488C-8CC4-21B1626F4B8E}, scope: PerMachine, version: 1.0.1.0, language: 0 operation: Downgrade
[0980:3888][2016-04-22T16:49:19]i103: Detected related package: {8201DD23-40A5-418B-B016-4D29BE6F010B}, scope: PerMachine, version: 1.0.1.0, language: 0 operation: Downgrade
[0980:3888][2016-04-22T16:49:19]i101: Detected package: KubeUpdaterServiceInstallerId, state: Obsolete, cached: Complete
[0980:3888][2016-04-22T16:49:19]i101: Detected package: MosquittoInstallerId, state: Obsolete, cached: Complete
[0980:3888][2016-04-22T16:49:19]i199: Detect complete, result: 0x0
[0980:3888][2016-04-22T16:51:43]i500: Shutting down, exit code: 0x0

As you can see, it just stopped at the detect complete state. It was supposed to begin the planning phase but it didn’t! I wasted a lot of time in find a solution and in the end arrived at one!

There is this method called “DetectComplete” which is called at the end of the detect phase. So I hooked onto that method and called the plan phase manually. Now the upgrade function works like a charm! it smoothly installs the new version while removing any previous contents! So below is the implementation of it.


void DetectComplete(object sender, DetectCompleteEventArgs e)
{
Bootstrapper.Engine.Log(LogLevel.Verbose,&quot;fired! but does that give you any clue?! idiot!&quot;);
if (LaunchAction.Uninstall == Bootstrapper.Command.Action)
{
Bootstrapper.Engine.Log(LogLevel.Verbose, &quot;Invoking automatic plan for uninstall&quot;);
Bootstrapper.Engine.Plan(LaunchAction.Uninstall);
}
}

Hope this helps someone else looking for a solution for this same issue!