Diagnosing performance issues in production environments – Part II of II

It has been a hectic few days and at last I found some time to discuss the potential solutions. In the previous post we had discussed the importance of metrics and how it can help us diagnose performance issues in production environment. We were left with the question as to whether we should implement the metrics logic on our own. The simple answer is, NO. The developer community has already come up with effective libraries that provide us with a variety of features. In this post, let’s look at the potential solutions we have for C#, Java and JavaScript languages.

The project in which I had to integrate performance metrics was Java and therefore I’ll start by discussing the Java solutions first. Once we chose to go ahead with metrics, we found out that there are numerous implementations available for metrics in Java. In order to choose a library we had to make a rational decision, for which we came up with the idea of preparing a DAR report. DAR (Decision Analysis and Resolution)  is a formal process where decisions are made using a formal evaluation process after careful consideration of the identified alternatives based on established criteria. Out of the available Java libraries, we short listed a few which were obviously having the edge over the others based on popularity, features and support.

  • Java Metrics
  • Perf4J
  • JAMon
  • Java Simon

The following criteria (In the order of the weights assigned) were drawn in order to choose the best solution from the above list of available alternatives.

  • Ability to create custom counters and timers for monitoring and measuring performance of code blocks.
  • Ability to visually evaluate the performance results which would help in finding trends and patterns.
  • Free and open source software license considering the cost and ability to do modifications as and when required.
  • Ability to enable or disable the features on demand.
  • Active support provided through mailing list or by any other means.

Java Metrics (http://metrics.dropwizard.io/3.2.0/) library easily outplayed the rest with a total of 47 points based on the criteria weights assigned. It satisfies the performance counter creation requirement with features to create meters, gauges, counters, timers and health checks. Visualization requirement is satisfied through reporting capabilities with Graphite and Ganglia. Licensing requirement is fulfilled with Apache License 2.0.Active support is available through the means of an active mailing list and dedicated Stack overflow tags. Enabling or disabling the features is not provided as a result of which we chose to wrap this library and provide it on our own! Perf4J was the second best option with 34 points.

We also had a C# component which resulted in the DAR report for the list of available C# libraries. Out of all the available libraries, the following three were shortlisted for the DAR analysis.

  • Metrics.NET (etishor)
  • Statsd
  • Serilog Metrics

The same criteria were considered and Metrics.NET was chosen ahead of the rest with 50.5 points. It satisfies the performance counter creation requirement by allowing to create gauges, counters, meters, timers and histograms.Visualization requirement is satisfied through graphite reporting along with other sources of reporting such as HTTP endpoints, influx DB and elastic search. Licensing requirement is fulfilled with Apache Software License 2.0. Unlike the Java solution, Metrics.NET provides with the disabling and enabling feature through a configuration file.This means that the metrics reporting could be completely disabled when not needed. Unlike the Java solution, support is available only through the GitHub page. The second best option we had is statsd with 41.5 points. Metrics.NET could be marked down as an equivalent implementation of the Java Metrics library.

Now let us have a look at the possible JavaScript solutions that are out there.

  • metrics
  • measured
  • node-monitor
  • appmetrics

The same set of criteria were considered and “metrics” library was easily the better option compared to the rest, with 47.5 points. It satisfies the performance counter creation requirement by providing features such as meters, gauges, counters and timers.The visualization requirement has been satisfied through the reporting capability provided through Graphite. CSV and console reporting are among the other options available for reporting of the metrics. Licensing requirements are satisfied with MIT License though configuration to enable or disable the features is not provided. Support is available only through the GitHub page. This could also be identified as an equivalent implementation of the Java Metrics library.

Anyone reading through the previous post would’ve got the doubt, “What do we do when we don’t need the metrics logic to be executed?! “. Reading through the solutions in this post, you would’ve already got the answer. Most of the libraries provide enable/disable feature through configuration and even if it is not present, we could still enable/disable it by wrapping the library to implement it on our own. Another popular question would’ve been “What about the overhead?!“. We carried out a simple test based on our application’s normal flow, with and without the metrics library integration and no visible performance issues were noticed. Besides, the library websites and GitHub pages do provide the guarantee for a very minimal overhead that could simply be ignored. Okay but still, “Won’t having this code integrated into the business logic cause any confusion and reduce readability?!“. Well that is inevitable. But you could isolate the metrics code through the usage of design patterns. Observer pattern could be an option where the business logic could act as the Subject and Metrics logic could act as the Observer.

So the next time you are starting over with a new project, just consider the world of possibilities that capturing performance metrics could present you with. No more dependence on any developer for finding bottlenecks, no need to replicate the production environment data to debug and find the issue and above all, no need to squeeze your brain to think of the critical areas where it could’ve gone wrong!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s