Developers Need Smart Application Metrics

Image

Developers Need Smart Application Metrics, Not Server Monitoring

Monitoring if your server is up or down and its CPU usage is simply not enough for today’s applications. If you want to really monitor the performance and health of your applications, there are lots of application metrics available from a wide variety of sources but viewing all of them from one place has long been a problem. Retrace solves this problem by combining server metrics tools, including server monitoring, application framework metrics, custom metrics, error tracking, log monitoring, and full code-level performance statistics. Retrace provides comprehensive application monitoring.

Monitor everything about your application

Find application problems instantly by having a single monitoring platform for all of your application metrics. Retrace provides robust application metrics monitoring.
Image

Monitor your key app metrics, not just your servers

Analyze, trend, and chart all of your metrics

Powerful charting and dashboards make it easy to analyze and trend your software metrics.
  • Built-in dashboards
  • Interactive charting
  • Easily compare metrics across multiple servers
  • See trends at a glance with sparklines

Application Framework Metrics

Depending on which programming language you are using, they can provide a wealth of information that you should monitor about your app, including:

  • garbage collection statistics
  • context switches
  • requests queued
  • requests per second and others

These application performance metrics can be very helpful when trying to identify application performance problems.

ASP.NET Performance Counters

The .NET framework provides a wealth of Windows performance counters about your .NET application process, IIS application pool, and more. Microsoft recommends monitoring several Windows Performance counters for your ASP.NET applications. These counters are automatically created and are always available on any Windows server. Retrace has fantastic support for monitoring common .NET performance counters by default and can monitor any Windows performance counter.

Monitoring the Java JVM & Managed Beans

Java utilizes Java Management Extension (JMX) managed beans to provide statistics about your Java application. Depending whether you are using Tomcat, JBoss or other application servers, they can also provide useful MBeans to monitor. Retrace has the ability to connect via JMX and monitor any managed beans.

Web Server Performance Stats

Depending on which web server you are using, it can provide a wealth ofdifferent metrics. Some are provided via Windows performance counters or JavaMBeans. Others like Apache or NGINX may also have ways to monitor various performance statistics.

Create Your Own Custom Application Metrics

Sometimes it makes the most sense to create your own metrics that are specific to your application. For example, perhaps you want to track the batch size of some incoming data when it is uploaded to a REST API that you provide. You could implement this via a Windows performance counter or custom MBean. Another option is to use Retrace .NET or Java libraries.Creating a custom metric is really simple!
StackifyLib.Metrics.Average("Incoming Data", "Batch Size", 25);
Retrace can also collect metrics that are reported as MBeans or Windows performance counters, but you may find our API much simpler to use and may provide more functionality as well. With our API they just work and will show up no matter where you deploy your app. If Retrace has to collect the MBeans or counters then you have to manually configure that in your Retrace monitoring configuration for every app.
See how easy it is to start tracking valuable metrics in your application now. Sign up for a free trial of Retrace!

Tracking critical error rates

Tracking all of the errors in your application are a good first line of defense. It helps you find little bugs in your software that happen occasionally and big problems that are happening thousands of times a minute. Performance problems can also be caused by very high error rates.

There are three different types of error rates you should track. HTTP errors, how many total errors are thrown in your code, and how many errors you handle and log via your application logging. All three are critical application metrics that should be tracked and monitored.
1. HTTP error %
You can monitor your web server for how many 500-level HTTP errors are occurring. If you are using an APM solution like Retrace, it can help you track this as well. For ASP.NET there is a counter called "Errors Total/Sec" for your specific application that you can track. It is good if you can extrapolate the error rate to % of HTTP requests that have errors. Retrace provides this automatically.
2. Total exceptions thrown counts
For .NET you should track the counter called ".NET CLR Exceptions – # of Exceps Thrown". This number should include all exceptions, even though being caught and thrown away. Sometimes your app may seem to be functioning correctly, but this number can be in the thousands and that is really bad. This will have a big performance impact and there could be hidden problems.
3. Logged exceptions

In your code you should have good application logging to a framework like log4net, log4j, etc. You should forward all of those errors to an error or bug tracking system like Retrace. From there you can see how many errors per minute your code is logging. These types of error tracking systems can also send you an alert whenever a new type of error is found. This is highly valuable!

Monitor application availability

Another key software metric to track is your availability or service-level agreement (SLA) %. In larger corporations you probably have contractual obligations to have your software working 99+% of the time or some similar sort of guarantee. It may even cost you money every minute your application is down!

Tracking application availability is a complicated subject that we can't fully cover here. For your application you need to decide if it being online counts towards SLA or is more complex like how fast transactions are processed. Someone like Visa or Mastercard would require an SLA that not only requires credit card processing to be online, but also respond within a certain number of seconds.

Retrace currently tracks SLA in a simple manner by pinging a certain HTTP endpoint to make sure it responds. Although, by combining it with a measurement of "user satisfaction" which is calculated by application response times, you could come close to more complex scenarios.

Measuring performance & satisfaction

Retrace uses the popular apdex formula to calculate a simple-to-understand performance score. It works by specifying a goal for how long a transaction should take and then sorting those into buckets of satisfied (fast requests) and tolerating (slow requests) users.

alt_text

The result of this formula is a number between 0 and 1. This makes for a nice easily tracked application metric to use across all of your applications uniformly. Without it, trying to track performance by response times is quite variable and is relative to what is good or bad.

Retrace uses this methodology for calculating user satisfactions for your entire application and for individual web requests or transactions.

Monitor application dependency performance metrics

Today's applications utilize a wide variety of external dependencies and services like SQL databases, Redis, Elasticsearch, MongoDB, queues, external HTTP web services and more. It is important to monitor these services to ensure they are working correctly and to understand the performance impact they are having on your application.

Monitoring SQL queries can help you understand which queries are being used the most or which are the slowest. By monitoring external HTTP calls you can also identify if a 3rd-party service is causing your application performance problems.

Basic server metrics

It is important to monitor if your servers are online or not. Server CPU and memory usage are also important statistics to monitor. You should also monitor the CPU and memory usage of your specific application as well, not just the server itself.

You may also find it useful to monitor network and disk performance, as well as disk space. Although, if you are like us and host your apps via a service like Azure, we don't even care about these server metrics.

Turning application logs into metrics

Sometimes your application or server logs contain really valuable information that can be useful for application monitoring. You could potentially change your application to report custom metrics, but instead you could setup a query to monitor your log data and achieve the same goal. A good example of this would be searching all of your logs across all servers in your application for a specific log message that denotes an important event. You could then turn this into a metric that can be tracked, charted, and alerted with Retrace.

Retrace combines application metrics, monitoring & alerts

Tracking and monitoring all of these types of metrics may seem overwhelming. Retrace automatically tracks many of the things mentioned and some are more advanced features that our users can implement over time. Retrace automatically tracks things like key application framework performance metrics, error rates, user satisfaction scores, basic server metrics and more.