Advanced Error Monitoring, Tracking & Reporting with Retrace

Image

Advanced Error Monitoring, Tracking & Reporting with Retrace

Are Important Application Errors Hiding in your Logs?

Using a logging framework and monitoring application errors is critical. The problem is most developers log them to a text file or database table. If you want to wait until your users report application errors, you can then go back to your log files and try and find related errors. It makes a lot more sense to constantly monitor your applications for errors so you can identify and fix problems ASAP.

Why Developers Need an Error Tracking Service

To make the most of your application errors, you should use an error reporting service. They can collect all of your errors and provide valuable insights in real-time about application problems.
  1. Real-time alerts - know immediately when a new error happens
  2. Centralized repository - one place your team can access everything
  3. Error rates - quickly identify large spikes in error rates
  4. Improve productivity - find root cause of problems much faster

How to Identify Root Cause Instantly

Being able to know that errors are happening is awesome, and much better than nothing. With Retrace, you can go from seeing just an error message to also seeing the complete context of what your code was doing. Understanding what your entire application stack is doing is critical to identifying root cause.Additional details:
  • Web request details
  • Related log messages
  • SQL queries being called
  • Interaction with other dependencies (caching, queueing, etc)
Seeing your application errors within the context of code level transaction traces allows you to see the full picture of what the user was doing.

Three Types of Application Errors

Normally, developers have to specifically catch and log errors in their code so they can find them later in their logging files. If they rigorously follow good practices to always use "try catch" blocks and log errors, you should collect most of your errors. There are potentially three types of errors that can occur within your application. 1. Errors caught in your code that are logged or ignored This is the normal scenario. You use "try catch" blocks and do a good job of logging your exceptions or at least handling exceptions so your users never see them. 2. Errors caught outside of your own code, in another library, and handled somehow These are the types of errors that occur within the .NET framework or JVM and are caught and handled within it. The errors are happening, but you never even see them. 3. Uncaught errors that bubble up to your users If you don't properly catch errors, your app may return a 500 level HTTP status code and an error to your users. These are the worst kind of errors.

Track Application Errors Without Logging Them

Retrace can collect errors in multiple ways. When using our full APM with code profiling, we are able to collect exceptions directly from .NET or Java, without any code changes! Retrace also accepts any errors reported directly from your code via your logging libraries. Our users typically find all kinds of errors they never knew even existed. Many times they find errors that have existed for a long time that were being caught and thrown away by their code. These errors could be hidden problems or simply causing unnecessary performance overhead. Note: Retrace's functionality differs by programming language. 

Retrace Error Monitoring Features

View Web Request Details

For every error that you log, Retrace will attempt to collect any details about the current web request. These details are very valuable to get more context.
  • URL
  • Full stack trace
  • User
  • Header, session, and cookie values
Image

Identify Unique Errors

One of the key features of any error tracking system is the ability to de-duplicate and uniquely identify errors. This enables you to track how often each individual error occurs. You can also reliably setup alerts for when a new error is found. You could send all of your errors to Elasticsearch, a database, or a log management system. Unfortunately, those systems don't understand how to uniquely identify errors.
Image

Ignore Specific Errors

Some errors happen all the time and essentially noise. They could be errors that you can't fix in your code or they could be random SQL timeouts or similar errors. With Retrace you can ignore specific errors. They still get tracked if you need to see them, but they get suppressed from normal reporting so they don't skew your reporting.
Image

View Related Log Messages

Since Retrace provides both error tracking and log management in one solution, you can view all of the log messages that are related to an error. Being able to see the related log messages provides much more context about what your code was doing.
Image

Monitor Error Rates

One of the important things you can monitor about your application are the error rates. If your application has a lot of traffic, it is pretty normal for it to have occasional errors, including some normal transient type exceptions. Monitoring error rates is critical because if your start having problems with a database, external web service, or some other service, you can quickly identify the issue due to a big spike in errors.
Image

Analyze Errors Multiple Ways

Retrace collects a lot of context data about what is occurring when an error is thrown. This data can be very useful for error reporting purposes. Quickly get a list of your customers or users being affected by an error. See every unique stack trace, URL, or method name that are all impact by the error.
Image

Watch Server Errors in Real Time

With Retrace you can tail any type of log messages, including errors, to see them in real-time. Watching errors happen in your system has never been this awesome!
Image

Error Tracking is Your Best Friend After a Production Deployment

One of the best use cases for an error tracking system is watching for errors during and right after a deployment. You are likely to see some errors that are just noise due to the deployment. Aborted requests, request timeouts, and other issues are normal. You may even have some SQL query issues if you have to also deploy SQL schema type changes. Unfortunately, you are also likely to see some new errors that were introduce due to the release. Watch very closely for new errors after the deployment and be prepared to hot fix them quickly!
Error Tracking new errors

Retrace Error Tracking

3 Types of Error Monitoring

Monitoring your applications for errors is really important. There are three different types of error monitoring that we think you should consider. Retrace can help you with all three. 1. HTTP Requests with 500 Status Codes Anytime your web server responds with a 500 internal server error, it is a bad thing. We would suggest tracking this based on a % and/or # per minute. You can then monitor that as a key application performance metric. Retrace APM automatically tracks this as "HTTP Error %" for each application. 2. Error Rates of Tracked Exceptions This represents how many exceptions are being logged and tracked by your error tracking system. Retrace provides this via its error rate monitoring functionality. It is an essential way to monitor for spikes in errors in your code. 3. Count All Thrown Exceptions Sometimes errors happen and you have no way of knowing it. There is a Windows Performance Counter for CLR Exceptions Thrown per minute. This is a great metric to track. We have seen many instances of this being thousands per minute but the application works fine. These types of errors cause big performance problems or hide underlying issues.

Application Errors Cause Performance Overhead

The operation of throwing and catching them is expensive. It causes your app to pause its thread while it walks back the stack to collect a stack trace and do other things. A few exceptions happening per minute are not a huge problem, but it is best practice to avoid them wherever you can. Whatever you do, don't purposely throw exceptions to control logic flow within your app. It is a much better idea to reorganize the code to return some sort of failure status and change the behavior of your code based on that instead.

How to Find Application Errors Before Production

The best place to find application errors is before they get to production. You can get the complete functionality of Retrace, including APM, error tracking, and log management for just $10 a month for your QA servers. Want to find bugs on your dev box? Try Prefix for free!