Error Reporting and Forwarding

Overview

Error reporting is most of the time underestimated because it is believed that it is clear how this works. But the real requirements are most of the time not clear and most of the time it is not working how it should. This article shows the needs and goals of good error reporting. It also shows how it is currently done and how it is related to logging. Specially in dedicated systems error reporting must be defined new.

Reporting

Why do we report errors? We want someone or something let to know that something went wrong. 

This gives the first requirements for error reporting. We need to report whats wrong. Exact enough to understand the problem or to analyse it deeper. 

To do this a static error message for the case is necessary and more dynamic data to describe the specific case. Also a stack trace should be delivered to show exactly the code line where the error occurs. The stack trace is not part of this contemplation because it depends on the underlying programming language how it works. Because of security reasons most times it is not be delivered to a user interface but it should be logged with the error message.

message : String
parameters : Array of String
stacktrace: Array of elements

The message is a more technical information what happened.

Moreover if we present the error to a humans in a user interface it is important to present the error. This means it should be possible to translate it and to show it as a text. Therefore the message could include placeholder where the parameters should be included. Since the parameters are not part of the message itself and the message is a static string it will be possible to create translation tables for the supported languages. Even English to translate the technical message in a well human readable information. The technical message is only the last option - fall back. 

message : String - original technical message with placeholders
messageString : String - Translated and substituted message
parameters : Array of String
stacktrace: Array of elements

Another important task of error messages is that we would search it in a log system. Most time this is one of the hardest thinks to do specially if the error message include dynamical parts. It would also be great if dynamical parts could be searched separately. This is absolutely important to operate a software system.

An example to understand the case:

Access for a user will not be granted for a ticket queue. A traditional way to throw and log an error would be "Access for user 'u123456' are not allowed for ticket queue 'devops'."

To search this in a log something like 'Access for user% and %are not allowed for ticket%' must be searched. This is inefficient and could cause to find also other log entries.

Now change the log entry like it is defined below:

message: "Access for user {1} are not allowed for ticket queue {2}"
parameters[1]: u123456
parameters[2]: devops

Now it will be possible to search the exact message "message:Access for user {1} are not allowed for ticket queue {2}" and if needed it will also be possible to search for the exact user "parameters:u123456" or queue "parameters:devops". By the way this kind of search is more efficient by searching exact values.

The example shows that error reporting is strongly related with logging. This is a topic for another post. It shows also that the separation of static message and dynamic parameters is clever. It is not necessary to log the translation.

Error Codes

For technical reasons it is useful to ship an error code with the error message. The error code should not describe the error in detail but the error class. A similar system are HTTP status codes. The code describes the kind of error, e.g. server side error or client error. Since errors often reported as HTTP results like for RESTful calls it is smart to use the same codes as error codes. But it is necessary to map error events to static codes.

An important property of error codes are an information if the call should be retried. A 'syntax error' will never return success but a 'service unavailable' could be successful after a while.

errorCode: integer
message : String - original technical message with placeholders
messageString : String - Translated and substituted message
parameters : Array of String
stacktrace: Array of elements

Forwarding

In distributed systems error reporting to the UI is more complex than in monolith apps.

A sample to describe the case:

There are a service to create a user in the database which is called from outside UI. The create customer service accepts a list of addresses like owner, contact, shipping address. To validate the address a second service is called. The service also check the ZIP against the area code of the phone number.

If one address is non valid because of the ZIP-AC check the error is "ZIP and phone area code do not match". Another reason could be "family name is too long, maximum is a length of 70 characters". Then the create customer service fails with "shipping address is not valid".

The error message tells us which address cause the problem but the user has no idea what the problem is. Therefore a forwarding of the original error is desirable. But a pure forwarding is also not useful. A combination would help: "shipping address is not valid, ZIP and phone area code do not match".

Encapsulated error messages should be forwarded to the caller. There are a few options to handle root causes.
  • Adopt the root cause: If a root cause is given than override the current message
  • Append the root cause: Add the root cause at the end as part of the error
  • Ignore root cause
A root cause could also be the result of another REST call. It will be stored as error object in the parameters.

The translation will be done while delivery to the UI.

message: "shipping address is not valid, {1}"
errorCode: 400
properties[1]: {
  message: "ZIP {1} and phone area code {2} do not match"
  errorCode: 405
  properties[1]: "12345"
  properties[2]: "55555"
}

As result error code '400 Bad request' and  translated message:

Shipping address is not valid, ZIP "12345" and phone area code "55555" do not match.

Conclusion

Error reporting can become an upgrade if there is the will to think about it. Then it can be a helpful tool supporting users and operators during their daily tasks.








Comments

Popular posts from this blog

Sonatype Nexus fails with random "peer not authenticated" errors behind ingress

Creating a flux sync configuration referring a config map for substitution

[mhus lib] Reorg in generation 7 nearly finished