Error Reporting Strategy

By "error reporting strategy", we mean here: the strategy put in place in a program to catch an report any error occurring during its execution.

A good error reporting strategy starts with analyzing the needs:

  • Inform the application user of what went wrong in a user-friendly message in his own language. The error message should give to the user the maximum relevant information to allow him to understand what is the problem and how to solve it or find a workaround.
  • If the exception is not relevant for the user (because it comes from a technical error) don't try to explain it. A message like 'Internal error, please report to support' (translated in user language) is enough.
  • Ensure that the developer has the maximum information from the exception reporting so that he can easily figure out what happened. For developer messages, plain English text is enough.

This is based on a fundamental distinction between two types of exceptions that can happen inside a program:

  • Logical errors = errors that shouldn't happen in a correct program because they come from programming mistakes. They are typically unchecked exceptions like NullPointerException or ArrayIndexOutOfBoundException. The development process will try to find and suppress all those errors, but the application must always be ready to catch and report them.
  • Environment errors = error caused by transient external condition (bad input, communication errors, file errors, database down ...). Those are usually checked exceptions and should be explained to the user so he can take appropriate action.

We have also following technical needs:

  • We want to spend the minimal effort to code and maintain the error reporting inside an application. New error messages should be easy to add.
  • The error support should not break the modularity of the application.

problems

The mapping between checked/unchecked exception to the two types of exception is not so clear, in addition some libraries warp checked exceptions in unchecked exceptions. So, a good exception reporting strategy cannot base itself on that criteria. For example, if you parse a user input string to an Integer, you should report parse exception with full details allowing the user to correct his input even if the thrown NumberFormatException is a RuntimeException.

In most API (including some java base API) there's no exhaustive list of the exception case that are worth to report to users. For example, if a method declares that it throws an IOException, you cannot deduce if it might throw a "file not found" a "disk full" or a "file locked by another application". Those are examples of exception case that are worth to report to the user but there is no way to know how and when they will be thrown ("file not found" has its own IOException subclass but the two others don't). So, in practice, you will discover the various errors that can happen at runtime (in test or production).

When your application is internationalized, there is usually a problem to locate the construction of the error messages. The place where the exception is first catch (usually in the inner business layer) is the best place to create a detailed message of what went wrong because all the context of the failed action is available. On the other hand, the upper GUI layer is the place when the user language and the user action (which is something more global that the precise action that failed but the only thing the user understands) are known. In fact, you need both to give a complete error message but in well designed systems the GUI and the business must stay separated.

An typical example

Let's take the following error case:

A user wants to read his task list using his newly installed task-management application. When doing so, it's the first time he connects to the database containing the tasks (which is known by this user as the 'work schedule database') so the application starts to setup the connection but failed because some configuration file is missing.

In this case, the ideal error message should be something like:

        Unable to read the task list!
        Cause: the configuration file (c:/myApp/conf/db.props) 
               of the database 'Work Schedule' cannot be found.
        

Of course, it's my opinion of the ideal error message. Some will find it too technical and prefer a second part saying just "database error". The exception support should allow you to choose the appropriate level of reporting.

The point of this example is that, in general, a good error report should merge the information of the upper GUI layer (the user was just trying to read his task list) and a lower technical layer (the DB configuration file named xxx is missing). And, of course, the whole should be reported in the user language. In particular, some contextual element names (like here the name of the database for which the connection failed) should also be translated.

Legacy solutions

One common solution (coming from legacy C or C++ application) to that problem is to say: the lower business layer will throw exceptions containing no message but error codes and an exhaustive list of error codes will be provided. So, an upper GUI layer can use that to generate a meaningful error message.

In practice, this doesn't work well because:

  • An error code has not enough structure to encode complex messages. Most of the time, you want more contextual information. Example: if you Database says "Error 52004: Table or view does not exist" it's always frustrating because you want to know which table or view was faulty.
  • The carrying of such code is done in ad-hoc manner varying from specific exception field to global variable. It makes the transmission of the information across technical layers very difficult.
  • You go to endless and painful process of error code list maintenance. This process is usually heavy because it encompass different actors like developers, support team, technical writers and (if you are very unlucky) managers.

We can do far better in term of result and easy of maintenance with java exceptions and a bit of organization.

SFaC solution.

The idea is to use smartly the java exception chain (you can chain exceptions using the setCause(Throwable) method). Any layer can catch an exception, create an new exception providing more context information of what happened, chain the lower level exception and throw this new exception.

When there is nothing relevant for the user (logical exception) the new exception can be any exception class. When something is to be reported to the user, the Exception must implement the MultiLingualMessage interface

The MultiLingualMessage interface (and the corresponding MultiLingualException class) allows to encode a structured message by giving a bundle key with optional parameters. A parameter can itself be a MultiLingualMessage. So, this recursive structure allows complex messages.

With those classes you can encode anywhere any complex message without bothering about the current user language. Then, at the very top of the method invocation chain, you need a last catch block that will get the exception and report it to the user using the ExceptionDialog. This catch block is usually in the method handling the swing action. So it's typically in a GUI component knowing what the user was doing. So, it can give the final part of error message reporting.

The ExceptionDialog will build the final error message by combining the message given directly and all the message provided by MultiLingualException in the exception chain.

Example

To take back the above database configuration example, we can solve it like following:

First always write your swing action handlers with a try/catch like following:

    void refreshTaskList() {
        try {
            DbConnection conn=getDatabaseConnection("WORK_SCHEDULE_DB");
            refreshTaskList(conn);
        } catch (Exception e) {
            log.error("Unable to refresh task list for user "+getUser(), e);
            ExceptionDialog.showExceptionDialog(this, "ERROR", "TASK_LIST_REFRESH_ERROR", e);
        }
    }

The important points are:

  • There should always be a try/catch at the higher call stack level (just after the swing action invocation)
  • The catch exception is reported using the ExceptionDialog and with a message describing the high level context problem.
  • Don't forget to also log the exception in your logging system (java Logger, log4j ...). In this case provide the maximum context information for the developer.
  • The catch exception might contain MultiLingualException in the exception chain, in this case, the provided message is also displayed (in correct language) by the ExceptionDialog.

Let's see how the getDatabaseConnection can provide a meaningful error message:

    private DbConnection getDatabaseConnection(String databaseId) {
        try {
            Properties props = new Properties();
            props.load(new FileReader(dbConfigFilePath));
            DbConnection conn = ... (The connection setup using properties)
            return conn;
        } catch (FileNotFoundException fnfe) {
            Object[] params = new Object[] { new MultiLingualTextImpl(databaseId), dbConfigFilePath };
            throw new MultiLingualException("DATABASE_CONFIG_FILE_NOT_FOUND", params, fnfe);
        } catch (IOException ioe) {
            Object[] params = new Object[] { new MultiLingualTextImpl(databaseId), dbConfigFilePath };
            throw new MultiLingualException("DATABASE_CONFIG_FILE_ERROR", params, ioe);
        } catch (Exception e) {
            Object[] params = new Object[] { new MultiLingualTextImpl(databaseId) };
            throw new MultiLingualException("CANNOT_OPEN_CONNECTION", params, e);
        }
    }
    

Here you can see that you simply have to wrap a catch exception into a MultiLingualException and throw the result. It allows to very simply add information about an error from any point in the call stack.

Of course, you have to provide the translation strings for the message you use. See Language Support documentation for info about it. Provided that your bundles contains the following:

   ERROR=Error 
   TASK_LIST_REFRESH_ERROR=Unable to read the task list!
   WORK_SCHEDULE_DB=Work Schedule
   DATABASE_CONFIG_FILE_NOT_FOUND=The configuration file ({1})\n of the database "{0}" cannot be found.
    

You will have the result:

That's it!

The "Details" button of the dialog allows you to show the raw exception chain stack trace. This is useful to investigate unexpected exceptions. Each time you encounter a new exception case (either in test or production) you have to decide either to fix it (because it's a logical error), to devise new error messages or to rely on default error message.

Doing that you will, with minimal maintenance effort, tune your error reporting.

Conclusion

Of course, you need to smartly catch and wrap the exception in your code and also to maintain your bundles for message translations. There's no way to avoid that if you want the correct error message at the end.

This solution has the following advantages:

  • It relies on standard java mechanisms: Exception chains and resource bundles. It's very easy to add/modify/remove new exception messages.
  • You can mix standard exceptions (carrying information for developers) with MultiLingualException providing information for the end user.
  • The information is added in the layer where it is available, adding no coupling between the layers.
  • The coverage and granularity of reported errors can easily be grown during test phases.
  • The intermediates layers are not forced to catch the exception. If they have no information to add, they can let them pass through.
  • It works as well with checked or runtime exceptions. You are not forced to one exception-handling style.
  • The resulting message is as complete as possible and self-explaining (it's not a code referring to and external explanation). You can win a lot in term of documentation effort either internal (for developer) or external (for final application user) because you are not forced to maintain endless error description lists.