Class SystemFailure

  • java.lang.Object
    • org.apache.geode.SystemFailure

  • @Deprecated
    public final class SystemFailure
    extends java.lang.Object
    Deprecated.
    since Geode 1.11 because it is potentially counterproductive to try to mitigate a VirtualMachineError since the JVM (spec) makes no guarantees about the soundness of the JVM after such an error. In the presence of a VirtualMachineError, the simplest solution is really the only solution: exit the JVM as soon as possible.
    Catches and responds to JVM failure

    This class represents a catastrophic failure of the system, especially the Java virtual machine. Any class may, at any time, indicate that a system failure has occurred by calling initiateFailure(Error) (or, less commonly, setFailure(Error)).

    In practice, the most common type of failure that is likely to be reported by an otherwise healthy JVM is OutOfMemoryError. However, GemFire will report any occurrence of VirtualMachineError as a JVM failure.

    When a failure is reported, you must assume that the JVM has broken its fundamental execution contract with your application. No programming invariant can be assumed to be true, and your entire application must be regarded as corrupted.

    Failure Hooks

    GemFire uses this class to disable its distributed system (group communication) and any open caches. It also provides a hook for you to respond to after GemFire disables itself.

    Failure WatchDog

    When startThreads() is called, a "watchdog" Thread is started that periodically checks to see if system corruption has been reported. When system corruption is detected, this thread proceeds to:
    1. Close GemFire -- Group communication is ceased (this cache member recuses itself from the distributed system) and the cache is further poisoned (it is pointless to try to cleanly close it at this point.).

      After this has successfully ended, we launch a

    2. failure action, a user-defined Runnable setFailureAction(Runnable). By default, this Runnable performs nothing. If you feel you need to perform an action before exiting the JVM, this hook gives you a means of attempting some action. Whatever you attempt should be extremely simple, since your Java execution environment has been corrupted.

      GemStone recommends that you employ Java Service Wrapper to detect when your JVM exits and to perform appropriate failure and restart actions.

    3. Finally, if the application has granted the watchdog permission to exit the JVM (via setExitOK(boolean)), the watchdog calls System.exit(int) with an argument of 1. If you have not granted this class permission to close the JVM, you are strongly advised to call it in your failure action (in the previous step).

    Each of these actions will be run exactly once in the above described order. However, if either step throws any type of error (Throwable), the watchdog will assume that the JVM is still under duress (esp. an OutOfMemoryError), will wait a bit, and then retry the failed action.

    It bears repeating that you should be very cautious of any Runnables you ask this class to run. By definition the JVM is very sick when failure has been signalled.

    Failure Proctor

    In addition to the failure watchdog, startThreads() creates a second thread (the "proctor") that monitors free memory. It does this by examining free memory, total memory and maximum memory. If the amount of available memory stays below a given threshold, for more than WATCHDOG_WAIT seconds, the watchdog is notified.

    Note that the proctor can be effectively disabled by setting the failure memory threshold to a negative value.

    The proctor is a second line of defense, attempting to detect OutOfMemoryError conditions in circumstances where nothing alerted the watchdog. For instance, a third-party jar might incorrectly handle this error and leave your virtual machine in a "stuck" state.

    Note that the proctor does not relieve you of the obligation to follow the best practices in the next section.

    Best Practices

    Catch and Handle VirtualMachineError

    If you feel obliged to catch either Error, or Throwable, you mustalso check for VirtualMachineError like so:
            catch (VirtualMachineError err) {
              SystemFailure.initiateFailure(err);
              // If this ever returns, rethrow the error.  We're poisoned
              // now, so don't let this thread continue.
              throw err;
            }
     

    Periodically Check For Errors

    Check for serious system errors at appropriate points in your algorithms. You may elect to use the checkFailure() utility function, but you are not required to (you could just see if getFailure() returns a non-null result).

    A job processing loop is a good candidate, for instance, in org.apache.org.jgroups.protocols.UDP#run(), which implements Thread.run():

             for (;;)  {
               SystemFailure.checkFailure();
               if (mcast_recv_sock == null || mcast_recv_sock.isClosed()) break;
               if (Thread.currentThread().isInterrupted()) break;
              ...
     

    Catches of Error and Throwable Should Check for Failure

    Keep in mind that peculiar or flat-outimpossible exceptions may ensue after a VirtualMachineError has been thrown anywhere in your virtual machine. Whenever you catch Error or Throwable, you should also make sure that you aren't dealing with a corrupted JVM:
           catch (Throwable t) {
             // Whenever you catch Error or Throwable, you must also
             // catch VirtualMachineError (see above).  However, there is
             // _still_ a possibility that you are dealing with a cascading
             // error condition, so you also need to check to see if the JVM
             // is still usable:
             SystemFailure.checkFailure();
             ...
           }
     
    Since:
    GemFire 5.1
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected static java.lang.Error failure
      Deprecated.
      the underlying failure This is usually an instance of VirtualMachineError, but it is not required to be such.
      static long MEMORY_MAX_WAIT
      Deprecated.
      This is the maximum amount of time, in seconds, that the proctor thread will tolerate seeing free memory stay below setFailureMemoryThreshold(long), after which point it will declare a system failure.
    • Method Summary

      All Methods Static Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      static void checkFailure()
      Deprecated.
      Utility function to check for failures.
      static void emergencyClose()
      Deprecated.
      Attempt to close any and all GemFire resources.
      static java.lang.Error getFailure()
      Deprecated.
      Returns the catastrophic system failure, if any.
      static void initiateFailure​(java.lang.Error f)
      Deprecated.
      Signals that a system failure has occurred and then throws an AssertionError.
      static boolean isJVMFailureError​(java.lang.Error err)
      Deprecated.
      Returns true if the given Error is a fatal to the JVM and it should be shut down.
      static void loadEmergencyClasses()
      Deprecated.
      Since it requires object memory to unpack a jar file, make sure this JVM has loaded the classes necessary for closure before it becomes necessary to use them.
      protected static void logFine​(java.lang.String name, java.lang.String s)
      Deprecated.
      Logging can require allocation of objects, so we wrap the logger so that failures are silently ignored.
      protected static void logInfo​(java.lang.String name, java.lang.String s)
      Deprecated.
      Logging can require allocation of objects, so we wrap the logger so that failures are silently ignored.
      protected static boolean logWarning​(java.lang.String name, java.lang.String s, java.lang.Throwable t)
      Deprecated.
      Logging can require allocation of objects, so we wrap the logger so that failures are silently ignored.
      static boolean setExitOK​(boolean newVal)
      Deprecated.
      Indicate whether it is acceptable to call System.exit(int) after failure processing has completed.
      static void setFailure​(java.lang.Error failure)
      Deprecated.
      Set the underlying system failure, if not already set.
      static java.lang.Runnable setFailureAction​(java.lang.Runnable action)
      Deprecated.
      Sets a user-defined action that is run in the event that failure has been detected.
      static long setFailureMemoryThreshold​(long newVal)
      Deprecated.
      Set the memory threshold under which system failure will be notified.
      static void signalCacheClose()
      Deprecated.
      Should be invoked when GemFire cache is closing or closed.
      static void signalCacheCreate()
      Deprecated.
      Should be invoked when GemFire cache is being created.
      static void startThreads()
      Deprecated.
      This starts up the watchdog and proctor threads.
      static void stopThreads()
      Deprecated.
      This stops the threads that implement this service.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • failure

        protected static volatile java.lang.Error failure
        Deprecated.
        the underlying failure This is usually an instance of VirtualMachineError, but it is not required to be such.
        See Also:
        getFailure(), initiateFailure(Error)
      • MEMORY_MAX_WAIT

        public static final long MEMORY_MAX_WAIT
        Deprecated.
        This is the maximum amount of time, in seconds, that the proctor thread will tolerate seeing free memory stay below setFailureMemoryThreshold(long), after which point it will declare a system failure. The default is 15 sec. This can be set using the system property gemfire.SystemFailure.MEMORY_MAX_WAIT.
        See Also:
        setFailureMemoryThreshold(long)
    • Method Detail

      • setExitOK

        public static boolean setExitOK​(boolean newVal)
        Deprecated.
        Indicate whether it is acceptable to call System.exit(int) after failure processing has completed.

        This may be dynamically modified while the system is running.

        Parameters:
        newVal - true if it is OK to exit the process
        Returns:
        the previous value
      • isJVMFailureError

        public static boolean isJVMFailureError​(java.lang.Error err)
        Deprecated.
        Returns true if the given Error is a fatal to the JVM and it should be shut down. Code should call initiateFailure(Error) or setFailure(Error) if this returns true.
        Parameters:
        err - an Error
        Returns:
        whether the given error is fatal to the JVM
      • signalCacheCreate

        public static void signalCacheCreate()
        Deprecated.
        Should be invoked when GemFire cache is being created.
      • signalCacheClose

        public static void signalCacheClose()
        Deprecated.
        Should be invoked when GemFire cache is closing or closed.
      • loadEmergencyClasses

        public static void loadEmergencyClasses()
        Deprecated.
        Since it requires object memory to unpack a jar file, make sure this JVM has loaded the classes necessary for closure before it becomes necessary to use them.

        Note that just touching the class in order to load it is usually sufficient, so all an implementation needs to do is to reference the same classes used in emergencyClose(). Just make sure to do it while you still have memory to succeed!

      • emergencyClose

        public static void emergencyClose()
        Deprecated.
        Attempt to close any and all GemFire resources. The contract of this method is that it should not acquire any synchronization mutexes nor create any objects.

        The former is because the system is in an undefined state and attempting to acquire the mutex may cause a hang.

        The latter is because the likelihood is that we are invoking this method due to memory exhaustion, so any attempt to create an object will also cause a hang.

        This method is not meant to be called directly (but, well, I guess it could). It is public to document the contract that is implemented by emergencyClose in other parts of the system.

      • checkFailure

        public static void checkFailure()
                                 throws InternalGemFireError,
                                        java.lang.Error
        Deprecated.
        Utility function to check for failures. If a failure is detected, this methods throws an AssertionFailure.
        Throws:
        InternalGemFireError - if the system has been corrupted
        java.lang.Error - if the system has been corrupted and a thread-specific AssertionError cannot be allocated
        See Also:
        initiateFailure(Error)
      • initiateFailure

        public static void initiateFailure​(java.lang.Error f)
                                    throws InternalGemFireError,
                                           java.lang.Error
        Deprecated.
        Signals that a system failure has occurred and then throws an AssertionError.
        Parameters:
        f - the failure to set
        Throws:
        java.lang.IllegalArgumentException - if f is null
        InternalGemFireError - always; this method does not return normally.
        java.lang.Error - if a thread-specific AssertionError cannot be allocated.
      • setFailure

        public static void setFailure​(java.lang.Error failure)
        Deprecated.
        Set the underlying system failure, if not already set.

        This method does not generate an error, and should only be used in circumstances where execution needs to continue, such as when re-implementing ThreadGroup.uncaughtException(Thread, Throwable).

        Parameters:
        failure - the system failure
        Throws:
        java.lang.IllegalArgumentException - if you attempt to set the failure to null
      • getFailure

        public static java.lang.Error getFailure()
        Deprecated.
        Returns the catastrophic system failure, if any.

        This is usually (though not necessarily) an instance of VirtualMachineError.

        A return value of null indicates that no system failure has yet been detected.

        Object synchronization can implicitly require object creation (fat locks in JRockit for instance), so the underlying value is not synchronized (it is a volatile). This means the return value from this call is not necessarily the first failure reported by the JVM.

        Note that even if it were synchronized, it would only be a proximal indicator near the time that the JVM crashed, and may not actually reflect the underlying root cause that generated the failure. For instance, if your JVM is running short of memory, this Throwable is probably an innocent victim and not the actual allocation (or series of allocations) that caused your JVM to exhaust memory.

        If this function returns a non-null value, keep in mind that the JVM is very limited. In particular, any attempt to allocate objects may fail if the original failure was an OutOfMemoryError.

        Returns:
        the failure, if any
      • setFailureAction

        public static java.lang.Runnable setFailureAction​(java.lang.Runnable action)
        Deprecated.
        Sets a user-defined action that is run in the event that failure has been detected.

        This action is run after the GemFire cache has been shut down. If it throws any error, it will be reattempted indefinitely until it succeeds. This action may be dynamically modified while the system is running.

        The default action prints the failure stack trace to System.err.

        Parameters:
        action - the Runnable to use
        Returns:
        the previous action
        See Also:
        initiateFailure(Error)
      • setFailureMemoryThreshold

        public static long setFailureMemoryThreshold​(long newVal)
        Deprecated.
        Set the memory threshold under which system failure will be notified. This value may be dynamically modified while the system is running. The default is 1048576 bytes. This can be set using the system property gemfire.SystemFailure.chronic_memory_threshold.
        Parameters:
        newVal - threshold in bytes
        Returns:
        the old threshold
        See Also:
        Runtime.freeMemory()
      • logWarning

        protected static boolean logWarning​(java.lang.String name,
                                            java.lang.String s,
                                            java.lang.Throwable t)
        Deprecated.
        Logging can require allocation of objects, so we wrap the logger so that failures are silently ignored.
        Parameters:
        name - the name of the logger
        s - string to print
        t - the call stack, if any
        Returns:
        true if the warning got printed
      • logInfo

        protected static void logInfo​(java.lang.String name,
                                      java.lang.String s)
        Deprecated.
        Logging can require allocation of objects, so we wrap the logger so that failures are silently ignored.
        Parameters:
        name - the name of the logger
        s - string to print
      • logFine

        protected static void logFine​(java.lang.String name,
                                      java.lang.String s)
        Deprecated.
        Logging can require allocation of objects, so we wrap the logger so that failures are silently ignored.
        Parameters:
        name - the name of the logger
        s - string to print
      • startThreads

        public static void startThreads()
        Deprecated.
        This starts up the watchdog and proctor threads. This method is called when a Cache is created.
      • stopThreads

        public static void stopThreads()
        Deprecated.
        This stops the threads that implement this service. This method is called when a Cache is closed.