Fault Management

Fault management systems use automated on-board processes to detect anomalous behavior, isolate the problem to one or more components or subsystems, and initiate a response. The goals of the fault management system design are to ensure system safety and, if possible, recover from the fault and restore the system to full operability. A traditional fault management response is often to transition the spacecraft to safe mode and allow ground operators to resolve the problem. Depending upon the spacecraft tracking coverage and ground system staffing schedules, anomaly response times are often measured in hours or days. In a hostile environment, traditional fault management designs may not be sufficient. Immediate response is required. An automated fault management system is needed that can interpret a set of faults as potentially being the result of a hostile action.

Vestigo Aerospace is developing a pioneering “Self-Monitoring, Assessment, and Response-to-Threat” (SMART) fault management system, which provides robust fault detection, isolation and recovery in the presence of uncertainty. Machine learning algorithms, trained with lab-based testing data as well as on-orbit datasets, may be utilized to identify faults through off-nominal telemetry or unexpected state transitions. Root cause determination in the presence of uncertainty may be implemented using probabilistic graphical models, such as Bayesian networks. Decision theory provides a framework for selecting the appropriate fault responses within the context of uncertainty, even in situations where probabilities of events are not well known.

The SMART fault management system provides the resiliency and rapid response capability needed in a contested space environment.