Work

Exploring Program Locality for Efficient Online Fault Detection

Public

As technology scales down, challenges in fabrication, thermal stress, and in-field degradation have put the reliability of processors at risk. Among different fault types, transient faults manifest themselves frequently due to high chip density, aggressive voltage scaling, and high clock frequency. Some dependable processor architectures have been proposed to counter these faults, by integrating various online solutions for error detection and recovery. Recently proposed techniques, including perturbation-based fault screening and ternary content-addressable memory anomaly detection, exploit locality in memory addresses and values for transient fault detection. Their fault coverage comes at a high energy cost and numerous false positives. This dissertation addresses the fault detector’s efficiency problem. We exploit the locality in memory strides, instead of references, to reduce the amount of data needed for fault detection. We propose using Bloom filters to store the hashed form of memory patterns, instead of their original form in TCAM to reduce the hardware and energy cost. We also explore program phase-level locality and propose a framework to customize the fault detectors for the current phase. Additionally, we present the detector design for both the processor backend and the frontend to achieve high fault coverage (80%) at a low false positive rate (<1%). This greatly improves the resilience of the processor to soft-errors while limiting the energy and performance overheads.

Creator
DOI
Subject
Language
Alternate Identifier
Date created
Resource type
Rights statement

Relationships

Items