Exploring Program Locality for Efficient Online Fault Detection
PublicAs technology scales down, challenges in fabrication, thermal stress, and in-field degradation have put the reliability of processors at risk. Among different fault types, transient faults manifest themselves frequently due to high chip density, aggressive voltage scaling, and high clock frequency. Some dependable processor architectures have been proposed to counter these faults, by integrating various online solutions for error detection and recovery. Recently proposed techniques, including perturbation-based fault screening and ternary content-addressable memory anomaly detection, exploit locality in memory addresses and values for transient fault detection. Their fault coverage comes at a high energy cost and numerous false positives. This dissertation addresses the fault detector’s efficiency problem. We exploit the locality in memory strides, instead of references, to reduce the amount of data needed for fault detection. We propose using Bloom filters to store the hashed form of memory patterns, instead of their original form in TCAM to reduce the hardware and energy cost. We also explore program phase-level locality and propose a framework to customize the fault detectors for the current phase. Additionally, we present the detector design for both the processor backend and the frontend to achieve high fault coverage (80%) at a low false positive rate (<1%). This greatly improves the resilience of the processor to soft-errors while limiting the energy and performance overheads.
- Creator
- DOI
- Subject
- Language
- Alternate Identifier
- http://dissertations.umi.com/northwestern:15089
- etdadmin_upload_742292
- Date created
- Resource type
- Rights statement
Relationships
Items
Thumbnail | Title | Date Uploaded | Visibility | Actions |
---|---|---|---|---|
|
Lu_northwestern_0163D_15089.pdf | 2021-01-21 | Public |
|