Reliability, Availability, Serviceability with on-chip monitoring
Proactive Failure Avoidance
Predict failures before they occur instead of reacting to them.
Accurate Fault Detection
Pinpoint to the developing fault at the individual circuit level.
Reliability Monitoring
Effectively identify intrinsic & extrinsic faults to avoid failures.
Unmatched Resiliency
Prevent failures from propagating into system level errors.
Avoid Functional Failures
Prevent Silent Data Corruption
Eliminate System-Wide Errors
Predicting failures before they occur
proteanTecs RTHM is a real-time workload & reliability-aware health monitoring application for advanced electronics. It provides predictive performance indexes and prescriptive maintenance to avoid failures and assure RAS in high-compute, high-scale applications.
Monitoring from within, for always-on visibility
When an issue in the data center is today’s needle in a haystack, pinpointing to a developing fault at the individual component level, and in zero latency, allows precise failure mitigation instead of employing blanket solutions.
Performance Index
Quantifies how close the device currently is to failure
Predictive Maintenance
Monitors how close the device is to timing failures
Prescriptive Maintenance
Realtime operational system adjustments to avoid failures
Failure Detection
Alerts on imminent failures to move to safe-state
In this white paper, we introduce Real-Time Health Monitoring, a proactive solution designed to predict and prevent failures before they occur.
It explores the challenges posed by advanced electronics and demonstrates how RTHM can enhance reliability, availability, and serviceability (RAS) in high-performance datacenters, making them resilient to the demands of modern cloud computing, AI, and high-performance workloads, while minimizing the risk of costly system failures.
RTHM's innovative approach is based on monitoring of timing margins in millions of logic paths within each chip. By identifying and quantifying performance degradation, RTHM enables early detection of potential failures, allowing for timely mitigation and prevention of costly downtime.
Silent data corruption is on the rise in advanced electronics. The explosion in AI leads to a growing complexity and diversity of hardware systems, bringing an increased risk to data integrity. Undetected manufacturing defects, accelerated aging, and environmental factors can lead to data corruption, while traditional approaches fail to adequately address this rising challenge. This paper explores a two-stage detection approach, for different stages of the lifecycle: ML-powered Outlier Detection for semiconductor defect detection at test, and Real-Time Health Monitoring for in-field predictive and prescriptive maintenance.
proteanTecs is the leading provider of deep data analytics for advanced electronics monitoring. Trusted by global leaders in the datacenter, automotive, communications and mobile markets, the company provides system health and performance monitoring, from production to the field. By applying machine learning to novel data created by on-chip monitors, the company’s deep data analytics solutions deliver unparalleled visibility and actionable insights—leading to new levels of quality and reliability. proteanTecs is headquartered in Israel with offices in the USA, India, South Korea and Taiwan.