Safety & Security: Making sense

System-based fault detection time versus sensor-based fault detection time

Giri Venkat, who is responsible for image sensor technical marketing at On Semiconductor. evaluates functional safety in automotive image sensors

As adas features such as lane keeping assist, adaptive cruise control and automated braking for collision avoidance evolve into true autonomy, additional cameras are making their way into production vehicles.

The primary sensor in almost all adas deployments is the image sensor. As adas progresses from assistance to automation, the safe operation of the vehicle will depend more and more on the reliability of the imaging subsystem.

Underlying this is the fact that to ensure a level of safety in adas and autonomous systems, the image sensor becomes a critical component in the system’s overall functional safety.

With the introduction of ISO 26262, the concept of automotive safety integrity levels has been defined. Asils range from Asil-A (lowest) to Asil-D (highest). An Asil is determined by three factors – severity of a failure, the probability of a failure occurring and the ability for the effect of the failure to be controlled. The key metrics that affect the safety performance of the system include detection, delay, efficiency and effect.

 

Image sensors

Image sensors are a core component of an adas and the primary source of all vision system data. They provide the raw data that the rest of the system uses to analyse the environment and then make operational decisions in the vehicle. In effect, image sensors are the eyes of the autonomous vehicle. Other sensors such as radar and lidar may also be used, but the primary sources of data are the image sensors. In addition to the sensors, other components in the adas include components that perform the functions of image processing, analysis and decision making.

The number of image sensors in a typical adas is rapidly growing. From a single forward looking camera, to full surround view systems, the number of cameras in a vehicle can be anywhere from one to over ten. The effect of failures in the sensor depends on the nature of the failure and can range from insignificant to critical. The ability for a system to detect, protect and correct individual failures in the image sensor had significant ramifications to overall safety and reliability.

At its core, a cmos image sensor is a rectangular array of photo-sensitive pixels organised in rows and columns. These pixels convert the incident light into voltage or current with a per-pixel analogue circuit. The current and voltage are then converted into digital values, typically in a row-by-row order. Additional digital logic enables the data to be stored, processed and transmitted to other devices in the system for subsequent processing and analysis.

The data captured by the image sensor in an adas application are typically used by the system to make decisions that affect the operation of the vehicle. As adas has increased in complexity, these decisions have advanced from generating simple audible and visual warnings to much more complex decisions including braking, acceleration and steering, and in future will progress to completely autonomous driving.

 

Failures

A very conservative view of a failure in an image sensor would be to define an unsafe fault as any output that differs from a fault-free model or known-good device output. At a granular level, this would imply that errors even at the pixel level could constitute a failure. At higher levels, row, column and frame errors could also constitute a failure.

Implied are any problems in the internal operation of the device, either analogue or digital, that could manifest themselves as pixel, row, column or frame errors. Finally, errors in the physical transmission of data from the sensor to the rest of the system present another potential cause for failure. Due to the dynamic nature of video, faults can be static, that is permanent or fixed, and dynamic, both spatially and temporally.

Taking this conservative definition of failure in an image sensor, the challenge to the design is the detection of the presence of a failure. Additionally, the system may take measures to protect against the occurrence of a system failure or to correct or take corrective action in the presence of an individual sensor error.

Faults that affect individual pixels may appear to have minimal impact to an adas. However, considering the fact that many of the most advanced object detection algorithms can detect objects in the image of less than ten by ten pixels, individual pixel errors, and certainly error clusters, might affect an object identification algorithm. Also, failures that contribute to pixel errors are likely to affect some proportion of pixels across the array.

Since a pixel output is converted to a digital value representing the intensity of light at a given position, a failure can be considered to be any error or corruption that causes any incorrect value, whether static or dynamic. Factors including power delivery, device defects, excessive noise or even ambient radiation could cause errors.

Due to the array nature of image sensors, logic associated with the row-column structure of the array may also contribute to device faults. Missing or duplicated rows and/or columns can result in loss of information or incorrect representation of the scene. Obvious errors such as a repeating frame in a rear-view system could lead to catastrophic consequences by autonomous, semi-autonomous and human driving.

Even if all elements of the image frame, pixel, row-column and frame data are error free, transmission errors can cause corruption of the data before they reach the intended receiver device. These transmission errors can be caused by any number of natural phenomena that are undetectable by the system.

These failures are general categories of failures each comprising hundreds of individual failure modes. In fact, there are literally thousands of individual failures in an image sensor that could lead to incorrect data being received by downstream devices. Decisions based on the incorrect data could lead to a safety risk. Ultimately, the system must be able to identify and detect the occurrence of these failures to take risk mitigating actions.

 

Fault detection

Detection of faults in an image sensor is a non-trivial exercise. The nature and complexity of the image sensor result in a staggering number of failure modes that could occur. The mix of analogue and digital circuitry further aggravates the problem.

The pixel structure and associated charge transfer and readout circuits are analogue in nature. Faults associated with analogue circuits have different behaviour than those in digital circuits.

During operation, a pixel may suffer from a fault similar to a digital stuck-at fault, which occurs when a logic node becomes stuck at a high or low value. Detecting a fault such as a stuck pixel may appear to be trivial on a host processor. But as sensor resolutions increase to 8Mpixels and above, checking every pixel for any of several fault conditions on every frame for a given window of time can begin to consume a significant number of processor cycles and memory.

Detecting some types of pixel faults, for example noise outside the specified limits, may not even be achievable at the system level. Detecting faults in the analogue-to-digital conversion stage, faults that include missing codes, noise and non-linearities, may also be prohibitive or impossible to perform on a host processor or at the system level.

In addition to analogue faults at the pixel level, the system must contend with digital faults at the pixel level as well. If pixel data are affected by digital errors that cause bits to be shifted, higher level processing may be unable to even detect these errors.

Similarly, while some types of colour space errors may be easily visible to the human eye, computing devices may be unable to detect such faults. Systemic faults in the image processing and transmission pipeline can cause widely ranging error behaviour that may or may not be detectable by the system.

Spatial errors such as row or column addressing errors that result in repeated rows could be detected at the system level but at a cost of CPU cycles and memory. The system has no guarantee that the sensor is even sending the rows and columns in the correct order and virtually no way to verify it.

There may be generic methods to determine that consecutive images are similar to prior images, but these may only indicate gross failures in the sensor. More subtle failures are still beyond the scope of the system to detect. Even in cases where detection at the system level is possible, accounting for the vast number of failure modes that are possible and performing the analysis required to detect them would be prohibitive in terms of compute power as well as being incomplete in coverage.

The last three failure modes to consider are probably more commonly encountered in other digital circuits. The first is ensuring that the data transmitted by the sensor have not been corrupted prior to being received as the data may have to traverse long and noisy transmission media. The second is ensuring that the memories and registers within the sensor are functional and that faults can be detected and/or corrected. The third is a failure in the internal logic or state machines of the sensor.

The first may be solved by using transmitters and receivers with built in error checking and/or error correcting coding. At the very least this adds cost to the system. The second can be solved by the system periodically checking the register and memory contents of the sensor, but this consumes resources. The third could cause issues ranging from catastrophic corruption of image data to more insidious changes that gradually corrupt frame data over the course of many frames. The former type could be easily identified while the latter may be completely invisible to any system-level checking.

Another factor to consider is the delay between the occurrence of the fault and its detection. Commonly referred to as fault detection time interval (FDTI), the detection delay has a significant impact on the overall time between the occurrence of the fault and the transition of the system into a safe state before a hazardous event occurs, or fault tolerant time interval (FTTI), as shown in the diagram.

In the case where the system is required to perform some or all the fault detection, the overall FDTI includes the time for the sensor to transmit the data to the next stage, as well as the time required for the system to receive, analyse and detect the presence of a fault.

 

Sensor-based functional safety

Sensors today offer test capabilities integrated into the device. Some image sensors provide the ability to transmit a defined test frame. Performing a CRC check on the data could indicate a possible fault in transmission. This is a good first step towards fault detection, but often the test frame does not exercise any significant portion of the actual image capture pipeline, especially the analogue portions.

This type of check typically only indicates faults in the transmission data path and not in the sensor itself. Additionally, the faults caught by this method tend to be static failures.

Finally, the generation of a test frame also takes the sensor, and therefore the entire system, offline for a finite period. All these drawbacks point to the need for a real-time method of detecting possible faults at the pixel level of the image sensor.

When considering image sensors for an adas or autonomous vehicle system, analogue fault coverage should be a serious consideration. More advanced sensors offer significant functional safety mechanisms that provide diagnostics of the analogue portion of the sensor, which in most modern sensors occupy more than half of the total circuit area. A high level of analogue diagnostic coverage is essential to robust image sensor functional safety.

A simple metric to differentiate sensors can be the number of analogue safety mechanisms supported by the sensor. While certain analogue safety mechanisms may require some additional computational, a key factor will be the amount of additional processing required to detect the fault. More advanced safety mechanisms will require less computation, often limited to bounds checking, while less sophisticated mechanisms will require more elaborate, compute intensive processing.

Another step towards safety is the inclusion of a frame counter within the sensor. This allows the system to detect when capture has failed for some reason. Counting pixels and lines can provide even better fault coverage by detecting that the sensor is transmitting the correct number of rows and columns per frame. This may capture dynamic failures, but the detection of missing columns or rows indicates that the fault is fairly severe and renders the frame unusable.

These failure modes produce errors that vary in nature from randomly distributed errors to repetitive or fixed errors that require varying levels of computing power and memory to detect. Individually, detection of a given error could be efficiently performed by any adas processor.

While detecting any given type of fault may be possible with some backend processing, detecting every possible type of fault in every frame becomes a monumental task even for the highest performance processors. Having on-sensor functional safety mechanisms that perform the bulk of the fault detection could reduce this computing demand to the simple checking of status or health indicator bits or registers that consume virtually no significant system resources.

In addition to reducing computational demands on the system, sensor based diagnostic coverage can also significantly reduce the FDTI.

The most advanced functional safety devices today take into consideration the wide range of failure modes that could occur in an image sensor and offer three key advantages: first, the ability to achieve the lowest latency from failure to notification; secondly, the ability to provide safety notification in real time, without affecting the operation, quality or performance of the sensor; and, finally, to offer the highest fault coverage at the lowest computation and cost.

 

Fault coverage

Many image sensor vendors make bold claims of high fault coverage including Asil-B and Asil-C support, but how can tier-one manufacturers and OEMs verify these claims? Another important factor is the ability to develop a system with higher fault coverage incorporating a sensor with a lower Asil, that is Asil decomposition.

Typically, diagnostic coverage is based on guidelines given in ISO 26262. However, many sensor manufacturers quote these numbers solely based on the type of test implemented, with little or no consideration of the details of the implementation or any of the other variations in ISO 26262. This usually results in artificially high diagnostic coverage estimates and, to the benefit of the sensor vendor, an equally artificially high Asil rating. This begs the question of how to determine diagnostic coverage of safety mechanisms accurately.

The best way to determine diagnostic coverage is through actual fault injection to determine if a given fault is detected by a safety mechanism. However, with the number of gates in a typical image sensor being in excess of 1.5 million, exhaustive fault injection is practically infeasible. In addition, automotive image sensors today can contain over eight million pixels in addition to other analogue circuitry.

To address this, statistical methods can be employed that enable the calculation of diagnostic coverage within a given margin of error. Statistical fault injection can be effectively used to achieve margins of error of less than five per cent. This gives the ability to calculate diagnostic coverage to within a few per cent.

 

Conclusion

When considering the overall safety of an autonomous vehicle, understanding the diagnostic coverage of an image sensor to a high level of accuracy is vital. Having an image sensor whose diagnostic coverage estimation based on recommendations and guidelines creates a high degree of uncertainty when performing the safety analysis of the overall system.

Conversely, having an image sensor whose diagnostic coverage is known to be accurate to within a few per cent gives high confidence in the overall safety of the autonomous system. Documents such as the FMEDA failure modes, effects and diagnostics analysis can give a clear picture of how the safety mechanisms are tested and how diagnostic coverage is calculated.

Geri Venkat is responsible for image sensor technical marketing at On Semiconductor

www.onsemi.com

Recommend to Colleagues: 

Add new comment

Plain text

  • Allowed HTML tags: <a> <em> <p> <h1> <h2> <h3> <code> <ul> <ol> <li> <dl> <dt> <dd> <strong>
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.

Follow Us:

Twitter icon
Facebook icon
RSS icon