Home Reliability study

Test principle and key point of SSD Power-off Protection

ERIC

Considering the SSD must be based on FTL to realize the data transfer between logical address and physical address, if there happens abnormal power-off in case of normal operation (read/ program/ erase), it may lead the loss of mapping table which has not finished updating and resulting in the SSD failed to be detected by the host.


Meanwhile, to improve the read/write performance, SDRAM is normally adopted as buffer on SSD board. The abnormal power-off also can cause the loss of data or updated mapping table which have not been written into NAND Flash from SDRAM.



SSD Failure problem caused by abnormal power-off


1. SSD cannot be detected by OS once sudden power-off, it only be reusable after re-build mapping table or replanting.

2. Mass new bad block generated after repeatedly abnormal power-off.

3. The mechanism of new bad block generated is: when SSD cannot successfully read/ program or erase some blocks, such blocks would be marked as bad block. However, they are not real bad blocks, but a fault judgment by the Controller due to abnormal power-off. Loss of the data in SDRAM.



Typical method of power-failure protection


1. To protect the data integrity in SDRAM


This method must ensure all the data in SDRAM successfully be written into NAND Flash. In general, the capacity of SDRAM is set millesimal of total capacity of SSD. So for the SSD with small capacity, the data stored in SDRAM is limited, which could be realized in write operation after power-off based on the power support of super-capacitor or tantalum capacitor. However, if the SSD with high-capacity like 8TB, the data needed to be written into NAND from SDRAM would be huge. In this condition, the suppliers have to solve below problems if still based on the super-capacitor or tantalum capacitor:


(1) More tantalum capacitors demand. But in practice, it is difficult to achieve due to the limitation of thickness, standard size and space of the board.

(2) Even there are enough capacitors as power supply, there leads to another problem that SSD is unable to boot normally when executing “reboot”, which has to be powered off for a while and then reboot again. Only discharged all of the tantalum capacity the SSD can be detected.

(3) The tantalum or super capacitor will be aging after years of usage. When the power supply of tantalum capacitor cannot reach the initial default value, there are also the potential risks of data loss or SSD failed to be detected after abnormal power off.


2. Only save the data in SDRAM but not mapping table


This method reduced the usage of SDRAM or tantalum capacitor. “Do not save the mapping table” does not mean the miss of mapping table, which only does not save the last updated mapping table.

When power on again, SSD will find the new programmed data after last saved mapping table and then re-built the new mapping table. The disadvantage of this method is that it needs long-time to re-build the mapping table before SSD returns to regular use if without reasonable mechanism design setting.


To the SSD without SDRAM design, all of the data are written into NAND Flash directly. This way, if the data that have not been written into NAND Flash will be marked as failure program and returned to host in abnormal power-off, then no extra data need to be saved. Therefore, for the applications need high reliability, the best design is without SDRAM. The only disadvantage is read/write performance is not so fast. But in practice, most applications do not pursue the best performance but enough performance with high reliability.




Test method and theory



In the practical test, SSD should be tested in two conditions - as Master disk and Slave disk. Test as Master disk, the power-on/off operation is for PC, but as Slave disk, the power-on/off only for SSD.


(1) Set 3000 cycles of abnormal power-off test separately for SSD in blank condition, filled with 25% data, 50% data, 85% data and full filled. The interval time of power-on and power-off is 3s. The theory of test separately for SSD with different data capacity: The disk would run GC when SSD filled with certain data. GC means data movement. Each data move related to update of mapping table. The abnormal power-off happens in this period would lead the failure of SSDs.

(2) Abnormal power-off test on SSD with common programming (write data into SSD) To execute below 8 operations when write data file system in Windows:


Test principle and key point of SSD Power-off Protection, SSD Test, Renice

From the above flow, we can see the process of writing data is also the process of updating mapping table. The abnormal power-off during this period still can affect the complete updating of mapping table.

(3) Abnormal power-off happened during data erase process. Data erase implementation also has to execute above 8 operations in Windows. So the theory is same as file-building, the mapping table need to be updated.

(4) Test 3000 cycles of abnormal power-off when SSD in read operation. Set the interval time of power-on/ power-off as 3s.

(5) Test 3000 cycles of abnormal power-off during common startup and shutdown.

(6) Test 3000 cycles of abnormal power-off in common boot the OS. Refer to the industrial grade SSD or military grade SSD, the above test should be run in high/low temperature environment.


If you have a question or need a quote, please leave your message. We'll get back to you as soon as possible.

Get Quotes
Get Quotes

We use cookies to help us improve our webpage. Please read our Cookie Policy.

Ok Block Cookie