I. Introduction
The designer of electronic systems for space applications should take into account a number of critical design constraints related to the harsh environment in which they operate. In fact, in the space environment electronic components are stressed by a number of physical phenomena, such as, for example, mechanical stresses, ionizing radiations, and critical thermal conditions. In order to deal with these issues the typical approach has been the development of space qualified electronic devices based on special and expensive technology processes. The use of such components implies some drawbacks such as high cost (due to the special technology and the low number of produced parts) and low performances compared with commercial off the shelf (COTS) components (due to limiting factors in radiation hard technology) [3]. On the other hand the use of COTS components push for a design based on suitable system level methodologies in order to match the severe reliability requirements of space applications. A typical case, where this approach is exploited, is the design of space-borne mass memories. In fact, the rapid growth in capacity of semiconductor memory devices permits the development of solid-state mass memories, which are competitive with respect to tape recorders due to higher reliability, comparable density, and better performances. Solid state mass memories (SSMMs) have no moving parts and their operational flexibility has made them suitable for many applications. Moreover, the requirements of low latency time, high throughput, and storage capabilities, cannot be satisfied by space qualified components and the choice of COTS is mandatory. The fault tolerant solid state mass memory (FTSSMM) presented here is based on COTS components. In [4] different coding schemes based on fixed Reed-Solomon (RS) codes, and hardware configurations to protect data stored in COTS-based memories are compared. However, in our architecture highly reconfigurability of RS codes is exploited to obtain a trade-off between reliability, data integrity, and overhead. The reconfigurability of RS codes is also used to achieve a graceful degradation of the system and the discrimination between permanent and transient fault allows to reduce the use of spare elements. Moreover, in our architecture a number of SpaceWire data links [5] accesses the memory banks through a crossbar switch matrix [6]. This solution has many advantages with respect to a bus-based architecture in terms of bandwidth, latency, and reconfiguration capability. In fact, the failure of a connection does not compromise the entire connection of the network but only the access to a specific node. In order to improve both the fault tolerance and the memory usage, a distributed file system has been implemented. Most of the functions performed by the file system are hardware based and handled locally on each memory module. This paper is organized as follows. Section II illustrates the used design methodology, Section III describes in details the FTSSMM architecture, while the reliability, data integrity, graceful degradation evaluation, and the used simulation methodology are reported, respectively, in Section IV and Section V. In Section VI a description of the prototype setup and a description of the used fast prototyping methodology is presented. Conclusions are drawn in Section VII.