Position Paper for the Workshop on Strategic Directions in Computing Research Working Group " Parallel and Distributed Computation" Subgroup "Dependable Computing" L. Simoncini *, A. Bondavalli ** * University of Pisa, Italy ** CNUCE-CNR, Italy Dependability is defined as the trustworthiness of a computer system such that reliance can justifiably be placed on the service it delivers. The main focus is posed on the service provided by the computer system, intended as the behaviour of the system as it is perceived by its users, and on the justifiability on the reliance of the service. Concerning dependability it is possible to identify: 1) impairments to dependability, further distinct into failures (delivered service does not comply with the specification of the expected service), errors (the part of the system state liable to lead to subsequent failure) and faults (the adjudged or hypothesised cause of an error); 2) means for dependability, further distinct into means for procuring dependability: fault prevention, that is how to prevent fault occurrence or introduction, and fault tolerance, that is how to provide a service complying with the specification in spite of faults, and means for dependability validation: fault removal, that is how to reduce the presence of faults, and fault forecasting, that is how to estimate the present number, the future incidence, and the consequence of faults; 3) attributes of dependability, which enable the properties that are expected from the system to be expressed and allow the system quality resulting from the impairments and the means opposing to them to be assessed. These attributes, depending on the type of service the system is intended to deliver, are: availability, with respect to readiness to usage of the system, reliability, with respect to the continuity of the service, safety, with respect to the avoidance of catastrophic consequences on the environment the system is intended to serve and security, with respect to prevention of unauthorised access and/or handling of information [1]. For each of the three main groups indicated before, it is important to assess the state of the art and the open problems that are faced today to design systems that provide dependable services that can be justifiably assessed, the main problem being the pervasiveness of computing systems, their increasing complexity and the provision of services in critical applications. For the impairments to dependability: we know better the process of generation of HW faults, their modelling and evaluation although in some cases it may be difficult to make a distinction between hardware operational faults and design faults. In fact, very often the design of a system takes into account slight deviation from expected behaviour of the components and some tolerance is included; then it is difficult to judge if failures caused by special operating condition and/or deviation at the limit of the allowed tolerances are due to hardware or design faults. Beside this type of difficulties, design (i.e. software) faults present still open problems. Among the most severe are those concerning the various forms of correlation, which may affect a combination of components both at the same time and in sequential executions. Here investigations are necessary to understand the process of creation of faults as well as their consequences on systems like propagation or exploitation of other otherwise dormant faults. In a system there are generally more software than hardware components and the software organisation is by far more complex, dynamic and flexible than the hardware one. This makes integration faults another considerable and insufficiently known source of dependability impairment. For the means for procuring dependability: the old recurrent dispute on the effectiveness of fault tolerance vs. effectiveness of fault prevention is still unresolved. There is in fact no evidence on a general supremacy of each method over the other and even no case studies are available on budget assignment strategies to these activities that can be expected to optimise the final result. As far as fault tolerance in general is concerned, open problems are: coverage of faults by the several fault tolerance techniques, mastering of the added complexity introduced by redundancy, design of adjudication mechanisms. In addition software fault tolerance shows additional problems such as: identification of a proper set of diversity enforcing rules and integration in the design process of measures for ensuring diversity. For the means of dependability validation: the use of formal methods is starting to be requested by regulatory agencies for the certification of systems for critical applications and the use of these methods in industrial environments is far from being largely accepted by design engineers; different types of software testing are still under evaluation regarding their effectiveness and cost of development, while finally the problem of validating the validation tools and strategies adds a further problem to the ability of an overall dependability assessment of systems. Dependability forecasting offers many open problems, the most difficult of which is 'ultra dependability' evaluation. The requirements for the most critical applications are so strict that no methodologies are known to provide confidence in such high numbers. Several disparate sources of evidence are sought and ways to combine them must be devised. This problem is in any case only the most apparent one, being many others to be solved: analysis methods for coping with large number of states, reduction and abstraction methods for making the models manageable, non standard types of distributions, links to experimental forecasting methodologies such as fault injection and testing, confidence in the models built, confidence in the fact that the necessary assumptions do not affect the result obtained above the precision desired. Finally the production process of a dependable system, from requirements to specification, to design decisions, to prototype implementation up to implementation of the first release, requires a continuous interaction between the decisions taken in each step with the validation and verification of each step, taking into account the possible problems arising in design integration and consolidation. This implies preliminary managing decisions on how many resources and funds are to be destined at the different efforts for producing a balanced design process for a dependable system: what percentage is to be destined to techniques for procuring dependability and to dependability validation, what percentage of validation must be based on formal or experimental techniques, and finally how to validate a product-in-a-process design for a system. [1] J.C. Laprie (ed.), "Dependability: Basic Concepts and Terminology", Dependable Computing and Fault-Tolerant Systems Vol. 5, Springer-Verlag, 1992.