Title: High-Performance Parallel and Distributed Computing for the Masses. Author: Satoshi Matsuoka, The Univ. of Tokyo First of all, for the sake of focusing the discussions, I will concentrate on high-performance parallel and distributed computing (not necessarily but mostly numerical applications), and will not touch upon non performance-critical areas including typical Internet services such as e-mail and the World-Wide Web. I will also introduce some of the ongoing projects we have. The ongoing evolution in computer architecture development has brought about qualitative change in high-performance parallel and distributed computing. Except for very special vector pipeline processors, one can (almost) purchase some of the highest-performance microprocessors, off-the-shelf at your local neighborhood computer store. Although the availability of fast processor interconnect has not been as widespread, near-commodity interconnects, such as the FibreChannel and the Myrinet, have finally brought gigabit processor interconnect performance to workstations and PCs at reasonable prices. The promise of wide-area Gigabit networking is making the dream of 'World-Wide Networked High-Performance Computing' closer and closer to reality. (Indeed, `distributed' high-performance computing had been almost a misnomer in the past, due to the lack of communication performance, but this is changing.) There are skeptics who claim that with current-day technology, high-performance cannot be reached by high processor performance alone. As the CPU processing power increases, peripheral circuitry to support the corresponding massive data bandwidth needs to keep its pace. Indeed, Hitachi's SR2201, the fastest MPP to date with 1000 (will be upgraded to 2000 autumn 1996) modified PA-RISC processor, achieving over 220 LINPACK Gigaflops, sustains its computational bandwidth with cache-bypassing prefetching/poststore into its 128 FP registers, supported by a 16-way interleaving main memory system, plus hyper-crossbar interprocessor interconnect that has a point-to- point bandwidth of over 300MBytes/sec. This allows the solution of full QCD calculation under $32^4$ gridsize in 244 days. However, with emphasis on higher-performance, especially those driven by multimedia including those over the Internet, one could claim otherwise, in that memory and I/O systems to sustain ultra-high data bandwidth will also be commodity technologies. With high-performance parallel and distributed computing being less confined to large-scale Vector processors and MPPs, but rather becoming more and more available at commodity levels, so will the end-users of the technology. For example, the research group that consist of ourselves and several other universities, along with Electro-Technical Laboratories at Tsukuba, Japan, are currently engaged in the "Ninf project", which makes available high-performance computing servers, such as Cray C90/J90 as well as Fujitsu AP1000+ MPP and SR2201 over the Internet easily accessible from end-user programs using simple RPC protocols. Users can easily utilize high-quality numerical libraries such as LAPACK, without worrying about installation, version management, etc. There are projects with similar objectives, such as the Legion project at Univ. of Virginia, and NetSolve at Univ. of Tennessee. If any or all of these projects succeed, then accessing Gigaflops computational resources from one's desktop becomes a reality, just as the World-Wide Web has made desktop access of global information resources commonplace. Even without such global resource access, the aggregation of computing power over local network will be enormous; such 'cluster of workstations' is being actively investigated in research projects such as the NOW project at U.C. Berkeley, the COW project at Wisconsin, and the CCOS project at Real-World Computing (RWC) partnership in Japan. The CCOS project is especially ambitious in its objectives, as it aims for Gigabyte/sec bandwidth and sub-microsecond latency over clusters of workstations within a building. Thus, I strongly speculate that high-performance parallel and distributed computing will be available to the `masses', both implicitly as parts of common applications, and also for average programmers to exploit. As a result, software developed targeted toward such environment will be more abundant, less specialized---in other words, grinding out highly specialized parallel code via fine tuning under a `barren' programming environment (as is being done today) without any concerns for good software engineering disciplines such as reuseability, compositionality, portability, should become a thing of the past. Rather, techniques, practices, and realities of current-day mainstream commercial software development should become prevalent in the high-performance computing---that is to say, high-performance parallel and distributed will have to come into the mainstream of software development. For example, high-performance parallel applications software should be developed using some kind of object-oriented design methodologies and design patterns, programmed using parallel object-oriented languages and frameworks under the design constraint, and finally, tested and debugged using tools that support both parallel-OOAD and parallel-OO programming. The trouble is, none of the current 'real' (object-oriented in this case) systems currently support such an ideal to the full extent. For such an ideal situation to happen, researchers in parallel and distributed software such as language and systems designs as well as compilers and software tools construction, must unite, break out of the mold, and reestablish the positions that the chief purpose of their research is to serve the objectives of `high-performance parallel distributed computing for the masses'. There, developing high-performance parallel programs needs to be as easy and integrated to every-day computing environment as developing Applets for Java. At the same time, for developing large systems, we need a framework to support software engineering techniques intended for large-scale developments consisting of millions of lines of code. We can no longer indulge ourselves in developing `toy' software systems that can only run small programs very slowly, unusable concurrency models that ignore the cost of computing, or ignore software engineering disciplines employed in real software for practical use. One must look at the needs of real applications, determine the true cost of parallel operations, and furthermore, allow programmers to develop high-performance parallel programs as everyday software development effort. We can no longer be restricted to some 'ideal' model of computation---rather, integration with other languages, systems, and tools will be as equally important, to maximize acceptance and utilization of current software resources. Our research group, in collaboration with other universities and research institutions such as ETL and RWC, is investigating several research frontiers in this regard. All are aimed at bringing parallel computing to mass software development using sophisticated software techniques: * The Ninf Project---World-Wide, High-performance, Heterogeneous Network Computing. * Parallel Computing Support via Metaprogramming. * Standard Parallel Template Library for parallel C++ programming * Optimization of MPI message passing programs via compiler support * Declarative visualization of parallel programs In addition, in order to support widespread domestic research and development of new generation of practical parallel and distributed software, a research consortium called PDC (Parallel and Distributed Consortium) has been established in Japan. Its members include over 60 Japanese university researchers, and over 10 major computer and electronics companies, including IBM, TI, Sun Microsystems, NEC, Hitachi, Toshiba, Fujitsu, Mitsubishi, Matsushita, Sharp, and Sanyo. The research interests cover high-performance computer architectures, operating systems and distributed systems, parallel programming languages and compilers, high-performance parallel applications and libraries, and finally, real-time and multimedia systems. Coordinated joint research between the universities and private sectors is expected to produce truly beneficial parallel and distributed software systems that could become industry standards in the 21st century. ---