Parallel Computing
Hardware 
(brief review of the world's resources)
[SSD Home] 

Hardware
The physical, touchable, material parts of a computer or other system. The term is used to distinguish these fixed parts of a system from the more changable software or data components which it executes, stores, or carries. 
Computer hardware typically consists chiefly of electronic devices (CPU, memory, display) with some electromechanical parts (keyboard, printer, disk drives, tape drives, loudspeakers) for input, output, and storage, though completely non-electronic (mechanical, electromechanical, hydraulic, biological) computers have also been conceived of and built. 
Source: http://wfn-shop.princeton.edu/foldoc/
 

Supercomputers and High-Performance Servers

 
Avalon A12 Fujitsu AP3000 Intel ASCI Option Red Scali Computer HS Series
Alex AVX Series 3 Fujitsu VX-E, VPP300-E, VPP-700 Intel Paragon XP/S Scali US and PII Series
C-DAC OpenFrame 9000 Fujitsu VPP500 Kendall Square Research KSR1 Siemens Pyramid Releant RM1000
C-DAC PARAM 8600 Fujitsu GS8600 MasPar MP-1/MP-2 Siemens Pyramid RM1000 Enterprise Server
C-DAC PARAM 8000 Fujitsu GS8400 Matsushita ADENART Siemens Pyramid RM600 E Series
Connection Machines CM-5 Hewlett Packard HP9000 V-class Server Meiko CS-2HA Sun HPC10000
Cray Origin 2000 HP/Convex Exemplar SPP12000/XA nCUBE 2S Sun HPC6500
Cray T3E HP/Convex Exemplar SPP12000/CD NEC SX-4 Supercomputer Sun HPC5500
Cray J90 Hitach SR2201 Parsys TA9000 Sun HPC4500
Cray T90 IBM RS/6000 SP Parsytec CCe Series Sun HPC3500
Digital AlphaServers, Clusters IBM S/390 G4 Enterprise Server Parsytec CCi Series Supercomputing Systems AG GigaBooster
IBM S/390 G4 Multiprise 2000 Parsytec PowerMouse Tera MTA
 


Supercomputer
A broad term for one of the fastest computers currently available. Such computers are typically used for number crunching including scientific simulations, (animated) graphics, analysis of geological data (e.g. in petrochemical prospecting), structural analysis, computational fluid dynamics, physics, chemistry, electronic design, nuclear energy research and meteorology. 
Source: http://wfn-shop.princeton.edu/foldoc/
 


Avalon Computer Systems, Inc.
 
Series A12 Parallel Supercomputers
The A12 is distributed-memory multi-processor system with 12 - 1680 DEC Alpha 21164 RISC processors. 
Each CPU card is available with either a 1 or 4 Megabyte tertiary cache and employs two techniques to improve memory bandwidth: bank interleaving and cache-synchronized, page-mode access. The bandwidth to/from memory is 400 MB/s or one 64-bit word every 8 cycles. Standard 72-pin SIMMs are allowed. A processor card comes with 32 Megabytes of memory in the base configuration and expands to 1 Gigabyte per processor. 
The architecture of supercomputer is based on a scalable array of modules. Each module capable of holding up to 12 computational nodes and provides a peack performance of 9.6 Gigaflops. Connection between modules can provide a variety of configurations, such as 2-D grid, 3-D mesh, cascaded crossbar, parallel cascaded crossbar and special configuration. 
The system maximum performance is 1.3 TFlops
The operating system is AVALON micro kernel based Unix
The year of introduction is 1996
See also:The NetLib short description
 

Alex Informatics
 
AVX Series 3 High Performance Computer
The AVX Series 3 is a scaleable distributed-memory multiprocessor system with 1 - 32 Motorola PowerPC 604 RISC processor. 
Each node includes its own configuration of processor, memory, disk storage, and I/O facilities. Nodes communicate with each other using on-board Fast Ethernet links to a network interconnect or by using the Alex high-bandwidth, SHARC-based interconnect system. 
On-board Fast Ethernet links provide 100 Mbit/sec point-to-point internodal communications. 
Each processor has 16+16KB 1-level cache and 512KB 2-level cache. The overall system memory capacity is 32 - 256 MB. 
The system maximum performance is about 4 GFlops
The operating systems are AIX 3.2.5 or Windows NT
See also: VMP - AVX Series 2 - home page
 

C-DAC
 
PARAM OpenFrame 9000
The C-DAC PARAM OpenFrame 9000 is distributed-memory multi-computer system with 1 - 1024 SuperSPARC II processors. 
The system has scalable communication bandwidth of 10Mbytes/sec to 40 Mbytes/sec full duplex per node. 
C-DAC OpenFrame is configurable as massively parallel system, cluster of workstations or heterogenous meta computer. 
The system maximum performance is about 300 GFlops
The operating systems are PARAS micro kernel based Unix(compatible with Sun's Solaris). 
The year of introduction is 1996
See also:The NetLib short description
 
 
PARAM 8600
The C-DAC PARAM 8600 is multi-computer system based on Intel i860 microprocessors. 
The system statically configurable interconnection network with T80x as communication engines. 
The operating systems are PARAS micro kernel based Unix(compatible with Sun's Solaris). 
 
 
PARAM 8000
The C-DAC PARAM 8000 is Massively Parallel Processing Machine with 16 - 256 INMOS T800/T805 microprocessors. 
Each node of the system consists of the Transputer and associated memory. 
The system has statically configurable interconnection network. 
The 256 node PARAM 8000 has a peak computing power exceeding 1 GigaFLOP
 

Connection Machines Services, Inc.
 
CM-5
CM-5 is massively parallel computer system with maximum 16000 nodes. 
Each node of the CM-5 is a 22-Mips RISC Sun SPARC microprocessor that has four vector pipes, 64 KB cache and is capable of 128 Mflops peak speed. Each node have a total of 32 MBytes of memory and is connected to two inter-processor communication networks, the data network and the control network. 
The system maximum performance is about 1 TFlops
The operating system is CMOST
See also:CM-5 Guide
 

Cray Research
 



CRAY Origin 2000
CRAY Origin2000 is distributed-memory multi-computer available with 33 to 128 processors. 
The primary component of the CRAY Origin2000 system is the Origin2000 Module, a modular building block supporting two to eight MIPSÆ R10000 processors, 512MB to 16GB memory, and up to 6.4GB/second of peak I/O bandwidth. 
The architecture of the system incorporates two connection types: 4 CPUs on two node cards can communicate directly with the memory partitions of each other via the hub, a 4-ported non-blocking crossbar and hubs can be coupled to other hubs in a hypercube fashion. 
The system overall memory capacity is up to 256 GB
The system maximum performance is about 49.9 GFlops
The operating system is IRIX
The year of introduction is 1996 
See also:The NetLib short description, Press Release(October, 1996). 
 
CRAY T3E
CRAY T3E is distributed-memory multi-computer available with 6 - 2048 processors. 
There are three models of CRAY T3E series: T3E with DEC Alpha 21164 RISC processor, T3E-900 and T3E-1200 with Alpha 21064A RISC processor. The maximal performance of the node is 600, 900 and 1200 MFlops respectively. 
The architecture of the system is 3-D torus. 
The system overall memory capacity is up to 4 TB
The system maximum performance is 2458 GFlops
The operating system is UNICOS/mk(mikro kernel-based UNIX). 
The year of introduction is 1996
See also:The NetLib short description
 
CRAY J90
CRAY J90 is shared-memory multi-vector supercomputer with 4 - 32 processors. 
The system has one multiply and add vector pipe set per CPU at a clock cycle of 10 ns which results in a theoretical peak performance of 200 MFlops. 
CRAY J90se has the scalar enhanced processors. 
The system main memory capacity is 4 GB
The system maximum performance is 6.4 GFlops
The operating system is UNICOS (Cray Unix variant). 
The year of introduction is 1994
See also:The NetLib short description
 
CRAY T90
CRAY J90 is shared-memory multi-vector supercomputer with 1 - 32 processors. 
The system has one multiply and add vector pipe set per CPU at a clock cycle of 2.2 ns which results in a theoretical peak performance of 1.8 GFlops. 
In the CRAY T90 system there are no separate scalar processor and scalar and vector code have to share the same functional units. A small scalar cache is added to speed up scalar calculations. 
The system main memory capacity is 8 GB
The system maximum performance is 58 GFlops
The operating system is UNICOS (Cray Unix variant). 
The year of introduction is 1995
See also:The NetLib short description
 

Digital Equipment Corporation
 
AlphaServers 8400, 8200, Clusters
The AlphaServers are symmetric multi-processing systems which are based on the Alpha 21064A processor. 
The AlphaServer 8200 contains up to 6 processors, the AlphaServer 8200 contains up to 14 processors. AlphaServer Clusters contain up to 8 nodes and up to 96 processors. 
The connection structure for model 8400 and model 8200 is crossbar. The processor/memory bandwidth for each model is 1.87 GB/s
AlphaServers can be clustered using PCI bus Memory Channel link cables that are connected to a hub. The systems need not be of the same model. The bandwidth of this interconnect is 100 MB/s. To support this kind of cluster computing, HPF and optimised versions of PVM and MPI are available. 
The overall memory capacity is up to 12 GB for AlphaServer 8200, up to 28 GB for AlphaServer 8400 and up to 112 GB for AlphaServer Cluster. 
The system maximum performance is about 7.3 GFlops for AlphaServer 8200, about 17.2 GFlops for AlphaServer 8400 and about 68.6 GFlops for AlphaServer Cluster. 
The operating systems are Digital UNIX, Windows NT and OpenVMS
The year of introduction is 1997 
See also:The NetLib short description, Technical Specifications(October, 1997). 
 

Fujitsu
 




AP3000
Fujitsu's Parallel Server AP3000 is a distributed memory scalar parallel server based on 64-bit UltraSPARC-I technology. 
The AP300 system consists of 4 - 1024 nodes which are connected in a 2-D torus structure with a bi-directional bandwidth of 200 MB/s. Each node can have 1 or 2 CPUs and shared onboard memory when 2 CPUs presented. The nodes are connected by high-speed network (AP-net) and controlled by Control workstation through Control network. 
HPF is available and the machine can also be used with a message passing model as customised MPI/AP and PVM/AP are offered. 
The overall memory capacity is 256 GB - 2 TB, the internal disk capasity is 8.4 GB - 8.4 TB
The system maximum performance is 614 GFlops
The operating systems are Cell OS (transparent to the user) and Solaris
The year of introduction is 1996
See also:The NetLib short description, AP3000 Product description
 
VX-E Series/VPP300-E Series/VPP700 Series

All these Fujitsu systems are distributed-memory vector multiprocessors. Architecture of the system is based on 1 - 256 Processing Elements linked via the high-speed crossbar network. Each processing element consists of a Scalar Unit and a Vector Unit and can have up to 2 GB of main memory and can achieve a maximum vector performance of 2.4 GFlops. The maximum 256 PE configuration offers an overall vector performance of 614.4 GFlops together with 512 GB of main memory. The point-to-point communication bandwidth is 570 MB/s
The VX-E Series system contains 1 - 4 processing elements with overall performance of 9.2 GFlops, maximal memory capacity 8 GB
The VPP300-E Series system contains 1 - 16 processing elements with overall performance of 36.5 GFlops, maximal memory capacity 32 GB
The VPP700-E Series system contains 8 - 256 processing elements with overall performance of 614.4 GFlops, maximal memory capacity 512 GB
The operating system is UXP/V (based on the System V release 4 variant of Unix). 
The year of introduction is 1995 for VX, VPP300 and 1996 for VPP700. 
See also:The NetLib short description, Technical Specifications

 
VPP500 Series

The VPP500 system is distributed-memory vector parallel multiprocessor. It consists of processing elements (PEs) that perform arithmetic operations, control processors (CPs) that perform system control, and a crossbar network that links these together. The VPP500 system links up to 222 PEs. Each PE can perform high-speed vector operations at 1.6 GFlops of theoretical peak performance. The system peak performance is 355.2 GFlops
The point-to-point communication bandwith is up to 800 MB/s
The system main memory capacity is 1 - 222 GB

 
GS8600

The GS8600 is multiprocessor system, which consists of a number of clusters. It provides an Extended Virtual Machine (EVM) facility. In conjunction with the AVM/EX system software, it allows up to six operating systems to be run per cluster. The EVM facility limits the need for additional channels by sharing an OCLINK channel among a number of operating systems without any increase in overhead. 
Each single cluster processing unit contains 1 - 8 CPUs, the main storage unit with capacity of 256 - 8192 MB

 
GS8400
The GS8400 is multiprocessor system, which consists of a number of clusters. It provides an Extended Virtual Machine (EVM) facility. In conjunction with the AVM/EX system software, it allows up to six operating systems to be run per cluster. The EVM facility limits the need for additional channels by sharing an OCLINK channel among a number of operating systems without any increase in overhead. 
Each single cluster processing unit contains 1 - 4 CPUs, the main storage unit with capacity of 256 - 1024 MB
The system can have up to 8 clusters with 64 - 4096 MB of main storage capacity per cluster
 

Hewlett Packard
 
HP 9000 V-class Enterprise Server
The V2200 and V2250 servers are 1 to 16 processor symmetric multiprocessor (SMP) systems. The foundation of the V-Class architecture is a memory crossbar technology called HyperPlane[tm] combined with 200/240 MHz PA-8200 RISC processors. 
A V-Class system contains up to 8 memory boards, and 8 I/O channels with up to 24 PCI I/O controllers. Each of the eight crossbar ports on the processor side connects to a single agent. Each agent supports a pair of PA-8200 processors and a 240 Mbytes/second I/O channel bandwidth. On the memory side of the crossbar, each connects to 4-way interleaved memory board. The system supports up to 16 GB synchronous DRAM
The V2250 system peak TPC-C performance is 52,117 tpmC, 15.360 GFlops
The V-Class can also act as a node within a HyperClass cluster. 
The operating system is HP-UX V11.0(Hewlett Packard version of UNIX). 
The year of introduction is 1998
See also: HP 9000 V-Class Enterprise Server Overview White Paper 
 
 
HP/Convex Exemplar SPP1200/XA

The Exemplar SPP1200/XA is a scalable parallel computing system with 8 - 128 Hewlett Packard PA7200 RISC processors. 
The system architecture is scalable MIMD with two-level coherent memory/interconnect hierarchy, global shared memory and message passing (MP) programming models. 
The peak performance of the system is 30.7 GFlops (for 128 CPU), the maximal memory capacity is 32 Gbytes with 4 Gbytes/sec peak I/O bandwidth
The operating system is SPP-UX. 
See also:Convex Division Data Sheets, SPP1200/XA Convex Data Sheet (gzipped ps-file). 

 
 
HP/Convex Exemplar SPP1200/CD
The Exemplar SPP1200/CD is a scalable parallel computing system with 2 - 16 Hewlett Packard PA7200 RISC processors. 
The system architecture is scalable MIMD with two-level coherent memory/interconnect hierarchy, global shared memory and explicit message passing (EMP) programming models. 
The peak performance of the system is 3.84 GFlops (for 128 CPU), the maximal memory capacity is 4 Gbytes with 500 Mbytes/sec peak I/O bandwidth
The operating system is SPP-UX. 
See also:Convex Division Data Sheets, SPP1200/CD Convex Data Sheet (gzipped ps-file). 
 

Hitachi
 
SR2201 Massively Parallel Processor
The Hitach SR2201 High-end model is massively parallel computing system ranges from 32 to 2,048 processors. 
The SR2201 uses Hitachi's new high-performance RISC chips. Each chip has peak theoretical performance of 0.3GFlops, addressable memory of 64 - 1024MB, 16KB/16KB primary cache (instructions/data) and 512KB/512KB secondary cache (instructions/data). 
The SR2201 High-end system peak performance is 614.4 GFlops
The system maximal storage capasity is 2TB
The interprocessor network is 2,3 - dimensional crossbar network with PE-PE transfer speed of 300MB/s
The system supports Express, PVM and MPI programming models. 
The operating system is UNIX-based HI-UX/MPP. 
The year of introduction is December 1996
See also: Netlib short description
 

IBM
 
RS/6000 SP System
The IBM RS/6000 SP is a scalable parallel computing system with up to 128 processor nodes (up to 512 by special request). 
The processor node is the basic SP building block. It consists of a POWER2SC microprocessor or PowerPC 604e symmetric multiprocessor (SMP), memory, PCI or Micro Channel(R) expansion slots for I/O and connectivity, and disk devices. The three sizes of nodes (thin, wide, and high) may be mixed in a system and are housed in short or tall system frames. These frames can be interconnected to form a system with up to 128 nodes (512 by special request).The maximum of 64 SMP (high) nodes can be installed per system. 
Each node provides from 32KB data/32KB instruction to 128KB data/32KB instruction L1 cache, from 256KB to 2MB L2 cache per processor; the RAM memory capacity from 64MB to 256MB and the internal disk storage of 4.5GB. The system expansion allows to have up to 4GB RAM with 18GB disk storage or up to 3GB RAM with 36.4GB disk storage
The SP Swith is used for internode communication and provides a bi-directional data-transfer rate of 122MB/second between each node pair. 
The RS/6000 system peak performance is 0.528 GFlops
The system supports PVM and MPI programming models. 
The operating system is AIX Version 4
The year of introduction is 1997
See also: SP hints, tips & white papers, RS/6000 White Paper: Node Selection For The System, Netlib short description of the 9076 SP2 System
 
 
S/390 G4 Enterprise Server

The S/390 G4 Enterprise Server is pultiprocessor computing system with up to 10processors. 
The system processor storage is 512MB - 16GB
The operating systems are OS/390, VM and VSE

 
 
S/390 Multiprise 2000 Server

The S/390 G4 Multiprise 2000 Server is pultiprocessor computing system with up to 10processors. 
The system processor storage is 256MB - 4GB
The operating systems are OS/390, VM and VSE

 

Intel
 
Intel ASCI Option Red Supercomputer
The ASCI TFLOPS is a Massively Parallel Processor (MPP) with a distributed memory Multiple-Instruction, Multiple Data (MIMD) architecture. 
The system's 9,216 Pentium® Pro processors with 596 Gbytes of RAM are connected through a 38 x 32 x 2 mesh. The system has a peak computation rate of 1.8 TFLOPS and a cross-section bandwidth (measured across the two 32 x 38 planes) of over 51 GB/sec
The system contains 4,536 computing nodes called Eagle Node. Each node includes two 200MGz Pentium Pro processors, up to 256 MB DRAM and two L2 cache. The processor-memory bandwidth is 533MB/sec. The compute node peak performance is 400 MFLOPS
The bidirectional node-to-node bandwidth is 800MB/sec, bi-directional cross-section bandwidth is 51.6GB/sec
The system peak performance is 1.8 TFLOPS, the RAID I/O bandwidth (per subsystem) is 1GB/sec and RAID storage (per subsystem) is 1TB
The system supports MPI and NX (for Paragon developed applications) programming models. 
The ASCI Option Red supercomputer has the multiply operating system configuration. For the service, I/O and partition the TFLOPS OS, port of Paragon OS (Intel version of UNIX), is used. The compute partition needs satisfied by using Cougar - the version of SUNMOS operating system (used on Intel Paragon XP/S supercomputers). 
The year of introduction is 1997
See also: A brief description at Sandia National Laboratories, The Overview of Intel TFLOPS Supercomputer, Overview of ASCI Red
 
 
Intel Paragon XP/S Supercomputer
The Intel Paragon XP/S is scalable distributed multicomputer system with up to 1984 nodes based on Intel i860XP RISC processor. 
The Paragon's processing nodes are arranged in a two-dimensional rectangular grid. The system memory is distributed among the nodes. There are three groups of nodes: compute nodes, service nodes and I/O nodes. The node-to-node bandwidth is 175 MB/s
Each node contains three prcessors: two application processors and one message processor. All processors share the node memory with capacity of 16 - 32 MB. Each processor has theoretical peak performance of 75 MFLOPS
All communications between nodes in the grid provided by Paragon Mesh Routing Chips, connected to each neighbour by 16-bit channel. To each iMRC one node may be attached. 
The system peak performance (with 1984 compute nodes) is 148.8 GFLOPS
The system supports MPI, SVM and NX (for Paragon developed applications) programming models. 
The operating system is OSF/1
The year of introduction is 1994
See also: Rudolf Berrendorf, Heribert C. Burg, Ulrich Detert, Rüdiger Esser, Michael Gerndt, Renate Knecht, "Intel Paragon XP/S - Architecture, Software, Environment and performance" (paper in .ps file), Oliver A.McBrayan's Intel Paragon Overview paper
 

Kendall Square Research
(this company no longer exist)
 
KSR1

The KSR1 is a massively parallel computer system designed to be scalable from 8 to 1088 processors. 
Each processor is a RISC-style superscalar 64-bit unit operating at 20 MIPS and 40 MFLOPS (peak). This unit is a four chip set implemented in 1.2 micron CMOS: basic control unit (CEU), floating point unit (FPU), integer and logical operations unit (IPU) and I/O channel unit (XIU). 
All processors share memory of 1TB, using ALLCASHE technique. 
The system peak performance ranges from 320 to 43,520 MFLOPS
The KSR1 uses shared memory programming model. The operating system is KSR OS (extension of OSF/1). 
See also:Hardware description of the KSR1-64

 

MasPar Computer Corporation
 
MP-1/MP-2

The MP-1/MP-2 machine is massively parallel shared memory SIMD supercomputer which consists of 1024 - 16384 processing elements. 
Each custom designed by MasPar processing element is RISC-like unit grouped into clusters of 16 on chips. Each cluster has the PE memories and connection to the communication network. 
The topology is 2-D grid with 8 nearest neighbours connections. 
The data transfer speed is 20 GBytes/s between nearest neighbours and 1.3GBytes/s by using network router. 
The system peak performance is 1.2GFLOPS
The system memory capacity is 1TB
The maximum system storage capacity is 132GB
See also: Netlib short description, MasPar MP-1 at the SCL 

 

Matsushita
 
ADENART
The ADENART is distributed-memory multi-computer system with 64 - 256 processors. 
Each processor has 2MB of memory, 20NBytes/s communication bandwidth and peak performance of 10MFLOPS
The connection structure is multiplane and has two types of connections: through crossbar between two processors in the same plane and through two corresponding crossbars between processors placed on different planes. 
The system main memory capacity is 0.5GB, maximum performance is 2.56 GFlops
The operating systems are internal OS transparent to user and SunOS on front-end computer. 
See also:The NHSE Review short article
 

Meiko
 
CS-2HA
The Meiko CS-2HA is distributed-memory multi-processor system with 8 - 1024 processor elements. 
There may be appropriate mix of scalar and vector RISC-based processing elements. Each scalar processing element has two 90-MGz Sparc processors, 512KB cache, 32 - 512MB main memory and 180 MFLOPS peak performance. Each vector processing element has one Sparc and two vector processors, 256KB cache, 128MB main memory and 200 MFLOPS peak performance. 
The interprocessor communication network is a multistage crossbar organized by communication processors. The bidirectional bandwidth is 100MBytes/s over physical link operating at 0.6GBit/s in each direction. See also Communication Network overview
The system peak performance is 204.8 GFlops
The system supports two styles of parallel programming: strongly synchronous SPMD model and loosely synchronous message passing model (based on libraries Express, PARMACS, P4, PVM, etc.). 
The operating systems are internal OS (transparent to user) and Solaris on the front-end system. 
The year of introduction is 1994
See also: Meiko World, Netlib short description
 

nCUBE
 
nCUBE 2S
The nCUBE 2S is massively parallel distributed-memory multiprocessor system with 8 - 8192 processors. 
The nCUBE3 processor has 8 KBytes instruction and 8 KBytes data cache, up to 64 MBytes memory, can address up to 1 GBytes of memory and has theoretical peak performance of 3.0 MFLOPS
The nCUBE 2S maximal memory capacity is 256 GBytes
The system peak performance is 19.7 GFLOPS
The connection structure is hypercube. 
The operating systems are internal OS transparent to user and SunOS on the front-end system. 
The year of introduction is 1993
See also: Netlib short description, Alabama Supercomputer Network Tour (nCUBE 2S Model 10). 
 

NEC
 
SX-4 Supercomputer
The SX-4 is a distributed memory scalable parallel vector supercomputer with 1 - 512 processors. 
The SX-4 Series central processor unit combine a 2 GigaFLOPS eight-pipeline vector unit, one RISC-architecture state-of-the-art superscalar unit, main memory unit, extended memory unit and input/output processor. 
The maximal main memory capacity is 256 GB with aggregate maximum bandwidth of 1 TBytes/s, extended memory capacity is 512 GBytes with maximum bandwidth of 192 GBytes/s
The system peak performance is 1 TeraFLOPS
The operating system is EWS-UX/V (Unix version based on Unix System V.4). 
The year of introduction is 1995
See also: Netlib short description
 

Parsys
 
TA 9000
The Parsys TA9000 is distributed-memory multi-processor system with up to 512 DEC Alpha 21066 processors. 
The TA9000 maximal memory capacity is up to 32 GBytes, peak memory bandwidth is 25 MBytes/s per link. 
The connection structure is multi-stage crossbar. 
The system peak performance is 119.3 GFLOPS
The operating system is Idris (a real-time sub-unix variant). 
The year of introduction is 1995
See also: NHSEReview paper
 

Parsytec
 
Parsytec CCe
The Parsytec CCe is distributed-memory multi-processor system with 8 PowerPC 604e processor based nodes. 
Each node has 32/32KB L1 cache, up to 1MB L2 cache and up to 512MB EDO DRAM. 
The connection structure is full 8x8 crossbar with router bandwidth 8x2x40 MBytes/s. The network router allows to connect processing boards to unlimited topology. 
The system peak performance is 3.6 GFLOPS
The system supports PVM and MPI programming models. 
The operating system is Embedded PARIX - EPX (Embedded Parallel Extensiions to UNIX). 
The year of introduction is 1996
 
 
Parsytec CCi
The Parsytec CCi is distributed-memory multi-processor system based on Intel Pemtium Pro processor. The maximal configuration installed is 144 node system. 
The connection structure is similar to all CC-series systems and based on crossbar with router bandwidth 8x2x40 MBytes/s. 
Each node peak performance is 200 MFLOPS and overall 144-node theoretical peak performance is 28.8 GFLOPS
The operating systems are Windows NT 4.0 and ParsyFRame (UNIX environment is optional). 
The year of introduction is 1996
 
 
Parsytec PowerMouse
The Parsytec PowerMouse is distributed-memory multi-processor system based on PowerPC 604e processor. The system has unlimited scalability within grid topology. Each node has 32/32KB L1 cache, 64MB SDRAM with 422 MBytes/s bandwidth and 400 MFLOPS peak performance. 
The system supports PVM and MPI programming models. 
The operating system is PARIX/PowerTools
The Parsytec PowerMouse is extension of famous PowerXplorer system. 
 

Scali Computer
 
HS Series
The HS Series employs hyperSPARC processors in the SMP Nodes consisting of a motherboard with 4 hyperSPARC processors from ROSS Technology. The processors run at 180MHz clock speed and comes with 512kBytes of L2 cache. 
HS-400 is the smallest configuration and has a peak performance of 79.7 SPEC-fp95 or 2.9 GFLOPS while the full SingleTower configuration peaks at 159.4 SPEC-fp95 or 5.8 GFLOPS. The maximum main memory capacity is 4Gbytes for the HS-400 and 8Gbytes for the HS-800
The Scali Internode Communication Channel - ICC - provides a large selection of interconnect topologies. The basic HS-400 model uses a single ring, while the largest HS-800 uses a combination of switches and counter rotating rings. The SCI provides 200 MByte/s interconnect bandwidth per SCI ringlet used in the HS-400 and 400 MByte/s for dual ringlets used in the HS-800 system. The bisection bandwidth for the HS-800 ICC is 2.4 GByte/s
The Scali parallel programming paradigm is MPI. Code for MPI can be generated automatically from a sequential source by using tools from the Scali Software Platform (SSP). These tools include compilers, debuggers and performance monitoring tools for writing, porting or implementing parallel programs. By using the Apogee series of compilers the Scali® HS accommodate automatic parallelization, processor and cache specific optimizations and loop optimizations. 
The Scali HS series runs the Solaris operating environment, the UNIX®-based operating system. The Scali HS series complies with the SCD-1-1 and SCD-2-1 requirements. 
The HS series systems may be scaled up to 32 SMP nodes with 128 processors having a performance of 637.4 SPEC-fp95 (or 23 GFLOPS)
 
 
US and PII Series
The Scali US and PII series is Mid-range computer systems. The US and PII system architecture consists of clustered SMPs nodes which uses SCI Interconnect standard. The SMP nodes are using the UltraSPARC (US model) or Pentium II (PII model) processor architecture with up to 2 processors per node. Models range from the smallest dual-node systems to multi-tower configurations of 4 or 8 nodes per tower. The scalability of the system provides capacity matching of both the individual SMP nodes as well as of the overall system and interconnect capacity and system availability, with up to 128 nodes. 
Scali US and PII systems offer a high degree of parallel program execution based on the UltraSPARC and Scali's native, high-performance implementation of MPI 1.1. - ScaMPI®. Using the Apogee series of compilers the Scali US and PII provides automatic parallelization, processor and cache specific optimizations and loop optimizations through unrolling, interchanging and strip mining. 
Scali US and PII systems offer a large selection of interconnect topologies from simple rings to 2D meshes and a combination of rings and switch topologies. The interconnect has a data transfer rate of 400 Mbytes/sec between SMP nodes and allows several SMP nodes to communicate at the same time. The bisectional bandwidth is 32Gbyte/sec for an 8 by8 node system
The Scali PII series runs the SolarisX86 operating environment. Windows NT is planned for Q2 1998. 
 

Siemens Pyramid
 
Releant RM1000
The Reliant RM1000 is scalable to 768 processing nodes, 128 Reliant RM1000 cells, up to 768 GB of memory, a maximum of 28 TB of disk capacity, and supports more than 10 TB of on-line information. 
Each processor board contains its own 64-bit microprocessor, copy of the operating system, memory, cache, communications interface, and two fast-and-wide SCSI-2 I/O controllers. 
The Reliant RM1000 is based on modular cabinets called cells. Cells can be stacked two high and connected side-by-side to build larger configurations. Each cell supports a maximum of six single processor nodes, up to 24 hot-pluggable disk nodes, control and environment nodes for service network, redundant cooling, four internal dual hosted SCSI buses and four external SCSI buses, six Ethernet interfaces and our optional QIC-320 tape drive. Each cell also features redundant Mesh Interconnects, N+1 power supplies and redundant disk fans, hot swappable power supplies, fans, processor, disk, control and environmental nodes, RAID 5 and/or mirror disks, and redundant disk paths. 
 
 
RM1000 Enterprise Server

The Reliant RM1000 enterprise cluster server offers a three-in-one combination of SMP, MPP and clustering. It is configured with 214 hot- swappable Reliant RM1000 MPP nodes, 32 Reliant RM1000 cells, 2 RM600 E's starting at 16-CPU SMP nodes, 7.8 TB hot-swappable disk storage and up to 214 GB of memory. It also has high-availability cluster features with no single point of failure, and an integrated, configurable data cache to reduce database I/O. 
The Reliant RM1000 enterprise cluster server integrates SMP and MPP nodes in a cluster. 

 
 
RM600 E Server

The system architecture for the RM600 E is a dual bus architecture. The first is the Cluster Processor (CP) bus which groups up to four R10000 CPUs together with NUMA memory on one processor board. The RM600 E20 can support up to 2 processor boards (8 CPUs) and 8 GB of memory while the RM600 E60 can support up to 6 processor boards (24 CPUs) and 24 GB of memory. The second bus comprises the system backplane. The system backplane is constructed around a highly parallel bus called the SP bus. This component is a split-transaction bus, which separates the different system actions to run in parallel and which uses buffering techniques to maximize that parallelism. 
The new RM600 E uses cache-coherent non-uniform memory access (cc- NUMA). This memory technique is twice as fast as a machine with a more conventional memory access methodology. In this methodology, each CPU can access every local or remote memory location. The architecture also retains the concept of a single global shared memory for the applications on a specific platform. 
The operating system is Reliant UNIX

 

Sun Microsystems
 
HPC10000
The HPC10000 is distributed memory multicomputer system with number of processors ranged from 16 to 64. The system based on 336 MHz UltraSPARC-II microprocessors with 4MB external L2 cache. The CPU interface provided by 64-bit Ultra Port Architecture (UPA) slots
The system contains maximum of 16 boards, minimum configuration contains 4 boards. Each board holds up to 4 processors, up to 4 SBus cards, memory module with 4 banks of 8 SIMMs each. 
The main memory capacity is from 2GB to 64GB per system. There are also 512MB and 2GB memory expansion options (each a group of 16 SIMMs) and up to 2 memory expansion options per system board. 
The system supports MPI and PVM parallel programming models. 
The operating system is Solaris 2.5.1
See HPC10000 specifications for more details. 
 
 
HPC6500
The HPC6500 is distributed memory multicomputer system with 1 - 64 336 MHz UltraSPARC-II microprocessors. Each microprocessor has 1 - 4MB external L2 cache. The CPU interface provided by 1 - 30 128-bit Ultra Port Architecture (UPA) slots
The system interconnect is Gigaplane(TM) with 2.6GB/s sustained (at 84 MHz) and 2.7GB/s peak (at 84 MHz) bandwidth. 
The system consists maximum of 16 boards, minimum of 1 CPU/Memory board and 1 I/O channel. Each CPU/Memory board holds up to 2 processors and 16 memory SIMM slots, each SBusI/O board offers 2 SBus channels, 3 SBus slots, SunFastEthernet(TM), fast/wide SCSI, 2 fibre channel sockets
The main memory capacity is 256MB - 30GB per system. 256MB and 1GB memory expansion options (each a group of 8 SIMMs) and up to 2 memory expansion options per board are available. 
Up to 2 disk boards with capacity of 8.4GB may be installed in the server chassis. More then 375GB of storage can be rackmounted in the system cabinet. The system supports over 10TB of external storage
The system supports MPI and PVM parallel programming models. 
The operating system is Solaris 2.6
See HPC6500 specifications for more details. 
 
 
HPC5500
The HPC6500 is distributed memory multicomputer system with 1 - 14 336 MHz UltraSPARC-II microprocessors. Each microprocessor has 1 - 4MB external L2 cache. The CPU interface provided by 1 - 14 128-bit Ultra Port Architecture (UPA) slots
The system interconnect is Gigaplane(TM) with 2.6GB/s sustained (at 84 MHz) and 2.7GB/s peak (at 84 MHz) bandwidth. 
The system consists maximum of 8 boards, minimum of 1 CPU/Memory board and 1 I/O channel. Each CPU/Memory board holds up to 2 processors and 16 memory SIMM slots, each SBusI/O board offers 2 SBus channels, 3 SBus slots, SunFastEthernet(TM), fast/wide SCSI, 2 fibre channel sockets
The main memory capacity is 256MB - 14GB per system. 256MB and 1GB memory expansion options (each a group of 8 SIMMs) and up to 2 memory expansion options per board are available. 
Up to 2 disk boards with capacity of 8.4GB may be installed in the server chassis. More then 500GB of storage can be rackmounted in the system cabinet. The system supports over 6TB of external storage
The system supports MPI and PVM parallel programming models. 
The operating system is Solaris 2.6
See HPC5500 specifications for more details. 
 
 
HPC4500
The HPC6500 is distributed memory multicomputer system with 1 - 14 336 MHz UltraSPARC-II microprocessors. Each microprocessor has 1 - 4MB external L2 cache. The CPU interface provided by 1 - 14 128-bit Ultra Port Architecture (UPA) slots
The system interconnect is Gigaplane(TM) with 2.6GB/s sustained (at 84 MHz) and 2.7GB/s peak (at 84 MHz) bandwidth. 
The system consists maximum of 8 boards, minimum of 1 CPU/Memory board and 1 I/O channel. Each CPU/Memory board holds up to 2 processors and 16 memory SIMM slots, each SBusI/O board offers 2 SBus channels, 3 SBus slots, SunFastEthernet(TM), fast/wide SCSI, 2 fibre channel sockets
The main memory capacity is 256MB - 14GB per system. 256MB and 1GB memory expansion options (each a group of 8 SIMMs) and up to 2 memory expansion options per board are available. 
Up to 2 disk boards with capacity of 8.4GB may be installed in the server chassis. The system supports over 4TB of external storage
The system supports MPI and PVM parallel programming models. 
The operating system is Solaris 2.6
See HPC4500 specifications for more details. 
 
 
HPC3500
The HPC6500 is distributed memory multicomputer system with 1 - 8 336 MHz UltraSPARC-II microprocessors. Each microprocessor has 1 - 4MB external L2 cache. The CPU interface provided by 1 - 8 128-bit Ultra Port Architecture (UPA) slots
The system interconnect is Gigaplane(TM) with 2.6GB/s sustained (at 84 MHz) and 2.7GB/s peak (at 84 MHz) bandwidth. 
The system consists maximum of 5 boards, minimum of 1 CPU/Memory board and 1 I/O channel. Each CPU/Memory board holds up to 2 processors and 16 memory SIMM slots, each SBusI/O board offers 2 SBus channels, 3 SBus slots, SunFastEthernet(TM), fast/wide SCSI, 2 fibre channel sockets
The main memory capacity is 256MB - 8GB per system. 256MB and 1GB memory expansion options (each a group of 8 SIMMs) and up to 2 memory expansion options per board are available. 
Maximum 8 hot-swappable FC-AL disk drives with dual connections may be installed in the server chassis. The system supports over 2TB of external storage
The system supports MPI and PVM parallel programming models. 
The operating system is Solaris 2.6
See HPC3500 specifications for more details. 
 

Supercomputing Systems AG
 
GigaBooster
The GigaBooster is distributed-memory multi-computer system with 7 233 MHz DEC Alpha 21066 microprocessors. 
The system has one root node with 32MB standard memory, 256MB maximal expandable memory, SCSI-2 and 4xPCI interfaces and other nodes with 16MB standard memory, 128MB maximal expandable memory and SCSI-2
The root node has 2GB standard internal HD mass storage and 6.4GB maximal one. Other nodes have 1GB/6.4GB
The interconnection structure is broadcast bus with 160MB/s bandwidth
The system supports ICI, PVM and MPI programming models. 
The operating system is Digital UNIX V3.2C 
See Technical description for more details. 
 

Tera Computer Company
 
Tera MTA
The Tera computer system is shared memory multiprocessor with 16 - 256 custom microprocessors. 
Each microprocessor is multithread and has 31 general-purpose 64-bit registers, 333MHz clock speed and 1GFLOPS peak performance. The peak memory bandwidth is 2.67GB/s
The interconnection network is a three-dimensional packet switched network nominally containing p^3/2 nodes, where p is the number of processors. These nodes are toroidally connected in three dimensions to form a p^1/2-ary three-cube, and processor and memory resources are attached to some of the nodes. 
The memory system is implemented as either 2p or 4p memory units distributed around the network. 
The maximum bandwidth in a p-processor system is 200p megabytes per second in each direction via p duplex HIPPI channels. The maximum main memory capacity is up to 16GB 
Maximum Strategy Gen5 XL RAIDs are used, with a sustained bandwidth of about 130MB/s each. At least p/16 disk arrays must be configured in a p-processor system. The maximum capacity per disk array is about 360GB, so system disk capacity can approach 300p GB
The system supports thread-based programming model. 
The operating system is UNIX BSD4.4
See also: Tera Press Releases
NHSE Review paper
 

This page is under construction.
Copyright (C) 1998 Anton V. Selikhov, Supercomputer Software Department RAS
Please, don't hesitate to send any comments or suggestions!
Best viewed with Netscape Navigator. The page last updated on June 3, 1998.