An Overview of Grid Technology

What is Grid Technology?

“Grid Computing” is a term that covers several aspects of distributed computing; for sharing computationally intensive operations; sharing and managing data; or even sharing access to a physical resource such as a telescope. A Grid is made up of resources called Nodes which make processing power and disk storage available via the software running on the host computer which may take the shape of a laptop computer on somebody’s desk or a powerful computer in a server room.

 

Why do we need Grids?

Traditionally, academic and military scientists have relied on very costly supercomputers to process the vast amounts of data or complex mathematical problems involved in weather forecasting, climate research, molecular modelling and physical simulations of things like wind tunnels and nuclear detonations. Supercomputers have many processors closely linked together to achieve astonishing number-crunching potential. However, they are expensive to build, expensive to operate and can quickly be made obsolete by emerging technologies.

 

A Grid, on the other hand, may be built up of a large number of disparate commodity computers connected by a private network or the Internet. Thus a Grid may be continuously improved by adding or upgrading nodes and often the computing resources are offered voluntarily, so it is a very economical way for an academic to access substantial computing power.

 

Disadvantages

Because Grid users generally have no control over the computers that are handling their data and computations, Grid designers need to introduce measures to prevent malfunctions or malicious participants from producing false or erroneous results. There is also no guarantee that a node is going to remain available to complete the task that was assigned to it, so work units have to be able to be reassigned to other nodes when a given node fails to return the expected results.

 

Grids Today

A well-known and successful example of Grid computing is the SETI@Home project which has been utilising the spare processing cycles on computers that might otherwise be used for displaying a screen saver of a simulated fish tank. What for? To sift through the radio telescope data in the hope of finding signs of Extra Terrestrial Intelligence. SETI@Home has been growing since 1999 and as of May 2008, has more than 350,000 active computers and the ability to compute over 480 TeraFLOPS. By contrast, the world’s current fastest supercomputer, IBM’s Blue Gene, peaks at just over 596 TFLOPS.

Another Grid project has been making waves in the press recently. This summer, the CERN laboratory in Switzerland will switch on their Large Hadron Collider particle accelerator, which, it is estimated, will produce enough data to fill over 50m CDs annually. In anticipation of this deluge of data, the scientists at CERN have built a Grid using dedicated fibre optic cables to distribute the data to some 55,000 servers worldwide without swamping the Internet.

 

Grid Architecture  

 

Grid architecture can be defined as a number of layers to provide specific functions. The highest layer, the Application and Serviceware layer contains all of the software that the user will see and interact with.

 

The serviceware provides general management functions such as tracking who is providing resources and who is using them – and how much. This enables accounting and billing in a commercial environment.

 

It is important to remember that normal applications which run on a standalone PC will not be able to benefit from the power of Grid resources without modification; just like “webifying” applications to run on a web browser, Grid users must “gridify” their applications to run on a Grid.

 

Next is the Middleware layer, this software defines the communication and authentication protocols that enable all of the various elements of the Grid to participate and interact with each other. The middleware makes authentication and authorisation decisions; locates available resources; provides monitoring and diagnostics functions; allocates and schedules work to resources, and so on. The middleware is considered to be a hidden layer because it’s not something that the users should have to see or interact with.

 

Below the middleware lies the Resource layer containing actual physical resources such as computers, storage systems and sensors that are connected to the network.

 

Underlying everything is the Network layer connecting all of the Grid resources. The physical infrastructure of the Grid in the bottom two layers is often referred to as the “Fabric”.

 

Beacon Computer Technology’s Grid Development

Over the last year, Beacon have been working with the Research and Development department of a global broadcasting organisation to see how Grid Computing can assist them with their forecasted needs of distributing and sharing very large amounts of video and audio data over the Internet.

 

Using Debian GNU/Linux, we built a Grid of nodes running the Globus Toolkit and tested various options for securing access to the Grid using Public Key Infrastructure (PKI). We have written software in Java to integrate with the Globus Toolkit that allows the upload and download of media to and from the Beacon Grid, along with meaningful metadata to describe and index the content.

 

To test the viability of using Grid Technology for on-demand video delivery, we have built a prototype Set-Top Box. The Set-Top Box software, which is platform independent, has been written to find, play and share video data via a basic remote control driven user interface.

 

It is clear that Grids have potential beyond scientific research and we may, one day soon be participating in powerful distributed computing systems from the comfort of our sofas!