February 23, 2011

PoD 3.4

I'm very glad to announce a new stable release of PoD.

PoD version 3.4 has now been released.

Highlights of this release
  • Starting with version 3.4, PoD got its version numbering scheme a bit changed. It reflects the fact that PoD is both a production system and a research project. PoD now uses odd minor version numbers to denote development releases and even minor version numbers to denote stable releases.
  • Implemented a completely new "pod-info" command, which now works as a client for "pod-agent" and can be used to request information from remote PoD servers.
  • The PoD server starter script uses now the new "pod-info" and got significant improvements. It takes now less than a second to start PoD server.
  • The PoD worker package is updated automatically if any of its component is updated.
  • The SSH plug-in spawns now PROOF workers automatically according to a number of CPU Cores on workers, if a user doesn't request a specific number of workers.
  • Updated User's Manual.
  • Many minor bugfixes and further improvements were made.

February 21, 2011

A PROOF cluster in 7 seconds

Anar here...

Can anybody install a PROOF cluster (dynamically, from zero, on requested machines) in 7 seconds?
With PoD anybody can do that!


A bit of propaganda... ;)

I have just finished some small improvements of PoD's SSH plug-in and also decided to give it a bit of propaganda push once again.

Among other changes I made the plug-in a bit faster and also now if a user doesn't request a specific number of PROOF workers for some of the worker nodes, then pod-ssh will automatically setup PROOF workers on that worker node according to a number of CPU cores. It means, that the last parameter in the pod-ssh configuration file is not required anymore and can be left empty.

The screenshot below is just a demonstration of how easy and fast pod-ssh works.

In the demo, I have used the following configuration file:

r1, manafov@lxi020.gsi.de , , /misc/manafov/tmp/test,
r2, manafov@lxi021.gsi.de , , /misc/manafov/tmp/test,
r3, manafov@lxb333.gsi.de , , /misc/manafov/tmp/test,
r4, manafov@lxir036.gsi.de, , /misc/manafov/tmp/test,
r5, manafov@lxir037.gsi.de, , /misc/manafov/tmp/test,
r6, manafov@lxir038.gsi.de, , /misc/manafov/tmp/test,
#r_32bit, manafov@lxi009.gsi.de, , /misc/manafov/tmp/test, 

in order to use all described machines as my PROOF slaves.

For the sake of demonstration I wrote a simple script, just to show the timestamp of the execution of the all PoD command required to setup a cluster:

There is nothing special in the script, just the usual PoD command sequence:
pod-server start
pod-ssh (in case of SSH. For other RMS plug-ins it is "pod-submit")
and
pod-info
(refer to PoD user's manual for more information)

The screenshot below shows the output of the script:



It took me exactly 7 seconds to 
1. Start a PoD server (PROOF master).
2. Submit PoD jobs to 6 worker nodes using ssh.
3. Setup 44 PROOF workers out of these 6 PoD workers.



These worker nodes are just interactive machines or desktops standing in my institution somewhere. These boxes have no pre-installed PROOF software and are being configured on the fly by PoD.
At night I can count much more machines, which nobody use. With PoD and the SSH plug-in I can make a dynamic PROOF cluster out of them at any time just in seconds :)

If you want to try the latests PoD features, then use the latest beta.

February 12, 2011

PoD at CERN's lxplus. Part 2.

Anar here...

PoD at CERN's lxplus. Part 1.

Today I've been testing a new beta (v3.2.61.geeb0) of PoD and same as always I have been playing with it at CERN's lxplus (LSF) just to check whether this version properly behaves on AFS.

It is really surprising to see how effective CERN's LSF works. At GSI we have a special queue for PoD - preemptive queue, in order to provide as much as possible interactivity. I and other PoD users always get PoD very fast at GSI. But at CERN I used a standard "1nh" queue which was full of pending jobs and since my share should be good at CERN's LSF (I almost don't use the cluster, only doing my tests time to time) I got my PoD jobs through just within 40 seconds - almost a record :)
So, simply to say I got a dynamic PROOF cluster of 37 workers just in 40 seconds ;)
Ufff... Obviously CERN's LSF foreshore works perfectly correct.
If I used LSF at CERN more intensively, then I would get my worker up and running a bit later, I guess. Depends on how intensive I used it. There is no magic - fare share. This is why, if you want to provide the maximum interactivity for PoD user on your cluster, you need to tune a bit your cluster or make a dedicated queues. This is a good trade for a fast dynamic PROOF cluster, which will give resources back to batch users as soon as nobody use it.
Anyway, as we can see, even without any pre-configuration of resource management clusters, PoD is very fast and more than usable ;)

I would be very grateful if somebody at CERN, who use LSF intensively than I do, would test PoD and report back how fast he/she gets PoD works online.

BTW, since PoD now in a redesign stage. I wouldn't recommend PoD users to use PoD CLI instead of PoD GUI. I am currently working on a new GUI which will also allow to work with remote PoD servers and will reflect the latest development of PoD.

Some screenshots of my CERN tests of today:

1. start PoD server:

2. submit PoD jobs to CERN's LSF:

Just ~40! seconds later:
3. check how many workers we got already and which workers are they:

4. we have our dynamic PROOF cluster and now we can process a PROOF analysis as usual:


Here is another test I did, just for the sake of demonstration I used the "date" command to show the current time. Of course I also have PoD logs as a evidence :D

1. at 11:41:14 requested 40 PoD workers:

2. at 11:41:50 I got my first 18 workers.

3. at 11:42:10 I got all requested workers:

So, in the second test it took me ~36 seconds to get the first half of the workers and in ~ 1 min I got my last requested worker online.
Actually I could start my analysis as soon as I got some reasonable amount of workers, for example more than 30 workers and it was just about 40 seconds of waiting since I requested them. The rest of the workers will be connected to PoD automatically as soon as they online and ready. Also if I want, at any time, I can submit more workers and they also will be connected to my cluster automatically. ;) PoD is very flexible.


Meet us on http://pod.gsi.de

February 10, 2011

Printer-friendly User's Manual

Anar here...

Given the amount of requests, I've just tuned the PoD User's manual build system and tool chain so that the manual now automatically converted also to printer-friendly pdf and ps formats, as an addition to its HTML version.
It was  easy to do in anyway, sine the manual is written in docbook ;)

From now on PoD's manual will be distributed in HTML, PDF and PS formats.