November 15, 2011

PoD 3.8

Anar here...

Finally I can announce,  that


PoD version 3.8 is released and ready for consumption.

Highlights of this release

I've been mostly concentrated on pod-ssh and pod-remote during this development cycle. As a result, 
  • the pod-ssh command got some new features and many bugs fixed. 
  • Users can now define an environment script for WNs in pod-ssh config files. It's so called an inline environment script.
  • The pod-remote command has been revamped and is fully functional now (including the ssh open domain option).
  • PoD has been successfully tested on MacOSX 10.7.
  • PoD WNs support now the older versions of MacOSX (the minimum is 10.5).
  • Updated User's Manual.
  • Many minor bugfixes and further improvements were made.
please check the release notes for more details.

September 26, 2011

PoD at CERN's LSF

Anar here...

Many PoD user facing a problem running PoD at CERN on LSF due to the fact that on the worker nodes for some reason a gLite environment is setting up. I therefore has just created a know issue topic on how to workaround this issue.

July 7, 2011

PoD 3.6

I'm very glad to announce a new stable release of PoD.

PoD version 3.6 has now been released.

Highlights of this release
  • A New great functionality - the pod-remote command.
  • PoD officially supports now the LoadLeveler (IBM Tivoli Workload Scheduler) plug-in.
  • PoD WN has bin ported to MacOSX (10.6). Now PoD (UI/Server/WN) run on Linux and MacOSX.
  • In this version each PoD WN starts its own xproofd daemon. In this case we can control and handle each PoD worker individually.
  • The pod-ssh command got several major improvements.
  • The build of pod-console is disabled by default. At the moment CLI is much more advanced, powerful and recommend to use over GUI. The GUI is being redesigned and work in progress.
  • Updated User's Manual.
  • Many minor bugfixes and further improvements were made.

June 27, 2011

PoD in PROOF connection string

Anar here...

I am very glad to confirm, that PoD is now officially supported as one of the PROOF connection strings.
To connect to a local or remote (managed by new pod-remote) PoD cluster you can just enter:

TProof::Open("pod://");

This feature is already in the trunk of ROOT and will be released with the next production ROOT 5.30.

June 14, 2011

pod-remote


Anar here...

It has been a long time since my last post... :) 

I am very excited right now, because in this post I am going to introduce to you a new PoD development. Another big step forward for PoD!
This is a fresh stuff, which is not completely finished yet, but already in a good and usable shape. ;)

So, it is pod-remote. A new command of PoD.

I've been dreaming for a long time about having a possibility to control PoD Servers from a remote machine. I mean, I have all my work, development, analysis scripts on a laptop and I want to use some remote machines as my PoD/PROOF servers, some with LSF, some with PBS, some with SSH plug-ins.
But my work stays on my laptop (or other dedicated desktop machine) and hasn't to be copied if I want to use different PoD/PROOF servers.
  
PoD UI and PROOF UI are lightweight entities and they don't do much. I even might want to have my UI on an iOS device ;)
With pod-remote my dream is coming to reality and starting to have a shape...

Using pod-remote it is possible to start/restart/stop remote PoD servers. It is possible to submit jobs from remote PoD servers. Most importantly, pod-remote automatically creates and handles SSH tunnels for remote PoD servers, so that these servers can be used only via SSH connection - outside of a firewall. Another important feature of pod-remote is its integration into PoD.

There is no yet a documentation for pod-remote, but I can give you a simple use case, so that you get the picture.

Let us say, I have two machines A and B.
A (host: demac012) - is my laptop.
B (host: lxg0527) - is a machine standing somewhere at my work place. From this machine I can submit jobs to my batch system (LSF) or use it as a server for PoD SSH plug-in.

On both machines I have PoD installed.

All the following I do from my laptop (machine A, "demac012").

  1. Start the remote PoD server:  
    pod-remote --start --remote manafov@lxg0527.gsi.de:/misc/manafov/PoD/3.5.75.gbecd4 --env ../GSI_env_5_27.sh

    --start - start the server
    manafov@lxg0527.gsi.de:/misc/manafov/PoD/3.5.75.gbecd4 - a connection string to the server, including a full path to a PoD installation
    --env - an environment file, which will be executed on the remote-end before PoD starts the server. In my case I just source a proper ROOT version in it. You can set some more env. variables if needed.

    pod-remote will remember your settings and next time you want to stop/start/restart this server you can just write: pod-remote --start(stop/restart)
    without --remote and --env. The command will always use the latest given settings.
    If you want to change the server, just provide new arguments values.

    if everything is OK and remote server is started, pod-remote will create and manage special SSH tunnels from machine A to B. So, the whole PoD communication and PROOF requests will go via these tunnels.
    Tunnels stay alive until remote server is alive or you restart/stop pod-remote. The pod-remote command creates a background daemon, which regularly checks the status of the remote PoD server and manages tunnels.  


  2. Submit PoD jobs from the remote Server (in case of RMS plug-ins):
    pod-remote --command "pod-submit -r lsf -n 50 -q my_lsf_queue"
    or in case of ssh plug-in
    pod-remote --command "pod-ssh -c --submit"

    Using --command, you can execute any command vie SSH on the remote server.
    I am going to keep it that way for a timebeing, and see whether it is usfull to wrap "submit commands" into sub-options or keep it as generic as it is. I like to have it that way and let users have the flexibility to execute any remote command and wrap commands for their special needs/environments.
  3. Now, you can just use pod-info as usual, as if everything would run locally:
    pod-info -s
    pod-info -c
    pod-info -n
    ...
    The pod-info automatically detects that there is a pod-remote-managed server and will gather the information directly from it via the SSH tunnels.
    It means, of course, that to connect from your local machine (a laptop) to your remote PoD/PROOF cluster you need just to use:
    TProof::Open(gSystem->GetFromPipe("pod-info -c"));

    That's it!
The second time I want to use the remote server I have to just execute the following on my local machine:  
pod-remote --start (note: no other arguments, pod-remote remembers my settings)
pod-info -n (note: no need to use --remote here. The pod-info command handles pod-remote automatically)
root[] TProof::Open(gSystem->GetFromPipe("pod-info -c"));

to stop the server: pod-remote --stop
or it will use standard PoD idling timeouts...

You can use your laptop, your development machine as a command center.
Don't need to copy analysis code or to have different machines for different PROOF clusters.
Don't need to handle SSH tunnels by hands and so on and so on...

Please give it a try and send me your feedback. The pod-remote command is available in PoD starting from PoD v3.5.75.
The pod-remote command will be a part of upcoming stable release PoD v3.6.

If you want to try this build, please download the latest Beta: http://pod.gsi.de/download.html
In case of Problems: http://pod.gsi.de/support.htm

P.S. As soon as PoD v3.6 is out I am going to record a screen-cast demonstrating pod-remote use cases ;)

May 25, 2011

LoadLeveler plug-in

Anar here...

I've just released a new PoD beta. In this version we have a new job manager plug-in implemented.
Now PoD supports IBM Tivoli Workload Scheduler LoadLeveler!

A special thanks to Simon Heisterkamp (Copenhagen, Denmark), who helped a lot in development of the plug-in.


If you want to try this build, please download the latest Beta: http://pod.gsi.de/download.html
In case of Problems: http://pod.gsi.de/support.html

May 3, 2011

New PoD WN

Anar here...

I've just released a new beta build of PoD. This version got three major improvements:

  1. PoD WN has now been officially ported to Mac OS X. That means PoD now fully supports Linux and Mac. You can run PoD Server, PoD UI or PoD WN on Mac OS X (and/or Linux). While testing this feature today, I set up a dynamic PROOF cluster of 46 WNs using the SSH plug-in, containing different Linux versions (Debian Etch, Debian Lenny) and two Mac OS X workers. :D Thanks to PROOF and PoD you now are able to combine different OSs and architectures in your dynamic PROOF clusters.
  2. Each PoD WN starts its own xproofd daemon now. This feature makes it easer to handle PoD workers individually. The feature is also important for PoD and AliEn integration.
  3. The start up time of PoD WNs is significantly improved in this version of PoD.
If you want to try this build, please download the latest Beta: http://pod.gsi.de/download.html
In case of Problems: http://pod.gsi.de/support.html

April 28, 2011

Video Tutorials

A short news about PoD website update.

PoD's home website got a "Video Tutorials" page. There will be all screencasts about PoD collected.

April 27, 2011

PoD Screencasts. Part 2.

Part 1.

Anar here...


I have just recorded another screencast. This time it is about how to setup a dynamic PROOF cluster on a resource management system. I used CERN's LSF farm as a playground and PoD 3.4.

Enjoy:



April 7, 2011

PoD Screencasts

Anar here...

I think you would agree, that watching a live demo is much better than just reading a presentation slides or something ;)
This is why I decided to start recording screencasts about PoD. In the future screencasts you will see how to install PoD, how to use different plug-ins of PoD and so on and so on.

I've just started from a simplest screencast - on how to install PoD (see below).
This is the very first public screencast in my life, so, please, be gentle ;)





February 23, 2011

PoD 3.4

I'm very glad to announce a new stable release of PoD.

PoD version 3.4 has now been released.

Highlights of this release
  • Starting with version 3.4, PoD got its version numbering scheme a bit changed. It reflects the fact that PoD is both a production system and a research project. PoD now uses odd minor version numbers to denote development releases and even minor version numbers to denote stable releases.
  • Implemented a completely new "pod-info" command, which now works as a client for "pod-agent" and can be used to request information from remote PoD servers.
  • The PoD server starter script uses now the new "pod-info" and got significant improvements. It takes now less than a second to start PoD server.
  • The PoD worker package is updated automatically if any of its component is updated.
  • The SSH plug-in spawns now PROOF workers automatically according to a number of CPU Cores on workers, if a user doesn't request a specific number of workers.
  • Updated User's Manual.
  • Many minor bugfixes and further improvements were made.

February 21, 2011

A PROOF cluster in 7 seconds

Anar here...

Can anybody install a PROOF cluster (dynamically, from zero, on requested machines) in 7 seconds?
With PoD anybody can do that!


A bit of propaganda... ;)

I have just finished some small improvements of PoD's SSH plug-in and also decided to give it a bit of propaganda push once again.

Among other changes I made the plug-in a bit faster and also now if a user doesn't request a specific number of PROOF workers for some of the worker nodes, then pod-ssh will automatically setup PROOF workers on that worker node according to a number of CPU cores. It means, that the last parameter in the pod-ssh configuration file is not required anymore and can be left empty.

The screenshot below is just a demonstration of how easy and fast pod-ssh works.

In the demo, I have used the following configuration file:

r1, manafov@lxi020.gsi.de , , /misc/manafov/tmp/test,
r2, manafov@lxi021.gsi.de , , /misc/manafov/tmp/test,
r3, manafov@lxb333.gsi.de , , /misc/manafov/tmp/test,
r4, manafov@lxir036.gsi.de, , /misc/manafov/tmp/test,
r5, manafov@lxir037.gsi.de, , /misc/manafov/tmp/test,
r6, manafov@lxir038.gsi.de, , /misc/manafov/tmp/test,
#r_32bit, manafov@lxi009.gsi.de, , /misc/manafov/tmp/test, 

in order to use all described machines as my PROOF slaves.

For the sake of demonstration I wrote a simple script, just to show the timestamp of the execution of the all PoD command required to setup a cluster:

There is nothing special in the script, just the usual PoD command sequence:
pod-server start
pod-ssh (in case of SSH. For other RMS plug-ins it is "pod-submit")
and
pod-info
(refer to PoD user's manual for more information)

The screenshot below shows the output of the script:



It took me exactly 7 seconds to 
1. Start a PoD server (PROOF master).
2. Submit PoD jobs to 6 worker nodes using ssh.
3. Setup 44 PROOF workers out of these 6 PoD workers.



These worker nodes are just interactive machines or desktops standing in my institution somewhere. These boxes have no pre-installed PROOF software and are being configured on the fly by PoD.
At night I can count much more machines, which nobody use. With PoD and the SSH plug-in I can make a dynamic PROOF cluster out of them at any time just in seconds :)

If you want to try the latests PoD features, then use the latest beta.

February 12, 2011

PoD at CERN's lxplus. Part 2.

Anar here...

PoD at CERN's lxplus. Part 1.

Today I've been testing a new beta (v3.2.61.geeb0) of PoD and same as always I have been playing with it at CERN's lxplus (LSF) just to check whether this version properly behaves on AFS.

It is really surprising to see how effective CERN's LSF works. At GSI we have a special queue for PoD - preemptive queue, in order to provide as much as possible interactivity. I and other PoD users always get PoD very fast at GSI. But at CERN I used a standard "1nh" queue which was full of pending jobs and since my share should be good at CERN's LSF (I almost don't use the cluster, only doing my tests time to time) I got my PoD jobs through just within 40 seconds - almost a record :)
So, simply to say I got a dynamic PROOF cluster of 37 workers just in 40 seconds ;)
Ufff... Obviously CERN's LSF foreshore works perfectly correct.
If I used LSF at CERN more intensively, then I would get my worker up and running a bit later, I guess. Depends on how intensive I used it. There is no magic - fare share. This is why, if you want to provide the maximum interactivity for PoD user on your cluster, you need to tune a bit your cluster or make a dedicated queues. This is a good trade for a fast dynamic PROOF cluster, which will give resources back to batch users as soon as nobody use it.
Anyway, as we can see, even without any pre-configuration of resource management clusters, PoD is very fast and more than usable ;)

I would be very grateful if somebody at CERN, who use LSF intensively than I do, would test PoD and report back how fast he/she gets PoD works online.

BTW, since PoD now in a redesign stage. I wouldn't recommend PoD users to use PoD CLI instead of PoD GUI. I am currently working on a new GUI which will also allow to work with remote PoD servers and will reflect the latest development of PoD.

Some screenshots of my CERN tests of today:

1. start PoD server:

2. submit PoD jobs to CERN's LSF:

Just ~40! seconds later:
3. check how many workers we got already and which workers are they:

4. we have our dynamic PROOF cluster and now we can process a PROOF analysis as usual:


Here is another test I did, just for the sake of demonstration I used the "date" command to show the current time. Of course I also have PoD logs as a evidence :D

1. at 11:41:14 requested 40 PoD workers:

2. at 11:41:50 I got my first 18 workers.

3. at 11:42:10 I got all requested workers:

So, in the second test it took me ~36 seconds to get the first half of the workers and in ~ 1 min I got my last requested worker online.
Actually I could start my analysis as soon as I got some reasonable amount of workers, for example more than 30 workers and it was just about 40 seconds of waiting since I requested them. The rest of the workers will be connected to PoD automatically as soon as they online and ready. Also if I want, at any time, I can submit more workers and they also will be connected to my cluster automatically. ;) PoD is very flexible.


Meet us on http://pod.gsi.de

February 10, 2011

Printer-friendly User's Manual

Anar here...

Given the amount of requests, I've just tuned the PoD User's manual build system and tool chain so that the manual now automatically converted also to printer-friendly pdf and ps formats, as an addition to its HTML version.
It was  easy to do in anyway, sine the manual is written in docbook ;)

From now on PoD's manual will be distributed in HTML, PDF and PS formats.

January 26, 2011

The pod-info command got revamped

Anar here...

Many of PoD customers like to use the pod-info command, to get a PROOF connection string, for example. You can also use it to get other different kinds of information. But the command had on big disadvantage - it could only be used on a machine where a PoD server is running. Crap! Isn't it?! :)
Users want to start a PoD server on one machine, but process an analysis and connect to a master from another, from a laptop for example.

A laptop is a perfect PROOF user interface, but a bad PROOF master, when we talk about hundreds of workers. You need good machines for PoD servers, since a PROOF master runs there.
Exactly in this direction PoD is developing.
My target is to have PoD user interfaces completely separated, if I want it, from PoD servers. So that I can start/stop/submit PoD jobs and completely control my PoD server from a user interface, probably a remote machine and behind a firewall.

Well I am glad to introduce a revised version of pod-info. This is another step in direction of disentangling of PoD UI and PoD Server. The new pod-info command addressed many issues including the one I described above.

The new pod-info is re-written from scratch and now acts as a client to pod-agent (server), which makes it a lot more accurate, powerful and, especially, more flexible in compare to the old command.


By default pod-info tries now to find and connect to a local PoD server. A PoD server considered to be a local one if the pod-info and the PoD server run under the same user id. It could be the same machine or different machines but with a shared home file system.

When you want to retrieve information about a remote PoD server, than you need to use the --ssh option. Using this option you can specify an ssh connection string, where a remote PoD server is running. The pod-info command will first try to find the running PoD server on that host and than process user requests on that server.

Interested? Want to try the new command?
if YES, than
check the User's Manual of the current Nightly build for more information.
and get the latest PoD nightly build, which is shipped with the new command.

You could really help to improve the product if you could test it in your environment and with your use cases.
Please use our issue tracker to report bugs or feature requests.

January 13, 2011

PoD 3.1.3

Anar here...

While working on a new PoD version I noticed an unpleasant bug, which was introduced with some new stuff in PoD 3.1. I decided to fixe it not only in the PoD master branch, but also back-port fixes to PoD 3.1.

The bug is not critical, but I would strongly recommend all PoD users, who use PoD 3.1 to upgrade to PoD 3.1.3, which has been just released. Otherwise in some cases PoD could leave xproof processes behind and slowly pollute user's environment with unwanted daemons.

Some details about the bug can be found here and here.

I am very sorry for any inconvenience this may have caused.

PoD is being constantly tested in different environments and as soon as a critical bug found a patch will be immediately released.

If you found a bug or have a feature request, please use our new issue tracker to report.

January 7, 2011

PoD 3.1

Finally a new PoD stable version is ready for consumption.

PoD version 3.1 has now been released.

Those who have been following PoD development know, that a shared installation feature and a Condor plug-in are the main topics of this release.

The Condor plug-in has just joined a long list of supported job managers' plug-ins.
PoD is now shipped with SSH, gLiteLSFPBSPro/OpenPBS/Torque, Grid Engine and Condor plug-ins.

Highlights of this release
  • PoD now supports Condor and is shipped with the Condor plug-in.
  • Now PoD supports shared installations.
  • PoD server and PoD worker instances clean now precisely only own processes. It makes it possible to start several different PoD workers/servers sessions under the same user ID (this is important for an AliEn integration).
  • The PoD server starter script is a factor of 3 faster now.
  • Improved idle time-out handling in PoD servers.
  • Updated User's Manual.
  • Many minor bugfixes and further improvements were made.