Message: Re: superslow make on a cluster over the network Not Logged In (login)
 Next-in-Thread Next-in-Thread
 Next-in-Forum Next-in-Forum

Idea Re: superslow make on a cluster over the network 

Forum: Installation and Configuration
Re: Sad superslow make on a cluster over the network (Valery Taranenko)
Re: None Re: superslow make on a cluster over the network (Ben Morgan)
Date: 04 Nov, 2009
From: Valery Taranenko <Valery Taranenko>

computational node where the make takes place is completely free. i even use -j8 (it's 8 cores machine), but all 8 cc1plus-es are idling almost all the time waiting for something.

our cluster admin sees no network problem on Gb ethernet, as well as I/O on the node, and NFS deamons see no stress.

i'm not familiar with intricate g4 make system, but if i look into the user code intermediate make-process files in $G4WORKDIR/tmp/Linux/usercode I see those dependency .d files where all header files of g4 as well as the OS(!) are listed. some of those files hold 500 or even 800 dependencies. now, imagine, how make build those. it must be opening each source or header file, parsing for includes, open those includes and so on and so on. one poor g4 user code analysis file has 500 dependencies--pretty crazy. this sounds like an overkill. the bottleneck must be I/O on the disk system when make process constantly opens hundreds of those files while building dependencies.

there is no point to copy the user code on the local node and compiling there. i need to copy the whole g4 system (300 MB for built system) plus(!) all system gcc includes etc because they are mentioned in those dependency files. that means all devel files should be on the calc node.

note, that our cluster disk system is remote, it's not on the head node, but somewhere outside. when i build on another cluster which has a local disk system, the build takes 1 min. when i run on this cluster with remote disk system, the make requests to deliver (to the local node) a tiny file, and it must be the latency of the remote disk system plus the network what slows everything down. that's why we see cc1pluses ideling all the time--they are waiting for a tiny header and cpp files to be delivered, parsed in nano sec and other requests will fly again to the disk system.

prestaging of g4 system on the calc node should eliminate about hald of the waiting time (since in the dependency files i estimate that about the half files are native g4, others from the linux OS)

does it make sense?

Inline Depth:
 1 1
 All All
Outline Depth:
 1 1
 2 2
 All All
Add message: (add)

1 None: Re: superslow make on a cluster over the network   (Ben Morgan - 05 Nov, 2009)
 Add Message Add Message
to: "Re: superslow make on a cluster over the network"

 Subscribe Subscribe

This site runs SLAC HyperNews version 1.11-slac-98, derived from the original HyperNews


[ Geant 4 Home | Geant 4 HyperNews | Search | Request New Forum | Feedback ]