Browsing projects by Tag(s)

Select a tag to browse associated projects and drill deeper into the tag cloud.

Showing page 1 of 1

LLNL SLURM SPANK pluginsThis package contains several SLURM SPANK plugins developed and used at LLNL. These are all drop-in plugins that enhance and extend the functionality of SLURM for users and administrators, and are hosted here not only so that others may use these plugins directly, but also as ... [More] good examples of what kind of features and functionality may be developed with the spank plugin framework for slurm. Some of the plugins in this package are LLNL-specific, in that they require LLNL-only modifications to the software stack. However, the code for these plugins is still provided for demonstrative purposes. Also see the top-level README. And the latest NEWS. Some further information can be found in the wiki: Generic description of SPANK plugins The use-env plugin The SLURM cpuset plugin [Less]

0
 
  0 reviews  |  2 users  |  11,920 lines of code  |  0 current contributors  |  Analyzed 5 days ago
 
 

SLURM: A Highly Scalable Resource Manager SLURM is an open-source resource manager designed for Linux clusters of all sizes. It provides three key functions. First it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can ... [More] perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work. SLURM's design is very modular with dozens of optional plugins. In its simplest configuration, it can be installed and configured in a couple of minutes. [Less]

5.0
 
  0 reviews  |  1 user  |  1,024,968 lines of code  |  55 current contributors  |  Analyzed about 18 hours ago
 
 

io-watchdog is a facility for monitoring user applications and parallel jobs for "hangs" which typically have a side effect of ceasing all IO in a cyclic application (i.e. one that writes something to a log or data file during each cycle of computation). The io-watchdog attempts to watch ... [More] all IO coming from an application and triggers a set of user-defined actions when IO has stopped for a configurable timeout period. Read the full IO Watchdog README NEWSio-watchdog v0.7 released2009-11-17 The major feature in this new version of io-watchdog is the addition of a client API to get and set the current watchdog timeout. See the io-watchdog(3) manpage for more information about using the libio-watchdog library. Other changes in this release include Add -q, --quiet option to io-watchdog Set IO_WATCHDOG_TIMEOUT and IO_WATCHDOG_TARGET in the environment of action scripts. Fix for hangs in io-watchdog on exec(2) failures The io-watchdog now exits by default after the first timeout event is reached. This means that if the monitored process is not terminated by any action script, then it will continue to run unmonitored by the watchdog. There is a new option to io-watchdog(1), --persistent which restores the old behavior. The spec file included with the distribution now splits the io-watchdog RPM into a main package and -libs, -devel, and -slurm subpackages. A small testsuite is now included with the io-watchdog source. [Less]

0
 
  0 reviews  |  0 users  |  35,458 lines of code  |  0 current contributors  |  Analyzed 8 days ago
 
 

The sqlog package contains a set of scripts useful for creating, populating, and issuing queries to a SLURM job log database and/or the queue of running jobs. SUMMARYREQUIREMENTSCOMPONENTSCONFIGURATIONEXAMPLESUSAGE SUMMARYsqlog was designed as a simple but powerful and easy-to-use tool to ... [More] query information about the job history on clusters running SLURM. Queries using sqlog can be composed of simple job information such as jobid, job name, nodes on which the job rate, job completion state, username, number of nodes, number of CPUs, and start, end, and total runtime. The sqlog(1) utility uses perl's Date::Manip module, and thus supports a powerful range of date and time input formats. REQUIREMENTSThe sqlog package requires a mysql database, as well the perl modules Date::Manip for date parsing, DBI, DBD::mysql and Digest::SHA1 for database access. COMPONENTSsqlog: The "SLURM Query Log" utility. Provides a single interface to query jobs from the SLURM job log database and/or current list of running jobs. skewstats: The "SLURM Queue Stats" utility. Reports simple SLURM job statistics, such as machine utilization, number and size of jobs, and job completion stats. Uses sqlog to query historical job data. slurm-joblog : Logs completed jobs to the job log database using SLURM's jobcomp/script interface. Optionally logs jobs to an additional text file. sqlog-db-util : Administration utility for creation and update of the SLURM joblog. Also provides an interface to "backfill" the database using existing SLURM joblog files created by the jobcomp/filetxt plugin. sqlog.conf : World-readable config file. slurm-joblog.conf : Private config file. Contains DB password, etc. CONFIGURATIONSee the sqlog README for information about sqlog configuration. EXAMPLESDisplay the job or jobs that were running on host55 on July 19, 4:00pm: sqlog --time="July 19, 4pm" --nodes=host55Display at most 25 jobs that were running at midnight yesterday: sqlog --time=yesterday,midnightDisplay all jobs that failed between 8:00AM and 9:00AM this morning, sorted by descending endtime: sqlog --all --end=8am..9am --states=F --sort=-endDisplay all jobs that started today: sqlog --start=+midnight --allDisplay all jobs that have run between 3 and 4 hours on the nodes host30 through host65, and that didn't complete normally sqlog -L 0 -T=3h..4h -n 'host[30-65]' -xs completedDisplay all jobs that were running yesterday with 1000 nodes or greater and completed normally: sqlog -t yesterday,12am..12am -s CD -N +1000List current queue, sorted by number of nodes (ascending): sqlog --all --no-db --sort=nnodesList the top 10 longest running jobs, and then the 5 oldest jobs: sqlog --sort=runtime --limit=10 sqlog --sort=-start --limit=5USAGE Usage: sqlog OPTIONS... Query information about jobs from the SLURM job log database and/or the current queue of running jobs. -j, --jobids=LIST... Comma-separated list of jobids. -J, --job-names=LIST... Comma-separated list of job names. -n, --nodes=LIST... Comma-separated list of nodes or node lists. -p, --partitions=LIST... Comma-separated list of partitions. -s, --states=LIST... Comma-separated list of job states. Use '--states=list' to list valid state names. -u, --users=LIST... Comma-separated list of users. --regex Enable regular expression matching for the above. -x, --exclude Exclude the following list of jobids, users, states, partitions, or nodes. -N, --nnodes=N List all jobs that ran on N nodes. N may be specified using the RANGE syntax described below. --minnodes=N Explicitly specify the minimum number of nodes. --maxnodes=N Explicitly specify the maximum number of nodes. -C, --ncores=N List all jobs that ran on N cores. N may be specified using the RANGE syntax described below. --mincores=N Explicitly specify the minimum number of cores. --maxcores=N Explicitly specify the maximum number of cores. -T, --runtime=DURATION List jobs that ran for DURATION, e.g., '4:30:00' or '4h30m'. RANGE operators apply. --mintime=DURATION Explicitly specify the minimum runtime. --maxtime=DURATION Explicitly specify the maximum runtime. -t, --time, --at=TIME List jobs which were running at a particular date and time, e.g., '04/14 13:30:00' or 'today,2pm'. TIME may also be specified using RANGE operators. -S, --start=TIME List all jobs that started at TIME. --start-before=TIME Explicitly specify maximum start time. --start-after=TIME Explicitly specify minimum start time. -E, --end=TIME List all jobs that ended at TIME. --end-before=TIME Explicitly specify maximum end time. --end-after=TIME Explicitly specify minimum end time. -X, --no-running Don't query running jobs (no current queue). --no-db Include only running jobs (don't query joblog DB). -H, --no-header Don't print a header row. -o, --format=LIST Specify a list of format keys to display or a format type, or both using the form 'TYPE:keys,..' Use --format=list to list valid keys and types. -P, --sort=LIST Specify a list of keys to sort output. -L, --limit=N Limit the number of records to report (default=25). -a, --all Report all matching records (Same as --limit=0). -h, --help Display this message. -v, --verbose Increase output verbosity. --dry-run Don't actually do anything. TIME, DURATION, and NUMERIC arguments may optionally use one of the RANGE operators +, -, or '..', where +N N or more (at N or later) -N N or less (at N or earlier) N..M Between N and M, inclusive N or @N Exactly N (exactly at N) (use @ if 'N' begins with '+' or '-'). LIST refers to a comma-separated list of words. All options except --format which take a LIST argument may also be specified multiple times (e.g. --users=sally,tim --users=frank). Node lists may be specified using the host list form, e.g. "host34-36,67". TIME arguments are parsed using the perl Date::Manip(3pm) package, and thus may be specified in one of many formats. Examples include '12pm', 'yesterday,noon', '12/25-15:30:33', and so on. See the Date::Manip(3pm) manpage for more examples. [Less]

0
 
  0 reviews  |  0 users  |  3,079 lines of code  |  0 current contributors  |  Analyzed 2 days ago
 
 
 
 

Creative Commons License Copyright © 2013 Black Duck Software, Inc. and its contributors, Some Rights Reserved. Unless otherwise marked, this work is licensed under a Creative Commons Attribution 3.0 Unported License . Ohloh ® and the Ohloh logo are trademarks of Black Duck Software, Inc. in the United States and/or other jurisdictions. All other trademarks are the property of their respective holders.