CPU activity and NI errors on heron

Wed Jul 14 16:05:26 CDT 2004

(Current load average: 18.01      Free memory: 12550928 )

  Program(s) assigned to each processor and Network Interconnect error counts
(username, process group, command, CPU time) / (Retries, Serial Number errors, CheckBit errors)
CPUs
0 - 7
kuczera
3440
charmm
6:07:11
shi
49237
dynamics
0:04
shi
49237
dynamics
0:04
shi
49237
dynamics
0:04
.
.
.
.
.
.
httpd
1497
ps
0:00
shi
49237
dynamics
0:04
Hub
Module 1
Slots 1-4
Retries 3530 (1/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 733 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 531 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 249 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
CPUs
8 - 15
kuczera
3440
charmm
6:07:15
kuczera
3440
charmm
6:06:56
sangkil
43044
floquet_H
2:55:44
kuczera
3440
charmm
6:06:59
.
.
.
.
.
.
.
.
.
.
.
.
Hub
Module 2
Slots 1-4
Retries 805 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 995 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 457 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 206 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
CPUs
16 - 23
.
.
.
.
.
.
.
.
.
.
.
.
root
1552
mediad
1:16
.
.
.
.
.
.
.
.
.
Hub
Module 3
Slots 1-4
Retries 288 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 207 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 116 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 76 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
24 - 31 .
.
.
.
.
.
.
.
.
.
.
.
kuczera
41878
charmm
5:11:48
kuczera
41878
charmm
5:11:55
kuczera
41878
charmm
5:11:53
kuczera
41878
charmm
5:11:56
Hub:
Module 4
Slots 1-4
Retries 166 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 133 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 45 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 45 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
CPUs
32 - 39
shi
49237
dynamics
0:04
shi
49237
dynamics
0:04
shi
49237
dynamics
0:04
shi
49237
dynamics
0:04
kuczera
7762
charmm
1:09:50
kuczera
7762
charmm
1:09:47
kuczera
7762
charmm
1:09:49
kuczera
7762
charmm
1:09:50
Hub:
Module 5
Slots 1-4
Retries 474 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 419 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 588 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 405 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
CPUs
40 - 47
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Hub:
Module 6
Slots 1-4
Retries 500 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 1007 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 396 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 431 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
CPUs
48 - 55
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Hub:
Module 7
Slots 1-4
Retries 129 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 101 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 85 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 83 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
CPUs
56 - 63
.
.
.
.
.
.
.
.
.
.
.
.
kuczera
41878
charmm
5:11:52
kuczera
41878
charmm
5:11:53
kuczera
41878
charmm
5:11:53
kuczera
41878
charmm
5:11:52
Hub:
Module 8
Slots 1-4
Retries 744 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 865 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 466 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)
Retries 353 (0/Min)
SN errs 0 (0/Min)
CB errs 0 (0/Min)


The table above shows all processes that have been explicitly identified
with a particular processor by the ps command.  Processes that are
moving from one processor to another will not appear in the table.

Processes with the same process group number are parallel processes.
Time values are formatted as days-hours:minutes:seconds of CPU time used during execution.

The error statistics shown above are taken from the linkstat command. The pair of processors immediately above a particular hub, are the processors connected to that hub. Hubs experiencing NI errors may be generated by the programs running on the processors attached to that hub.

The table below shows all processes associated with any process assigned to a particular CPU (appearing in the table above). Process groups with multiple PIDs are parallel processes.

UserProcess
group
PID MemoryCPU timeCPU
number
Command
httpd 1497 14646 3424 0:01 * httpd
14647 3424 0:01 * httpd
14648 3424 0:01 * httpd
14650 3424 0:01 * httpd
14651 3424 0:01 * httpd
14652 3440 0:01 * httpd
14653 3424 0:01 * httpd
1513 3440 0:02 * httpd
1514 3424 0:02 * httpd
1515 3424 0:02 * httpd
1518 3424 0:01 * httpd
1520 3424 0:02 * httpd
2082 3424 0:02 * httpd
2751 3424 0:02 * httpd
2946 3440 0:01 * httpd
2947 3424 0:02 * httpd
2948 3440 0:02 * httpd
49247 3344 0:00 * cpu-and-e
49252 1936 0:00 6 ps
kuczera 3440 3440 2976 0:01 * res
3442 672 0:00 * 108964798
3445 592 0:00 * 108964798
40985 2528 0:07 * mpirun
41007 2009984 6:06:59 11 charmm
41009 2009296 0:12 * charmm
41011 2010048 6:07:11 0 charmm
41013 2009984 6:07:15 8 charmm
41014 2009984 6:06:56 9 charmm
kuczera 41878 41833 -613056 5:11:56 31 charmm
41878 2656 0:06 * mpirun
41879 -613056 5:11:53 62 charmm
41881 -613744 0:11 * charmm
41883 -613056 5:11:52 60 charmm
41884 -612992 5:11:53 61 charmm
41885 -613056 5:11:52 63 charmm
41886 -606688 5:11:53 30 charmm
41887 -613056 5:11:55 29 charmm
41888 -613056 5:11:48 28 charmm
kuczera 7762 37273 2009984 1:09:50 39 charmm
37276 2009984 1:09:49 38 charmm
37278 2010048 1:09:47 37 charmm
37280 2009984 1:09:50 36 charmm
37281 2528 0:07 * mpirun
37285 2009296 0:14 * charmm
7762 2976 0:01 * res
7764 672 0:00 * 108966580
7767 608 0:00 * 108966580
root 1497 1497 3296 0:09 * httpd
root 1552 1552 3216 1:16 20 mediad
sangkil 43044 43044 531520 2:55:44 10 floquet_H
shi 49237 49238 144752 0:04 32 dynamics
49239 144752 0:04 33 dynamics
49240 144752 0:04 34 dynamics
49241 144752 0:04 35 dynamics
49242 144752 0:04 2 dynamics
49243 144752 0:04 1 dynamics
49244 144752 0:04 3 dynamics
49245 144752 0:04 7 dynamics
49246 144752 0:00 * dynamics

A process not assigned to a specific CPU during the inquiry will have an asterisk (*) in the CPU number column. Memory appears to be reported in Kilobytes. This page will update itself every 30 seconds.

An example page from a version of this monitor without NI error reporting may be found at http://raven.cc.ku.edu/~grobe/history/cpu-use-output.html

For information about this service contact Michael Grobe at grobe@ku.edu. Copyright 1997, The University of Kansas.