Installing atop Tool To Monitor the System Process in Linux
atop is an ASCII full-screen interactive performance monitor which is kind of similar to the top command to view the load over a Linux system. The job of the most critical hardware resources (from a performance point of view) at the system level, i.e. CPU, memory, disk, and network can be seen. atop enable us to identify which processes are actually responsible for the indicated load. In brief atop can report the activities of all processes, including the completed ones.
Installing Atop Monitoring Tool on Linux:
Here we guide you regarding how to install atop and configure atop on Linux systems ( Debian/Ubuntu-based) and also make you familiar with atop’s system-level information & Process level information so that you can easily monitor and understand your system processes.
atop can be installed from the default repositories by using the following command:-
sudo apt-get install atop
You can access atop the main window by typing the following command:
ALWAYS remember to exit atop by pressing ‘q’ or with kill -15 otherwise if you stop it with any other way that will not allow it to stop the accounting mechanism which will continue to generate a huge file on disk.
When the atop window is displayed, we can see two parts in the output portion. The 1st portion shows System-level information & the second part shows Process level information Let us discuss more in deep about both of them.
The system-level information consists of the following output lines:
# PRC (process level totals):
This section shows the total CPU time consumed in system mode (‘sys’) and in user mode (‘user’), the processes in total running (‘#proc’), the total number of threads running’ (‘#trun’), ‘sleeping interruptible’ (‘#tslpi’) and ‘sleeping uninterruptible’ (‘#tslpu’), zombie processes (‘#zombie’), clone system calls (‘clones’), and the number of processes that ended during the interval (‘#exit’), which shows ‘?’ if process accounting is not used).
# CPU(CPU utilization):
The total occupation of all CPUs together is shown in this line(can be more than 1 line) i.e. The percentage of CPU time spent in kernel mode by all active processes (‘sys’), the percentage of CPU time consumed in user mode (‘user’) for all active processes the, percentage of CPU time spent for interrupt handling (‘irq’)and the percentage of unused CPU time while at least one process was waiting for disk-I/O (‘wait’). For virtual machines, the steal-percentage is shown (‘steal’), for displaying the percentage of CPU time that was stolen by other virtual machines that are running on the same hardware. The average frequency (‘avgf’) and the average scaling percentage (‘avgscal’) is also displayed .The current frequency (‘curf’) and the current scaling percentage (‘curscal’) is also displayed.
# CPL (CPU load information):
CPL line row contains the load average figures showing the number of threads available to run on a CPU or the ones that are waiting for disk I/O.All these figures are averaged over 1 (‘avg1’), 5 (‘avg5’) and 15 (‘avg15’) minutes.
Also, the number of context switches (‘csw’), the number of serviced interrupts (‘intr’), and the number of available cpu’s are displayed.
# MEM (Memory occupation):
The total amount of physical memory (‘tot’), the amount of memory that is currently free (‘free’), the amount of memory page cache (‘cache’) is using, the amount of memory used for filesystem metadata (‘buff’) and the memory that is used for kernel mallocs are displayed in this MEM line section row.
# SWP(Swap occupation and overcommit info):
The total amount of swap space (‘tot’), free swap space (‘free’) on the disk is shown in this line. Including the committed virtual memory space (‘vmcom’) and the maximum limit of the committed space (‘vmlim’),( by reading default swap size plus 50% of memory size) is shown.
# PAG(Paging frequency):
This line contains the number of scanned pages (‘scan’) due to the fact that free memory drops below a particular threshold and the number of times that the kernel tries to reclaim pages due to an urgent need (‘stall’).
Also, the number of memory pages the system read from swap space (‘swin’) and the number of memory pages the system wrote to swap space (‘swout’) are shown.
# LVM/MDD/DSK (Logical volume/multiple device/disk utilization):
This line shows the unit that was busy handling requests (‘busy’), the number of read requests issued (‘read’), the number of write requests issued (‘write’), the number of KiBytes per reading (‘KiB/r’) . the average queue depth (‘avq’) and the average number of milliseconds needed by a request (‘avio’) for seek, latency, and data transfer.
# NET (Network utilization):
There are 3 lines of NET, one is for the transport layer (TCP and UDP), one, for the IP layer, and one for per active interface.
- Transport layer
- IP layer
- Active Interface
The number of received TCP segments including those received in error (‘tcpi’), transmitted TCP segments excluding those containing only retransmitted octets (‘tcpo’), UDP datagrams received (‘udpi’), UDP datagrams transmitted (‘udpo’), active TCP opens (‘tcpao’), passive TCP opens (‘tcppo’), TCP output retransmissions (‘tcprs’), TCP input errors (‘tcpie’), TCP output resets (‘tcpie’), TCP output retransmissions (‘tcpor’), UDP no ports (‘udpnp’), and the number of UDP input errors (‘tcpie’) are displayed.
This line shows the number of IP datagrams received from interfaces, including those received in error (‘ipi’), IP datagrams that local higher-layer protocols offered for transmission (‘ipo’), IP datagrams received which were forwarded to other interfaces (‘ipfrw’), IP datagrams delivered to local higher-layer protocols (‘deliv’), ICMP datagrams (‘icmpi’) received, and the number of transmitted ICMP datagrams (‘icmpo’) are displayed.
For every active network interface:
The number of received packets (‘pcki’), transmitted packets (‘pcko’), the effective amount of bits received per second (‘si’), the effective amount of bits transmitted per second (‘so’), collisions (‘coll’), multicast packets (‘mlti’) received, errors while receiving a packet (‘erri’), errors while transmitting a packet (‘erro’), packets dropped (‘drpi’) received, and the number of transmitted packets dropped (‘drpo’) are shown in this line.
Process level information:
The Process level information consists of the following output lines:
PID(Process-id): We can see ‘?’ when a process has been started and finished during the last interval because the process-id is not part of the standard process accounting record.
YSCPU: Due to system call handling CPU time consumption of this process in system mode (kernel mode) is displayed.
USRCPU: Due to processing the own program text CPU time consumption of this process in user mode is shown.
RGROW: This shows the amount of resident memory that the process has grown during the last interval.
VGROW: This shows the amount of virtual memory that the process has grown during the last interval.
EXC: The exit code of a terminated process is shown.
THR: The total number of threads within this process is shown.
S: The current state of the main thread of the process: ‘R’ for currently processing or in the run queue, ‘S’ for wait for an event to occur, ‘D’ for sleeping non-interruptible, ‘Z’ for zombie, ‘T’ for stop, ‘W’ for swapping and ‘E’ (exit) for processes which have finished during the last interval is shown.
CPUNR: This shows the identification of the CPU on which the main thread of the process is running on or has recently been running on.
CPU: The occupation percentage of any process related to the available capacity for the resource on the system level is shown.
CMD: The name of the process. That is running or has been finished during the last interval.