FUSE stands for Filesystem in Userspace. It is a software interface that allows non-privileged applications to provide their own filesystem and mount it within the Linux file namespace. The FUSE module (which is a kernel module) provides a software bridge to the kernel interfaces.
VFS, or Virtual File System, is a component of the Linux kernel that provides the filesystem interface to userspace programs. The VFS is what implements open, stat, chmod, and other similar filesystem-related system calls. The pathnames passed to these calls is used by the VFS to lookup the directory entry cache, aka dentry cache or dcache). This allows very fast lookups of dentries without needing to reference the backing filesystem.
procfs is a special filesystem maintained by the linux kernel that allows you to inspect the state of running processes.
Text Only
$ ls -lah /proc/21/
total 0
dr-xr-xr-x 9 root root 0 Oct 28 17:16 .
dr-xr-xr-x 238 root root 0 Oct 28 17:16 ..
dr-xr-xr-x 2 root root 0 Dec 8 15:43 attr
-rw-r--r-- 1 root root 0 Dec 8 15:43 autogroup
-r-------- 1 root root 0 Dec 8 15:43 auxv
-r--r--r-- 1 root root 0 Dec 8 15:43 cgroup
--w------- 1 root root 0 Dec 8 15:43 clear_refs
There are a lot of useful bits here. For example, you can inspect all open file descriptors for a process. In fact, this is what lsof uses to show open files:
Text Only
$ ls -l /proc/79808/fd
lr-x------ 1 ltclipp ltclipp 64 Dec 8 15:44 0 -> /dev/null
lrwx------ 1 ltclipp ltclipp 64 Dec 8 15:44 1 -> 'socket:[126462260]'
lrwx------ 1 ltclipp ltclipp 64 Dec 8 15:44 10 -> /tmp/foo.txt
You can view your own kernel info:
Text Only
$ ls -l /proc/self/
dr-xr-xr-x 2 ltclipp ltclipp 0 Dec 8 15:47 attr
-rw-r--r-- 1 ltclipp ltclipp 0 Dec 8 15:47 autogroup
-r-------- 1 ltclipp ltclipp 0 Dec 8 15:47 auxv
-r--r--r-- 1 ltclipp ltclipp 0 Dec 8 15:47 cgroup
--w------- 1 ltclipp ltclipp 0 Dec 8 15:47 clear_refs
-r--r--r-- 1 ltclipp ltclipp 0 Dec 8 15:47 cmdline
-rw-r--r-- 1 ltclipp ltclipp 0 Dec 8 15:47 comm
-rw-r--r-- 1 ltclipp ltclipp 0 Dec 8 15:47 coredump_filter
The file located at /sys/bus/pci/devices/*/resource provides ASCII text that describes the host addresses of PCI resources for that device. For each region, there is a corresponding /sys/bus/pci/devices/*/resource* file that contains the contents of that region. You must memory-map to this file in order to access it.
For example, using lspci we can introspect the regions:
Text Only
# lspci -n -s 0000:01:00.1 -vv
0000:01:00.1 0200: 8086:10c9 (rev 01)
Subsystem: 10a9:8028
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin B routed to IRQ 40
Region 0: Memory at b2140000 (32-bit, non-prefetchable) [size=128K]
Region 1: Memory at b2120000 (32-bit, non-prefetchable) [size=128K]
Region 2: I/O ports at 2000 [size=32]
Region 3: Memory at b2240000 (32-bit, non-prefetchable) [size=16K]
These regions can be device memory, IO ports, or other resources. The exact contents of the memory is going to be specific to the device in question.
A DRAC (Dell Remote Access Controller) is a hardware unit within a server chassis that is capable of monitoring, deploying, and interacting with the main server hardware and host outside of the typical kernel. It's often integrated into the motherboard itself, and acts as a standalone computer that you can log into and issue commands to.
The main benefit of a DRAC is being able to independently execute commands to the host kernel (either through a console or through power cycling commands via hardware), monitoring the health of hardware components, configuring hardware, BIOS, host OS, and various other facets.
An "adaptive-tick" CPU is one where the kernel can temporarily disable the scheduling clock ticks if there is only one runnable task on the core. This is useful in realtime or latency-sensitive applications that need to not be interrupted for scheduling work. The nohz_full kernel boot parameter specifies which cores should be the adaptive-tick cores.
Text Only
nohz_full=4-7
Cores which are not currently configured (by the kernel's runtime logic) to receive scheduling interrupts are considered to be "dyntick-idle":
Quote
An idle CPU that is not receiving scheduling-clock interrupts is said to be "dyntick-idle", "in dyntick-idle mode", "in nohz mode", or "running tickless". The remainder of this document will use "dyntick-idle mode".
Security-Enhanced Linux is a linux security module that provides the ability to implement access control policies. In Linux, the default access control mechanisms are done through what's called Discretionary Access Controls (DAC). The granularity of a DAC is only based off of user, group, and "other" permissions, and are applied to specific files.
SELinux implements Mandatory Access Control (MAC). System resources have what's called an SELinux context. The context, otherwise known as an SELinux Label, abstracts away the underlying resources and instead focuses on only the security properties of the underlying object.
The /etc/sudoers file is a file on Linux systems that describes various actions that users are allowed to take as the root user. The man page describes in depth the details of this file:
Text Only
SUDOERS(5) File Formats Manual SUDOERS(5)
NAME
sudoers - default sudo security policy plugin
DESCRIPTION
The sudoers policy plugin determines a user's sudo privileges. It is the default sudo policy plugin. The policy is driven by the /private/etc/sudoers file or, optionally, in LDAP. The policy format is described in detail in the
SUDOERS FILE FORMAT section. For information on storing sudoers policy information in LDAP, see sudoers.ldap(5).
An ACL, or Access Control List, commonly refers to extra access policies that are applied to specific files or directories. You can view ACLs in most POSIX-compatible filesystems using the getfacl command:
Huge pages are a memory optimization technique whereby you grant your application memory space that uses larger memory page allocation sizes. The typical page size is 4096 bytes, but by enabling hugepages, you can get much larger page sizes. This improves performance in workloads that use large blocks of memory because there will be fewer requests sent to the page cache.
You can also link these mappings to a named file descriptor on the hugetlbfs filesystem. Hugepages are drawn from a pool of allocated pages. The size of this pool can be modified.
The hugepages parameter can be provided to the kernel to reserve a pool of huge pages. This can also be allocated at runtime using the procfs or sysfs interface.
Specifying the kernel command-line parameter is the more reliable method of allocating hugepage pools, as memory has not yet become fragmented. It's possible hugepage allocation can fail at runtime due to fragmentation.
Kickstart is an installation mechanism provided by Redhat that allows you to install and configure operating systems in an automated fashion. Cobbler is used to automate the kickstart configuration process.
Forcefully terminate a program. This signal is not catchable.
15
SIGTERM
Terminate
Gracefully terminate a program. This is similar in behavior to SIGINT, but it cannot be sent from the keyboard. Parent processes will typically send this signal to its children upon termination.
19, 18, 25
SIGCONT
Continue
Continue execution of a process that was stopped by SIGSTOP. You can also use the bg bash command to continue the process in the background. See Backgrounding a Terminal Process for more details.
17, 19, 23
SIGSTOP
Stop
Ctrl+Z
Stop execution of a process, but allow it to be resumed through SIGCONT.
Kernel Bypass is a technology implemented in Linux (and often other kernels as well) that allows network processing to happen in userspace. This often leads to a huge performance improvement for network-bound applications as the traffic does not have to pass through the kernel-userspace boundary.
Allows the kernel to allocate a circular buffer in userspace so that applications can read their memory directly, instead of making one system call per packet.
This is a type of network socket, originally implemented by Napatech ntop cards, that provides a circular ring buffer of the network traffic. This is a kernel module that you must load. The kernel module polls packets from the NIC through Linux NAPI and copies the packets from the NIC to the ring buffer, which lives in kernel space. The user application mmaps itself to this kernel buffer. PF_RING is capable of delivering packets to multiple ring buffers, which allows each application to be isolated from others.
This is a networking framework for Lua applications that allows the app to completely control a network card. The user application acts as a hardware driver. This is done on the PCI device level by mmapping the device registers with sysfs.
`ionice`` determines the IO scheduling priority and class. The various classes that can be used:
Idle: the program with idle IO priority will only get disk time if nothing else is using the disk.
Best effort: This is the default scheduling class. Programs running best-effort are served in a round-robin fashion.
Realtime: The program with realtime class priority will be given first access to the disk. This must be used with care as there is a potential for realtime processes to starve other processes of disk IO.
The lscpu command will show you which cores are on which NUMA node. If possible, applications should be given CPU affinities that are on a single NUMA node to prevent long-distance memory access on a different node.
udev is a replacement for the Device File System (DevFS) starting with the Linux 2.6 kernel series. It allows you to identify devices based on their properties, like vendor ID and device ID, dynamically. udev runs in userspace (as opposed to devfs which was executed in kernel space).
udev allows for rules that specify what name is given to a device, regardless of which port it is plugged into. For example, a rule to always mount a hard drive with manufacturer "iRiver" and device code "ABC" as /dev/iriver is possible. This consistent naming of devices guarantees that scripts dependent on a specific device's existence will not be broken.
sendfile is an efficient way to copy data between two file descriptors. Because copying is done in kernel space, it eliminates the context switches needed to userspace in operations that would call read then write.