Multiprocessing by forking children PIDs

Published on 2024-12-29

Could be Interesting!

Parallel Processing (concurrency) with children

System calls

System calls are implemented as functions or procedures within the kernel, which manage the core functionality of the Operating system. For example, open, close, read and write are 4 system calls for interacting with the filesystem. The easiest way to learn more about these functions are simply typing man open in the terminal. man stands for manual.

int open(const char *pathname, int flags, mode_t mode); // int fd = open(argv[1], O_WRONLY | O_CREAT | O_EXCL, 0644);
int close(int fd);
ssize_t read(int fd, void *buf, size_t count);
ssize_t write(int fd, const void *buf, size_t count);
Above open function returns a file descriptor number for the opened file or -1 on error. And touch command, which creates a file if not exists, utilizes open command. Note: file descriptor 0 is already reserved for standard input (stdin), 1 for standard output (stdout) and 2 for standard error (stderr).

Fork

Now that we get this out of the way, forking - fork command create a second process that is a clone of the first. This clone is called a child of the process 'parent' from which it is copied from. The child also inherit the virtual memory address from the parent. They share the same physical addresses until one of them changes its memory contents to be different than the other. This is called lazy change. Note: The idea of forking is not to split the same task but rather to perform different tasks in parallel.

pid_t pidOrZero = fork();
if (pidOrZero == 0) {
  execvp // run and terminate, Only executed by the child. It is important to terminate the child otherwise they will become a zombie process.
} else {
  waitpid // wait for the children, Only executed by the parent
}
 // Executed by both parent and child (if they get here)
A child process can also call fork to spawn its own child process.

Pipes and signals

pipes and signals are commands we can utilize for parent-child communication and child-child communication.

int pipe(int fds[]);
How? We can simply set the child STDIN to STDOUT of the parent. 0 is for standard input (stdin), 1 for standard output (stdout). And that is how this command works:
cat file.txt | uniq | sort
A process can issue a Signal to other processes who are listening to the signals. This is kind of similar to event-driven executions. There are many predefined signals: SIGINT (Ctl+C for terminating a process), SIGSTOP, SIGKILL, SIGCONT, SIGCHLD (when a child changes state, the kernel sends a SIGCHLD signal to its parent) and many more. But Signals are asynchronous which means they can interrupt a program at any point in its execution and this can lead to unintended consequences.

An Example

cat file.txt | uniq | sort
1. Terminal Read Loop: The terminal runs in a loop, waiting for the user to type a command and press Enter. (STDIN).
2. Parsing the input: Tokenize the inputs with delimiter '|' and each command’s output is connected to the next command’s input.
3. Pipeline: The shell creates pipes using the pipe() system call for inter-process communication. The output of cat is written to Pipe 1. uniq reads from Pipe 1 and writes to Pipe 2. sort reads from Pipe 2 and sends the final output to the terminal. cat (STDOUT) --> uniq (STDIN). uniq (STDOUT) --> sort (STDIN). sort (STDOUT) --> parent (STDOUT).
4. Forking: Child processes are spawned for each command. Child process executes with command execvp and then terminates.

Note: we can't determine the order of execution for the children processes. This is decided by the kernel and it is random. Here, the child processes wait for each other implicitly due to their dependencies on the pipe. uniq can’t do anything until cat writes data. Likewise, sort can’t do anything until uniq writes data. In other words, data flow dictates the flow of the execution. This is not an example of concurrency but rather to demonstrate the pipe communication between processes.