Ancora Imparo: July 2025

Tuesday, July 29, 2025

CST334: Week 5 (Week 29)

This week we learned about threads, concurrency, and locks.

We learned that a process has at least one thread and can have multiple threads. Each one has its own stack. Concurrency is when multiple threads are able to time share on a cpu. The threads aren’t actually running simultaneously on a single core since only one run at a time but they can time share. Sharing resources between threads while avoiding data corruption and undefined behavior can be a challenge. We learned how locking is a powerful solution to this problem. In one of our simple examples we discussed how in a situation with two threads, with both threads reading, writing, incrementing, and decrementing a global variable in a process, the variable’s state can become unpredictable. If a mutex is defined such that each process grabs the lock before R/W or modify operations, the value of the shared variable will be predictable and accurate.

Locks avoid race conditions where the value of a shared resource like a variable depends on which thread accesses or modifies it first. Determinism is when a program's output or behavior is able to be predicted based on the inputs and the code. Locks help ensure this as demonstrated with the race condition. Atomicity is implemented at the hardware and is necessary for locks to work. An operation is atomic if it executes on the CPU in one uninterruptible operation. We learned that the test-and-set instruction is one such operation. Without this hardware/ISA supported operation, mutexes would not work.

Lastly we learned about critical sections and considerations around blocking. Critical sections are portions of code that are executed between the lock and unlock operations. Ideally they should be as small as possible so that a thread does not hog CPU time. Blocking happens when a thread needs to wait for some resource to become available or for a task to complete. For example, a thread may block while waiting for data to be returned by a disk read operation.

We’re well beyond the days of single core CPUs being dominant in most devices and a lot of applications that we use everyday run with multiple threads. This section was very helpful in demystifying what happens under the hood.

Monday, July 21, 2025

CST334: Week 4 (Week 28)

This week we learned about paging, swapping, caching, and translation look aside buffers. Though paging seems slightly more complex than swapping, it strikes me as a better approach if a system can only use one or the other. Segmentation seems more intuitive since it involved logical units that come up a lot in other classes outside of OS and compiler courses - code segment, stack segment, and heap segment. It focused on those segments, their bases, and their offsets. That said, this week we learned how paging uses a set block(page) size in virtual and physical memory addressing which applies to the entirety of the process's memory - code, stack, and heap. Having fixed page sizes can lead to some internal fragmentation but it reduces external fragmentation since by design, pages are sized to be small enough to minimize internal fragmentation but large enough for even a single page to be useful in some scenarios. Other than that, it takes practice to understand the virtual to physical address translation and being able to do it by hand but it comes to get quickly enough.

We also covered swapping which allows pages to be swapped between main memory and secondary storage (e.g. disks and flash). Since memory is expensive and is limited, since many processes can be present in memory at once, and since an entire process does not need to be in main memory for the entire time that it is running, the goal of swapping is to keep relevant data and code in main memory. When additional data is needed, it is swapped into main memory. Knowing how much slower RAM is than for example an NVMe SSD, ideally when configuring a system we want to pay attention to how much RAM we equip it with.

We also learned about caching. Cache is another type of volatile, on-chip memory. Faster than RAM but not in the datapath like actual CPU registers. Depending on the algorithm data that has been accessed recently or is adjacent to recently accessed memory is kept in cache memory. Lastly, we learned about translation look aside buffers. TLBs store recently used virtual to physical memory address mappings. Otherwise, address translation has to be performed and that is a slower operation.

The last two weeks have been focused on process memory addressing, protection, speed, and sizing concerns. We only scratched the surface of this topic compared to the available implementations and the body of work on the subject but I feel like I now have a much better grasp of how it works.

Sunday, July 13, 2025

CST334: Week 3 (Week 27)

This week we spent time looking at memory and some of the challenges and solutions to maintaining performance and safety on systems that need to multi-task with limited resources. We learned how process address spaces are defined, how address space separation is maintained, how multiple processes can co-exist in memory, and how the memory hierarchy can be used to maximize resource sharing for many processes with finite system resources.

We learned about virtual memory and address translation which is very important to how modern general purpose operating systems function. A system has a finite amount of RAM/memory but has many processes active at the same time. To maintain the illusion that each process has all of its resources available in memory, virtual addresses and address translation are used to employ different layers of the memory hierarchy. Whereas it appears to a process that all of its resources exist in RAM, they may actually map to flash memory, for example. Virtual addresses, bases, bounds, and offsets are what enable this approach. Highlighting the benefit of having an MMU was important since it explains how address translations can be executed quickly in hardware, how an extra layer of security is enforced, and why Linux, Windows, and MacOS can be run on an ARM Cortex-A, but not on an ARM Cortex-M.

We learned how process memory is broken down into segments - code, stack, and heap. The clear benefit here is that a process doesn't require a single contiguous block of memory to run. This would be prohibitive. Instead, smaller portions can be allocated and distributed on the memory hierarchy using virtual memory.

We learned about different memory allocation schemes and their trade-offs. Best fit, first fit, worst fit, etc. Some approaches are better at avoiding fragmentation, some are faster, some are less likely to result in failed allocations. There is no perfect scheme in most cases since in any non-trivial application, it isn't possible to predict how much memory will be requested. I was able to look at the Linux kernel's approach to memory allocation and learned that those simple approaches are still an important part of how deployed, production OSes manage free memory.

Tuesday, July 8, 2025

CST334: Week 2 (Week 26)

This week we covered scheduling, processes and C process APIs (fork, yield, etc). We learned about the relationship between parent and child processes and how they share memory space. We also learned about software threads and how they are functional execution units of processes.

With regard to the scheduling algorithms we were able to see how starvation could happen. For example, in a preemptive scheduling system with a simplistic shortest duration scheduler, as long as subsequent tasks have shorter durations, a longer task could starve. We were also able to see that scheduling algorithms necessarily make trade-offs. An algorithm like round robin buys us some level of fairness but it’s at the cost of longer turn around times. The shortest remaining time first algorithm cuts down on average response time but as I mentioned before, there is the risk of process starvation.

Lastly, we spent some time learning about limited direct execution where even though applications run on the CPU directly, they do not have access to all of the system's memory. We learned about user mode and kernel mode privileges. If an application wants to access hardware, it needs to initiate a system call that initiates a software trap. That allows kernel mode software modules to perform low level operations before returning control to the application. Context switches come with a time penalty since they require saving the CPU state to memory and restoring it when the task is complete. Still, it is worth it since limited direct execution makes systems more secure and stable by preventing bad actors from accessing sensitive areas of memory and preventing fault scenarios and bugs in code from putting the system in a fault state.

Tuesday, July 1, 2025

CST334: Week 1 (Week 25)

We started off by learning/reviewing basic computer architecture and laying the groundwork for this class. That included things like the memory hierarchy and trade-offs, what is meant by instruction set architecture, what makes an operation atomic, etc.

We learned about how to deploy a docker container on our system as a way to run a lightweight OS environment within the host OS environment. In this case it was a good way to distribute a sandbox environment to everyone in the class.

We learned about the Windows powershell which has many of the commands that are so useful in *nix systems. To use those commands, we learned about man (manual) pages and common command line switches for common command line utilities. For example, the “-h” or “--h” option almost always displays the help page for the utility. For the ls command, “-l” or long format is useful for displaying more data about directory contents, and “-a” or all enables displaying hidden files.

Those commands are useful in shell scripts, which we also learned about. Shell scripts can be as simple as just being a list of commands in a text file. They are a good way to automate sequences of commands or operations that you might normally do by hand - typing in each command at the prompt.

We also got some exposure to C and learned about how it is different from C++ and Java. One key difference is that it is not an object oriented language. It also handles references differently and by default, function parameters are passed by value. Another distinction is that garbage collection is not automatic for variables allocated on the heap. It’s something that has to be remembered to prevent memory leaks.

We learned a bit about Makefiles and the GNU debugger, gdb. Compiling with the debug flag makes sure that debug symbols are built into the output executable. These symbols allow us to do things like set breaking points and step through the code.

This week was about pinning down the fundamentals.

Ancora Imparo