fblktrace - Trace I/O on the file block level

Originally published in 2018-11-12.

When Brendan Gregg gave his Performance Analysis superpowers with Linux BPF talk during the Open Source Summit in Los Angeles last year, he wasn't messing around. Using the new eBPF tracing tools really feels like you gained some x-ray vision powers since, suddenly, opening the program's hood is no longer necessary to see in details how it is behaving internally.

I had the chance of applying these new gained powers last month, when I was asked to develop a small tool to trace when and what parts of each file was being accessed for the first-time during a program initialization. This request came with a few constraints, like the information had to be available independently of the kind of buffered I/O method being used (synchronous, aio, memory-mapping). It also should be trivial to correlate the block data with which files were being accessed and, most importantly, the tracing code should not result in a large performance impact in the observed system.


Posted Sun Apr 21 16:40:31 2019

Performance Analysis in Linux - Part 2

Originally published in 2017-09-19

This blog post is based on the talk I gave at the Open Source Summit North America 2017 in Los Angeles. Let me start by thanking my employer Collabora, for sponsoring my trip to LA.

Last time I wrote about Performance Assessment, I discussed how an apparently naive code snippet can hide major performance drawbacks. In that example, the issue was caused by the randomness of the conditional branch direction, triggered by our unsorted vector, which really confused the Branch Predictor inside the processor.

An important thing to mention before we start, is that performance issues arise in many forms and may have several root causes. While in this series I have focused on processor corner-cases, those are in fact a tiny sample of how thing can go wrong for performance. Many other factors matter, particularly well-thought algorithms and good hardware. Without a well-crafted algorithm, there is no compiler optimization or quick hack that can improve the situation.

In this post, I will show one more example of how easy it is to disrupt performance of a modern CPU, and also run a quick discussion on why performance matters - as well as present a few cases where it shouldn't matter.


Posted Sun Apr 21 16:35:20 2019

Tracing the user space and Operating System interactions

Originally published in 2017-03-31

Like the bug that no one can solve, many issues occur on the interface between the user application and the operating system. But even in the good Open-Source world, understanding what is happening at these interfaces is not always easy. In this article, we review some of the tools to trace the calls being made among the kernel, libraries and the user applications.

Tracing System Calls with strace

=strace= traces both directions of the interaction between the kernel and the evaluated application, namely, it traces when an application executes a system call, and when the operating system sends a signal to the process.


Posted Sun Apr 21 16:28:52 2019

Linux Block I/O tracing

Originally published in 2017-03-28

Like starting a car with the hood open, sometimes you need to run your program with certain analysis tools attached to get a full sense of what is going wrong - or right. Be it to debug an issue, or simply to learn how that program works, these probing tools can provide a clear picture of what is going on inside the CPU at a given time.

In kernel space, the challenge of debugging a subsystem is greatly increased. It is not always that you can insert a breakpoint, step through each instruction, print values and de-reference structures to understand the problem, like you would do with GDB in userspace. Sometimes, attaching a debugger may require additional hardware, or you may not be able to stop at a specific region at all, because of the overhead created. Like the rule of not trying to printk() inside the printk code, debugging at early boot time or analyzing very low-level code pose challenges on itself, and more often than not, you may find yourself with a locked system and no meaningful debug data.

With that in mind, the kernel includes a variety of tools to trace general execution, as well as very specialized mechanisms, which allow the user to understand what is happening on specific subsystems. From tracing I/O requests to snooping network packets, these mechanisms provide developers with a deep insight of the system once an unexpected event occurs, allowing them to better understand issues without relying on the very primitive printk() debug method.

So large is the variety of tools specialized to each subsystem, that discussing them all in a single post is counter-productive. The challenges and design choices behind each part of the Linux kernel are so diverse, that we eventually need to focus our efforts on specific subsystems to fully understand the code, instead of looking for a one-size-fits-all kind of tool. In this article, we explore the Linux block I/O subsystem, in a attempt to understand what kind of information is available, and what tools we can use to retrieve them.


Posted Sun Apr 21 16:23:37 2019

Performance Analysis in Linux

Dynamic profilers are tools to collect data statistics about applications while they are running, with minimal intrusion on the application being observed.

The kind of data that can be collected by profilers varies deeply, depending on the requirements of the user. For instance, one may be interested in the amount of memory used by a specific application, or maybe the number of cycles the program executed, or even how long the CPU was stuck waiting for data to be fetched from the disks. All this information is valuable when tracking performance issues, allowing the programmer to identify bottlenecks in the code, or even to learn how to tune an application to a specific environment or workload.

In fact, maximizing performance or even understanding what is slowing down your application is a real challenge on modern computer systems. A modern CPU carries so many hardware techniques to optimize performance for the most common usage case, that if an application doesn't intentionally exploit them, or worse, if it accidentally lies in the special uncommon case, it may end up experiencing terrible results without doing anything apparently wrong.

Let's take a quite non-obvious way of how things can go wrong, as an example.


Posted Sun Apr 21 16:08:30 2019

Using the ArchC ARMv7 platform simulator

ArchC is a framework to create single process virtual machines by describing the modeled architecture using a high level description language. Number and size of registers, instruction encoding, memory devices and several other architectural details can be described using ArchC.

Nevertheless, ArchC was limited to describe the processor core and a few other things, and wasn't capable of generating more complex platform virtual machines, capable of executing a full operating system, for example.

In this project, we used an ARMv7 core modeled in ArchC and implemented several other peripherals and SoC components required to fully simulate a Freescale iMX53 Quick Start Board. These include functional modules such as Memory Management Units, UARTs, Storage devices, and buses and also some modules to improve the simulator performance, such as simulated TLBs and cache for instructions already decoded. Our goal was to boot a full GNU/Linux under the simulated environment.

The simulator is used since 2013 to teach computer architecture and Assembly language for undergraduate students at UNICAMP. In this course, students are required to implement several parts of the operating system, such as device drivers, schedulers, syscall handlers, and some userland programs in ARM Assembly. All their code is run on the simulator, allowing them to collect information and data about their programs that could not be available otherwise.

The development of this first platform simulator based on ArchC allowed us to identify some deficiencies in ArchC in describing complex models. As a result several modules where included in the ArchC framework, such as a new Decoder Unit and a cache for decoded instructions with support for self-modifying code.

All of the simulator code, the ROM code to perform bootstrapping and the small operating System we created for use in the classroom, are available under a GPLv3 license at my Gitweb page.


Posted Sun Apr 21 15:53:22 2019

SINAR - SINAR Is Not A Radar

The lack of attention, associated with driving at high speed, is considered by the World Health Organization one of the main causes of car accidents worldwide. Along with traffic signals and visual pollution in big cities, several gadgets were introduced into the panel of vehicles, such as GPS devices, parking assistants and rear view cameras. All these new technologies contribute to take the driver's attention away from the road, which might lead to several kinds of accidents. SINAR is a low-cost real-time embedded system projected to redirect the conductors' attention to their own speed.

SINAR is a Free Software tool to help enforcing respect to speed limits by providing conductors with a visual warning of their speed, in a way that does not divert drivers attention to the main task of safely conducting the vehicle. It is built using a video camera capable of capturing images from street lanes, a panel of LEDs to display the vehicle speed and a simple embedded computer, running SINAR application under a distribution of GNU/Linux.


Posted Sun Apr 21 15:46:17 2019

This blog is powered by ikiwiki.