Gnarly Cmake Heisenbug – resolved thanks to strace.

I’m working on a relatively complex project with many dependencies. It is organised as a git super project that aggregates several tens of git submodules. The author has chosen to implement a strategy of performing the submodule work directory management through cmake functions as opposed to using recursive git clone techniques.

Following latest best practice, I am applying continuous integration to this project, but hit a few problems with gitlab authentication. To figure out where the problems with the gitlab authentication were, I started instrumenting the cmake functions to capture the information from the standard output and standard error streams when the git commands were invoked. Since I wanted to be able to see this clearly for each of the large number of submodules, I chose to use the OUTPUT_FILE and ERROR_FILE options from the cmake execute_process function.

At the point of writing the cmake functions, I was sufficiently far away from thinking about the calls that would be made that the abstraction that each would be passed a submodule name left me thinking of the submodule name as a simple token.

When triggering the CI, the build was failing with incomplete working directories for the sub modules. The last output I saw from cmake was a bare “No such file or directory” but with little additional context to infer which missing file or directory was causing the problem.

Eventually, I reached for my most trusted debug tool when I am having difficulty resolving errors from an interacting set of shells – in this case the combination of CI spinning up a docker container, running some kind of bash shell, driven by a CI .gitlab-ci.yml file, running a git clone followed by a cmake instruction, triggering a cascade of subsidiary CMakeLists.txt scripts calling cmake helper functions, calling git. Note that in all of the above, keeping track of the current working directory and various flavours of text variables with much substitution makes following the thread of control difficult. So – simply running “strace -f -o /tmp/trace.output cmake ../” in my own instance of the docker container helped me see the problem immediately: there was an open call trying to acccess a filename with the form “git.output.foo/Bar/Baz” and this directory clearly did not exist, which was what cmake was unhappy about. The problem was obvious in hindsight. For top level submodules with a simple name, “git.output.foo” would be a perfectly valid name for OUTPUT_FILE. For a nested submodule whose name includes a path separator character however, I was asking cmake to create an output file with an invalid path. What confused me was that I was focussed on the error from the git command being invoked, and was ignoring the fact that the cmake machinery might have a separate internal error. Trying to reproduce the error from the git command directly with an interactive shell failed for obvious reasons.

When in doubt, strace, system calls and the data being passed from user space to kernel will tell you a lot about many processes and is really clear at identifying where some classes of fault reside.

STS : Git Hooks

Today’s attempt to “sharpen the saw” (STS) is to get into the world of git hooks to improve my productivity. In particular, to try and keep track of what I’ve been working on, where and when. Conceptually simple, but necessary, as I work on around 50 different machines, so the number of accounts, filesystems, operating systems and so forth that I am involved with can make it difficult to context switch efficiently.

The first concept I want to work on is inspired by some Emacs lisp extensions I wrote 15 years ago to keep track of which files I had been working on. This used emacs hooks to make log entries on file open and file save events. See my-emacs-stuff.

My first attempt at a git post-commit logging script generates two lines of information. The first identifies the working context, such as account, hostname, working directory and working repository. The second summarises the commit. These lines are appended to $GITLOG. The code can be found at my-git-hooks.

As with many git features, this is both really powerful, and yet totally not supported in a way that would be fully useful. What I mean by this is that hook scripts get installed in the .git/hooks folder in a project, and since this location is intrisinsically part of the git metadata infrastructure, it is not permitted to add hooks files in the project to the git repository. Googling will find at least a dozen third party solutions and tool extensions to solve the problem.

Ubuntu 18.04 X on XPS 15 9560

A short note on the steps I needed to go through to successfully get Ubuntu 18.04 dual booting with Windows 10 on a Dell XPS …

Initial installation from USB was relatively pain free, though I had to iterate to install the grub bootloader on the main disk /dev/nvme0n1 rather than on the boot partition I had selected of /dev/nvme0n1p5.

Brief note on grub if you get dropped to a grub shell. The shell is quite useful, but do note that in grub terminology the root device is the partition upon which a grub/grub.cfg may be located and hence if you are using separate /boot and / (root) partitions for a linux install, the grub root is the same as the linux /boot.

After resolving the bootloader issue, the next problem was that although the live installer X session had performed well, and a gdm graphical login page came up, after logging in, the machine simply shut down.

Using a text TTY login, I first make a dpkg-reconfigure console-setup and selected the largest possible (16×32) VGA fonts so that I could work comfortably in the 4k (Google hidpi) screen. I made an apt-get update to update packages, but this did not resolve the issue.

Shutting down the GDM server (/etc/init.d/gdm3 stop) and trying to generate a new Xorg.conf file (Xorg –configure) just resulted in the machine shutting down.

The next gambit was to check out the graphics hardware and consider whether alternate drivers or a different X configuration (wayland/…) would be helpful. From the Windows device manager I could see display hardware including Intel HD graphics 630 complemented by NVidia GeForce GTX 1050. Note to self, the GTX 1050 supports CUDA and this would be interesting to try.

From linux, had I not had a working Windows 10 install – “apt install hwinfo” and “hwinfo –gfxcard –short” were recommended probe commands to find out more, but very surprisingly, the hwinfo command launched X…. and killed the machine. On login, Ubuntu puts up a text message that “Your Hardware Enablement Stack (HWE) is supported until April 2023”. Must find out what that means. “lshw -C display” also triggers the start of an X server, so clearly need to identify the hooks and take them out.

From the Nvidia site, the latest Linux 64 bit series compatible drivers for the GTX 1050 notebook card is the 440.31 series released 2019/11/4 (so either November, or more likely April, given USA date format).

“apt search nvidia-driver” gives the closest equivalent as nvidia-driver-435 which the Nvidia information confirms is also compatible and appears to be an equivalent up to date release but with a shorter intended support life. It was released 29/08/2019. So “apt-get install nvidia-driver-435”, “reboot” and we’re done.

Root filesystem rescues

My Ubuntu 18.04 distribution on a laptop locked up. A power cycle later and I was dropped into a minimal busybox shell. Oops. Time to order a new hard disk and rebuild everything? Perhaps not.

First issue was that the Ubuntu setup had a cryptfs setup, so it was not immediately obvious how to check the filesystem. First attempt – to boot to the secondary Centos7 distribution on the second disk. However, this bailed out at an early stage, leaving a slightly more functional root shell.

The Centos7 was xfs filesystems on top of LVM2. So – from the root shell :

vgscan # List the volume groups, checking that the expected one is found

vgchange -ay # Activate all volumes in the VG

lvs # List the volumes, or alternatively ls -l /dev/<VGNAME>

mkdir -p /mnt/myvg/{root,home}

mount /dev/myvg/root /mnt/myvg/root

mount /dev/myvg/home /mnt/myvg/home

Looking around, the filesystems seemed to be in reasonable shape. I was able to verify also that the filesystems were xfs with couple of “stat -f -c %T /path” commands. I then unmounted both filesystems and checked them with “xfs_repair -n /dev/myvg/root” and similarly for home. Apparently no problems. Studying the output from the boot more closely it became apparent that the kernel command line root attribute had been passed incorrectly. root=/dev/dm-1 was present, but in fact the root filesystem for the Centos7 install was on /dev/dm-0. Correcting this from the command line brought up a functional Centos7. Looking back at the boot partition on the first disk confirmed that the incorrect root device path was in the grub.cfg.

However, Centos7 had no cryptfs tools installed, and reading about the packaging, I discovered that even persuading it to install a cryptfs-utils package would be fruitless without a corresponding kernel module supporting encryption. I tried booting to a rescue distro on a USB, but that lacked cryptfs tools as well.

Booting the first disk again and poking around from the busybox shell, I was able to see that the basic unencryption of the disk had proceeded successfully as I was able to mount root and home partitions via the paths /dev/mapper/volume-name. Again, “stat -f -c %T /path” let me verify that the Ubuntu system was on standard ext partitions. Unmounting again I ran some fsck commands and sure enough, there were some filesystem corruptions that required correction. Fortunately after that, I was back in business and needed only to reboot Ubuntu and then correct the bootloader in case I want to boot Centos7 again at any time in the future.

All in all, a useful half hour revising basic disk recovery procedures and I learned a few handy commands. Plus, during my google searches for how to install various packages on Centos7, I came across an interesting, if rather contentious set of exchanges between the systemd developers and the linux kernel crew. This was very informative about how to constructively approach kernel/user-space interactions.

The second disk boot issue with grub seems to be a bug in grub2 or the os-prober related tool as discussed somewhat here https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1065196

Todo when time permits, follow up on this issue and see if I can figure out how to contribute a fix (assuming it hasn’t been fixed in the meantime, in which case I will just need to upgrade).

Aerobatic Experience

A fun afternoon thanks to the team at https://www.ultimateaerobatics.co.uk/ who definitely deserve the title Ultimate.


20 minute aerobatic flight in an Extra 330LX at White Waltham Airfield, Maidenhead. Thanks to Simon for teaching me to loop and roll, and for demonstrating some of what this incredible machine can do. Highly recommended experience.

The plane, weight around 650kg

https://www.ultimateaerobatics.co.uk/extra-330lx

360 view of the cockpit

https://www.ultimateaerobatics.co.uk/extra-360

The engine, horizontally opposed 6 cylinder 300hp

https://en.wikipedia.org/wiki/Lycoming_IO-580

Power to weight ratio then around 800kg to 300hp or 340W/kg. F1 is 1500W/kg. However, this is irrelevant since when F1 cars go up in the air, it is (a) rarely fun and (b) they don’t manage to stay up very long.

Flight manual here https://docs.wixstatic.com/ugd/dbea38_3f49032053e24cf9a18c991f46204a40.pdf

My pilot – Simon Abbott

5 minute aerobatic demo on youtube with this plane

https://www.youtube.com/watch?v=2fXpDJTwrMQ

More Mandarin Resources

Been catching up on my Mandarin homework and trying to get my writing up to scratch. First task – put together some tools to capture short phrases for reading/writing exercises via worksheets. Already having done some work to capture my lecture notes via LaTeX, it seemed appropriate to knock something together with a Python/LaTeX combination.

So – the initial sketch is done and I’ve put it all on my mandarin repo at github. I’ve also copied the TeX and PDF files for anyone without a Linux setup ready to reproduce the whole workflow for themselves.

One itch left to scratch – I had to drop back to Emacs to edit this, since I’ve got the input tools set up nicely to switch via Ctrl+8/Ctrl+9 between English and Pinyin input. Works flawlessly for Emacs. However, getting Vim to play ball with it’s modal editing style is a more interesting challenge. Number of keen Vim users that have English as a native language and want to write in Chinese? Relatively few that I have found so far.

Links

Clang Powertools

Context : following episode 103 of Jason Turner on ‘Learning Modern C++’ on a Windows 10 laptop. In theory, having installed MS Visual Studio 2017 Community Edition, plus the LLVM tools, plus the Clang Power tools, cppcheck and a couple of other extensions, typing C++ errors should automatically give clang-tidy warnings and fixes. Nope.

Just get a bunch of errors on triggering the clang-tidy in the output window stating that the .ps1 (power shell?) script in the psClang extensions project cannot overwrite variable HOME because it is read-only or constant. Googling doesn’t get me very far. Also – the features that were in existence in the GUI when JT’s video were made have shifted.

Just another day in open source software world where features evolve so fast, but they’re not really easy to make sure they will continue working.

Could be a setup/permissions issue to do with my work laptop, so next task is to try and reproduce on another Windows 10 box, ideally one with a standard installation. Can we do this from Linux with Virtual Box (at least for the purposes of testing? Perhaps – but a task for another day.

First check was to dust off an old Acer laptop with a (albeit very slow) standard Windows 10. Repeated the installs of MS Visual Studio, clang powertools, clang format, cppcheck, plus LLVM. This time get the warnings, but not the fix option, and more significantly, don’t get nearly as many of the warnings as Jason did.

Time to see if I can run the clang tools independently for linting from the command line on linux (I assume the clang powertools and clang format VS code extensions are just wrappers over clang CLI tools).