I have been recently watching The Americans, a decade-old TV series about undercover KGB agents living disguised as a normal American family in Reagan’s America in a paranoid period of the Cold War. I was not expecting this weekend to be reading mailing list posts of the same type of operation being performed on open source maintainers by agents with equally shadowy identities (CVE-2024-3094).
As The Grugq explains, “The JK-persona hounds Lasse (the maintainer) over multiple threads for many months. Fortunately for Lasse, his new friend and star developer is there, and even more fortunately, Jia Tan has the time available to help out with maintenance tasks. What luck! This is exactly the style of operation a HUMINT organization will run to get an agent in place. They will position someone and then create a crisis for the target, one which the agent is able to solve.”
The operation played out over two years, getting the agent in place, setting up the infrastructure for the attack, hiding it from various tools, and then rushing to get it into Linux distributions before some recent changes in systemd were shipped that would have stopped this attack from working.
An equally unlikely accident resulted when Andres Freund, a Postgres maintainer, discovered the attack before it had reached the vast majority of systems, from a probably accidental performance slowdown. Andres says, “I didn’t even notice it while logging in with SSH or such. I was doing some micro-benchmarking at the time and was looking to quiesce the system to reduce noise. Saw sshd processes were using a surprising amount of CPU, despite immediately failing because of wrong usernames etc. Profiled sshd. Which showed lots of cpu time in code with perf unable to attribute it to a symbol, with the dso showing as liblzma. Got suspicious. Then I recalled that I had seen an odd valgrind complaint in my automated testing of Postgres, a few weeks earlier, after some package updates were installed. Really required a lot of coincidences.”
It is hard to overstate how lucky we were here, as there are no tools that will detect this vulnerability. Even ex-post it is not possible to detect externally as we do not have the private key needed to trigger the vulnerability, and the code is very well hidden. While Linus’s law has been stated as “given enough eyeballs all bugs are shallow,” we have seen in the past this is not always true, or there are just not enough eyeballs looking at all the code we consume, even if this time it worked.
In terms of immediate actions, the attack appears to have been targeted at subset of OpenSSH servers patched to integrate with systemd. Running SSH servers in containers is rare, and the initial priority should be container hosts, although as the issue was caught early it is likely that few people updated. There is a stream of fixes to liblzma, the xz compression library where the exploit was placed, as the commits from the last two years are examined, although at present there is no evidence that there are exploits for any software other than OpenSSH included. In the Docker Scout web interface you can search for “lzma” in package names, and issues will be flagged in the “high profile vulnerabilities” policy.
So many commentators have simple technical solutions, and so many vendors are using this to push their tools. As a technical community, we want there to be technical solutions to problems like this. Vendors want to sell their products after events like this, even though none even detected it. Rewrite it in Rust, shoot autotools, stop using GitHub tarballs, and checked-in artifacts, the list goes on. These are not bad things to do, and there is no doubt that understandability and clarity are valuable for security, although we often will trade them off for performance. It is the case that m4 and autotools are pretty hard to read and understand, while tools like ifunc allow dynamic dispatch even in a mostly static ecosystem. Large investments in the ecosystem to fix these issues would be worthwhile, but we know that attackers would simply find new vectors and weird machines. Equally, there are many naive suggestions about the people, as if having an identity for open source developers would solve a problem, when there are very genuine people who wish to stay private while state actors can easily find fake identities, or “just say no” to untrusted people. Beware of people bringing easy solutions, there are so many in this hot-take world.
Where can we go from here? Awareness and observability first. Hyper awareness even, as we see in this case small clues matter. Don’t focus on the exact details of this attack, which will be different next time, but think more generally. Start by understanding your organization’s software consumption, supply chain, and critical points. Ask what you should be funding to make it different. Then build in resilience. Defense in depth, and diversity — not a monoculture. OpenSSH will always be a target because it is so widespread, and the OpenBSD developers are doing great work and the target was upstream of them because of this. But we need a diverse ecosystem with multiple strong solutions, and as an organization you need second suppliers for critical software. The third critical piece of security in this era is recoverability. Planning for the scenario in which the worst case has happened and understanding the outcomes and recovery process is everyone’s homework now, and making sure you are prepared with tabletop exercises around zero days.
This is an opportunity for all of us to continue working together to strengthen the open source supply chain, and to work on resilience for when this happens next. We encourage dialogue and discussion on this within Docker communities.
Learn more
- Docker Scout dashboard: https://scout.docker.com/vulnerabilities/id/CVE-2024-3094
- NIST CVE: https://nvd.nist.gov/vuln/detail/CVE-2024-3094