Broadcom eCos | Firmware Analysis with Ghidra
In this post I’ll share tools, tips, and tricks to help you reverse engineer an eCos firmware image dumped from a Broadcom eCos BFC cable modem. I consider that you have an extracted firmware image with you and the latest version of Ghidra installed.
If you don’t know how to dump a firmware image, head over to Writing a device profile for bcm2-utils.
Extracting ProgramStore Images
Broadcom use a custom format to store firmware images called “ProgramStore”. This format is composed of a header and then the actual firmware content compressed with one of 5 supported methods of compression. The compression mode is set in the file header.
Broacom released sources related to ProgramStore when they released their bootloader for the Zephyr platform, you can find the code at https://github.com/Broadcom/aeolus.
Let’s take a look at the header definition first:
We can visually represent this structure as follows:
Using the ProgramStore utility, it’s easy to decompress the actual firmware from a ProgramStore file:
You can then import the resulting file into Ghidra.
Loading firmware in Ghidra
All the Broadcom based eCos cable modems runs MIPS 32bit big endian chipsets, so you can set the target architecture to that without hesitation. You also have to set the right load address, which we read from the ProgramStore file header (here it’s 0x80004000).
Ghidra provides an interesting feature called FunctionID. It is similar to what IDA provides under the FLIRT name or Binary Ninja “Signature Libraries”.
I won’t go into the details but basically it will compute instruction patterns of each function you have currently defined in the binary you’re analyzing and save these patterns in a database. When you load another binary, you can run FunctionID which will execute pattern matching against all functions and rename them accordingly.
If you want to dig deeper into the subject I recommend you go to Hex Ray’s blog and specifically read IDA F.L.I.R.T. Technology: In-Depth.
Ideally, what we want is a database of patterns matching all the eCos standard library functions. To generate such a database, we will follow these steps:
- Download the eCos source code
- Cross-compile each eCos subsystem to a MIPS32 big endian ELF object files
- Load all object files to a dedicated Ghidra project subdirectory
- Run FunctionID analysis on all loaded object files
- Export the FunctionID database
The FunctionID auto-analysis is largely inspired by threatrack’s work at https://blog.threatrack.de/2019/09/20/ghidra-fid-generator/
Downloading eCos Source Code
While eCos sources can be downloaded from their original repositories, two cable modem vendors (Technicolor and Netgear) released the specific versions they use in order to honor the GPL.
There is exactly no differences whatsoever between the code released by these vendors. It’s eCos version 2.0 with three different profiles: BCM33 chipsets, BCM33 chipsets with IPv6 support, BCM33 chipsets with SMP support.
You can take a look yourself by checking out these sources:
So we can assume that the Broadband Foundation Classes provided by Broadcom under the “BFC” name are all based on eCos version 2.0. The standard eCos packages and libraries will therefore be the same between every firmware that is built for Broadcom chipsets.
The eCos license allows commercial users of eCos not to release the code they built on top of eCos so we will miss the actual BFC libraries that are closed source.
Cross-compilation of shared object files
Getting the proper toolchain to build eCos is a real pain. The instructions are clear but they are based on quite old software tools. I finally managed to get everything right by doing everything on a Centos box using Vagrant.
Once everything is done, you’re left with a bunch of
.o files that we can import into Ghidra.
For the next steps, I’ll be relying on scripts developed by Threatrack to auto-import and auto-generate FIDB files. These scripts expect the imported files to follow a specific structure (root/library_name/version/variant/file.o).
I developed the Python script below to move and rename files around so that our generated eCos object files would follow this naming scheme.
The result once imported:
Auto-import and analysis of shared object files
Here I’m relying on two scripts from threatrack.
After all these steps we’re left with a FIDB file holding 2180 function signatures spanning 26 standard libraries. This will be super helpful during our analysis process.
The FIDB can be downloaded here.
Setting Up Memory Mappings
The first step is to identify the different memory regions using the script we developed when we reverse engineered the eCos memory layout.
Click on ‘Window’ -> ‘Memory Map’, select the RAM line and click on ‘Split’ icon.
Here we split RAM in two regions: .text (code) and .data (data):
Once that’s done, we can add new regions. We can add the BSS as an overlay:
The stack, also as an overlay:
You can define locations of vectors related to interrupt and exception handling:
This should be what you’re left of with:
A nice addition would be to dump the section from 0x80000000 to 0x80004000 and append it to our image, in order to get exception and interrupt handlers in our analysis view.
Automated Function Renaming
When reversing my first firmware images, I identified tracing functions left by Broadcom. The first one logs a message in the form
-<%s>-\t Entering func \n with the function name it’s called from as first parameter.
This means we could trace all calls to that function and use the third argument to effectively rename the function it’s called from:
Two other functions are called almost always in pairs. The first one logs the function name while the other logs “Entering” or “Leaving” then the function name:
The third one I identified seems to set these strings into the C++ class definition.
Similar conventions were observed for other logging functions:
To take advantage of that, I wrote a custom Ghidra script that given a logging function would:
- get a list of all functions calling that logging function (cross-references)
- for each call, get the pointer value that is put into $a1, $a2, or $a3 depending on the logging function parameters
That script is available on its dedicated repository
Some stats on current projects:
- ASKEY: 54667 functions identified by Ghidra, 3179 auto-renamed with the script, 1972 identified with eCos FIDB (5151 functions identified, which is close to 10% of the binary that was identified).
- Netgear: 50138 functions identified by Ghidra, 2603 auto-renamed with the script, 1972 identified with eCos FIDB (4575 functions identified, which is close to 10% of the binary that was identified).
Automated VTable Identification
By looking at the function names observed in logging calls, we see the “classname::function_name” nomenclature, which indicates usage of C++.
If you look at constructor functions - considering you set the function calling convention to “this call” - you’ll see the this pointer set to a specific address:
That address is the class virtual table, which holds pointers to the class functions:
In the screenshots, everything is already named right but this is actually the work of another Ghidra script that I wrote. The script goes over all the ‘PTR_FUN’ labels and checks the function name, if the function name follows the C++ naming convention, it will rename the label to class_name::vftable.
This is super helpful because now we have even more context such as inheritance and implementation of classes. On top of that we can derive other function names based on the structure of the class that implements it.
An excellent example are all the NonVolSettings classes that implements specific sections of nonvol settings. Each of these classes follows pretty much the same structure with WriteTo, ReadFrom, ReadFromImpl and WriteToImpl functions. Even if a class that inherits from NonVolSettings did not implement verbose logging calls, we still can derive its function names given that we know the structure of other NonVolSettings classes vtables.
Importing Data Structures from C Headers
Theoretically, you could import data structures from C headers coming from the eCos source code. However, I don’t think this brings a lot of added value at the moment. This would only concern standard library function calls, and would not help you with custom code coming from either Broadcom or the vendor they partnered with. I think it’s more efficient to check the eCos doc whenever you’re tracing such a call than go through the painful process of importing these structures. On top of that, Ghidra does not support having multiple names for the same data type (as in, multiple typedefs call), which pretty much leads to losing all context.
In this article we demonstrated how to properly extract and load a ProgramStore image into Ghidra, set memory mappings, and perform auto-identification of functions and classes by taking advantage of Ghidra scripting capabilities.
My final objective with this is to implement a complete Ghidra loader that would execute all these actions in one go, but I still have to find the time to do that.
At this point, all that is left from a vulnerability research perspective is to identify dangerous function calls (using Rhabdomancer for example), and work your way from there by using the context gained from the automated renaming scripts. The heavy lifting is done, now all you have to do is rename or re-type a few things here and there to get the bigger picture.
As always, if you have any question feel free to contact me via Twitter or email.
Tagged #ecos, #reversing, #ghidra, #broadcom.