Reversing ESP8266 Firmware (Part 4)
Writing an IDA loader
So, why a loader? The main reason was that I wanted something I could re-use when reversing future ESP8266 firmware dumps.
Our loader will be quite simple. IDA loaders typically define the following functions:
def accept_file(li, n): def load_file(li, neflags, format):
The first is responsible for identifying an applicable file, based on its signature and is executed when you open a file in IDA for analysis. The second, for interpreting the file, setting entry points, processor, as well as loading and naming segments accordingly. Our loader won’t perform any sanity checking, but should be able to load an image for us.
My loader is derived from the existing loader classes shipped with IDA and of-course, is built to take into account the format we’ve dissected above. It will attempt to identify the firmware image based on signature (image magic), followed by loading each of the segments into memory, whilst trying to guess the names and types of segments based on their loading address.
Below is the Python code for our loader, which lives in IDA’s loader directory:
#!/usr/bin/python from struct import unpack_from from idaapi import * def accept_file(li, n): retval = 0 if n == 0: li.seek(0) if li.read(2) == "e901".decode("hex"): retval = "ESP8266 firmware" return retval def load_file(li, neflags, format): li.seek(0) # set processor type (doesn't appear to work) SetProcessorType("xtensa", SETPROC_ALL); # load ROM segment (magic, segments, flash_mode, flash_size_freq, entrypoint) = struct.unpack('<BBBBI', li.read(8)) print "Reading ROM boot firmware" print "Magic: %x" % magic print "Segments: %x" % segments print "Entry point: %x" % entrypoint print "\n" (rom_addr, rom_size) = unpack_from("<II",li.read(8)) li.file2base(16, rom_addr, rom_addr+rom_size, True) add_segm(0, rom_addr, rom_addr+rom_size, ".boot_rom", "CODE") idaapi.add_entry(0, entrypoint, "rom_entry", 1) print "Reading boot loader code" print "ROM address: %x" % rom_addr print "ROM size: %x" % rom_size print "\n" # Go to user ROM code li.seek(0x1000, 0) # load ROM segment (magic, segments, flash_mode, flash_size_freq, entrypoint) = struct.unpack('<BBBBI', li.read(8)) idaapi.add_entry(1, entrypoint, "user_entry", 1) print "Reading user firmware" print "Magic: %x" % magic print "Segments: %x" % segments print "Entry point: %x" % entrypoint print "\n" print "Reading user code" for k in xrange(segments): (seg_addr, seg_size) = unpack_from("<II",li.read(8)) file_offset = li.tell() if(seg_addr == 0x40100000): seg_name = ".user_rom" seg_type = "CODE" elif(seg_addr == 0x3FFE8000): seg_name = ".user_rom_data" seg_type = "DATA" elif(seg_addr <= 0x3FFFFFFF): seg_name = ".data_seg_%d" % k seg_type = "DATA" elif(seg_addr > 0x40100000): seg_name = ".code_seg_%d" % k seg_type = "CODE" else: seg_name = ".unknown_seg_%d" % k seg_type = "CODE" print "Seg name: %s" % seg_name print "Seg type: %s" % seg_type print "Seg address: %x" % seg_addr print "Seg size: %x" % seg_size print "\n" li.file2base(file_offset, seg_addr, seg_addr+seg_size, True) add_segm(0, seg_addr, seg_addr+seg_size, seg_name, seg_type) li.seek(file_offset+seg_size, 0) return 1
As you can see, the user segment loading loop, which iterates over each of the segments within ROM 1, attempts to perform some basic classification and naming based on the load address of the given segment, per our rules mentioned earlier.
if(seg_addr == 0x40100000): seg_name = ".user_rom" seg_type = "CODE" elif(seg_addr == 0x3FFE8000): seg_name = ".user_rom_data" seg_type = "DATA" elif(seg_addr <= 0x3FFFFFFF): seg_name = ".data_seg_%d" % k seg_type = "DATA" elif(seg_addr > 0x40100000): seg_name = ".code_seg_%d" % k seg_type = "CODE" else: seg_name = ".unknown_seg_%d" % k seg_type = "CODE"
With this loader in use, IDA now recognises our firmware image:
Our segments look a lot tidier:
And we have an entry point! (of the user ROM):
Whilst we’re in a good state to perform cursory analysis, we don’t have any function names to base our analysis on. Ideally, we’d like to identify the routine(s) responsible for connecting to a given port and locate the references to that function, as well as make sense of any other library function calls. This will allow us to discover the ports knocked on, as well as the order of which knocking should take place.
Performing library recognition
There are known and documented methods to identify library functions within a statically linked, stripped image. The most known of which is to use IDA’s Fast Library Acquisition for Identification and Recognition (FLAIR) tools, which in turn creates Fast Library Identification and Recognition Technology (FLIRT) signatures.
The process of creating FLIRT signatures usually requires a number of prerequisite conditions to exist:
- A pattern file must be created via either pelf or similar, followed by use of sigmake
- A compiled, relocatable library containing the functions and associated names, of which signatures are to be generated against, must exist
- The library must be a recognised format and with a supported instruction set
This poses two problems, the first is that we don’t have such a library available to us at present, the second is that Xtensa is not a supported processor type, as shown below.
josh@ioteeth:/tmp/flair68/bin/linux$ ./pelf ELF parser. Copyright (c) 2000-2015 Hex-Rays SA. Version 1.16 Supported processors: MIPS, I960, ARM, IBM PC, M6812, SuperH Usage: ./pelf [-switch or @file or $env_var] file [pattern-file] (wildcards are allowed)
The result is that we can’t create pattern files using IDA’s traditional toolset.
The solution to these problems, which we’ll tackle in a moment (not without their own obstacles) are as follows:
- We need to install a suitable IDE capable of compiling code for the ESP8266
- We need to write code that hopefully, uses the same libraries as our target
- We need to compile our code into an ELF file that is statically linked, unstripped and with debug info.
- We need to find a way to create signatures from said ELF file
The first step is involved and beyond the scope of this blog post. I’ve opted to use Arduino IDE and configured it to compile for a generic ESP8266 module, with verbose compiler output enabled.
With our environment configured, we can look up example sketches for the ESP8266, we want to find one that performs a similar function to our target. Fortunately, a Github of example code exists, which can help us.
Searching the repository, we see a promising file, WiFiClient.ino, which contains the following code:
/* This sketch sends data via HTTP GET requests to data.sparkfun.com service. You need to get streamId and privateKey at data.sparkfun.com and paste them below. Or just customize this script to talk to other HTTP servers. */ #include <ESP8266WiFi.h> const char* ssid = "your-ssid"; const char* password = "your-password"; const char* host = "data.sparkfun.com"; const char* streamId = "...................."; const char* privateKey = "...................."; void setup() { Serial.begin(115200); delay(10); // We start by connecting to a WiFi network Serial.println(); Serial.println(); Serial.print("Connecting to "); Serial.println(ssid); /* Explicitly set the ESP8266 to be a WiFi-client, otherwise, it by default, would try to act as both a client and an access-point and could cause network-issues with your other WiFi-devices on your WiFi-network. */ WiFi.mode(WIFI_STA); WiFi.begin(ssid, password); while (WiFi.status() != WL_CONNECTED) { delay(500); Serial.print("."); } Serial.println(""); Serial.println("WiFi connected"); Serial.println("IP address: "); Serial.println(WiFi.localIP()); } int value = 0; void loop() { delay(5000); ++value; Serial.print("connecting to "); Serial.println(host); // Use WiFiClient class to create TCP connections WiFiClient client; const int httpPort = 80; if (!client.connect(host, httpPort)) { Serial.println("connection failed"); return; } // We now create a URI for the request String url = "/input/"; url += streamId; url += "?private_key="; url += privateKey; url += "&value="; url += value; Serial.print("Requesting URL: "); Serial.println(url); // This will send the request to the server client.print(String("GET ") + url + " HTTP/1.1\r\n" + "Host: " + host + "\r\n" + "Connection: close\r\n\r\n"); unsigned long timeout = millis(); while (client.available() == 0) { if (millis() - timeout > 5000) { Serial.println(">>> Client Timeout !"); client.stop(); return; } } // Read all the lines of the reply from server and print them to Serial while (client.available()) { String line = client.readStringUntil('\r'); Serial.print(line); } Serial.println(); Serial.println("closing connection"); }
Based on the included files, we can see that this code uses the ESP8266WiFi library, which was displayed in our strings output earlier:
josh@ioteeth:/tmp/reversing$ strings recovered_file | grep -i wifi /tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h /tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h /tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/libraries/ESP8266WiFi/src/include/DataSource.h [...] ap_probe_send over, rest wifi status to disassoc WiFi connected
This is a good sign, as it’s indicative that at the very least, we’re compiling a Sketch which uses the relevant, identical or similar libraries (there may be version discrepancies) to our target firmware image. This increases the likelihood of successful function identification, based on the signatures we’ll obtain.
Compiling the above sketch, results in the following notable compiler output:
"/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/tools/esptool/esptool" -eo "/tmp/esp8266/arduino-1.8.5/hardware/esp8266com/esp8266/bootloaders/eboot/eboot.elf" -bo "/tmp/arduino_build_867542/sketch_may24a.ino.bin" -bm qio -bf 40 -bz 512K -bs .text -bp 4096 -ec -eo "/tmp/arduino_build_867542/sketch_may24a.ino.elf" -bs .irom0.text -bs .text -bs .data -bs .rodata -bc -ec
Which presents us with an ELF file, prior to its transformation into firmware, which is as follows:
josh@ioteeth:/tmp/reversing$ file /tmp/arduino_build_867542/sketch_may24a.ino.elf /tmp/arduino_build_867542/sketch_may24a.ino.elf: ELF 32-bit LSB executable, Tensilica Xtensa, version 1 (SYSV), statically linked, with debug_info, not stripped
Loading this ELF file into IDA, we can see we’ve got sensible function names! As depicted below:
So, how can we generate a pattern file from the above ELF to create a FLIRT signature? After much research, I found Fire Eye’s IDB2PAT tool, created by the FLARE the division of Fire Eye.
This tool is described as follows:
This script allows you to easily generate function patterns from an existing IDB database that can then be turned into FLIRT signatures to help identify similar functions in new files. More information is available at: https://www.fireeye.com/blog/threat-research/2015/01/flare_ida_pro_script.html
Fixing IDB2PAT
Having installed this plugin, it initially didn’t work at all for my version of IDA (6.8). This appeared to be the result of IDA using QT5 as opposed to Pyside in later versions (7.x), where the plugin was migrated to support version 7.x of IDA and not version 6.8.
Scrolling through the plugin’s known issues, someone pointed out the above and recommended an earlier version be used, which worked with IDA 6.8. I checked out an earlier commit. No more IDA plugin errors.
Did the plugin work? No. It got stuck in an infinite loop upon being launched. It turned out this issue was related to the version I had containing a bug, where functions less than 32 bytes would cause an infinite loop. To fix this issue, I downloaded the latest version of the individual script file, in which the bug was apparently fixed.
The result, yet another issue:
This was seemingly due to a version discrepancy between the installed and targeted IDA SDK. I fixed the plugin by updating the relevant function call “get_name(…)” to “GetFunctionName(…)”. I also added code to ignore functions that started with the word “sub_”, as these were undefined and not useful to me.
# ported from IDB2SIG plugin updated by TQN def make_func_sig(config, func): """ type config: Config type func: idc.func_t """ logger = logging.getLogger("idb2pat:make_func_sig") if func.endEA - func.startEA < config.min_func_length: logger.debug("Function is too short") raise FuncTooShortException() ea = func.startEA publics = [] # type: idc.ea_t refs = {} # type: dict(idc.ea_t, idc.ea_t) variable_bytes = set([]) # type: set of idc.ea_t if(GetFunctionName(ea).startswith("sub_")): logger.info("Ignoring %s", GetFunctionName(ea)) raise FuncTooShortException
Generating our pattern file
With these changes made, the plugin appeared to work:
[...] INFO:idb2pat:make_func_sigs:[ 38 / 1888 ] RC_GetAckTime 0x401010b8L INFO:idb2pat:make_func_sigs:[ 39 / 1888 ] RC_GetCtsTime 0x401010ccL INFO:idb2pat:make_func_sigs:[ 40 / 1888 ] RC_GetBlockAckTime 0x40101104L INFO:idb2pat:make_func_sigs:[ 41 / 1888 ] sub_40101144 0x40101144L INFO:idb2pat:make_func_sig:Ignoring sub_40101144 INFO:idb2pat:make_func_sigs:[ 42 / 1888 ] sub_40101178 0x40101178L INFO:idb2pat:make_func_sig:Ignoring sub_40101178 INFO:idb2pat:make_func_sigs:[ 43 / 1888 ] sub_4010122C 0x4010122cL INFO:idb2pat:make_func_sig:Ignoring sub_4010122C INFO:idb2pat:make_func_sigs:[ 44 / 1888 ] sub_4010125C 0x4010125cL INFO:idb2pat:make_func_sig:Ignoring sub_4010125C INFO:idb2pat:make_func_sigs:[ 45 / 1888 ] sub_401012F4 0x401012f4L INFO:idb2pat:make_func_sig:Ignoring sub_401012F4 INFO:idb2pat:make_func_sigs:[ 46 / 1888 ] rcUpdateTxDone 0x40101350L [...]
Finally, we had a generated pattern file, of which we could run sigmake against.
josh@ioteeth:/tmp/flair68/bin/linux$ ./sigmake ../../../reversing/sketch_may24a.ino.pat /tmp/reversing/esp_lib_sigs.sig /tmp/reversing/esp_lib_sigs.sig: modules/leaves: 1538/1547, COLLISIONS: 6 See the documentation to learn how to resolve collisions.
We can see six collisions have occurred. In this context, a collision is generated when sigmake encounters the same signature for more than one function. When this happens, it will generate a .exc file listing the collisions, which we can modify to instruct IDA to use one signature over another, for example.
josh@ioteeth:/tmp/flair68/bin/linux$ scat /tmp/reversing/esp_lib_sigs.exc ;--------- (delete these lines to allow sigmake to read this file) ; add '+' at the start of a line to select a module ; add '-' if you are not sure about the selection ; do nothing if you want to exclude all modules wifi_register_rfid_locp_recv_cb 00 0000 12C1F00261008548EA02210012C110800000............................ wifi_unregister_rfid_locp_recv_cb 00 0000 12C1F00261008548EA02210012C110800000............................ _ZN5Print5printEPKc 00 0000 12C1F0093185FCFF083112C1100DF0.................................. udp_new_ip_type 00 0000 12C1F0093185FCFF083112C1100DF0.................................. pgm_read_byte_inlined 00 0000 2030143022C02802D033110003402030913020740DF0.................... pgm_read_byte_inlined_0 00 0000 2030143022C02802D033110003402030913020740DF0.................... _ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE0EED2Ev 00 0000 31FFFF39020DF0.................................................. _ZN10DataSourceD2Ev 00 0000 31FFFF39020DF0.................................................. _ZN14HardwareSerialD2Ev 00 0000 31FFFF39020DF0.................................................. _ZN2fs8FileImplD2Ev 00 0000 31FFFF39020DF0.................................................. _ZN2fs7DirImplD2Ev 00 0000 31FFFF39020DF0.................................................. glue2git_err 00 0000 32C2101C047C3237340D21FCFF3A322203008022012028310DF0............ git2glue_err 00 0000 32C2101C047C3237340D21FCFF3A322203008022012028310DF0............ system_rtc_mem_write 00 0000 52A0BF2735149C130C37306014CCA6E0921182A3009088C047A8030C020DF047 system_rtc_mem_read 00 0000 52A0BF2735149C130C37306014CCA6E0921182A3009088C047A8030C020DF047
I’ve edited my exclusion file as follows:
+wifi_register_rfid_locp_recv_cb 00 0000 12C1F00261008548EA02210012C110800000............................ wifi_unregister_rfid_locp_recv_cb 00 0000 12C1F00261008548EA02210012C110800000............................ +_ZN5Print5printEPKc 00 0000 12C1F0093185FCFF083112C1100DF0.................................. udp_new_ip_type 00 0000 12C1F0093185FCFF083112C1100DF0.................................. +pgm_read_byte_inlined 00 0000 2030143022C02802D033110003402030913020740DF0.................... pgm_read_byte_inlined_0 00 0000 2030143022C02802D033110003402030913020740DF0.................... _ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE0EED2Ev 00 0000 31FFFF39020DF0.................................................. _ZN10DataSourceD2Ev 00 0000 31FFFF39020DF0.................................................. _ZN14HardwareSerialD2Ev 00 0000 31FFFF39020DF0.................................................. +_ZN2fs8FileImplD2Ev 00 0000 31FFFF39020DF0.................................................. -_ZN2fs7DirImplD2Ev 00 0000 31FFFF39020DF0.................................................. +glue2git_err 00 0000 32C2101C047C3237340D21FCFF3A322203008022012028310DF0............ git2glue_err 00 0000 32C2101C047C3237340D21FCFF3A322203008022012028310DF0............ +system_rtc_mem_write 00 0000 52A0BF2735149C130C37306014CCA6E0921182A3009088C047A8030C020DF047 system_rtc_mem_read 00 0000 52A0BF2735149C130C37306014CCA6E0921182A3009088C047A8030C020DF047
Re-running sigmake and correcting one last collision, followed by running again, finally results in a usable signature file.
Applying these signatures against our firmware file, resolves many of the library functions present, including the connect() call (which we hope is the one we want):