Yogesh Khatri's forensic blog

On macOS, the spotlight database is a central database holding metadata of all files/folders that macOS indexes and is always located at the root of any disk under /.Spotlight-V100.

However, while browsing the folders on my macOS 10.14 (Mojave) image, I find a folder that contains yet another spotlight database. It appears that there are now more than one spotlight databases on a single disk. There is one for each user located at:

~/Library/Metadata/CoreSpotlight/index.spotlightV3/

As with the other spotlight database, the files that hold the information are store.db and .store.db.

Mojave (10.14) isn't the first version of macOS to include this database. This appeared first in High Sierra (10.13).

What is in it?

The per-user database store is used to store metadata from items that aren't files or folders. Items seen so far are:

Safari browser history (web pages visited)
Safari browser bookmarks
News App history (web pages visited)
Notes App notes
Maps App data (locations?)

I would speculate that emails would also be seen here, as a number of email related fields are present too:

kMDItemRecipientEmailAddresses
kMDItemPrimaryRecipientEmailAddresses
kMDItemAdditionalRecipientEmailAddresses
kMDItemHiddenAdditionalRecipientEmailAddresses

However no email metadata was seen with a single configured IMAP account in the Mail app.

Since this is a test environment with very little activity and almost no apps other than those that come with macOS, there is likely to be a lot more metadata from different apps in this database on real world systems.

Why a separate database?

Reading Apple's documentation here and here seems to suggest that this is the implementation of functionality intended to allow app developers to provide in-app content searches and includes the ability to define metadata to do so. Items indexed are not required to be files.

Prior to this (10.12 and below) there existed a folder at

~/Library/Caches/Metadata/Safari/

which housed all the *.webhistory files. See pic below.

Figure 1 - webhistory files in ~/Library/Caches/Metadata/Safari/

The individual files were plists which were then indexed by spotlight.

Figure 2 - webhistory file content

That folder now does not exist and in its place we have the new spotlight database.

Parsing the data

Using mac_apt's single_plugin script to only run the spotlight plugin over individual store.db files, we can easily parse the database.

Figure 3 - mac_apt_singleplugin to parse the store.db file

There are quite a few fields, some more important than others. Below are some screenshots showing selected data (notes, safari history, and news). mac_apt gives you the data as a spreadsheet, sqlite db and a flat text file too (similar to mdls output).

Figure 4 - notes metadata from store.db (not all fields shown here)

Figure 5 - safari history from store.db (not all fields shown here)

Figure 6 - A single entry from News app showing all metadata (parsed from store.db)

mac_apt's spotlight plugin has been updated to automatically handle/process these user spotlight databases now.

If you've ever looked at removable media and found several hidden files which start with ._ and there exists one for almost every file (or folder) on the disk, this is the result of having that media being used on macOS.

macOS keeps a copy of file metadata in a separate area, known as Extended Attributes (xattr) on HFS+ or APFS. However, when writing to external media which is not formatted as HFS+ or APFS (thus not having the capability to store extended attributes), it will write this information out as a separate file which will have the same name, just prefixed with dot-underscore ._ as seen in the screenshot below.

Figure 1 - Screenshot showing exFAT volume on External USB disk

While this is well known for many years, this information is often overlooked in a forensic investigation. On media that has interacted with both macOS and Windows (or even Linux), macOS will create these files and delete them too when the original file is deleted. However, if the file is deleted or renamed on Windows or Linux, then the dot-underscore files will be left behind untouched. A while back, Lee Whitfield touched upon this here specifically pointing out its use for knowing the date & time that a file was copied onto the media. However, there is useful information inside the file too.

This file can contain useful metadata such as kMDItemWhereFroms (URL of file if downloaded from internet) and kMDItemDownloadedDate (Date & Time when it was downloaded) among other extended attributes.

After a bit of reverse engineering, I wrote an 010 hex editor template to parse this information, it is available at https://github.com/ydkhatri/MacForensics/blob/master/DotUnderscore_macos.bt.

In the screenshot below, you can see it being run on one such file.

Figure 2 - DotUnderscore_macOS.bt template output

Here is analysis from the data extracted:

Attribute Name	Value	Meaning
kMDItemWhereFroms	https://upload.wikimedia.org/wikipedia/commons/3/3c/Thiruvalluvar_Statue_at_Kanyakumari_beach.jpg	URL from where it was downloaded
kMDItemDownloadedDate	0x41BFB51D1CFFA4F8 (11/09/2017 23:32:44)	Timestamp when file was downloaded
com.apple.quarantine	0083;5a04e59c;Safari;A451620D-2B49-49BD-ADC1-88DEBEA66582	File was downloaded using the Safari browser*

kMDItemDownloadedDate is stored in a plist as a date value, which is stored as a 64-bit double that is the number of seconds since 01/01/2001

The template does not parse the plist for you, you can export it out and open in any plist viewer to view the human-readable date value.

*The 3rd item in com.apple.quarantine's value (separated by ;) is the Application (agent) name which downloads the file. For more details on this, read Howard Oakley's blog post.

This week Phil Moore made an excellent finding (link here), one that most of us have seen for years but not investigated. Those $I files that seem orphaned/abandoned without explanation now have one. Phil notes that every time a file/folder is deleted and then restored, the $I file stays behind in the recycle bin. This post is about some follow up testing and results.

I did some quick tests - deleting and restoring files in a few different ways testing $I & $R file creation every time. I'm not even looking at indexes or timestamps, just file creation/deletion. My testing was on Windows 10 (32bit) version 10.0.10586.106

I'm not reproducing all the output here, but only the summary. Here are a few ways to send files to the recycle bin. Remember you cannot do any operations using the command line (as that operates directly at the file system level and does not use the recycle bin abstraction).

First we delete a file (right-click and select Delete). Let's try to restore now using any one of the following ways:

1. Right-click the file, click Restore on file in the recycle bin
2. Cut file from recycle bin and Paste elsewhere
3. Drag the file from recycle bin and into another folder
4. Undo the last operation using Ctrl-Z OR right-click & select Undo Delete

Only the 4th method (Undo Delete) results in deletion of the $I file. The other methods leave it behind.

I believe this to be a windows bug. Interestingly, when you Restore a file from the bin (right-click & Restore), then right-click on the desktop (or in any folder), the context menu has an new item called 'Undo Move'. It does NOT say 'Redo Delete'. For every other action, you will see a 'Redo ACTION' in the menu (see figure 2). So I believe when you restore a file, windows just performs a file move (on the $R file) and thus marked it as a move in the last performed action.

Figure 1 - 'Undo Move' seen after restoring file

Now, clicking on the Undo Move will result in sending the file back to the recycle bin but as a new $I & $R pair, as if it was an entirely new delete operation.

Figure 2 - 'Redo Rename' seen after using 'Undo Rename' from an unrelated file rename operation

In figure 3 below, you can see multiple abandoned $I files after doing this multiple times (delete, then restore).

Figure 3 - Recycle bin folder after multiple deletes and restores showing abandoned $I files

Let us know (comments or twitter) if you know of more ways to delete/restore.

If you've been doing macOS analysis, you are definitely familiar with the (now not so new) serialized plist format also known as an NSKeyedArchive. There are parsers available to extract data from this format, such as the ccl_bplist from Alex Caithness. I've been using this library in mac_apt and other projects too. That is all old news.

Need for a human readable version

I always like to manually verify the results of my code by looking at the raw plist values to make sure my analysis programs parsed the right values as well as if they missed something. That isn't possible with NSKeyedArchives.

The ccl_bplist parser is great if you want to explore the structure of a plist interactively with python or programatically. What it does not do is auto-generate and save the deserialized version of that plist as a new human readable plist file. So that is what I set out to do yesterday.

Long story short, the code is available on github here.

Below are screenshots showing an NSKeyedArchive sfl2 file (Figure 1) and its deserialized human readable form (Figure 2).

Figure 1 - NSKeyedArchive

Figure 2 - Deserialized form of NSKeyedArchive

Using ccl_bplist and biplist, this should have been a 2-5 line program, ccl_bplist to deserialize and biplist to write out the new plist. However it turned out to be a few lines more than that as I had to write a short recursion function to process the plist data because the ccl_bplist also includes $class information which needed to be stripped out. Also as I found out, this will only work with Python 3 because Python 2 does not distinguish between the types str and bytes. In Python 2, all these (below) are the same:

'\x12\x34'

b'\x12\x34'

bytes('\x12\x34')

This creates a problem as biplist cannot distinguish between binary blobs and strings as they all appear to be strings and it fails when it encounters a byte that cannot be encoded as a string. No problems with python 3 as these are different distinct types there.

The code is here.

Alternatives

Since writing this, I did find another library (bpylist2) that has similar functions, reading/writing binary plists as well as creating/reading keyed archives. I haven't tested it yet.

The ADB backup has been a very useful tool for getting data from Android phones, particularly those phones/devices that are otherwise not accessible due to lack of support by forensic software vendors or hardware/software issues with other methods.

There is however one feature which I do not see being used by any of the vendors or FOSS or any other guides out there. I am specifically talking about about ADB's backup feature to backup key-value pairs. According to one source, since Oreo (8.0), the keyvalue backups are now available via adb backup.

To get data with keyvalue pairs returned, you need to add the -keyvalue parameter to the adb backup command like. I like to use :

adb backup -all -shared -system -keyvalue -f file.adb

Keyvalue backups give some very good information otherwise not available in the adb backup.

So, where are the key-value backups located?

When viewing the adb tar archive, you will find one or more folders under each app's folder with names like k, sp, db, .. The k folder holds the key value backup, having file(s) which ends in the extension .data.

Figure 1 - Folders holding key-value .data files (this isn't all , there are many more)

The *.data files are located in the k folders usually having the same name as the package like com.android.calendar.data.

Parsing .data files

This consists of a series of records, each starting with 'Data', and having a key (name) and value (data). The format is as follows, all data in this structure is stored as little-endian:

Position	Type	Description
00	char[4]	‘Data’
04	uint	key_size
08	uint	data_size
12	char[key_size]	key_name
12 + key_size	char[]	pad to 4 byte boundary
..	char[data_size]	data
..	char[]	pad to 4 byte boundary

The value field can be different types depending on the data/database being backed up. It is different for different packages. You can find XML files, entire SQLITE databases in there, and also single byte true/false type settings.

In the screenshot below, you can see the key-value records as parsed out for com.android.vending.data. The 010 template for this is available here.

Figure 2 - Hex editor view of com.android.vending.data, showing 'Data' records parsed out using an 010 template

In the above example, the value types are mostly True/False. But most other databases have other custom structures embedded there, which need further parsing.

In part 2 of this ADB series, we shall explore the formats of call logs and other databases that are backed up. Part 2 will be published soon..

This is Part 2 of the continuing blog series on ADB keyvalue backups. Today we focus on Call Log Backups.

Call logs are backed up under

<Backup.adb>/apps/com.android.calllogbackup/k/com.android.calllogbackup.data

They are backed up only if you specified the -keyvalue option and are available on non-rooted devices too.

This file follows the Key-Value Data format as outlined earlier in part 1. The Keys here are the call ids or serial number of calls, starting at 1 and sequentially rising. The Values are the individual call log records.

Here are the structures used in the Call Log record. All fields here are stored as Big Endian.

1. Text_Record

Position	Type	Description
00	ushort	field_length (in bytes)
02	char[field_length]	field data (text)

2. Call_Log

Position	Type	Description
00	uint	version, 0x03EF (1007) or 1005 seen
04	int64	timestamp
12	uint64	call duration in seconds
20	byte	is_phone_number_present
21	Text_Record	present if is_phone_number_present = 1
..	uint	call type 1 = Incoming 2 = Outgoing 3 = Missed 4 = voicemail 5 = Rejected / Declined 6 = Blocked 7 = Answered_Externally
..	uint	number presentation 1 = Allowed 2 = Restricted 3 = Unknown 4 = Payphone
..	byte	is_servicename_present
..	Text_Record	present if is_ servicename_present = 1
..	byte	is_iccid_present
..	Text_Record	present if is_ iccid _present = 1
..	byte	is_own_num_present
..	Text_Record	present if is_ iccid _present = 1
..	byte[12]	unknown bytes, always 0
..	Text_Record	oem namespace string
..	byte[18]	unknown bytes
..	uint	block reason (only on version 1007) 1 = Screening service 2 = Direct to voicemail 3 = Blocked number 4 = Unknown number 5 = Restricted number 6 = Payphone 7 = Not in contacts
..	byte[18]	unknown bytes (only on version 1007)

The screenshot below shows a raw record in the hex editor.

Figure 1 - Call log record showing some important fields

Using an 010 template to parse this information, it looks like this (below).

Figure 2 - Call log record data parsed in 010 editor

The level of detail on these records is great. There are call status codes known as Call Type (Missed, Incoming, Outgoing, ..) as well as a number Presentation code which is usually 1 (Allowed), although there are a few other values. Calls that show up on your phone as 'Private' numbers, will have presentation code 2 (Restricted). If you have enabled any call blocking features, then those show up too on blocked calls (known as block reason).

Code to automate this parsing

A python script has been created to parse call log records from the com.android.calllogbackup.data file, available here. The 010 template can be downloaded here.

Forensic Gems - Detecting Deleted call records

Since each call record has a key which is the call id or the serial number of the call, I performed an experiment to see if deleting intermittent call records would change this number. It turns out that the number does not change, in effect allowing us to detect deleted call records. This is visible in the screenshot below, where you can see call IDs (serial numbers) of 1 through 8 but its missing 4 and 7. Those are the ones I had manually deleted from the Call logs on the phone through the available feature in the Phone app. This was also tested on a real phone with several hundred call records going back several months and it appears to hold true there too.

Figure 3 - Output of callparser.py, made pretty in Excel showing missing call ids.

This can be useful knowing that there can be records which are missing, perhaps intentionally.

Stay tuned for Part 3, there is more good stuff in these key-value backups.

This is Part 3 of the continuing blog series on ADB keyvalue backups. Today we focus on Wifi settings and other system configuration available from-

<Backup.adb>/apps/com.android.providers.settings/k/com.android.providers.settings.data

They are backed up only if you specified the -keyvalue option and are available on non-rooted devices too.

This file follows the Key-Value Data format as outlined earlier in part 1. There are 8 different types of data seen here. The Key name represents the type of data and Value represents either a single structure or a set of name-value pairs (both name and value are strings). The table below shows the data seen here.

Key Name	Description
system	settings about font sizes, screen brightness, hearing aids, haptic feedback among others
secure	more system settings on gestures, button behaviors, spell checker, screensaver, accessibility, etc..
global	Boolean settings that enable/disable options like wifi wakeup, auto_time, sounds enabled, call auto-retry, etc..
locale	a locale string like ‘en-US’
lock_settings	owner info for display on screen if enabled
softap_config	Access point settings for Mobile hotspot
network_policies	unknown
wifi_new_config	xml data having wifi settings for connected access points

Perhaps the most interesting aspect here is the presence of wifi passwords (WPA pre-shared keys) in the wifi_new_config data as see in screenshot below. Yes, you can get wifi passwords from an adb backup now!

Figure 1 - Snippet of Wifi saved settings from com.android.providers.settings.data showing SSIDs & passwords

Here is a python script to read com.android.providers.settings.data and export the information to json files. Below you can see some of the data parsed by the this script for one of my test devices.

Figure 2 - Data from 'global' key

Figure 3 - Data from 'system' key

Figure 4 - Data from 'secure' key

Figure 5 - Data from 'softap_config' key

Figure 6 - Data from 'lock_settings' key

With macOS 10.15 - Catalina, Apple has introduced a change in the way system and user data is stored on disk. In prior versions, the root '/' volume was stored in a single volume usually named 'Macintosh HD'. This did not change with the update to APFS. However with Catalina, there are now two distinct volumes -

Macintosh HD
Macintosh HD - Data

The screenshot below shows the two different volumes -

Figure 1 - diskutil output showing a split Macintosh HD volume into two

The Macintosh HD volume stores the system files and is mounted as read-only, while the Macintosh HD - Data volume has all the other files on your system which include user profiles, system and user data, and user installed Applications.

However when booted, only a single logical volume is presented (as root /) that combines the contents of both. This is enabled through APFS using its Volume Role feature. This is mentioned in Apple's official APFS documentation but its usage or working is not documented. Each volume can be assigned a role in its volume Superblock structure (apfs_superblock_t). There are 8 possible roles documented. From the Apple docs:

#define APFS_VOL_ROLE_NONE 0x0000

#define APFS_VOL_ROLE_SYSTEM 0x0001

#define APFS_VOL_ROLE_USER 0x0002

#define APFS_VOL_ROLE_RECOVERY 0x0004

#define APFS_VOL_ROLE_VM 0x0008

#define APFS_VOL_ROLE_PREBOOT 0x0010

#define APFS_VOL_ROLE_INSTALLER 0x0020

#define APFS_VOL_ROLE_DATA 0x0040

#define APFS_VOL_ROLE_BASEBAND 0x0080

The SYSTEM volume contains the folders /bin, /sbin and most of the /usr and /System folders. A few subfolders of /usr and /System are on the DATA volume. The volumes are joined using a new construct that Apple calls firmlinks. They describe it as a Bi-directional wormhole in path traversal. Firmlinks are used on the system volume to point to the user data on the data volume.

They are somewhat similar to the unix symlinks and hardlinks, but only directories can be linked (from one volume to another). The file that defines/lists the firmlinks resides on the SYSTEM volume at /usr/share/firmlinks. The following paths are defined by default.

/AppleInternalAppleInternal

/ApplicationsApplications

/LibraryLibrary

/System/Library/CachesSystem/Library/Caches

/System/Library/AssetsSystem/Library/Assets

/System/Library/PreinstalledAssetsSystem/Library/PreinstalledAssets

/System/Library/AssetsV2System/Library/AssetsV2

/System/Library/PreinstalledAssetsV2System/Library/PreinstalledAssetsV2

/System/Library/CoreServices/CoreTypes.bundle/Contents/Library
System/Library/CoreServices/CoreTypes.bundle/Contents/Library

/System/Library/SpeechSystem/Library/Speech

/UsersUsers

/VolumesVolumes

/corescores

/optopt

/privateprivate

/usr/localusr/local

/usr/libexec/cupsusr/libexec/cups

/usr/share/snmpusr/share/snmp

The linked volumes will have distinct inode numbers for files/folders. The only common inode numbers seen are inodes 1 (Parent of root), 2 (root) and 3 (private-dir). All other inodes will be unique, a simple but clever scheme is used to ensure that. For the SYSTEM volume, every inode number allocated will be OR'd with 0x0FFFFFFF00000000. Take a look at the inode numbers in the combined volume in screenshot below. The very large numbers are the files that reside on the SYSTEM volume due to the upper bits being set by the mask.

Figure 2 - Contents of root showing files from both SYSTEM and DATA

If you try to create a file or folder on the root volume (or one of its owned folders), it fails with an error: Read-only volume.

For accessing most files and folders, there should be no problem as the stitched/combined volume works seamlessly so all programs should not notice any difference. However there are situations where you might want to explicitly access a folder from a specific volume, especially for forensics. For example, if you wanted to access /./fseventsd, you would always get the read-only volume's .fseventsd folder which won't be too interesting as its a read-only volume! To get the one on the DATA volume, there is still a way. Apple has also made the DATA volume available (mounted) at the mountpoint /System/Volumes/Data. This also means that if you have scripts that run across all files, they will need to be made aware that this location should be avoided to prevent duplication.

According to Apple, you cannot opt-out of this, and it is a required feature for macOS 10.15. Forensic tools that operate on full disk images will have to adapt for this change, and so I've updated mac_apt to support macOS Catalina. If you use it, let me know of bugs/issues.

UsageStats

If you are unfamiliar with this artifact, Alex Brignoni explains the UserStats artifact in the blog post here. Located at /data/system/usagestats/ this information can be useful in cases. Up until Android 9 (Pie), this was in XML format, however since Android 10(Q), it is now in a different format. So the tool written by Alex didn't work out for me or my students investigating this artifact a couple of months back.

The file name has the same format (unix millisecond time as integer) and below you can see what the new data looks like.

Figure 1 - File 1572840777639 - raw hex view (complete file not shown)

It appeared to be some sort of binary format, but without a standard consistent header (after I compared a few files). Taking a cue from fellow DFIR researchers (Sarah Edwards and Phill Moore), I tested if this was a protocol buffer. If you aren't familiar with a Protocol Buffer, read these posts from Sarah and Phill. This is a google creation, and as they describe it - ...a language-neutral, platform-neutral extensible mechanism for serializing structured data.

To test for protocol buffer presence (on windows), you will need to download protoc.exe from here. Run protoc.exe as shown below. Here 1572840777639 is the filename. If you got output, its a protobuf.

W:\usagestats\0\daily>protoc --decode_raw < 1572840777639
1: 1862148
3: 1
4: 1
2 {
1: 74
2: "com.google.android.youtube"
2: "com.google.android.ext.services"
2: "com.android.providers.telephony"
2: "com.android.dynsystem"
2: "com.android.settings.CryptKeeper"
...
...output snipped...
...
22 {
2: 23
4: 60
5: 1249881
7: 23
14: 92830887
15: 23
16: 60
}

OK, so we got some decoded json data back. But it still did not look like anything we are used to seeing (see XML below).

Figure 2 - XML usagestats snippet

The way protocol buffers work, you need a .proto file that defines the structure and data types of the data contained in the buffer. So to decode this, we need the .proto file!

Since Android is open source, so why not peek at the source code of AOSP? To avoid downloading the entire source code, just browse the aosp-mirror on github.

Figure 3 - aosp source code on github

After a bit of searching, we find the file we are looking for at:
platform_frameworks_base/core/proto/android/server/usagestatsservice.proto

Figure 4 - usagestatsservice.proto file snippet

As seen above, the file references other .proto files too. So we must get those too, and any dependencies in those as well (recursively). We eventually end up with 7 files:

usagestatsservice.proto
configuration.proto
privacy.proto
locale.proto
rect.proto
protobuf_descriptor.proto
window_configuration.proto

Next, we need to transform (google says compile) our .proto files into python libraries. Use protoc.exe to do so. The syntax is :

protoc -I=$SRC_DIR --python_out=$DST_DIR $SRC_DIR/your_proto_file.proto

Do this for every .proto file. It will generate a .py file for each one. For example, the usagestatsservice.proto compiles to usagestatsservice_pb2.py. Now all that remains is to use these generated python files to read our raw protocol buffer from file. We will need to write some code to do so.

Peeking into the usagestatsservice.proto file, you get some idea of how this might work. I constructed a basic python script to read this (below).

import usagestatsservice_pb2
input_path = "W:\\usagestats\\0\\daily\\1572840777639"stats = usagestatsservice_pb2.IntervalStatsProto() with open (input_path, 'rb') as f:     stats.ParseFromString(f.read())     # GET PACKAGES    for usagestat in stats.packages:         print('package = '+ stats.stringpool.strings[usagestat.package_index - 1])         print(usagestat)
    # GET CONFIGURATIONSfor conf in stats.configurations:        print(conf)
    # GET EVENT LOGSfor event in stats.event_log:        print(event) 

You can check for the existence of a field using the HasField() function. So here is what a package object consists of:

package = com.android.settings
package_index: 58
last_time_active_ms: 663647
total_time_active_ms: 4897
app_launch_count: 3
last_time_service_used_ms: -1572840673324
last_time_visible_ms: 673237
total_time_visible_ms: 25221

A configuration object consists of:

config {
font_scale: 1.0
locales {
language: "en"
country: "US"
}
screen_layout: 268435794
color_mode: 5
touchscreen: 3
keyboard: 2
keyboard_hidden: 1
hard_keyboard_hidden: 1
navigation: 1
navigation_hidden: 2
orientation: 1
screen_width_dp: 411
screen_height_dp: 659
smallest_screen_width_dp: 411
density_dpi: 560
window_configuration {
app_bounds {
right: 1440
bottom: 2392
}
windowing_mode: 1
bounds {
right: 1440
bottom: 2560
}
}
}
last_time_active_ms: 662163
total_time_active_ms: 37
count: 1

An event log object contains:

package = com.google.android.apps.nexuslauncher
class = com.google.android.apps.nexuslauncher.NexusLauncherActivity
task root package = com.google.android.apps.nexuslauncher
task root class = com.google.android.apps.nexuslauncher.NexusLauncherActivity
type = MOVE_TO_FOREGROUND
time_ms: 34440

So now, our protobuf parsed and file read and interpreted successfully! That's it for now. On to the next artifact..

The Google app, previously known as Google Now, is installed by default on most phones. From the app's description -

The Google app keeps you in the know about things that matter to you. Find quick answers, explore your interests, and stay up to date with Discover. The more you use the Google app, the better it gets.

Search and browse:
- Nearby shops and restaurants
- Live sports scores and schedules
- Movies times, casts, and reviews
- Videos and images
- News, stock information, and more
- Anything you’d find on the web

It is that ubiquitous bar/widget sometimes called the Google Assistant Search Bar or just google Search widget found on the phone's home screen.

Figure 1 - Google Search / Personal Assistant Bar

The internal package goes by the name com.google.android.googlequicksearchbox. It's artifacts are found at /data/data/com.google.android.googlequicksearchbox/

There are many files and folders here, but the most interesting data is the sub-folder files/recently

Your recent searches along with some full screen screenshots of search results are stored here. Screenshots (saved as jpg) are in .webp format. The unique number in the name is referenced by the data in the protobuf file (file name is the email address of the logged in user account). If you are not logged in, nothing is populated in this folder. See screenshots below.

Figure 2 - Folder 'recently' has no entries when no account was logged on.

Figure 3 - Folder 'recently' has files when searches were performed after logging in

The protobuf file (jokergogo54@gmail.com in this case) when decoded has entries that look like this (see below) for a typical search. If you aren't familiar with protobuf decoding, read this.

1 {
1: 15485946382791341007
3: 0
4: 1585414188066
5: "dolphin"
8 {
1: "web"
2: "google.com"
}
9: 10449902870035666886
17: 1585413397978
}

In the protobuf data (decoded using protoc.exe), as seen above, we can easily distinguish the relevant fields:

Item	Description
1	session id
4	timestamp1 (unix epoch)
5	search query
8	dictionary 1 = type of search (web, video, ..) 2 = search engine
9	screenshot-id (needs conversion to int from uint)
17	timestamp2 (unix epoch)

Here is the corresponding screenshot saved in the same folder -

Figure 4 - Screenshot of search for"dolphin"

If you clicked on a recent news story in the app, the protobuf entry looks like this (below):

1 {
1: 9016892896339717414
3: 1
4: 1572444614834
5: ""
7 {
1: "https://9to5mac.com/2019/10/30/photos-of-airpods-pro/"
2: "9to5mac.com"
3: "Photos of AirPods Pro arriving in stores around the world - 9to5Mac"
}
9: 9016892896339717414
10: 9
17: 1572444614834
}

Figure 5 - Screenshot for news article clicked from link in google app

Last week, I added a plugin for ALEAPP to read these recent search artifacts. This isn't all, there is actually more data to be read here.

The search widget can be used to make any kind of query, which may then be forwarded to the web browser or Android Auto or the Email or Messaging apps depending on what was queried for. This makes for an interesting artifact.

From my test data, all searches are stored in the app_sessionfolder as protobuf files having the extension .binarypb. See screenshot below.

Figure 6 - .binarypb files

Each of these files is a protobuf that stores a lot of data about the searches. This includes searches from Android Auto too. Josh Hickman did some excellent research on Android Auto and addressed some of this briefly in his talk here. A parser is not available to read this as the format of the data contained in the protobufs is unknown. I've attempted to reverse-engineer parts of it enough to get the useful bits of information out, such as the search queries. There are also mp3 recordings of the replies from google assistant stored in some of them. These are being added to ALEAPP to parse.

The format here is a bit too much to write about. Below is the raw protobuf structure (sans the binary blobs, replaced by ...). The search term here was "tom and jerry".

{
1: 0x00000053b0c63c1b
2: 0x11f0299e
3: "search"
132242267: ""
132264001 {
1: "..."
2: 0x00000000
3: 0
4: 0x00000000000e75fe
}
132269388 {
2: 0x0000000000000040
3 {
1: "..."
2: ""
3: "and.gsa.launcher.allapps.appssearch"
}
}
132269847 {
1 {
1: "..."
2: ""
3: "and.gsa.launcher.allapps.appssearch"
}
2 [
0: "...",
1: "... tom and jerry ..."
2: "..."
3: 1
]
}
146514374 {
1: "and.gsa.launcher.allapps.appssearch"
}
206022552 {
1: 0
}
}

After studying this and several other samples, here are the important pieces in the parsed protobuf dictionary:

Item

Description

session id (same number as in filename)

type of query (search, car_assistant, opa)
car_assistant = Android Auto
opa = personal assistant

132269388

dictionary
1 = mp3 recording of response

132269847

dictionary

1 = dictionary

Item	Description
2	last query

2 = List of session queries (in blobs)

For more details, refer the module googleQuickSearchbox.py in ALEAPP. Below is a screenshot of the parsed out data.

Figure 7 - ALEAPP output showing Google App / Personal assistant queries

Protocol Buffers are quite popular, more and more apps and system files are storing data in this format in both iOS and Android operating systems. If you aren't familiar with Protocol Buffers, read this post. There I use the protoc.exe utility (by google), as does everyone else who needs to view this data, when you do not have the corresponding .proto file.

This is great! But the raw view/output has one big disadvantage. While this approach (--decode_raw) works fine if you just want to see the text strings stored in your data, it does not always provide the correct conversions for all the raw data types!

According to google, when the message (data) is encoded, there are only 6 different types of data types allowed. These are known as wire types. Here are the allowed types (below).

Figure - Allowed wire types from https://developers.google.com/protocol-buffers/docs/encoding#structure

Unless you have the .proto file, you really don't know what the original data type may be. Even protoc.exe just makes a best guess. For instance, all binary blobs are also converted to strings with protoc as both the string and bytes type use the Length-delimited wiretype. There is also no way to tell if a number is to be interpreted as signed or unsigned, because they both use the same underlying type (varint)!

Now to raw-decode a protobuf in python, there are a couple of libraries I have seen so far that do a decent job. I will list out the libraries, then demonstrate parsing with them, and compare.

1. Protobuf-decoder (https://github.com/nevermoe/protobuf-decoder)

This seems to be more than 4 years old and not maintained any more. It is also in python2. There is a python3 port somewhere. It makes several assumptions regarding data types and attempts to produce output similar to protoc.

2. BlackboxProtobuf (https://github.com/nccgroup/blackboxprotobuf)

This is a more mature library that provides much more in functionality. It makes relatively few assumptions about data types. In addition to parsing the protobuf and returning a dictionary object, it also provides a type definition dictionary for the parsed data.

To demonstrate what I am talking about, I created a demo protocol buffer file called addressbook.proto and defined a protobuf message as shown below.

message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;
  required int64 id64 = 4;
  required uint64 uid64 = 5;
  optional double double = 6;
  optional bytes bytes = 7;
}

Then compiled it using protoc.exe.

protoc --python_out=. addressbook.proto

Now used a python script to include the compiled python protobuf header and generated a binary protobuf file called tester_pb. The data contained in it is shown below.

Actual data

[
name: "John Doe"
id: 1234
email: "jdoe@example.com"
id64: -22
uid64: 13360317589766481554
double: 4.5566
bytes: b'\x00\x124V'
]

Protoc output ( protoc --decode_raw < ..\tester_pb )

1 {
1: "John Doe"
2: 1234
3: "jdoe@example.com"
4: 18446744073709551594
5: 13360317589766481554
6: 0x401239f559b3d07d
7: "\000\0224V"
}

protobuf-decoder output

{
'01:00:string': 'John Doe',
'02:01:Varint': 1234,
'03:02:string': 'jdoe@example.com',
'04:03:Varint': 18446744073709551594,
'05:04:Varint': 13360317589766481554,
'06:05:64-bit': 4.5566
'07:06:string': '\x00\x124V'
}

blackboxprotobuf output (includes data dictionary and types dictionary)

{'1':
{
'1': b'John Doe',
'2': 1234,
'3': b'jdoe@example.com',
'4': -22,
'5': -5086426483943070062,
'6': 4616816293942907005,
'7': b'\x00\x124V'
}
}

{'1': {'type': 'message', 'message_typedef':
{
'1': {'type': 'bytes', 'name': ''},
'2': {'type': 'int', 'name': ''},
'3': {'type': 'bytes', 'name': ''},
'4': {'type': 'int', 'name': ''},
'5': {'type': 'int', 'name': ''},
'6': {'type': 'fixed64', 'name': ''},
'7': {'type': 'bytes', 'name': ''}
}, 'name': ''}
}

As seen in the outputs above, each decoder makes some default assumptions about the data types encountered. The items highlighted in red are the ones that are interpreted using incorrect types. I like blackboxprotobuf because it lets you specify the real type via a types dictionary similar to the one it outputs. So once we have figured out the correct types, we can pass this into the decode_message() function to get the correct output. See code snippet below.

import blackboxprotobuf

with open('tester_pb', 'rb') as f:
pb = f.read()
types = {'1': {'type': 'message', 'message_typedef':
{
'1': {'type': 'str', 'name': 'name'},
'2': {'type': 'int', 'name': 'id'},
'3': {'type': 'str', 'name': 'email'},
'4': {'type': 'int', 'name': 'id64'},
'5': {'type': 'uint', 'name': 'uid64'},
'6': {'type': 'double', 'name': 'double'},
'7': {'type': 'bytes', 'name': 'bytes'}
}, 'name': ''}
}
values, _ = blackboxprotobuf.decode_message(pb, types)
print(values)

That produces the desired output -

{'1':
{
'name': 'John Doe',
'id': 1234,
'email': 'jdoe@example.com',
'id64': -22,
'uid64': 13360317589766481554,
'double': 4.5566,
'bytes': b'\x00\x124V'
}
}

In summary, I recommend using the blackboxprotobuf library, however note that it is not exactly plug and play. Since it is not on pypi, you have to use it from code. Also, to use it with python3, I had to make one small tweak. I also added the 'str' type decode as that was not available. Since then I have tested this with numerous protobuf streams and it has not failed me! For my updated version of this library, get it here.

If you routinely perform mac forensics, you've probably done a few macOS Catalina (10.15) examinations already. And if you are the kind that verifies your data, you may have noticed that for ScreenTime notifications the databases don't show you the same strings that you see in the actual displayed Notification and several forensic tools don't either.

Let's explore why.

To start with, lets review the format of the Notifications database. For macOS High Sierra (10.) and above, it is located at:

/private/var/folders/<xx>/<yyyyyyy>/0/com.apple.notificationcenter/db2/db

where the <xx>/<yyyyyy> portion represents what might appear like random strings, but they are not random. This folder path represents the DARWIN_USER_DIR for a specific user. For more details on this read my old post here.

Inside the database, the record table holds the actual notification data (title, sub-title, body) and date of notification among other fields. A simple database query can get the useful data.

SELECT
(SELECT identifier FROM app WHERE app.app_id=record.app_id) as app,
uuid, data, presented, delivered_date
FROM record

The actual notification data is within a plist stored in the column data. Inside this plist, you can easily navigate to the items titl, subt and body to get the title, sub-title and body. However for screentime notifications, the data looks different. Instead of individual strings in these values, they are lists.

Figure 2 - Embedded plist for screentime notification

Screentime uses format strings and a list of data, which needs to be put back together. This is similar to how Event logs in windows or Unified logging in macOS works. The format strings are located at the paths shown below (for english) and are available in other languages too:

/System/Library/UserNotifications/Bundles/com.apple.ScreenTimeNotifications.bundle/Contents/Resources/en.lproj/Localizable.strings
/System/Library/UserNotifications/Bundles/com.apple.ScreenTimeNotifications.bundle/Contents/Resources/en.lproj/InfoPlist.strings

These files are plists which consist of a single dictionary each. So WeeklyReportNotificationNegativeDeltaBody seen in plist above resolves to the message :
"Your screen time was down %@ last week, for an average of %@ a day."The %@ will be replaced with data provided (15% and 6 hours, 24 minutes) becoming:
"Your screen time was down 15% last week, for an average of 6 hours, 24 minutes a day."

Figure 3 - Snippet of Localizable.strings plist

Similarly WeeklyReportNotificationTitle becomes Weekly Report Available. So now, we are able to reconstruct the complete original message.

mac_apt's NOTIFICATIONS plugin has now been updated with this functionality.

App snapshots on iOS are stored as KTX files, this is fairly well known at this point, thanks to the research by Geraldine Blay (@i_am_the_gia) and Alex Brignoni (@AlexisBrignoni) here and here. They even came up with a way to collect and convert them to PNG format. However that solution was only for macOS, and hence this research..

KTX

KTX is a file format used to store textures, and used commonly by OpenGL programs. The KTX file format is known and available here. There aren't many standalone utilities that work with KTX files, as it is mostly used in games and not for reading/distributing standalone textures. There are also no readily available python libraries to read it! The Khronos group that created the format distributes libktx, but it is C++ only. Even so, it would not be able to read iOS created ktx files (read on for the reasons mentioned below). The few Windows applications I could find like PicoPixel would not recognize Apple created KTX files as valid files.

So what is different here? A quick glance over the file in the hex editor showed that the texture data was stored in LZFSE compressed form, which currently only macOS/iOS can read.

Figure - Ascii view of 010 hex editor with ktx template

Now using pyliblzfse, I could decompress the data, and recreate a new KTX file with raw texture data. Even so, it would not render with KTX viewers other than macOS's Finder/Quickview and Preview. So I tried a few different approaches to get to the data.

Attempt 1 - Rendering & Export

Textures are different from 2D images and there is hence not a direct conversion from a textures to an image format. From available research, it seemed like the easiest way to do this would be to use OpenGL to render the texture data (extracted from the KTX file), then use OpenGL to save a 2D image of the rendered image. There is some sample code available too on the internet, but in order to get it to work, one would need to know how to use OpenGL to render textures, a learning curve that was too steep for me..

After spending several hours trying to get this to work in Python, I ultimately gave up as python is not the platform where major OpenGL development takes place, therefore there is little to no support, and libraries are platform dependent. I barely got one of the libraries to install correctly in Linux, and every step of the way I got more errors than I wanted to debug, ultimately I threw in the towel.

Attempt 2 - Convert texture data to RAW image data

Reading the KTX file header, the glInternalFormat value field from the header is 0x93B0 for all iOS produced KTX files (as seen in screenshot above). This value is the enumeration for COMPRESSED_RGBA_ASTC_4x4. So now we know the format is ASTC, which is Adaptive Scalable Texture Compression, a lossy compressed format for storing texture data, and uses a block size of 4x4 pixels. That simplifies our task to now finding a way to convert ASTC data to raw image data. A bit of searching led me to the python library astc_decomp which does precisely that. So what I needed now was to put the pieces together as follows:

Read KTX file and parse format to get LZFSE compressed data, and parameters of Width and Height
Decompress LZFSE to get ASTC data
Convert ASTC to RAW image stream
Save RAW image as PNG using PIL library

Combining this together, we are able to create a python script that can convert KTX files to PNG files. Get it here:
https://github.com/ydkhatri/MacForensics/tree/master/IOS_KTX_TO_PNG

There is also a windows compiled executable there if you need to do this on windows without python. Alex Brignoni was helpful in sending me samples of KTX files to work with from multiple images. The code also works with KTX files that are not really KTX, ie, they have the .ktx extension but the header is 'AAPL'. The format is however similar and my code will parse them out too. If you do come across a file that does not work, send it to me and I can take a look.

A point to note is that not all KTX files use the COMPRESSED_RGBA_ASTC_4x4 format, only the iOS created ones do. So you may come across many KTX files deployed or shipped with apps that can't be parsed with this tool, as it only handles ASTC 4x4 format.

Enjoy!

ios_apt is the new shiny companion to mac_apt

ios_apt is not a separate project, it's just a part of the mac_apt framework, and serves as a launch script that processes iOS/iPadOS artifacts.

Why yet another iOS parsing tool, don't we already have too many?

In addition to paid tools, we have iLEAPP, APOLLO and a few others, and I am also an active contributor to some of them. This isn't meant to compete with them, rather it utilizes the mac_apt framework to prevent duplication of work.

Many artifacts on iOS and macOS share common backend databases, configuration and artifact types. Among the artifacts that are almost identical are -

Spotlight
UnifiedLogging logs
Network usage database
Networking artifacts like hardware info and last IP leases
Safari
Notes
FSevents
ScreenTime

There are a few others too that aren't listed here. But you get the picture. Since mac_apt already parsed all of them, it made sense to just create an ios variant that parses these from ios extractions.

Also many of these artifacts are fairly complex and other FOSS tools don't have the architecture needed to handle them. APOLLO only gathers information from SQLite databases. iLEAPP is geared towards single artifact parsing per plugin. It is not designed for multiple layers of parsing where information parsed from one artifact/file may be used as a key to jump to an artifact elsewhere on disk.

Limitations

In its first version, ios_apt only works on full file system images extracted out to a folder. No support yet for zip/tar/dar/7z/other archives.

Available Plugins / Modules

The following Plugins are available as of now -

APPS
BASICINFO
FSEVENTS
NETUSAGE
NETWORKING
NOTES
SAFARI
SCREENTIME
SPOTLIGHT
TERMSESSIONS
WIFI

Download the latest version of mac_apt to get ios_apt.

Background

Tracking down an iOS application's Data folder, aka, SandboxPath in iOS is fairly easy. One simply needs to look at the applicationState.db sqlite database located under /private/var/mobile/Library/FrontBoard/ This is well known.

However locating the sandbox folder for its AppGroups (and Extensions) is not so straight-forward. The suggested method by Scott Vance here, and recommended by few others too is to look for the .com.apple.mobile_container_manager.metadata.plist file under each of the UUID folders:

/private/var/containers/Shared/SystemGroup/UUID/
/private/var/mobile/Containers/Shared/AppGroup/UUID/
/private/var/mobile/Containers/Data/InternalDaemon/UUID/
/private/var/mobile/Containers/Data/PluginKitPlugin/UUID/

As noted by Scott, the iLEAPP tool does this too, reading all the plists and listing out the path and its group name. For manual analysis, this works out great, as you can visually make out the app name from the group name. For example, the Notes app has bundle_id com.apple.mobilenotes and one of its shared groups (where the actual Notes db is stored!) has the id group.com.apple.notes.

The Problem

For automated analysis, this approach does not work, as each app follows its own convention on naming for ids. A program cannot know that group.com.apple.notes corresponds to com.apple.mobilenotes. Hence we search for something with a more direct reference connecting Shared Containers to their Apps. Before we proceed further, its important to understand the relationships between extensions, apps and shared containers. The diagram below does a good job of summarizing this. The shared containers are identified by AppGroups.

Figure 1 - iOS App, Extension, container relationships - Source: https://medium.com/@manibatra23/sharing-data-using-core-data-ios-app-and-extension-fb0a176eaee9

The Solution

Fortunately, there is a database that tracks container information on iOS. It is located at /private/var/root/Library/MobileContainerManager/containers.sqlite3

It precisely lists all Apps, their extensions, AppGroups and Entitlements. As far as I can tell, this is the only place where this information is stored (apart from caches and logs). It does not have information about UUIDs. This database is listed in the SANS smartphone forensics poster, but I couldn't find any details on it elsewhere.

The database structure is simple with just 3 main tables (and an sqlite_sequence one).

Figure 2 - containers.sqlite database tables

The child_bundles table lists extensions and their owner Apps. In figure below, you can see the extensions for the com.apple.mobilenotes app.

Figure 3 - child_bundles table, filtered on 'notes'

Or one could write a small query to list all apps with their extension names like shown below.

Figure 4 - App & Extensions - query and output

Information about AppGroups is found in the data field of the code_signing_data table as a BLOB, which stores a binary plist.

Figure 5 - Plist (for com.apple.mobilenotes - cs_info_id 456) from 'code_signing_data.data'

The Entitlements dictionary has a lot of information in it. If this App creates a shared AppGroup, then it will show up under com.apple.security.application-groups. There may also be groups under com.apple.security.system-groups.

Figure 6 - AppGroup information in Entitlements section (in plist)

So from the above data, we know that the Notes App has 5 extensions and 2 AppGroups, and we have the exact string names(aka ids) too - group.com.apple.notes and group.com.apple.notes.import . Correlating this data with information we found from .com.apple.mobile_container_manager.metadata.plist files (from each UUID folder earlier), we can programmatically search and link the two as being part of the same App, based on the container id (AppGroup name).

Figure 7 - AppGroup/UUID folder showing plist's content and Container owner id

This methodology is implemented in the APPS plugin for ios_apt, which now lists every App, it's AppGroups, SystemGroups, Extensions, and all the relationships. So you don't have to do any of it manually now. Enjoy!

Figure 8 - Apps Table from ios_apt output (not all columns are shown here)

Figure 9 - AppGroupInfo Table from ios_apt output

Gboard - the Google Keyboard, is the default keyboard on Pixel devices, and overall has been installed over a billion times according to the Play Store.

Although not the default on most non-Google brands, it is a popular app installed by foreign language users because of its good support and convenience of use particularly with dozens of Asian and Indian languages.

As a keyboard app, it monitors and analyzes your keystrokes, offering suggestions and corrections for spelling and grammar, sentence completion and even emoji suggestions.

Now for the interesting part. Since the last few versions, it also retains a lot of data (ie, user keystrokes!) in its cache. This is at least seen from the version from Jan 2020 (v 8.3.x). From a DFIR perspective, that is GOLD. For a forensic examiner, this can possibly show you data that was typed by the user on an app that is now deleted, or show messages typed that were then deleted, or messages from apps that have the disappearing message feature turned on! Or data entered into fields on web pages/online apps (that wouldn't be stored locally at all). Also for some apps that don't track when a particular item was created/modified, this could be useful.

Note - The Signal app wasn't specifically tested to see if data from that app is retained, but based on what we can see here, it seems likely those messages would end up here too. All testing was on a Pixel 3 running latest Android 11 using the default keyboard, and default settings. This was also verified on other earlier taken images. Josh Hickman's Android 10 Pixel 3 image was also used, and Josh was able to verify that Telegram and WhatsApp sent messages were present here. The specific versions of Gboard databases studied were:

8.3.6.250752527 (on Android 10)
8.8.10.277552084 (on Android 10)
10.0.02.338070508 (on Android 11)

Location

Gboard's app data (sandbox) folder is located here:

/data/data/com.google.android.inputmethod.latin/databases/

Here you might see a number of databases that start with trainingcache*. These are the files that contain the caches.

Figure 1 - Contents of Gboard's databases folder (v 10.0.02.338070508)

In different versions of the app, the database formats and names have changed a bit. Of these, useful data can be found in trainingcache2.db, trainingcache3.db and trainingcachev2.db. Let's examine some of them now.

trainingcache2.db (v 10.0.02.338070508)

The table training_input_events_table contains information about the application in focus, its field name (where input was sent), the timestamp of event and a protobuf BLOB stored in _payload field, as shown in screenshot below.

Figure 2 - training_input_events_table (not all columns shown)

The highlighted entry above is from an app that was since deleted. The _payload BLOB is decoded in screenshot below, highlighting the text typed by the user in the Email input field. The protobuf has also has all of the data included in the other columns in the table.

Figure 3 - Decoded Protobuf from _payload column

In most instances however, the protobuf looks like this - see screenshot below, where input needs to be put back together as shown.. Here you can see the words the user typed as well as suggestions offered by the app. Suggestions can be for spelling, grammar, or contact names, or something else.

Figure 4 - Decoded protobuf - reconstructing user input

Above, you can see the words typed and suggestions offered. On an Android device, the suggestions appear as shown below while typing.

Figure 5 - Android keyboard highlighting suggested words

trainingcache3.db (v 10.0.02.338070508)

In version 8.x, this same database is named trainingcache2.db, and follows the same exact format. The table s_table looks similar to the training_input_events_table seen earlier. However, the _payload field does not store the keystokes here.

Figure 6 - s_table

Figure 7 - _payload protobuf decoded from s_table

Keystroke data is stored in the table tf_table. Here, most entries are a single key press, and to read this, it again needs to be put back together as shown below.

Figure 8 - tf_table entries

All keystrokes from the same session have the same f1 value (a timestamp like field but not used as a timestamp). The order of the keys pressed is stored in f4. Assuming they are all in order, we can run a short query to concatenate the f3 column values for easy reading (shown below). This isn't perfect, as group_concat() doesn't guarantee order of concatenation, but it seems to work for now!

Figure 9 - Reading keystroke sessions from tf_table

We can combine (join) this data with the one from s_table to recreate the same data as we got from training_input_events_table earlier.

Figure 10 - joined tables

In the screenshot shown above, you can even see data being typed into a google doc, not saved locally. Only a snippet is shown above, but if you want to see the full parsed data, get Josh's Android image(s), and the latest version of ALEAPP (code), which now parses this out. Below is a preview (from a different image my students might recognize).

Figure 11 - ALEAPP output showing trainingcache parsed output

Cached keystroke data can also be seen and reconstructed from trainingcachev2.db, whose format is a bit different (not discussed here). Nothing of significance was found in trainingcache4 or the other databases.

Observations

As expected, keystrokes from password fields are not stored or tracked.

In data reconstructed from tf_table, you can see all the spelling mistakes a user made while typing! Any corrections made in the middle of a word/sentence will be seen at the end (because we are getting the raw keystokes in order of keys pressed). Hence it might be difficult to read some input. Also, if a user types something into a field, then deletes a word(s), and retypes, you won't see the final edited (clean) version, as backspaces (delete) are not tracked. You can see some of this in the output above (figure 9).

The caches are periodically deleted (and likely size limited too), and so you shouldn't expect to find all user typed data here.

Due to the popularity of OneDrive, it has become an important source of evidence in forensics. Last week, Brian Maloney posted about his research on reconstructing the folder tree from the usercid.dat files, and also provided a script to do so. In this brief post, we explore the format of OneDrive Logs, and provide a tool to parse them. In subsequent posts, I will showcase use case scenarios.

Where to find them?

OneDrive logs have the extension .odl and are found in the user's profile folder at the following locations:

On Windows -

C:\Users\<USER>\AppData\Local\Microsoft\OneDrive\logs\

On macOS -

/Users/<USER>/Library/Logs/OneDrive/

At these locations, there are usually 3 folders - Common, Business1 and Personal, each containing logs. As the name suggests Business1 is the OneDrive for Business version.

Figure 1 - Contents of Business1 folder

The .odl file is the currently active log, while the .odlgz files are older logs that have been compressed. Depending on which folder (and OS) you are in, you may also see .odlsent and .aodl files, which have a similar format.

What is in there?

These are binary files, and cannot be directly viewed in a text editor. Here is what a .odl file looks like in a hex editor.

Figure 2 - .odl file in a hex editor

It is a typical binary file format with a 256 byte header and data blocks that follow. Upon first inspection, it seems to be a log of all important function calls made by the program. Having this low level run log can be useful in certain scenarios where you don't have other logging and need to prove upload/download or synchronisation of files/folders or even a discovery of items which no longer exist on disk/cloud.

You do notice some funny looking strings (highlighted in red). More on that later..

The Format

With a bit of reverse engineering, the header format is worked out as follows:

struct {
char signature[8]; // EBFGONED
uint32 unk_version; // value seen = 2
uint32 unknown2;
uint64 unknown3; // value seen = 0
uint32 unknown4; // value seen = 1
char one_drive_version[0x40];
char os_version[0x40];
byte reserved[0x64];
} Odl_header;

The structures for the data blocks are as follows:

struct {
uint64 signature; // CCDDEEFF 0000000
  uint64 timestamp; // Unix Millisecond time
uint32 unk1;
uint32 unk2;
byte   unk3_guid[16];
  uint32 unk4;
  uint32 unk5;  // mostly 1
  uint32 data_len;
  uint32 unk6;  // mostly 0
byte data[data_len];
} Data_block;

struct {
  uint32 code_file_name_len;
  char   code_file_name[code_file_name_len];
  uint32 unknown;
  uint32 code_function_name_len;
  char   code_function_name[code_function_name_len];
  byte parameters[];
} Data;

In case of .odlgz files, the Odl_header is the same, followed by a single gzip compressed blob. The blob can be uncompressed to parse the Data_block structures.

Now, we can try to interpret the data. Leaving aside the few unknowns, the data block mainly consists of a timestamp (when event occurred), the function name that was called, the code file that function resides in and the parameters passed to the function. The parameters can be of various types like int, char, float, etc.. and that part hasn't been fully reverse engineered yet, but simply extracting the strings gives us a lot of good information. However the strings are obfuscated!

Un-obfuscating the strings

Since Microsoft uploads these logs to their servers for telemetry and debugging, they obfuscate anything that is part of a file/folder name or url string or username. However file extensions are not obfuscated. The ways this works is that the data identified to obfuscate is replaced by a word which is stored in a dictionary. The dictionary is available as the file ObfuscationStringMap.txt, usually in the same folder as the .odl files. To un-obfuscate, one simple needs to find and replace the strings with their original versions.

Figure 3 - Snipped of ObfuscationStringMap.txt

This is a tab separated file, stored as either UTF-8 or UTF-16LE depending on whether you are running macOS or Windows.

Now, referring back to the original funny looking string in Figure 2 -

/LeftZooWry/LogOneMug/HamUghVine/MuchDownRich/QuillRodEgg/KoiWolfTad/LawFlyOwl.txt

.. after un-obfuscating becomes ..

/Users/ykhatri/Library/Logs/OneDrive/Business1/telemetry-dll-ramp-value.txt

It is important to note that since extensions are not obfuscated, they can still provide valuable intel even if some or all of the parts in the path cannot be decoded.

Now this process seems easy, however not all obfuscated strings are file/folder names or parts of paths/urls. Some are multi line strings. Another problem is that the words (or keys to the dictionary) are reused! So you might see the same key several times in the ObfuscationStringMap. The thing to remember is that new entries get added at the top in this file, not at the bottom, so when reading a file, the first occurrence of a key should be the latest one. Also sometimes, the key is not found, as it's cleaned out after a period of time. Also there is no way to tell if an entry in the dictionary is stale or valid for a specific log file being parsed. All of this just means that the decoded strings need to be taken with a grain of salt.

Based on the above, a python script to parse the ODL logs is available here. A snippet of the output produced by the script is shown below.

Figure 4 - output snippet

In a subsequent post, we'll go through the items of interest in these logs like Account linking/unlinking, uploads, downloads, file info, etc..

In the last OneDrive blog post, I outlined how the ODL file format is structured. A working version of an ODL parser was also created to read these files. One key detail was how personal file/folder, location or credential identifying strings were obfuscated with the original values stored in the ObfuscationStringMap.txt file.

However some time in April 2022, Microsoft decided to change the way the obfuscation worked and the parser no longer worked.

What changed?

OneDrive now appeared to encrypt the data and the ObfuscationStringMap.txt is no longer used. The file may still exist on older installations, but newer ones include a different file.

Figure 1 - Contents of \AppData\Local\Microsoft\OneDrive\logs\Business1 folder

As seen in Figure 1 above, there is a new file called general.keystore. This file's format is JSON that can be easily read and apparently holds the key to decrypt the encrypted content as a base64 encoded string.

Figure 2 - Sample general.keystore contents

Time for some Reverse Engineering

With a little bit of digging around with IDA Pro on the LoggingPlatform.dll file from OneDrive, we can see the BCrypt Windows APIs being used in this file. Note, this is not the bcrypt hash algorithm which bears the same name!

Figure 3 - BCrypt* Imports in LoggingPlatform.dll

Jumping to where these functions are used, it is quickly apparent that the encryption used is AES in CBC (Cipher Block Chaining) mode with a key size of 128 bits.

Figure 4 - IDA Pro Disassembly

In the above snippet, we can see the call to BCryptAlgorithmProvider and then if successful, a call to BCryptSetProperty function which has the following syntax:

NTSTATUS BCryptSetProperty(
[in, out] BCRYPT_HANDLE hObject,
[in] LPCWSTR pszProperty,
[in] PUCHAR pbInput,
[in] ULONG cbInput,
[in] ULONG dwFlags
);

Without delving into too many boring assembly details, I'll skip to the relevant parts...

For each string to be encrypted, OneDrive initialises a new encryption object with the key that is stored in the general.keystore file, then encrypts the string and disposes of the encryption object. The encrypted blob is then base64 encoded and written out to the log the obfuscated string. There are a few other quirks along the way, such as replacement of the characters / and + with _ and - respectively, as the former can appear in base64 text but are also used in URLs to make it parseable later.

Why the change?

In the previous iteration of ODL (when the ObfuscationStringMap was used), there were instances where the same key (3 word combination) was often repeated in the file making it difficult or impossible to know which value to use as its replacement to get the original string.

Using encryption in place and not using a lookup table does appear to be a more robust scheme which eliminates the above issue. It does use some more disk space as the encrypted blob will always be a multiple of 16 bytes (128 bits) as this is block based encryption. In other words, it's inefficient for small text (less than 10 bytes).

Updated code

The python ODL parser has been updated to accomodate this new format, and works with both the old and new versions. It is available here.

A long time ago I wrote some code to make NSKeyedArchives (NSKA) human readable, basically de-serializing the data. It was then converted to a library for use in other projects like iLeapp and mac_apt. I revisited this last week and found and fixed a minor bug. While at it, I also added an extra capability, mostly for the folks who don't prefer to touch code.

Previously, this library only worked with NSKA files. If a file was a normal plist, it would return an exception complaining about not being able to find the '$archiver' element in the plist. But what if you had files that were normal plists (not serialised), but had nested NSKA plists as data blobs within. There are actually quite a few on ios/macOS. To make them human-readable, you would have to write code to extract the blobs and run them through the library. The previous code also did not handle recursive deserializing even within NSKA archives.

Now with the latest update (version 1.4.0), there is an extra parameter in the deserialize_plist(...) and deserialize_plist_from_string(...)functions to unlock this functionality and also performs full recursive deserializing of all nested blobs.

defdeserialize_plist(path_or_file, full_recurse_convert_nska=False)

By default, the value is False emulating the old behaviour. However, when set to True, this will no longer return an exception for non-NSKA (unserialized or normal) plists and will always return a plist. If there was a data (binary blob) element anywhere in the tree that had a value containing a valid header for an NSKA plist, that will now be replaced with a tree branch representing the deserialized version of the NSKA data.

Figure 1 - NSKA plist deserialized with old code vs new

If you are using nska_deserialize dependancy in any project, update to the latest:

pip3 install nska_deserialize --upgrade

The old compiled exe has been updated (with the flag set to True). It is also very conventient to use with drag and drop as shown here.

This post highlights improvements to the AUTOSTART plugin in mac_apt.

Since macOS 13 (Ventura), Login items and Background tasks are managed and tracked via .BTM files. This is located at the path:

/private/var/db/com.apple.backgroundtaskmanagement/BackgroundItems-v<xx>.btm

where <xx> is the version number, currently 13 on macOS 15.2

Much of this information (but not all!) is visible to the end user via the Login items & Extensions page under System Settings as shown below.

Figure 1 - Login items & Extensions from System Settings

mac_apt's AUTOSTART plugin already processed BTM files, however this is now significantly improved. Previously BTM specific parameters were not being parsed and developer entries were also included (which are not autostart) which made them difficult to read and interpret, also missing some key information.

BTM files are NSKeyedArchives which when deserialised contain dictionaries of items (login and background tasks) per user.

Figure 2 - Snippet of single item from .BTM file

How these are interpreted and transformed into the nice GUI view seen above is dependant mostly on the parameters 'type' and 'disposition'. The following values have been observed for these fields:

DispositionValues= {
0x01: 'Enabled',
0x02: 'Allowed',
0x04: 'Hidden',
0x08: 'Notified'
}

TypeValues= {

0x00001: 'user item',
0x00002: 'app',
0x00004: 'login item',
0x00008: 'agent',
0x00010: 'daemon',
0x00020: 'developer',
0x00040: 'spotlight',
0x00800: 'quicklook',
0x80000: 'curated',
0x10000: 'legacy'
}

The 'type' value indicates if this item is an agent, daemon, app, user defined item or a spotlight or quicklook extension.

When a user toggles the option to OFF for an item in the "Allow in the Background" setting, this will clear the 'Allowed' bit in the Disposition flag thereby indicating 'Not Allowed'.

mac_apt now reads, interprets and shows the BTM parameters for disposition, type, container, developer and executableModifiedDate. The following output snippet filtered for "Not Allowed" will show the same output as that shown in System settings GUI. As seen in Fig 1 above (and Fig 4 below), 2 Citrix items are toggled to OFF, resulting in 6 apps belonging to these items being in the 'Not Allowed' group.

Figure 3 - Snippet of AUTORUNS output from mac_apt, filtered on BackgroundTask items and 'Not Allowed' disposition

Figure 4 - Disabled items from System Settings

This greatly simplifies the review of background applications. If the app itself disables a startup item, then the 'Enabled' flag is off, this will be missing from the BTM_Disposition column. mac_apt will populate the Disabled column with the value '1' to also indicate this.

Also added is an 'AppArguments' column, which should populate the full command line arguments from all processed files (BTM and plists).

Be aware that mac_apt will process all encountered .btm files, so you may see repeated data as there are likely older .btm files, vestigial artefacts from previous macOS versions. On my test system, I've got BackgroundItems-v9.btm and BackgroundItems-v13.btm. This may be useful from a forensics perspective to look at the autostarts from that point in time. You will have to filter on the 'Source' column in the output if you wish to see only current data.