Export health data as an XML file

Apple iOS Health App allows you to export all your health data. The steps are straightforward:

  1. Open your Health app and click on your profile picture in the top-right corner.
  2. Select [Export All Health Data], which appears at the bottom of the menu.
  3. After a few minutes, you can send the exported export.zip to your computer via, e.g., AirDrop.

After unzipping the export.zip , you may find different types of files. For instance, workout routes are stored as .gpx files that are GPS tracking information. The relevant file for plotting the sleep schedule is in the export.xml file.

Parse the XML file in Python

You may look at the export.xml file by opening it in a text editor. The main structure of this XML file looks like this:

The root tag is , and there are different kinds of tags under it. Especially, we want to acquire the data in those tags belonging to SleepAnalysis, i.e., with type=”HKCategoryTypeIdentifierSleepAnalysis” , for example:

 type="HKCategoryTypeIdentifierSleepAnalysis" 
sourceName="MyAppleWatch"
sourceVersion="2021111"
creationDate="2021-12-01 17:14:48 -0500"
startDate="2021-12-01 08:38:00 -0500"
endDate="2021-12-01 08:43:59 -0500"
value="HKCategoryValueSleepAnalysisAsleep"
/>

To do so, first, we use the xml package to parse the XML file:

import xml.etree.ElementTree as ETtree = ET.parse("apple_health_export/export.xml")
root = tree.getroot()

You can check the tag of root , which is “HealthData” as we saw in the XML file:

>>> root.tag
'HealthData'

To iterate over the child tags under root , and append them to a list, we can perform a list comprehension:

records = [i.attrib for i in root.iter("Record")]

Then, we may convert records into Pandas.DataFrame , and cast the string-type date/time data into datetime datatype:

import pandas as pdrecords_df = pd.DataFrame(records)date_col = ['creationDate', 'startDate', 'endDate']
records_df[date_col] = records_df[date_col].apply(pd.to_datetime)

Now, we can select only the Sleep Analysis data in our records:

sleeps_df = records_df.query("type == 'HKCategoryTypeIdentifierSleepAnalysis'")

Cut each overnight record into two records

We may have some sleep records starting from one day but ending on the next day. To make the plotting process easier, we can cut those overnight records into two:

  1. Separate sleep records into two Pandas.DataFrame , the no_cross and cross , where cross means overnight records.
  2. Make two duplications c1 and c2 of cross .
  3. Set the end DateTime of the records in c1 as 23:59:59 of the Starting Date of each record, and set the start DateTime of the records in c2 as 00:00:00 of the Ending Date of each record.
  4. Concat the no_cross , c1 , and c2 , and sort them by the Starting DateTime.
no_cross = sleeps_df[sleeps_df["startDate"].dt.day == sleeps_df["endDate"].dt.day]cross = sleeps_df[sleeps_df["startDate"].dt.day != sleeps_df["endDate"].dt.day]c1 = cross.copy()
c2 = cross.copy()
c1["endDate"] = c1["startDate"].apply(lambda x: x.replace(hour=23, minute=59, second=59))c2["startDate"] = c2["endDate"].apply(lambda x: x.replace(hour=0, minute=0, second=0))sleeps_splitted_df = pd.concat([no_cross, c1, c2]).sort_values("startDate")

Plotting using Matplotlib

Finally, we can make our plot by using matplotlib . Some points are worth mentioning:

  1. We used matplotlib.patches.Rectangle((x, y), width, height) to plot our data.
  2. We used pandas.DataFrame.itertuples() to iterate over records.
  3. Matplotlib uses a floating number representation of daytime in its axes. We used matplotlib.dates.date2num to convert our DateTime data into its representation.
  4. Somehow, if the input time delta is too small (e.g., our y-axes are only 24 hours), we need to divide the value by 1000 to get the correct Ticks label. That’s the reason why we have some /1000 in the code.
  5. We found the Awake data is not particularly useful for our application. So we skipped them in our plot.
  6. The reference to Ticks Formatter and Locator can be found in the documentation ofmatplotlib.dates . The format string used in DateFormatter like %I%p are strftimeformat strings and can be found here.

Again, the Full Source Code

Again, The complete IPython Notebook to make the plot is available at my GitHub repo: https://github.com/c0rychu/apple-sleep

Hope you’ll enjoy~

Reference