Researchers Blame ‘Black Boxes’ for Unreliable Apple Watch Health Data

Apple Watch is known for its health-centric features. Over the past years, there have been several instances wherein Apple Watch helped diagnose impending health conditions. Newly added features like ECG, fall detection, oxygen sensor, and exercise tracking have aided health studies. Researchers from Harvard and the University of Michigan have now raised red flags about relying on Apple Watch for health studies.

Researchers are concerned about how Apple Watch algorithms for studies create “black boxes.” JP Onnela, associate professor of biostatistics, highlights the issue with using Apple Watch for health studies. The professor has noticed a case wherein heart rate variability data sourced from Apple Watch is inconsistent. Furthermore, the unreliable data can also affect the findings of other health studies based on the Apple Watch.

Typically Apple updates health algorithms regularly. Due to the algorithm changes, the data is bound to change. In some cases, the changes could affect the entire health study. In other words, “the data from the same time period can change without warning.”

These algorithms are what we would call black boxes — they’re not transparent. So it’s impossible to know what’s in them,” JP Onnela, associate professor of biostatistics at the Harvard T.H. Chan School of Public Health and developer of the open-source data platform Beiwe, told The Verge.

Devices like Apple Watch are aimed at consumers and not exactly a great fit for research. This is the reason Onnela uses research-grade devices that supply raw output data. On the contrary, the data obtained from Apple Watch is being processed through algorithmic filters and is likely to vary significantly.

So, they checked in on heart rate data his collaborator Hassan Dawood, a research fellow at Brigham and Women’s Hospital, exported from his Apple Watch. Dawood exported his daily heart rate variability data twice: once on September 5th, 2020 and a second time on April 15th, 2021. For the experiment, they looked at data collected over the same period — from early December 2018 to September 2020.

Unacceptable Variances in Apple Watch Data

Onnela used Apple Watch in one of the studies to highlight differences in the same data exported at two instances. The variances showed huge differences and an unimpressive linear correlation of 0.67.

To be clear, these data cover the same date range, so they should be identical. In fact, their means are very similar, 52 vs. 55 for the first and second export, respectively, but their variances are very different: 1240 vs. 572. To get some further insight into this, I made a scatter plot of the values of one time series against the other. The dashed identity line is where we’d like to see the points fall if they were identical, as we’d hope. Instead, there’s a lot of scatter in the data, and their Pearson linear correlation coefficient is just 0.67. That’s not a very high correlation.

The researcher explains how Apple Watch data could help those who want to track their health. When it comes to research, the differences are so huge that it is unacceptable. There is a big question mark on reliability, especially while studying participants wearing smartwatches from different brands. University of Michigan sleep researcher Olivia Walch agreed with Onnela.

Constantly changing algorithms makes it almost prohibitively difficult to use commercial wearables for sleep research, Walch says. Sleep studies are already expensive. “Are you going to be able to strap four FitBits on someone, each running a different version of the software, and then compare them? Probably not.

The report underlines risk of using the Apple Watch or other wearables, especially for research purposes. Apple is yet to comment on this matter.

[via TheVerge]