The British startup Babylon Health Retrieved app data from a critical user to create a press release publicly attacking the UK doctor who has raised patient safety concerns over the Symptom Triage Chatbot service for years.
In the press release published late Monday, Babylon referred to Dr. David Watkins – via his Twitter handle – as a "troll" and claims that he is "targeted employees, partners, customers, regulators and journalists and tweeted defamatory content about us. "
Watkins is also said to have performed "hundreds of hours" and 2,400 tests on his service to discredit his security concerns.less than 100 test results that he was considering ”.
Babylon's PR also claims that Watkins only found "real mistakes in our AI" in 20 cases, while cases other than "misrepresentations" or "mistakes" per unnamed "Panel of high-level clinicians "that, according to the startup's PR," examined and re-validated each one " – indicating that the error rate identified by Watkins was only 0.8%.
In response to the attack in a theinformationsuperhighway phone interview, Watkins described Babylon’s claims as “utter nonsense” – for example, he said he hadn’t even carried out 2,400 tests of his service. "There are certainly not 2,400 triage assessments completed," he told us. "Absolutely not."
When asked how many tests he thought he would complete, Watkins suggested that there would likely be between 800 and 900 full runs of "full triages" (some of which, he emphasizes, would have been repeat tests to determine if that was the case) Company has fixed problems he'd noticed earlier).
He said he identified problems in about one in two or one in three cases of testing the bot – although he found far more problems in 2018 and claimed that it was "one on one" at the time for an earlier version of the app.
Watkins suggests that to reach 2,400, Babylon is likely to count cases in which he was unable to complete triage because the service was delayed or faulty. "They manipulated data to discredit someone who raised patient safety concerns," he said.
“I'm obviously testing in a way that I know what I'm looking for – because I've been doing this for the past three years and I'm looking for the same issues that I previously reported to see them that they fixed. Trying to point out that my tests are actually a reference to the chatbot is absurd in itself, ”he added.
In another targeted attack, Babylon writes that Watkins had “published over 6,000 misleading attacks ”- without specifying what type of attack it relates to (or where it was published).
Watkins told us that he hasn't tweeted 6,000 times since joining Twitter four years ago – despite using the platform for three years to raise concerns about diagnostic issues with Babylon's chatbot.
Like this series of tweets, in which he shows a triage for a patient who does not notice a possible heart attack.
Watkins told us he had no idea what the number 6,000 was referring to, and accused Babylon of having a culture of "trying to silence criticism" rather than engaging in real clinical concerns.
"Not once has Babylon actually turned to me and said," Hey, Dr. Murphy – or Dr. Watkins – what you tweeted there is misleading, "he added." Not even. "
Instead, he said the startup has consistently chosen a "dismissive approach" to the security concerns he raised. "My general concern about the way they approached this is that they have again taken a dismissive approach to criticism and have tried again to smear and discredit the person who is raising concerns," he said.
Watkins, consultant oncologist with the Royal Marsden NHS Foundation Trust – for several years under the online nickname (Twitter) of @DrMurphy11The twittering of videos from Babylon's chatbot triage illustrates the bot that does not correctly identify patient presentations – and made its identity public on Monday when it participated in a debate in the Royal Society of Medicine.
There he gave a lecture calling for less hype and a more independent review of Babylon’s claims, as such digital systems continue to find their way into the healthcare field.
In the case of Babylon, the app has an important cheerleader in the current UK health minister, Matt Hancock, who has announced that he is a personal user of the app.
At the same time, Hancock is urging the National Health Service to revise its infrastructure to enable the plugging in of "Healthtech" apps and services. This way you can see the political synergies.
Watkins argues that the sector needs to focus more on collecting solid evidence and independent testing than on pointless ministerial support and partnership "endorsements" to advocate due diligence.
He points to the example of Theranos – the disgraced startup for blood tests, whose co-founder is now accused of fraud – and says this should be an important red flag for the need for independent testing of "novel" information on health products.
"(About product hyping) is a problem in the technology industry that unfortunately seems to have infected the healthcare industry in some situations," he said, referring to the startup book "Fake it til you make it" for hype marketing and -Scaling without waiting for external review of heavily marketed receivables.
In the case of Babylon, he argues that the company has failed to substantiate puff marketing with evidence of the type of extensive clinical testing and validation it believes should be required for a health app to be used in the wild by Patient is used. (References to academic studies have not been stopped by giving outsiders access to data so they can verify their claims, he also says.)
"They are supported by all of these people – the founders of Google DeepMind, Bupa, Samsung, Tencent and the Saudis have given them hundreds of millions and they are a billion dollar company. They have the support of Matt Hancock. I have one Deal with Wolverhampton. Everything looks trustworthy, "continued Watkins." But there is no basis for this trustworthiness. You base trustworthiness on a company's ability to become a partner. And you assume that these partners are a due Have done diligence. "
For its part, Babylon claims the opposite – its app complies with existing regulatory standards, referring to high "patient satisfaction ratings" and a lack of reported user harm as evidence of safety by writing in the same PR that Watkins contains:
Our track record speaks for itself: our AI has been used millions of times, and not a single patient has reported damage (a far better safety record than any other health care provider in the world). Our technology meets strict legal standards in five different countries and has been validated ten times by the NHS as a safe service. When the NHS reviewed our symptom reviewer, health check and clinical portal, they said that our method of validation "was completed with a robust, high-standard evaluation method". When it comes to patient satisfaction, over 85% of our patients give 5 stars (and 94% five and four stars), and the Care Quality Commission recently rated us as "outstanding" for our leadership.
The suggestion of assessing the effectiveness of a health service based on a patient's ability to complain when something goes wrong seems to be at least an unorthodox approach – to turn the Hippocratic oath principle "do no harm first". (In theory, someone who is dead would literally not be able to complain. This could fill a fairly large gap in a security bar that is claimed through such an assessment method.)
Regarding regulation, Watkins argues that the current British regime is not designed to respond intelligently to a development like AI chatbots, and that this new category lacks strong enforcement.
Complaints made to the MHRA (Regulatory Agency for Medical and Health Products) have resulted in Babylon being asked to work on problems with little or no follow-up, he says.
While he finds that confidentiality clauses limit what the regulator can disclose.
Of course, all of this could look like a plum opportunity for a certain type of startup disruptor.
And Babylon's app is one of several that are now using AI technologies as diagnostic help in chatbot form in several global markets. Users are typically asked to answer questions about their symptoms and to be informed of possible causes at the end of the triage process. Although Babylon’s PR materials carefully include a footnote that points out that the AI tools “neither provide a medical diagnosis nor replace a doctor”.
However, according to Watkins, reading certain headlines and claims about the company's product in the media may give you a completely different impression – and it is this level of hype that worries him.
He suggests that other chatbots are available that trigger less hype. He suggests reviewing Ada Health in Berlin to take a more thoughtful approach.
When asked if there were any tests that Babylon would like to do to withstand its hype, Watkins said to us, "The starting point is technology that you believe is actually in the public domain."
In particular, the European Commission is working on a risk-based legal framework for AI applications – including use cases in sectors such as healthcare – that would require such systems to be "transparent, traceable and under human control" and use unbiased data for training of their AI models.
"Because of the hyperbolic claims that were previously released about Babylon, there is a major problem here. How do you roll back and make it safe? You can do this by giving certain warnings about what it is used for." Watkins said, expressing concerns about the wording used in the app, "Because it is a diagnosis for patients and suggests what to do to bring out this disclaimer, which states that this is not healthcare information, but is just information – it doesn't make sense. "I don't know what a patient should think of it."
“Babylon always presents itself as very patient, very patient-oriented, we listen to the patient, we hear their feedback. If I was a patient and I have a chatbot that tells me what to do and gives me a suggested diagnosis – at the same time, he tells me: "Ignore this, don't use it" – what is it? "he added." What is your purpose?
"There are other chatbots that I think have been defined much more clearly – where they are very clear in their intent that we are not here to advise you on health care; we provide you with information that will be forwarded to your doctor so that you can have a more informed decision discussion with him. And if you put it in that context, I think it makes sense as a patient. This machine will give me information so that I can have a more informed discussion with my doctor. Fantastic. So there are simple things they just haven't done. And it drives me crazy. I'm an oncologist – I shouldn't be doing that. "
Watkins suggested that Babylon’s response to his patient safety concerns, in good faith, was symptomatic of deeper discomfort within the corporate culture. It has also had a negative impact on him and made him a target for parts of the right-wing media.
"What they did, although it may not be user health data, they tried to use data to intimidate an identifiable person," he said of the company's attack on him. "As a result of pursuing this threatening approach and trying to intimidate other parties, we have bundled and attacked this guy. So it's the damage it does. You chose a person to be attacked."
"I am concerned that there are clinicians in this company who, when they see this, do not raise concerns – because they are only discredited in the organization." And that's really dangerous in healthcare, ”added Watkins. "You have to be able to express yourself when you see concerns, otherwise patients are at risk of injury and nothing changes. You have to learn from mistakes when you see them. You can't just do the same thing over and over again . "
Others in the medical community quickly criticized Babylon for targeting Watkins in such a personal way and for revealing details of his use of his (medical) service.
Sam Gallivan, also a doctor, said: "Can other high-frequency Babylon Health users look forward to their medical questions being released in a press release?"
Can other high-frequency @ babylonhealth users look forward to their private medical requests being sent in a press release?
– Sam Gallivan (@samgal), February 25, 2020
The law certainly raises questions about Babylon’s approach to sensitive health data when accessing patient information in an attempt to counter informed criticism with steamruns.
Of course, we have seen similar ugly things in technology before – for example, when Uber had a "god look" at his hailstorm service and used it to keep an eye on critical journalists. In this case, misuse of platform data indicated a toxic culture problem that Uber had to sweat to turn around in the following years (including the change of its CEO).
Babylon's selective data dump on Watkins is also a vivid example of the ability of a digital service to access and design individual data at will. This shows the underlined performance asymmetries between these data acquisition technology platforms (which are increasingly gaining freedom of choice through our decisions) and their users, who only have strong, hyper-controlled access to the databases they feed.
For example, Watkins informed us that he can no longer access his query history in the Babylon app. He provides a screenshot of an error screen (below) that he now sees when trying to access chat history in the app. He said he didn't know why he could no longer access his historical usage information, but said he used it as a reference – to help (and can't) do more testing.
If it's a mistake, it's handy for Babylon PR …
We contacted Babylon to ask them to respond to criticism of his attack on Watkins. The company defended the use of its app data to create the press release. It argued that the "volume" of the queries it performed means that the usual data protection rules do not apply, and further claimed that it only "shared non-personal statistical information", although in the PR with its Twitter identity (and associated with his real name since Monday).
In a statement, the Babylonian spokesman told us:
When providing safety-related information about our technology, our medical professionals need to address these issues to ensure the accuracy and safety of our products. The recently released usage data, given the volume of usage, makes it clear that it is more theoretical data (part of an accuracy test and experiment) than real health concerns of a patient. Given the volume of use and the way data is presented publicly, we felt that we should take accuracy into account and use information to reassure our users. The data we shared was not personal statistical information, and Babylon has consistently followed its data protection obligations. Babylon does not publish real individualized user health data.
We also asked the UK data protection officer about the episode and Babylon, who are making Watkins app use public. The ICO told us: “People have the right to expect organizations to handle their personal information responsibly and securely. If anyone has concerns about how their data has been handled, they can contact the ICO and we will review the details. "
Babylon's Director of Clinical Innovation, Dr. Keith Grimes participated in the same Royal Society debate as Watkins this week. It was entitled "Latest developments in AI and digital health 2020" and was billed as a conference that "will overcome the hype surrounding AI".
So it doesn't seem to be a coincidence that the press release about their attack was released just after a presentation that was known to take place that day (at least since last December) – and in which Watkins argued what AI chatbots did concerns. "Validation is more important than evaluation".
Last summer, Babylon announced a $ 550 million Series C increase at a valuation of $ 2 billion +.
The company's investors include the Saudi Arabia Public Investment Fund, an undisclosed U.S. health insurance company, Munich Re's ERGO Fund, Kinnevik, Vostok New Ventures, and DeepMind co-founder Demis Hassabis, to name a few, to help fund of marketing.
"You came up with a story," said Watkins of Babylon's message to the Royal Society. "The debate was not particularly educational or constructive. And I'm just saying that because Babylon came with a story and they wanted to stick to it. The story should avoid any discussion of security concerns or the fact that there were problems, and simply consider them as describe safely. "
The clinician's counter-message to this event was to ask a question that EU policymakers are just starting to think about. She asked the AI manufacturer to display records that meet its security requirements.