DeepMind’s new AI can spot breast cancer just as well as your doctor

PA Images / Kristan Lieb/Chicago Tribune/TNS/Sipa USA

One in eight women will be diagnosed with breast cancer throughout their lives. In an effort to help with quicker detection, researchers have trained a deep-learning algorithm to spot breast cancer in screening scans as accurately or better than a radiologist.

While still at an early stage, the research could eventually help reduce incorrect results in the US and help alleviate the shortage of radiologists in the UK. As early detection is key to treatment, women over the age of 50 are tested in the US and UK even if they don’t show signs of the disease. False negatives, when cancer is present but not spotted, can prove deadly, while false positives can be distressing.

Google-owned DeepMind has already worked with NHS organisations to develop AI to read eye scans and spot neck cancer. Over the past two years, researchers from Cancer Research UK Imperial College, Northwestern University, Royal Surrey County Hospital, and Google Health have used a deep-learning system developed by DeepMind on two different datasets of breast scans, one from the US and one from the UK, suggesting the AI could help read mammograms accurately.

“This is another step along the way of trying to answer some of the questions that will be critical for us to actually deploying this in the real world,” says Dominic King, director and UK lead of Google Health. “This is another step closer to trying to deploy this type of technology safely and effectively.”

The system was first trained on de-identified mammograms from 76,000 British women, using Cancer Research UK’s OPTIMAM dataset, as well as 15,000 scans from the US. Once trained, the algorithm was tested on 25,000 scans in the UK, and a further 3,000 in the US. Four images from each mammogram were pulled into a neural network, spitting out scores for three different models between zero and one, with the latter a high risk of cancer.

The AI’s conclusions were then compared against real-life results in the future and what radiologists said at the time, says Christopher Kelly, clinician scientist at Google Health, and co-author of the research, which has been published in the journal Nature. “Our ground truth was based on biopsy results and follow-ups, so if they had a normal screen two or three years later,” he explains.

The research says the AI model could predict breast cancer with the same level of accuracy as a single expert radiologist. Compared to human experts, the system saw a reduction in false positives by 5.7 per cent in the US and 1.2 per cent in the UK, and in false negatives of 9.4 per cent in the US and 2.7 per cent in the UK.

However, those results don’t necessarily reflect how such scans are read in real life. In the US, breast scans are normally checked by a single radiologist, while in the NHS mammograms are checked by a minimum of two radiologists. If those two “readers” disagree on the result, the scan is checked by a third and potentially even a fourth.

The study claims the DeepMind algorithm performs better than a single radiologist, and is “non inferior” versus two.”The model performs better than an individual radiologist in both the UK and the US,” Kelly says. “In the UK we have this double reading system, where two radiologists or maybe three or four look at each scan… we’re statistically the same as that, but not better than that.”

However, the Royal College of Radiologists says its workforce modelling research suggests the UK is short of at least 1,104 radiologists; there are currently 542 expert breast radiologists in the UK, but eight per cent of hospital posts for such roles are unfilled.

If the role of the second reader could be partially replaced by AI, that could alleviate some of the staff shortages, notes King— indeed, he says radiologists asked Google Health to look into AI for screening scans for just this reason. “We had a group of senior breast radiologists in the UK contact us three or four years ago to say that this was an area they felt was amenable to artificial intelligence but also it was critical to start thinking about how technology could start supporting the sustainability of the service, because currently there can be very lengthy delays,” he says.

To test that idea, the researchers ran a side project, simulating how the algorithm could work with a human radiologist. The AI and the human radiologists agreed 88 per cent of the time, meaning only 12 per cent of scans would then have to be read by another radiologist. However, the reader study was run with a more limited data set and only six radiologists, all of whom were US trained and only two of whom had fellowship-level training in breast imaging. “We’d like to test this with further work, but as a kind of simulation, it was quite exciting to see this as a suggestion towards a potential system on the future,” says Kelly.

Regardless of the success of such research, radiologists can’t be fully replaced by AI – but they could be assisted, stresses Caroline Rubin, vice president for Clinical Radiology at the Royal College of Radiologists, who was not involved in the research. “Like the rest of the health service, breast imaging – and UK radiology more widely – is under-staffed and desperate for help,” she says. “AI programmes will not solve the human staffing crisis – as radiologists and imaging teams do far more than just look at scans – but they will undoubtedly help by acting as a second pair of eyes and a safety net.”

Alongside the accuracy checks and the reader simulation, the researchers also examined whether the system could be generalised, meaning the system could be trained on a single data set and used everywhere. To test this, they ran the algorithm trained on UK data on the American scans. The results were not as good as the system trained on US data – a 3.5 per cent reduction in false positives, versus 5.7 per cent using local data – but they were still positive, suggesting some generalisation may be possible, but training on a localised data set remains preferable.

While the results are positive, the researchers stress the work remains in the early stages. The Google researchers say they would like to see more research done not on retrospective, historical data, but with current patients. “Prospective studies are the only way you find out how these things perform in the real world,” Kelly says, in particular how clinicians would interact with the system. Plans for such a project is in the works. “That’s a different programme of research that we’re now excited to be exploring,” King says.

The RCP’s Rubin agrees, calling for rigorous testing and careful regulation. “The next step for promising [breast screening AI] products is for them to be used in clinical trials, evaluated in practice and used on patients screened in real-time, a process that will need to be overseen by the UK public health agencies that have overall responsibility for the breast screening programmes,” she says.

Thanks in part to the Royal Free debacle, concerns around data privacy have stalked Google’s latest foray into medical research. The researchers stressed that the mammograms were de-identified, adding that the algorithm looked only at the scans and no other patient information. The UK data was sourced from a set collected specifically for research by Cancer Research UK.

Another concern with AI development is the spectre of bias, and the paper says that checks of bias found none. That suggests the algorithm should work equally well on any scan – though it’s actually better at spotting invasive cancers, that’s positive as humans find them more troublesome – regardless of the specific details of the individual. To check that, Kelly says the team looked at metadata associated with each image to ensure the AI wasn’t “underperforming” on minority subgroups, he says, adding that more in depth analysis can and should be done to ensure no bias.

Andrew Holding, senior research associate at Cancer Research UK’s Cambridge Institute, who was not involved with the research, says the best way to avoid such bias is training with a diverse data set. “In this study data from the US and the UK was used in recognition of these challenges, nonetheless, we’re still a long way from representing the full diversity of people who present in the clinic,” he says.

“A clinician would rapidly adapt to something as simple as skin pigmentation by drawing on their wider life experiences, but an AI having never seen it would diagnose in an unpredictable manner. Similar problems could occur because one hospital uses a slightly older piece of equipment to take the mammogram leading, and that might lead to different patient outcomes. These problems aren’t unsolvable, but they do present a huge challenge.”

One bias to be considered in future research is the manufacturer of the scanning machines. The study coincidentally used scans that were predominantly produced via Hologic machines, and future work should ensure the algorithm works as well with other scanners.

There is one concern raised by the paper, says Holding: the code used by the algorithm has not been released. “The code used for training the models has a large number of dependencies on internal tooling, infrastructure and hardware, and its release is therefore not feasible,” the paper notes, saying it’s described in enough detail in the supplementary materials to be replicated with non-proprietary libraries.

“The work presents a fantastic effort, but it’s a shame that the authors have decided only to include instructions on how the AI was built and not provide the source code,” Holding says, pointing to a campaign for reproducible science. “Including source code is vital for increasing the impact of the research. It allows other scientists to build on the work rather than having to start again from scratch and provides for a better understanding of how the results were obtained. It also helps the researchers who do the work. By working reproducibly, you avoid costly mistakes and help your own research group build on your results.”

Beyond research itself, Holding argues researchers owe it to patients whose data they use to release such information freely. “If patients are generous enough to consent to their data to be used by companies like Google for research purposes, ideally the results and methods generated from the data should be available to them for free,” he says. “The research simply isn’t possible without that consent.”

Google Health’s Kelly admits that black-box algorithms are less useful in clinical settings, as it’s helpful to physicians to see their workings. This particular system is made up of three different models, which are combined into a single score. One of those, the local model, is perhaps the most important to clinicians as it highlights areas of concern. “When you look at workflows, localisation is actually really important to a radiologist… they look at the images and draw on the area that is suspicious,” says Kelly.

“When that goes to a consensus process, they compare all these diagrams.” That means that while the “global model”, which includes the local and the other data, has the most accurate cancer predictions, it may not be as useful to human radiologists. “Although it might perform the best, it’s a black box.” That said, Holding adds, we don’t always know how radiologists or physicians make their decisions, either. “While it’s in my nature to be sceptical of ‘black-box’ software, there is the counterpoint that we don’t really know how each individual clinician pulls together years of experience to make a decision about a patient.”

Work remains on this particular project as well as in the wider field of AI to read medical images, but such progress not only could cut costs, but improve care, Holding says. “These studies provide the path to a second digital pair of eyes that are never tired and see every patient,” he says. “By then using these AIs to catch potential mistakes, we can avoid concerns of putting all our faith in software, and still apply the technology to give better patient outcome. And that is really exciting.”

More great stories from WIRED

🚙 SUVs are worse for the planet than anyone realised

⏲️ Science says we should work shorter hours in winter

🐘 The illegal trade of Siberian mammoth tusks revealed

🙈 I ditched Google for DuckDuckGo. Here’s why you should too

📧 How to use psychology to get people to answer your emails

Like this article?

Share on facebook
Share on Facebook
Share on twitter
Share on Twitter
Share on linkedin
Share on Linkdin
Share on pinterest
Share on Pinterest

Leave a comment