How Cloud Computing & Big Data Could Help Cure Cancer
Tuesday September 3, 2013
Computer clouds have been credited with making the workplace more efficient and giving consumers anytime-anywhere access to emails, photos, documents, and music as well as helping companies crunch through masses of data to gain business intelligence.
Now it looks like the cloud might help cure cancer too.
The National Cancer Institute plans to sponsor three pilot computer clouds filled with genomic cancer information that researchers across the country will be able to access remotely and mine for information.
The program is based on a simple revelation, George Komatsoulis, interim director and chief information officer of the National Cancer Institute’s Center for Biomedical Informatics and Information Technology, told Nextgov. It turns out the gross physiological characteristics we typically use to describe cancer — a tumor’s size and its location in the body — often say less about the disease’s true character and the best course of treatment than genomic data buried deep in cancer’s DNA.
That’s sort of like saying you’re probably more similar to your cousin than to your neighbor, even though you live in New York and your cousin lives in New Delhi. It means treatments designed for one cancer site might be useful for certain tumors at a different site, but, in most cases, we don’t know enough about those tumors’ genetic similarities yet to make that call.
The largest barrier to gaining that information isn’t medical but technical, said Komatsoulis who’s leading the cancer institute’s cloud initiative. The National Cancer Institute is part of the National Institutes of Health.
The largest source of data about cancer genetics, the cancer institute’s Cancer Genome Atlas, contains half a petabyte of information now, he said, or the equivalent of about 5 billion pages of text. Only a handful of research institutions can afford to store that amount of information on their servers let alone manipulate and analyze it.
By 2014, officials expect the atlas to contain 2.5 petabytes of genomic data drawn from 11,000 patients. Just storing and securing that information would cost an institution $2 million per year, presuming the researchers already had enough storage space to fit it in, Komatsoulis told a meeting of the institute’s board of advisers in June.
To download all that data at 10 gigabytes per second would take 23 days, he said. If five or 10 institutions wanted to share the data, download speeds would be even slower. It could take longer than six months to share all the information.
That’s where computer clouds — the massive banks of computer servers that can pack information more tightly than most conventional data centers and make it available remotely over the Internet — come in. If the genomic information contained inside the atlas could be stored inside a cloud, he said, researchers across the world would be able to access and study it from the comfort of their offices. That would provide significant cost savings for researchers. More importantly, he said, it would democratize cancer genomics.
“As one reviewer from our board of scientific advisers put it, this means a smart graduate student someplace will be able to develop some new, interesting analytic software to mine this information and they’ll be able to do it in a reasonable timeframe,” Komatsoulis said, “and without requiring millions of dollars of investment in commodity information technology.”
It’s not clear where all this genomic information will ultimately end up. If one or more of the pilots prove successful, a private sector cloud vendor may be interested in storing the information and making it available to researchers on a fee-for-service basis, Komatsoulis said. This is essentially what Amazon has done for basic genetic information captured by the international Thousand Genomes Project.
A private sector cloud provider will have to be convinced that there’s a substantial enough market for genomic cancer information to make storing the data worth its while, Komatsoulis said. The vendor will also have to adhere to rigorous privacy standards, he said, because all the genomic data was donated by patients who were promised confidentiality.
One or more genomic cancer clouds may also be managed by university consortiums, he said, and it’s possible the government may have an ongoing role.
The cancer institute is seeking public input on the cloud through the crowdsourcing website Ideascale. The University of Chicago has already launched a cancer cloud to store some of that information. It’s not clear yet whether the university will apply to be one of the institute’s pilot clouds.
Because the types of data and the tools used to mine it differ so greatly, it’s likely there will have to be at least two cancer clouds after the pilot phase is complete, Komatsoulis said. As genomic research into other diseases progresses, it’s possible that information could be integrated into the cancer clouds as well, he said.
“Cancer research is on the bleeding edge of really large-scale data generation, he said. “So, as a practical matter, cancer researchers happen to be the first group to hit the point where we need to change the paradigm by which we do computational analysis on this data . . . But much of the data that I think we’re going to incorporate will be the same or similar as in other diseases.”
As scientists’ ability to sequence and understand genes improves, genome sequencing may one day become part of standard care for patients diagnosed with cancer, heart problems and other diseases with a genetic component, Komatsoulis said.
“As we learn more about the molecular basis of diseases, there’s every reason to believe that in the future if you present with cancer, the tumor will be sequenced and compared against known mutations and that will drive your physician’s treatment decisions,” he explained. “This is a very forward-looking model but, at some level, the purpose of things like The Cancer Genome Atlas is to develop a knowledge base so that kind of a future is possible.”