1

Bio-IT World Review: BIG Data; BIG Promise; BIG CHALLENGES.

Earlier this week,  I had the privilege of attending the tenth annual “BIO-IT World Conference and Expo,” at which some 2500  information technology professionals participated in a 12-track program featuring more than 200 presentations on scientific and technologic developments.

From  keynote speakers Jill Mesirov, PhD, and Martin Leach, PhD,  respectively the Associate Director and Chief Information Officer  of the Harvard-MIT Broad Institute,  I learned that exponential increases  in computing power promise to bring personalized medicine –allowing highly individualized diagnosis and treatment –to doctors offices within ten years. I also learned how hard it is to keep track of the petabytes  ( a PBs is a unit of information equal to one quadrillion  bytes, or 1024 terabytes )  used to keep it all going.

Mesirov announced the upcoming launch of “Genome Space“–a new Web-based technology to help scientists make sense of and collaborate in using such data.

And in a talk entitled “BIG,”  Leach described the difficulty of defining “big data,” because the amount of available information is growing so rapidly.   He described an event held recently at the Broad to celebrate the Institute’s ability to store and analyze ten pedabytes of data –his glee soon tempered by  his recollection that in 1993, NIH’s Institute of Medicine was thrilled with its ability store 16 gigabytes–which anyone can now do on a cell phone.

Today,  Leach said, we are  seeing “increasing big data with a decreasing footprint.” [that is, smaller systems needed for gathering and retrieval].

Mentioning that he has an autistic son and would like to be able to figure out what causes the disorder, Leach  asked, “Why is there no Google search for data, no way to access thousands of data repositories?

“We need a new application ecosystem and a breed of data scientist who knows how and where to push this data, ” he said.  He predicted that there will soon be 50 thousand jobs in the  “big data” arena.

In the exhibit hall,  I was pleased to see that  see that Wingu, headquartered in the Cambridge Innovation Center, where I work, had been nominated for a best of show award for its pharmaceutical, contract research and academic collaboration software.

The winners, announced last night, were Recentris, Opscode, Clear Trial, and Cambridge Semantics. [More info at http://www.bio-itworld.com/2012/04/26/2012-best-of-show-winners.html]. Best Practice Grand Prizes went to big Pharma: Merck, Pfizer, and Merck KGaA (Germany)  went to and two genomics organizations, BGI Shenzhen and the University of Utah/Omicia. http://www.bio-itworld.com/2012/04/25/bio-it-world-announces-winners-2012-best-practices-awards.html.

BIO-IT World is sponsored by  Insight Pharma Reports, Samsung, and the Portland Group. It runs through April 27, 2012.

—Anita M. Harris

 




Broad Institute Launches Collaborative Genomics “Cloud” Tool for Scientists ,

In an effort to harness and allow sharing of exponentially-developing genetic data, the Broad Institute will launch “Genome Space,”–a co-operative Web based tool aimed at “frictionless” data transfer, later this week.

So said Jill Merisov, PhD, the Broad’s associate director and Chief Informatics officer,  in a keynote speech at the opening of the Bio-IT World conference  yesterday,  in Boston. The Broad is a Harvard-MIT research center located in Kendall Square, Cambridge.

In her talk, Merisov pointed out that just ten years ago,  scientists announced that they had identified all of the genes present in human beings.  Since then, researchers  have discovered 30 million genetic variations among 1000 different individuals, 3000 genetically-related disease traits, and a multitude of cancer types. Such findings are  now being used to determine the genetic bases of  many diseases, to develop treatments for those diseases, and to determine for which patients particular treatments are likely to be effective. In another ten years,  she said, such “personalized medicine” will be commonly used by doctors, in clinics.

These advances are due in large part to less expensive,  increasingly sophisticated and sensitive computer technologies that have led to an “explosion”  of data ,  to less “noisy”  data, and to new, international ways of  reviewing  the data, Merisov explained.  Scientists can now buy the technology and carry out sequencing in their own labs and “”computing is now integral to every aspect of biomedical research.”

But these developments also mean that there are now  seven-to-ten  thousand bioinformatics tools available for download on the Web and five thousand databases–many of  which are “out of reach”  for scientists who do not have sophisticated programming skills.

The new tool  “bridges the gaps between bioinformatics tools, making it possible for [scientists ] to move data smoothly between these tools, leveraging the available analyses and visualizations in each of these tools,” according to the Genome Space Web site.

Genome Space also allows for data storage in the Amazon cloud [a computing platform of Amazon.com]  and “provides necessary file format transformations whenever a scientists selects an analysis or visualization within one of the tools.

The GenomeSpace project is a collaboration of the Mesirov and Regev laboratories at the Broad Institute; the Chang laboratory at Stanford University; the Ideker laboratory at the University of California, San Diego; the Nekrutenko laboratory at Pennsylvania State University; the Segal laboratory at the Weizmann Institute of Science; and the Haussler and Kent laboratories at the University of California, Santa Cruz. GenomeSpace is funded by the National Human Genome Research Institute, with additional support from Amazon Web Services, according to the Genome Space Web site.

The Bio-IT Conference Expo 2012   goes through April 26.

–Anita M. Harris