All posts in “Data”

Your smartphone could help power future cancer cures

In the field of potentially life-saving cancer research, data is more than just a buzzy term deployed by marketers — it’s a fundamental part of the search for answers.

Computing power, says Dr Warren Kaplan, the Chief of Informatics at the Garvan Institute of Medical Research, is quickly emerging as a precious resource in the quest to solve cancer and other complex diseases.

DreamLab, a mobile app and initiative dreamed up by The Vodafone Foundation Australia, is just one example of how data can make a difference. Instead of fundraising in the most literal sense, the app collects a different type of donation: your data.

Below are a few eye-opening facts about data’s role in cancer research and how DreamLab is making an impact.

The amount of data associated with cancer research is staggering

To paint a picture of the sheer amount of data we’re talking about, when it comes to cancer research such as the work being done at the Garvan Institute, it helps to think in terms with which we’re familiar. For example, according to Kaplan, sequencing one person’s genome — the three billion base pairs (or DNA letters) that act as the instruction manual for our body — requires roughly 500 gigabytes of data. This is equivalent to about half a million minutes of streaming music.

Sequencing one person’s genome requires roughly 500 gigabytes of data.

If you multiply this number by many thousands — the number of individuals whose genomes must be analysed to gain meaningful insights into cancer — that’s the amount of data processing power it takes to begin making a dent.

“Increasingly, we researchers are depending on supercomputers to crunch immense amounts of data in order to learn more about cancer and other serious illnesses,” says Kaplan. “A choke point in this research has been the sheer quantities of computing power required. The more computing power that’s available, the faster genomes can be analysed and potential new treatments discovered.”

Donate data simply by charging your device  

Millions of us today are walking around with tiny, powerful computers inside our pockets. Now, we can put those devices to use for the greater good.

Supporting the research being conducted by Kaplan and his colleagues is as simple as downloading DreamLab and performing an action you already do dozens of times every week — plugging in your device.

DreamLab is simple to use: You download it, choose a cancer research project you’d like to support and then select how much data to donate. (The mobile data to use the app itself is free if you’re a customer of Vodafone Australia). Then, whenever you charge your phone, the app downloads small bits of information from the cloud about specific types of cancer.

Kaplan elaborates about the app’s process: “Using your phone’s computer processor, the app then compares these genetic profiles to identify their similarities and differences between different cancers and sends the answer back to our team at the Garvan Institute.”

“DreamLab provides dedicated, free access to what is essentially a smartphone supercomputer,” says Kaplan. “By harnessing this power, complex data can be crunched faster and research completed sooner — speeding up the chance of making discoveries to improve and save lives.”  

Download the DreamLab app now on iOS from the App Store or on Android from Google Play to help fight cancer.

Disclaimer: Downloading DreamLab uses data. DreamLab can be used when your device is charging and has mobile network or WiFi connectivity. Mobile data to use DreamLab is free for Vodafone Australia customers on the Vodafone Australia network. Roaming incurs international rates. 

Runners in the Shanghai marathon are getting a gorgeous 3D data souvenir

Runners in the past weekend’s Shanghai marathon are taking home a unique personalised souvenir of their run.

The 30,000 runners have been invited to plug in their run data from the event into a platform that will produce a 3D visualisation of how they performed. 

The colourful chart can be panned in 360 degrees on a mobile phone with WebGL, and will reflect how fast each runner went over different stretches of the race.

The platform is the brainchild of ad agency Wieden+Kennedy Shanghai, who produced it for BMW. It will accept data from popular Chinese fitness apps CoDoon, JoyRun, and Rejoice. Here’s an example of what runners will get on their phones.

If you pan the graphic up, some additional stats will appear:

Image: bmw

BMW’s website says participants can expect their personalised charts next week, on Nov. 25.

Wieden+Kennedy just released this video explaining the project:

[embedded content]

This sure beats receiving a regular medal for participation.

Https%3a%2f%2fblueprint api production.s3.amazonaws.com%2fuploads%2fvideo uploaders%2fdistribution thumb%2fimage%2f82465%2f793f4bd3 d44d 49c9 ac83 c07b6c6cb5ef

Primer helps governments and corporations monitor and understand the world’s information


When Google was founded in 1998, its goal was to organize the world’s information. And for the most part, mission accomplished — but in 19 years the goalpost has moved forward and indexing and usefully presenting information isn’t enough. As machine learning matures, it’s becoming feasible for the first time to actually summarize and contextualize the world’s information. With two and a half years of R&D under its belt, Primer aims to do just that for governments, corporations and financial institutions.

Primer has collected a total of $14.7 million across Seed and Series A investment rounds led by Data Collective. Lux Capital, Amplify Partners and, In-Q-Tel, an investment firm supporting the CIA, have also provided capital to Primer. The team of 36 employees has been able to close initial customers at In-Q-Tel, Walmart and Singapore’s sovereign wealth fund.

Using a mixture of supervised and unsupervised machine learning models, Primer can ingest unstructured data and produce insights — think scouring the web for news related to a specific company and then organizing it into key themes. The general concept gives us flashbacks to the early days of Palantir. But Sean Gourley, founder and CEO of Primer, is quick to point out the difference between the two ambitious companies.

“If you want to charge high amounts of money, you can solve valuable problems infrequently or you can solve problems everyone has to deal with on a daily basis and it feels like infrastructure,” Gourley asserted.

He’s of course pointing out the notion that much of Palantir’s business model has focused on higher cost consulting services to help corporations and governments meet incredibly ambitious goals. Much hullabaloo has been made over rumors that Palantir’s software played a role in tracking down bin Laden. But Primer is promising considerably less in hopes of an even higher payoff.

Primer’s software is being positioned to augment low-level analyst work that manifests itself most commonly in the intelligence community and at big banks. By monitoring large quantities of information semiautonomously, Primer can potentially speed up the process by which research is gathered and presented.

Because Primer is being targeted at the intelligence community, Gourley wasn’t able to go into specifics about its capabilities in a military context. However, I was able to watch a demo of Primer Science, a version of Primer adapted to help academics monitor the regular release of new papers.

The version I saw was able to identify key machine learning papers published on ArXiv and contextualize them alongside social media postings and news reports. The platform made it easy to identify work completed by key research groups and quickly locate papers focusing on specific sub-topics like machine translation.

Collecting information in a single place is really the only way to consider events from every angle. Unfortunately, piles of information can quickly overwhelm even the best human analysts and critical details can go unnoticed.

Gourley gave me the example of news coverage to make a similar point. Press in every country cover events differently and its advantageous for someone tasked with gathering intelligence to look for disparities. The graphic below (and an interactive visualization here) shows contrasts in the coverage of terror events by U.S. and Russian media.

Alastair Dant, Primer

For less high-stakes corporate and financial users, Primer can classify common events like regulatory changes, product launches and M&A transactions. More long tail concerns are still flagged as interesting and human analysts can provide feedback, further training models to provide better insights the next time a similar event happens.

Training models can be a challenge for companies that operate in industries where data is very tightly controlled. Gourley explained to me that most customers are ok with using at least some of their data to improve models but that things can get tricky when it comes to feedback into the significance of findings.

The hope is that Primer can some day assist in prediction making by looking for statistical correlations between events. Such an effort would undoubtedly require strong synergies between human analysts and Primer’s technology.

Featured Image: John Mannes, Philippe Intraligi, RedlineVector/Getty Images

Nearly 200 million voters exposed in GOP data leak, proving all political parties are susceptible to being hacked

Image: Shutterstock / Barbara Kalbfleisch

Registered U.S. voters dating back more than a decade have been exposed in what’s believed to be the largest leak of voter information in history.

A data analytics contractor hired by the Republican National Committee (RNC) left databases containing information about 198 million potential voters open to the public for download without a password, according to a ZDNet report.

The leak helps prove that any political party is susceptible to cybersecurity vulnerabilities, despite the GOP’s insistence that it ran a more secure 2016 presidential campaign than the rival Democratic National Committee (DNC).

The exposed databases belonged to the contractor Deep Root Analytics and contained about 25 terabytes on an Amazon S3 storage server that could be viewed without requiring a user to be logged in. In theory, this means that anyone knowing where to look could have viewed, downloaded, and have potentially used the information for malicious purposes.

The RNC worked closely with Deep Root Analytics during the 2016 election and paid the company $983,000 between January 2015 and November 2016, according to an AdAge report.

The RNC’s remarkably bad security was first discovered by researcher Chris Vickery of the security firm UpGuard. The security firm responsibly disclosed the vulnerability to the RNC, and the server was secured last week prior to making the news public today.

This vast exposure of voter information highlights the growing risk of data-driven campaigning used by both the DNC and RNC. The data in this case contained models about voters positions on different issues, including how likely it is that they voted for Obama in 2012 and whether they were likely to agree with Trump’s “America First” foreign policy talking point. 

The leak has essentially exposed more than half of the U.S. population, trouncing the second-largest leak of voter information, the 2016 exposure of 93.4 million Mexican voters.

Perhaps the worst part about all of this is there’s very little voters can do to ensure their information is stored privately and securely. Mashable has reached out to the RNC and Deep Root Analytics for comment, and will update when we hear back.

Https%3a%2f%2fblueprint api production.s3.amazonaws.com%2fuploads%2fvideo uploaders%2fdistribution thumb%2fimage%2f1012%2fa0fcbd3e 259f 4127 bea8 329b9f0afb7c

IBM turns to artificial intelligence to solve poverty, hunger, and illiteracy

IBM is channeling its science and tech expertise into tackling some of the world’s biggest problems.

On Wednesday, the tech giant announced the launch of Science for Social Good, a new program that partners IBM researchers with postdoctoral academic fellows and nonprofits to take on societal issues through data.

With the new initiative, IBM announced 12 projects planned for 2017. Each Science for Social Good project aligns with one or more of the 17 Sustainable Development Goals, the United Nations’ blueprint to address some of the globe’s biggest inequalities and threats by the year 2030.

Science for Social Good covers issues like improving emergency aid and combating the opioid crisis, and the projects all use data science, analytics, and artificial intelligence to develop solutions.  

“The projects chosen for this year’s Social Good program cover predicting new diseases, alleviating illiteracy and hunger, and helping people out of poverty.”

One project is called Emergency Food Best Practice: The Digital Experience, which plans to compile emergency food distribution best practices and share it with nonprofits through an interactive digital tool. IBM will partner with nonprofit St. John’s Bread & Life to develop the tool based on the nonprofit’s distribution model, which helps the organization seamlessly serve more than 2,500 meals each day in New York City.

Another project is called Overcoming Illiteracy, which will use AI to allow low-literate adults to “navigate the information-dense world with confidence.” The project hopes to decode complex texts (such as product descriptions and manuals),  extract the basic message, and present it to users through visuals and simple spoken messages. While this project doesn’t solve the global literacy crisis, it will allow low-literate adults engage with text independently. 

“The projects chosen for this year’s Social Good program cover an important range of topics — including predicting new diseases, promoting innovation, alleviating illiteracy and hunger, and helping people out of poverty,” Arvind Krishna, director of IBM Research, said in a statement. “What unifies them all is that, at the core, they necessitate major advances in science and technology. Armed with the expertise of our partners and drawing on a wealth of new data, tools and experiences, Science for Social Good can offer new solutions to the problems our society is facing.”

IBM hopes the initiative will build off the success of the company’s noted supercomputer, Watson, which has helped address health care, education, and environmental challenges since its development. 

Six pilot projects were conducted in 2016 in order to develop the Science for Social Good initiative. These projects covered a broad range of topics, such as health care, humanitarian relief, and global innovation. 

A particularly successful project used machine learning techniques to better understand the spread of the Zika virus. Using complex data, the team developed a predictive model that identified which primate species should be targeted for Zika virus surveillance and management. The results of the project are now leading new testing in the field to help prevent the spread of the disease.

To learn more about current and past projects, visit the Science for Social Good website.

Https%3a%2f%2fblueprint api production.s3.amazonaws.com%2fuploads%2fvideo uploaders%2fdistribution thumb%2fimage%2f595%2fb49e8662 aeff 434e 85c4 3043b7c4cde9