ASU researcher shifts big data computing into high gear
Every second, approximately 6,000 tweets are posted on Twitter. Every minute, 360,000 tweets. Every hour, 22 million tweets. Every day, more than 500 million tweets. That’s a significant amount of data — and it represents only one social media platform out of hundreds.
Social media offers an enormous volume of unstructured data that can generate knowledge and help make better decisions on a larger scale. While humans are clearly efficient data generators, computers are having a difficult time processing and analyzing the sheer volume of data.
Arizona State University Associate Professor Ming Zhao has taken the driver’s seat in developing the Energy Efficient Big Data Research System, called GEARS, a new computing infrastructure created by a consortium of interdisciplinary researchers who are turning the noise of social media data into useful data sources that can improve machine learning and detect security threats or important, but hidden, incidents like disease outbreaks or crimes in real time.
But GEARS’ functionality isn’t limited to social media. The team is ready to clutch the increasingly challenging and diverse big data applications present in today’s world full of sensors and the internet of things, which are generating data ranging from brain signals to activity in deep space.
“Scientific discoveries are driven by data, not just experimentation anymore,” says Zhao, a faculty member in the ASU Ira A. Fulton Schools of Engineering. “But how do we make use of that data?”
In order to support new discoveries through data, we need new, higher-performance systems. However, power consumption has become a limiting factor for big data systems, so improvements in energy efficiency are also important.
Zhao’s efforts to solve both performance and energy-efficiency challenges of big data technologies — in a project titled “GEARS – An Infrastructure for Energy-Efficient Big Data Research on Heterogeneous and Dynamic Data” — is funded by the National Science Foundation through a three-year, $750,000 grant.
More diverse hardware for more diverse big data tasks
The way Zhao aims to meet performance and efficiency goals is through heterogeneous computing, or a combination of multiple processor and storage types. Though it isn’t yet a common term, heterogeneous computing is fairly common among our everyday devices.
One example of heterogeneous computing is the iPhone X’s processor, which has four cores optimized for performance and two cores optimized for power efficiency. While these general-purpose cores can run a variety of apps and operating system duties, the processor also features a dual-core neural engine that is specialized for machine learning tasks and operates Face ID, a face-recognition application requiring significant computing horsepower to run quickly.
The neural engine’s circuitry is specially designed to handle the complex computing involved in recognizing a user’s face compared to a general-purpose processor core with a one-size-fits-most structure that’s good enough for simple applications.
Heterogeneous computing on the storage side can also be seen in many computers, which are likely to include both hard disk drives (HDDs) for inexpensive, high-capacity storage and solid state drives (SSDs) for storage that’s speedy to access.
GEARS is taking a similar approach but at a much larger scale. The system is expanding beyond having only general-purpose computer processors (central processing units) to incorporating accelerators (graphics processing units and field-programmable gate arrays, or FPGAs), and integrating a deep hierarchy of storage tiers (dynamic random-access memory, non-volatile memory, or NVM, HDDs and various SSD technologies) that each have their own advantages and disadvantages depending on a given application’s characteristics.
The inclusion of heterogeneous hardware such as FPGAs and NVMs allows GEARS to tackle tough big data problems, for example, problems that cannot be easily parallelized and that are sensitive to delays. In many cases, these less traditional hardware designs also consume less power, contributing to the energy-efficiency goals of GEARS.
Easy-to-use software for heavy-duty hardware
GEARS incorporates software components that help optimize the use of the various processor and accelerator types and storage resources.
“It’s easy to buy the heterogeneous hardware [components] and put them together, but it’s up to the software system to make good use of the devices,” says Zhao, who is director of the Research Laboratory for Virtualized Infrastructures, Systems and Applications that started the development of GEARS’s underlying technology.
While some researchers on the GEARS research team are focusing on developing the hardware and software infrastructure, others are developing new algorithms to make efficient use of the infrastructure and to make it user-friendly for other data scientists.
“Usability is important, so we want to make it really easy for users to develop applications for the heterogeneous hardware of GEARS,” Zhao says.
One way GEARS researchers are achieving this is by developing extensions to popular data analytics platforms such as Apache Spark. Data scientists can develop an application with Spark as they normally would, then apply the application to the high-performance, energy-efficient, optimized GEARS infrastructure.
Another example is extending widely used machine learning platforms such as TensorFlow, which will allow researchers to conveniently deploy their algorithms on GEARS and benefit from its heterogeneous computing power.
GEARS wants your big data challenges
Now that they have the system in place, the GEARS team is eager to take on diverse big data challenges beyond the realm of computer science.
“Essentially, anything that requires big data could potentially benefit from GEARS,” Zhao says.
So far, GEARS has helped with several interdisciplinary projects in collaboration with researchers at ASU, other universities and companies across the country, and even around the globe. These include projects related to neuroscience, sustainability, medicine, aerospace, botany and geography.
Assistant Professor Fengbo Ren, a co-principal investigator of the GEARS project is helping researchers from ASU’s School of Geographical Sciences and Urban Planning, for example, to develop a deep learning system with GEARS using an unstructured big data source of remote sensor data and photographs to automatically classify terrain features. Outside of the university, researchers from the Phoenix Children’s Hospital are working with another GEARS co-PI, Professor K. Selçuk Candan, to develop deep phenotyping for physiologic biomarkers of post-traumatic epilepsy in children.
Zhao says the team is happy to support anyone at ASU by hosting their big data applications on the GEARS hardware in the ASU Research Computing high-performance computing data center. For collaborators outside ASU, the team is happy to share the GEARS technology and open source software.
Zhao’s team will transfer this new technology beyond ASU, benefit a wider community of data scientists and engage with industry partners through ASU’s Center for Assured and SCAlable Data Engineering, an Industry-University Cooperative Research Center.
“People can learn from our technologies and our lessons and experiences we got from building GEARS to make their own version of GEARS,” Zhao says.
Meet the GEARS research team
Associate Professor Ming Zhao’s research team is building a system of hardware, software and applications research projects to develop a higher-performing and more energy-efficient big data analysis platform: GEARS. Each level builds upon the work of the previous level of research, beginning with the underlying hardware infrastructure built through the overall project’s National Science Foundation grant funding.
Hardware (data-intensive systems)
From the basic hardware infrastructure, Zhao and Assistant Professor Fengbo Ren are creating processing and storage systems that work together to deliver the performance big data challenges demand while still remaining energy efficient by using the strengths of each type of hardware at their disposal.
Software (data management and visualization)
To optimize the use of the hardware systems Ren and Zhao are developing, Professor K. Selçuk Candan and Associate Professor Ross Maciejewski are developing software and algorithms to manage data and for data scientists to visualize GEARS’ output.
Applications (data sciences)
Professor Huan Liu, Associate Professor Jingrui He, Associate Professor Hanghang Tong, Associate Professor Hasan Davulcu and Professor Baoxin Li are using the GEARS framework to solve big data challenges in understanding social media, networks of people and web-scale images and videos.