报告题目:Smarter Storage for Big Data Analytics: Architecture and Systems
报告人:华宇教授
报告时间:2016年5月15日 10:00
报告地点:信息学馆301学术报告厅
邀 请 人:关楠 教授
报告人简介:华宇,华中科技大学教授、博士生导师,IEEE和中国计算机学会的高级会员,计算机学会学术工委通讯委员、信息存储、高性能计算和体系结构专委委员,曾在美国University of Nebraska-Lincoln大学做博士后研究工作。主要研究内容包括海量存储系统中数据的语义管理方法、重复数据删除机制和近似存储系统体系结构等方面。主持和参加多项国家973、 863计划重大项目、国家自然科学基金和教育部创新团队等,其中已结题的973计划项目和教育部创新团队项目都被评价为“优秀”。在国际期刊TC、TPDS和国际会议USENIX FAST、USENIX ATC、INFOCOM、SC、HPDC、ICDCS、MSST、DATE上发表多篇学术论文,被引用超过550次。在INFOCOM、RTSS、ICDCS、ICNP、MSST、IWQoS、ICPP等30多个国际会议上担任组委会或程序委员会委员,担任国际期刊Frontiers of Computer Science (FCS)、Journal of Communications and Networks (JCN)的编辑,是湖北省优秀硕士学位论文指导教师,入选湖北省青年科技晨光计划,获得中国电子学会电子信息科学技术二等奖。
报告内容摘要:In the era of big data, the explosive growth in data volume and complexity requires highly efficient searchable data analytics. Existing cloud storage systems have largely failed to offer an adequate capability for real-time analytics for big data. In order to support cost-efficient and real-time services in data analytics, we propose a smarter and searchable data analytics methodology. The idea is to explore and exploit the semantic correlation within and among datasets via correlation-aware hashing and manageable flat-structured addressing to significantly reduce the processing latency, while incurring acceptably small loss of data-search accuracy. The near-real-time property enables rapid identification of correlated files and the significant narrowing of the scope of data to be processed. The proposed scheme supports several types of data analytics, which can be implemented in existing searchable storage systems.