Information Extraction

We work on extracting entities and relations between them from encyclopedias, semi-structured data or free text data. There are plenty of problems on this topic, and also some great new tools can be used such as CNN and Crowdsourcing.

Members: 胡猛,沈永新,何莹,李茂龙

  • Extracting structured data from free text record
  • Detecting semantic drift and cleaning extraction errors
Knowledge Graph

We are working on building a large Chinese-English knowledge graph from encyclopedias, and copies of html pages in English or Chinese during the past several decades. We also want to detect events with spatial-temporal information to construct a spatial-temporal event graph.

Members: 李茂龙,何莹,郝茂祥,钱大伟,丁鹏飞

  • Constructing Chinese Knowledge Graph
  • Constructing Spatial-Temporal Even Graph
Data Cleaning

We work on improving data quality in every aspect, including data integration with multiple data sources, schema mapping, record matching, entity resolution, imputing missing data, detecting and correcting erroneous data, and data provenance.

Members: 杨强,顾斌斌,单双利,王怡婷

  • Data Integration
  • Data Imputation
  • Data Cleaning
Auto Answer Machine

We work on building an auto answer machine in specific domains such as library. Plenty of challenges and opportunities in this direction. We would like to built a good one based on a domain knowledge graph.

Members: 李茂龙

  • A FAQ auto-answer machine for libraries
Recommendation System

We work on building a recommendation system of specific domains, which not only recommends people with something very relevant to their previous activities, but also those things that we believe people would be interested at. Also, we do recommendation across multiple domains.

Members: 杨佳莉

  • Efficient recommendation with consolidated information