Qun Chen 陈群
Professor
Email: chenbenben@nwpu.edu.cn
Office: Room 316, School of Computer Science
Address: Northwestern Polytechnical University
Xi'an Shaanxi, China

Introduction

  • My research interests are closely related to big data and artificial intelligence, specifically Risk Analysis for AI and Gradual Machine Learning.
  • Many of my pioneering research works have been published in top conferences and journals such as SIGMOD, VLDB, ICDE, WWW, TKDE, CIKM, ICDM and ICDCS.

Bio

  • Sep 1993 - Jul 1998. Undergraduate, Management Information System, Tsinghua University.
  • Aug 1999 - Oct 2003. Ph.D, Computer Science, National University of Singapore.
  • Nov 2003 - Feb 2007. Postdoc, Hong Kong University of Science and Technology.
  • Mar 2007 - Now. Professor and doctoral supervisor, Northwestern Polytechnical University.

Research Interests

Due to the uncertainty and un-interpretability of DNN, I focus on risk analysis for deep AI models, i.e. analyzing and evaluating the risk that an AI model mislabels a target instance in a classification problem. Risk analysis is by itself an important and interesting research problem. Moreover, it can have a profound impact on the design and implementation of core machine learning operations, e.g. active selection of training instances, model training and model selection.

Even though deep learning has achieved tremendous success, its efficacy usually relies on a large number of accurately labeled training data. Unfortunately, high-quality labeled data may not be readily available in real AI applications.

I have proposed a new non-i.i.d paradigm of machine learning, namely Gradual machine learning (GML). Given a classification task, GML begins with the easy instances, which can usually be automatically labeled by the machine with high accuracy, and then gradually labels more challenging instances based on evidential certainty by iterative factor inference. Compared with traditional i.i.d learning (e.g. deep learning), GML is more interpretable and requires less or even no manually labeled data.

Recent Publications

Towards Interpretable and Learn able Risk Analysis for Entity Resolution. International Conference on Management of Data (SIGMOD), 2020.
Zhaoqiang Chen, Qun Chen, Boyi Hou, Tianyi Duan, Zhanhuai Li and Guoliang Li
[Abstract]  [Bibtex]  [PDF]

Machine-learning-based entity resolution has been widely studied. However, some entity pairs may be mislabeled by machine learning models and existing studies do not study the risk analysis problem-predicting and interpreting which entity pairs are mislabeled. In this paper, we propose an in terpretable and learnable framework for risk analysis, which aims to rank the labeled pairs based on their risks of being mislabeled. We first describe how to automatically generate interpretable risk features, and then present a learnable risk model and its training technique. Finally, we empirically eval uate the performance of the proposed approach on real data. Our extensive experiments have shown that the learning risk model can identify the mislabeled pairs with considerably higher accuracy than the existing alternatives.

@article{chen2019towards,
title={Towards Interpretable and Learnable Risk Analysis for Entity Resolution},
author={Chen, Zhaoqiang and Chen, Qun and Hou, Boyi and Duan, Tianyi and Li, Zhanhuai and Li, Guoliang},
j ournal={arXiv preprint arXiv:1912.02947},
year={2019}
}

Gradual Machine Learning for Entity Resolution. WWW 2019.
Boyi Hou, Qun Chen, Jiquan Shen, Xin Liu, Ping Zhong, Yanyan Wang, Zhaoqiang Chen,Zhanhuai Li
[Abstract]  [Bibtex]  [PDF]  [Homepage]

Usually considered as a classification problem, entity resolution can be very challenging on real data due to the prevalence of dirty values. The state-of-the-art solutions for ER were built on a variety of learning models (most notably deep neural networks), which require lots of accurately labeled training data. Unfortunately, high quality labeled data usually require expensive manual work, and are therefore not readily available in many real scenarios. In this demo, we propose a novel learning paradigm for ER, called gradual machine learning, which aims to enable effective machine label ing without the requirement for manual labeling effort. It begins with some easy instances in a task, which can be automatically labeled by the machine with high accuracy, and then gradually labels more challenging instances based on iterative factor graph inference. In gradual machine learning, the hard instances in a task are gradually labeled in small stages based on the estimated evidential certainty provided by the labeled easier instances. Our extensive experiments on real data have shown that the proposed approach performs considerably better than its unsupervised alter natives, and its performance is also highly competitive compared to the state-of-the-art supervised techniques. Using ER as a test case, we demonstrate that gradual machine learning is a promising paradigm potentially applicable to other challenging classification tasks requiring extensive labeling effort.

@inproceedings{hou2019gradual,
title={Gradual machine learning for entity resolution},
author={Hou, Boyi and Chen, Qun and Shen, Jiquan and Liu, Xin and Zhong, Ping and Wang, Yanyan and Chen, Zhaoqiang and Li, Zhanhuai},
booktitle={The World Wide Web Conference},
pages={3526--3530},
year={2019},
organization={ACM}
}

Joint Inference for Aspect-Level Sentiment Analysis by Deep Neural Networks and Linguistic Hints. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2019.
Yanyan Wang, Qun Chen, Murtadha Ahmed, Zhanhuai Li, Wei Pan, and Hailong Liu
[Abstract]  [Bibtex]  [PDF]  [Homepage]

The state-of-the-art techniques for aspect-level sentiment analysis focused on feature modeling using a variety of deep neural networks (DNN). Unfortunately, their performance may still fall short of expectation in real scenarios due to the semantic complexity of natural languages. Motivated by the observation that many linguistic hints (e.g., sentiment words and shift words) are reliable polarity indicators, we propose a joint framework, SenHint, which can seamlessly integrate the output of deep neural networks and the implications of linguistic hints in a unified model based on Markov logic network (MLN). SenHint leverages the linguistic hints for multiple purposes: (1) to identify the easy instances, whose polarities can be automatically determined by the machine with high accuracy; (2) to capture the influence of sentiment words on aspect polarities; (3) to capture the implicit relations between aspect polarities. We present the required techniques for extracting linguistic hints, encoding their implications as well as the output of DNN into the unified model, and joint inference. Finally, we have empirically evaluated the performance of SenHint on both English and Chinese benchmark datasets. Our extensive experiments have shown that compared to the state-of-the-art DNN techniques, SenHint can effectively improve polarity detection accuracy by considerable margins.

@article{wang2019joint,
title={Joint Inference for Aspect-level Sentiment Analysis by Deep Neural Networks and Linguistic Hints},
author={Wang, Yanyan and Chen, Qun and Ahmed, Murtadha and Li, Zhanhua and Pan, Wei and Liu, Hailong},
journal={IEEE Transactions on Knowledge and Data Engineering},
year={2019},
publisher={IEEE}

}

Improving Machine-based Entity Resolution with Limited Human Effort: A Risk Perspective. International Workshop on Real-Time Business Intelligence and Analytics, 2018.
Zhaoqiang Chen, Qun Chen, Boyi Hou, Murtadha Ahmed, Zhanhuai Li
[Abstract]  [Bibtex]  [PDF]  [Technical report]

Pure machine-based solutions usually struggle in the challenging classification tasks such as entity resolution (ER). To alleviate this problem, a recent trend is to involve the human in the resolution process, most notably the crowdsourcing approach. However, it remains very challenging to effectively improve machine-based entity resolution with limited human effort. In this paper, we investigate the problem of human and machine cooperation for ER from a risk perspective. We propose to select the machine-labeled instances at high risk of being mislabeled for manual verification. For this task, we present a risk model that takes into consideration the human-labeled instances as well as the output of machine resolution. Finally, we evaluate the performance of the proposed risk model on real data. Our experiments demonstrate that it can pick up the mislabeled instances with considerably higher accuracy than the existing alternatives. Provided with the same amount of human cost budget, it can also achieve better resolution quality than the state-of-the-art approach based on active learning.

@inproceedings{chen2018risker,
title={Improving Machine-based Entity Resolution with Limited Human Effort: A Risk Perspective},
author={Chen, Zhaoqiang and Chen, Qun and Hou, Boyi and Ahmed, Murtadha and Li, Zhanhuai},
booktitle={Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics},
series={BIRTE'18},
numpages={5},
year={2018},
doi={10.1145/3242153.3242156},
publisher={ACM},
}

r-HUMO: A Risk-aware Human-Machine Cooperation Framework for Entity Resolution with Quality Guarantees. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2018.
Boyi Hou, Qun Chen, Zhaoqiang Chen, Youcef Nafa, Zhanhuai Li
[Abstract]  [Bibtex]  [PDF]  [Technical report]

Even though many approaches have been proposed for entity resolution (ER), it remains very challenging to enforce quality guarantees. To this end, we propose a risk-aware HUman-Machine cOoperation framework for ER, denoted by r-HUMO. Built on the existing HUMO framework, r-HUMO similarly enforces both precision and recall guarantees by partitioning an ER workload between the human and the machine. However, r-HUMO is the first solution that optimizes the process of human workload selection from a risk perspective. It iteratively selects human workload by real-time risk analysis based on the human-labeled results as well as the pre-specified machine metric. In this paper, we first introduce the r-HUMO framework and then present the risk model to prioritize the instances for manual inspection. Finally, we empirically evaluate r-HUMO's performance on real data. Our extensive experiments show that r-HUMO is effective in enforcing quality guarantees, and compared with the state-of-the-art alternatives, it can achieve desired quality control with reduced human cost.

@article{hou2018rhumo,
title={r-HUMO: A Risk-aware Human-Machine Cooperation Framework for Entity Resolution with Quality Guarantees},
author={Hou, Boyi and Chen, Qun and Chen, Zhaoqiang and Nafa, Youcef and Li, Zhanhuai},
booktitle={IEEE Transactions on Knowledge and Data Engineering (TKDE)},
year={2018},
doi={10.1109/TKDE.2018.2883532},
publisher={IEEE},
}

SenHint: A Joint Framework for Aspect-level Sentiment Analysis by Deep Neural Networks and Linguistic Hints. WWW 2018.
Yanyan Wang, Qun Chen, Xin Liu, Murtadha Ahmed, Zhanhuai Li, Wei Pan, Hailong Liu
[Abstract]  [Bibtex]  [PDF]  [Homepage]

The state-of-the-art techniques for aspect-level sentiment analysis focus on feature modeling using a variety of deep neural networks (DNN). Unfortunately, their practical performance may fall short of expectations due to semantic complexity of natural languages. Motivated by the observation that linguistic hints (e.g. explicit sentiment words and shift words) can be strong indicators of sentiment, we present a joint framework, SenHint, which integrates the output of deep neural networks and the implication of linguistic hints into a coherent reasoning model based on Markov Logic Network (MLN). In SenHint, linguistic hints are used in two ways: (1) to identify easy instances, whose sentiment can be automatically determined by machine with high accuracy; (2) to capture implicit relations between aspect polarities. We also empirically evaluate the performance of SenHint on both English and Chinese benchmark datasets. Our experimental results show that SenHint can effectively improve accuracy compared with the state-of-the-art alternatives.

@inproceedings{DBLP:conf/www/WangCLALPL18,
author={Wang, Yanyan and Chen, Qun and Liu, Xin and Ahmed, Murtadha and Li, Zhanhuai and Pan, Wei and Liu, Hailong},
title = {SenHint: {A} Joint Framework for Aspect-level Sentiment Analysis by
Deep Neural Networks and Linguistic Hints},
booktitle = {Companion of the The Web Conference 2018 on The Web Conference 2018,
{WWW} 2018, Lyon , France, April 23-27, 2018},
pages = {207--210},
year = {2018},
crossref = {DBLP:conf/www/2018c},
url = {http://doi.acm.org/10.1145/3184558.3186980},
doi = {10.1145/3184558.3186980},
timestamp = {Tue, 24 Apr 2018 14:09:22 +0200},
biburl = {https://dblp.org/rec/bib/conf/www/WangCLALPL18},
bibsource = {dblp computer science bibliography, https://dblp.org}
}

Enabling Quality Control for Entity Resolution: A Human and Machine Cooperation Framework. ICDE 2018.
Zhaoqiang Chen, Qun Chen, Fengfeng Fan, Yanyan Wang, Zhuo Wang, Youcef Nafa, Zhanhuai Li, Hailong Liu, Wei Pan
[Abstract]  [Bibtex]  [PDF]  [Slides]

Even though many machine algorithms have been proposed for entity resolution, it remains very challenging to find a solution with quality guarantees. In this paper, we propose a novel Human and Machine cOoperation (HUMO) framework for entity resolution (ER), which divides an ER workload between the machine and the human. HUMO enables a mechanism for quality control that can flexibly enforce both precision and recall levels. We introduce the optimization problem of HUMO, minimizing human cost given a quality requirement, and then present three optimization approaches: a conservative baseline one purely based on the monotonicity assumption of precision, a more aggressive one based on sampling and a hybrid one that can take advantage of the strengths of both previous approaches. Finally, we demonstrate by extensive experiments on real and synthetic datasets that HUMO can achieve high-quality results with reasonable return on investment (ROI) in terms of human cost, and it performs considerably better than the state-of-the-art alternatives in quality control.

@INPROCEEDINGS{chen2018humo,
author={Z. Chen and Q. Chen and F. Fan and Y. Wang and Z. Wang and Y. Nafa and Z. Li and H. Liu and W. Pan},
booktitle={2018 IEEE 34th International Conference on Data Engineering (ICDE)},
title={Enabling Quality Control for Entity Resolution: A Human and Machine Cooperation Framework},
year={2018},
pages={1156-1167},
doi={10.1109/ICDE.2018.00107},
month={April},
}

A Human-and-Machine Cooperative Framework for Entity Resolution with Quality Guarantees. ICDE 2017.
Zhaoqiang Chen, Qun Chen, Zhanhuai Li
[Abstract]  [Bibtex]  [PDF]  [Homepage]

For entity resolution, it remains very challenging to find the solution with quality guarantees as measured by both precision and recall. In this demo, we propose a HUman-andMachine cOoperative framework, denoted by HUMO, for entity resolution. Compared with the existing approaches, HUMO enables a flexible mechanism for quality control that can enforce both precision and recall levels. We also introduce the problem of minimizing human cost given a quality requirement and present corresponding optimization techniques. Finally, we demo that HUMO achieves high-quality results with reasonable return on investment (ROI) in terms of human cost on real datasets.

@inproceedings{DBLP:conf/icde/ChenCL17,
author = {Zhaoqiang, Chen and Qun, Chen and Zhanhuai, Li},
title = {A Human-and-Machine Cooperative Framework for Entity Resolution with
Quality Guarantees},
booktitle = {33rd {IEEE} International Conference on Data Engineering, {ICDE} 2017,
San Diego, CA, USA, April 19-22, 2017},
pages = {1405--1406},
year = {2017},
crossref = {DBLP:conf/icde/2017},
url = {https://doi.org/10.1109/ICDE.2017.197},
doi = {10.1109/ICDE.2017.197},
timestamp = {Wed, 24 May 2017 11:31:57 +0200},
biburl = {https://dblp.org/rec/bib/conf/icde/ChenCL17},
bibsource = {dblp computer science bibliography, https://dblp.org}
}