Selecting Data for Labeling under Cost/Resource Constraint(2/3)

Project Details

Description

With the progress of technology along with the tide of big data, the importance of "information" has increasingly been valued by people. Therefore, many scholars began to dive into the field of data mining, looking forward to finding the value behind the data and coming up with innovative usages. Such as, but not limited to, using classifiers to discriminate the categories of data and so on. Generally, to build good classifiers we must consider two factors in selecting data for labeling. Firstly, the data selected for labeling must be more representative to the original data, because this will make the trained classifier achieve a higher accuracy. Secondly, when training data have no label, we must ask the help of experts to label the training data. Since each data has a different condition, experts may spend different cost/resources to label the data. Accordingly, a problem arising immediately is: how we can select a set of data to label from training data under the given cost/resource constraint so that the classifier built from the selected data may have best accuracy. In this project, we make three different assumptions about the cost/resource constraint, which lead to three different problems. Since this is a three year project, our aim is to solve one problem at a time in every year of the project. These three problems are given below.1.Assume that the cost for labeling each data is the same. Then the problem is how we can select K data to label from training data so that the classifier built from the selected data may have best accuracy.2.Assume that the cost for labeling each data is different. Specifically, let ci be the labeling cost for data i, and C be the total cost constraint. Then the problem is how we can select a set of data to label from training data under the total cost constraint C so that the classifier built from the selected data may have best accuracy.3.Assume that labeling a data needs using different amounts of different resources. Further assume that each type of resource has a limit on it. Then the problem is how we can select a set of data to label from training data under the multiple resource constraint so that the classifier built from the selected data may have best accuracy.
StatusFinished
Effective start/end date1/08/1931/07/20

UN Sustainable Development Goals

In 2015, UN member states agreed to 17 global Sustainable Development Goals (SDGs) to end poverty, protect the planet and ensure prosperity for all. This project contributes towards the following SDG(s):

  • SDG 4 - Quality Education
  • SDG 11 - Sustainable Cities and Communities
  • SDG 17 - Partnerships for the Goals

Keywords

  • Classification
  • Unsupervised instance selection
  • Cost
  • Resource
  • Classifiers

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.