Data Mining For Very Busy People

From Academic Kids

Data Mining For Very Busy People is an article by Menzies (West Virginia University) and Hu (University of British Columbia). It appeared in IEEE Computer, November 2003, pp. 22-29.

In it, the authors make the argument that accessing data is not the problem for the data mining community - the problem is ignoring the irrelevant data. The angle presented is to search large data sets to find the smallest subset of the most relevant data. According to the authors, the approach of "learning the least" as applied to some data set is a departure from the norm - they state that most data miners are typically concerned with discovering a data summary with fine-grained details.

The authors describe the TAR2 treatment learner, a data mining tool that searches for "the minimal difference set between things." It is claimed that TAR2 produces data models that are simpler to understand by humans, because the models are presented as a list of essential differences instead of a highly detailed summary.

The TAR2 treatment learner, which is available at, takes a large amount of data and creates a few simple rules instead of the complex tree produced by many data miners. It uses the three major concepts of treatment learning: lift, minimum best support, and the small treatment effect.

What the authors claim TAR2 has to offer over other treatment learners is the use of superior heuristics in finding data treatments. The TAR2 uses three key heuristic tricks. First, TAR2 chunks continuous attributes into separate bins of values. For example, instead of having a range of continuous values, the data would instead be separated into small values, medium values, and high values. Second, TAR2 assumes the small treatment effect and always deals with a small amount of attributes. Finally, the TAR2 looks only at treatments with high ranges. It assumes that people are only interested in seeing positive results.

To support their claims of being able to derive simplified and more representative models for large data sets, the authors present three case studies in the domains of software risk estimation, software inspection policies, and requirements engineering.

Menzies and Hu apply the TAR2 treatment learner to several data sets in these domains, and demonstrate improved results using their methods.

Data mining and treatment learning

The article also includes a sidebar in which Menzies and Hu also describe some of the common methods used in both data mining and treatment learning. These include decision tree learning, association rule learning, wrappers, genetic algorithms, and simulated annealing algorithms.


Academic Kids Menu

  • Art and Cultures
    • Art (
    • Architecture (
    • Cultures (
    • Music (
    • Musical Instruments (
  • Biographies (
  • Clipart (
  • Geography (
    • Countries of the World (
    • Maps (
    • Flags (
    • Continents (
  • History (
    • Ancient Civilizations (
    • Industrial Revolution (
    • Middle Ages (
    • Prehistory (
    • Renaissance (
    • Timelines (
    • United States (
    • Wars (
    • World History (
  • Human Body (
  • Mathematics (
  • Reference (
  • Science (
    • Animals (
    • Aviation (
    • Dinosaurs (
    • Earth (
    • Inventions (
    • Physical Science (
    • Plants (
    • Scientists (
  • Social Studies (
    • Anthropology (
    • Economics (
    • Government (
    • Religion (
    • Holidays (
  • Space and Astronomy
    • Solar System (
    • Planets (
  • Sports (
  • Timelines (
  • Weather (
  • US States (


  • Home Page (
  • Contact Us (

  • Clip Art (
Personal tools