Explore Your Data: Cases, Variables, Types of Variables

A data set contains informations about a sample. A Dataset consists of cases. A case is an experimental unit. These are the individuals from which data are collected. When data are collected from humans, we sometimes call them participants. When data are collected from animals, the term subjects is often used. Another synonym is experimental unit.  So, cases are nothing but the objects in the collection.

Each case has one or more attributes or qualities, called variables which are characteristics of cases. A variable is a characteristic that is measured and can take on different values. In other words, something that varies between cases.

This is in contrast to a constant which is the same for all cases in a research study.

Cases

An experimental unit from which data are collected

Variable

Characteristic of cases that can take on different values (in other words, something that can vary)

Constant

Characteristic that is the same for all cases in a study

Example:

Suppose you are collecting information about breast cancer patients. Now for each and every cancer patient you want to know the below information

  • Sample code number: id number
  • Clump Thickness: 1 – 10
  • Uniformity of Cell Size: 1 – 10
  • Uniformity of Cell Shape: 1 – 10
  • Marginal Adhesion: 1 – 10
  • Single Epithelial Cell Size: 1 – 10
  • Bare Nuclei: 1 – 10
  • Bland Chromatin: 1 – 10
  • Normal Nucleoli: 1 – 10
  • Mitoses: 1 – 10
  • Class: (2 for benign, 4 for malignant)

These features were taken from UCI Breast Cancer Dataset. You can find it here

https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original)

For this example, breast cancer patients themselves are cases and all these characteristics of the patients are variables.

In a study, cases can be many different things.  They can be individual patients and group of patients. But they can also be, for instance, companies, schools or countries etc.

we can have many, different kinds of variables, representing different characteristics. Because of this reason there are various level of measurements or different types of variables.

A variable is a characteristic that can be measured and that can assume different values. Height, age, income, province or country of birth, grades obtained at school and type of housing are all examples of variables. Variables may be classified into two main categories: categorical and numeric.

Each category is then classified in two subcategories: nominal or ordinal for categorical variables, discrete or continuous for numeric variables. These types are briefly outlined in this section.

Categorical Variables:

A categorical variable (also called qualitative variable) refers to a characteristic that can’t be quantifiable. Categorical variables can be either nominal or ordinal. Both nominal and ordinal variables can be called categorical variables.

Nominal Variable:

A nominal variable is made up of various categories which has no order.  A nominal variable is one that describes a name, label or category without natural order. Sex and type of dwelling are examples of nominal variables.

Example:

Gender of a patient may be Male or Female or State where they live in. Here each category differs from each other but there is no ranking order. Similarly, in the below example, the variable “mode of transportation for travel to work” is also nominal.

Ordinal Variable:

The second level of measurement is the ordinal level. An ordinal variable is a variable whose values are defined by an order relation between the different categories. There is not only a difference between the categories of a variable; there is also an order.  An example might be Highest paid, Average Paid and Lowest Paid employee.

In below example, the variable “behaviour” is ordinal because the category “Excellent” is better than the category “Very good,” which is better than the category “Good,” etc. There is some natural ordering, but it is limited since we do not know by how much “Excellent” behaviour is better than “Very good” behaviour.

Quantitative/ Numerical Variables:

A numeric variable (also called quantitative variable) is a quantifiable characteristic whose values are numbers (except numbers which are codes standing up for categories). Numeric variables may be either continuous or discrete.

Continuous Variable:

A variable is continuous if the possible values of the variable form an interval. An example is, again, the height of a patient. Someone can be 172 centimeters tall and 174 centimeters tall.  But also, for instance, 170.2461. We don’t have a set of separate numbers, but an infinite region of values.

Discrete Variable:

A variable is discrete if its possible categories form a set of separate numbers.

For the above breast cancer data Uniformity of Cell Size: 1 – 10 is an example of discrete variable.

Basic Statistics Roadmap for Data Analysis

Explore your Data: Matrix and Frequency Table