Explore Your Data: Cases, Variables, Types of Variables

A data set contains informations about a sample. A Dataset consists of cases. Cases are nothing but the objects in the collection. Each case has one or more attributes or qualities, called variables which are characteristics of cases.

Example:

Suppose you are collecting information about breast cancer patients. Now for each and every cancer patient you want to know the below information

  1. Sample code number: id number
  2. Clump Thickness: 1 – 10
  3. Uniformity of Cell Size: 1 – 10
  4. Uniformity of Cell Shape: 1 – 10
  5. Marginal Adhesion: 1 – 10
  6. Single Epithelial Cell Size: 1 – 10
  7. Bare Nuclei: 1 – 10
  8. Bland Chromatin: 1 – 10
  9. Normal Nucleoli: 1 – 10
  10. Mitoses: 1 – 10
  11. Class: (2 for benign, 4 for malignant)

These features were taken from UCI Breast Cancer Dataset. You can find it here

https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original)

For this example, breast cancer patients themselves are cases and all these characteristics of the patients are variables.

In a study, cases can be many different things.  They can be individual patients and group of patients. But they can also be, for instance, companies, schools or countries etc.

we can have many, different kinds of variables, representing different characteristics. Because of this reason there are various level of measurements or different types of variables.  

 

Categorical Variables:

Both nominal and ordinal variables can be called categorical variables.

1. Nominal Variable:

A nominal variable is made up of various categories which has no order.

Example:

Gender of a patient may be Male or Female or State where they live in. Here each category differs from each other but there is no ranking order.

2. Ordinal Variable:

The second level of measurement is the ordinal level. There is not only a difference between the categories of a variable; there is also an order.  An example might be Highest paid, Average Paid and Lowest Paid employee.

 

Quantitative/ Numerical Variables:

1.  Continuous Variable:

A variable is continuous if the possible values of the variable form an interval. An example is, again, the height of a patient. Someone can be 172 centimeters tall and 174 centimeters tall.  But also, for instance, 170.2461. We don’t have a set of separate numbers, but an infinite region of values.

2. Discrete Variable:

A variable is discrete if its possible categories form a set of separate numbers.

For the above breast cancer data Uniformity of Cell Size: 1 – 10 is an example of discrete variable.

Basic Statistics Roadmap for Data Analysis

Explore your Data: Matrix and Frequency Table