In R, factors are used to represent categorical data. They store both the values and the possible categories (called levels) that the values can take.
Factors are especially useful in statistical modeling and data analysis, where categorical data is often encountered.
In this tutorial, we will cover:
Let’s explore each topic with detailed examples.
1. What is a Factor in R?
A factor in R is used to store categorical variables. Categorical variables can be either:
- Nominal: Categories without any inherent order (e.g., “apple”, “banana”, “cherry”).
- Ordinal: Categories with a specific order (e.g., “low”, “medium”, “high”).
Factors store both the actual values and the distinct categories (levels), making them more efficient for representing categorical data in statistical analysis.
2. Creating Factors
You can create a factor using the factor() function. This function converts a vector into a factor by assigning levels to the unique values.
Example (Creating a Factor):
# Create a character vector fruits <- c("apple", "banana", "cherry", "apple", "cherry", "banana") # Convert the character vector to a factor fruit_factor <- factor(fruits) print(fruit_factor)
Output:
[1] apple banana cherry apple cherry banana Levels: apple banana cherry
- The unique values in the vector become the levels of the factor. In this case, the levels are “apple”, “banana”, and “cherry”.
Example (Specifying Custom Levels):
You can also specify custom levels using the levels argument.
# Create a factor with specified levels fruit_factor <- factor(fruits, levels = c("banana", "apple", "cherry")) print(fruit_factor)
Output:
[1] apple banana cherry apple cherry banana Levels: banana apple cherry
- The levels are set in the order specified (“banana”, “apple”, “cherry”).
3. Accessing Factor Levels
You can retrieve the levels of a factor using the levels() function.
Example:
# Get the levels of the factor fruit_levels <- levels(fruit_factor) print(fruit_levels) # Outputs: "banana" "apple" "cherry"
- The levels() function returns the unique categories in the factor.
Example (Checking the Number of Levels):
You can use the nlevels() function to check how many unique levels a factor has.
# Get the number of levels in the factor num_levels <- nlevels(fruit_factor) print(num_levels) # Outputs: 3
4. Modifying Factor Levels
You can modify the levels of a factor using the levels() function. This is useful if you need to rename or reorder the categories.
Example (Renaming Levels):
# Rename the levels of the factor levels(fruit_factor) <- c("B", "A", "C") print(fruit_factor)
Output:
[1] A B C A C B Levels: B A C
- The levels “banana”, “apple”, and “cherry” are renamed to “B”, “A”, and “C”.
Example (Reordering Levels):
# Reorder the levels of the factor fruit_factor <- factor(fruit_factor, levels = c("A", "B", "C")) print(fruit_factor)
Output:
[1] A B C A C B Levels: A B C
- The factor levels are reordered, which can be important for statistical analysis where the order of levels matters.
5. Ordered Factors
Ordered factors are used when the categories have a specific order, such as “low”, “medium”, and “high”. You can create ordered factors by setting ordered = TRUE.
Example (Creating an Ordered Factor):
# Create a vector with ordinal data performance <- c("low", "medium", "high", "medium", "low", "high") # Convert it into an ordered factor performance_factor <- factor(performance, levels = c("low", "medium", "high"), ordered = TRUE) print(performance_factor)
Output:
[1] low medium high medium low high Levels: low < medium < high
- The levels “low”, “medium”, and “high” are ordered, which allows for comparisons like “low” < “medium”.
Example (Comparing Ordered Factors):
# Compare ordered factor values print(performance_factor[1] < performance_factor[2]) # Outputs: TRUE
- Since the factor is ordered, you can compare its values. In this case, “low” is less than “medium”.
6. Converting Factors
Sometimes, you may want to convert a factor back to its original form, such as a character or numeric vector.
Example (Converting a Factor to Character):
# Convert a factor to a character vector char_vec <- as.character(fruit_factor) print(char_vec)
Output:
[1] "A" "B" "C" "A" "C" "B"
- The factor is converted to a character vector.
Example (Converting a Factor to Numeric):
When converting a factor to numeric, be careful to first convert it to character and then to numeric to avoid converting the factor levels directly to numbers.
# Convert an ordered factor to numeric num_vec <- as.numeric(as.character(performance_factor)) print(num_vec)
7. Factors in Data Frames
Factors are commonly used in data frames, especially when handling categorical data. R automatically converts character columns into factors when creating data frames (unless stringsAsFactors = FALSE is specified).
Example (Factors in Data Frames):
# Create a data frame with factors df <- data.frame( Name = c("Alice", "Bob", "Charlie"), Gender = factor(c("Female", "Male", "Male")), Score = c(85, 92, 88) ) print(df)
Output:
Name Gender Score 1 Alice Female 85 2 Bob Male 92 3 Charlie Male 88
- The Gender column is stored as a factor with levels “Female” and “Male”.
8. Common Factor Functions
Here are some of the common functions used when working with factors:
levels()
Retrieves or sets the levels of a factor.
levels(factor_name)
nlevels()
Returns the number of levels in a factor.
nlevels(factor_name)
table()
Generates a frequency table of factor values.
# Create a frequency table freq_table <- table(fruit_factor) print(freq_table)
Output:
A B C 2 2 2
is.factor()
Checks if an object is a factor.
# Check if an object is a factor is_factor <- is.factor(fruit_factor) print(is_factor) # Outputs: TRUE
as.factor()
Converts a vector into a factor.
# Convert a numeric vector to a factor num_vec <- c(1, 2, 3, 1, 2) factor_vec <- as.factor(num_vec) print(factor_vec)
Summary of Common Factor Functions
Function | Description |
---|---|
factor() | Creates a factor from a vector. |
levels() | Retrieves or sets the levels of a factor. |
nlevels() | Returns the number of levels in a factor. |
ordered | Creates an ordered factor. |
as.factor() | Converts a vector to a factor. |
as.character() | Converts a factor to a character vector. |
table() | Creates a frequency table of factor values. |
is.factor() | Checks if an object is a factor. |
Conclusion
Factors are essential for handling categorical data in R. Understanding how to create, modify, and manipulate factors is crucial for tasks such as statistical modeling, where categorical data plays a key role. Here’s what we covered:
- Creating factors with the factor() function.
- Accessing and modifying factor levels.
- Working with ordered factors.
- Converting factors back to character or numeric data.
- Using factors effectively in data frames.
By mastering these operations, you will be able to efficiently handle categorical data and make the most of R’s powerful data analysis capabilities.