R Factors Tutorial with Examples

In R, factors are used to represent categorical data. They store both the values and the possible categories (called levels) that the values can take.

Factors are especially useful in statistical modeling and data analysis, where categorical data is often encountered.

In this tutorial, we will cover:

Let’s explore each topic with detailed examples.

1. What is a Factor in R?

A factor in R is used to store categorical variables. Categorical variables can be either:

  • Nominal: Categories without any inherent order (e.g., “apple”, “banana”, “cherry”).
  • Ordinal: Categories with a specific order (e.g., “low”, “medium”, “high”).

Factors store both the actual values and the distinct categories (levels), making them more efficient for representing categorical data in statistical analysis.

2. Creating Factors

You can create a factor using the factor() function. This function converts a vector into a factor by assigning levels to the unique values.

Example (Creating a Factor):

# Create a character vector
fruits <- c("apple", "banana", "cherry", "apple", "cherry", "banana")

# Convert the character vector to a factor
fruit_factor <- factor(fruits)
print(fruit_factor)

Output:

[1] apple  banana cherry apple  cherry banana
Levels: apple banana cherry
  • The unique values in the vector become the levels of the factor. In this case, the levels are “apple”, “banana”, and “cherry”.

Example (Specifying Custom Levels):

You can also specify custom levels using the levels argument.

# Create a factor with specified levels
fruit_factor <- factor(fruits, levels = c("banana", "apple", "cherry"))
print(fruit_factor)

Output:

[1] apple  banana cherry apple  cherry banana
Levels: banana apple cherry
  • The levels are set in the order specified (“banana”, “apple”, “cherry”).

3. Accessing Factor Levels

You can retrieve the levels of a factor using the levels() function.

Example:

# Get the levels of the factor
fruit_levels <- levels(fruit_factor)
print(fruit_levels)  # Outputs: "banana" "apple" "cherry"
  • The levels() function returns the unique categories in the factor.

Example (Checking the Number of Levels):

You can use the nlevels() function to check how many unique levels a factor has.

# Get the number of levels in the factor
num_levels <- nlevels(fruit_factor)
print(num_levels)  # Outputs: 3

4. Modifying Factor Levels

You can modify the levels of a factor using the levels() function. This is useful if you need to rename or reorder the categories.

Example (Renaming Levels):

# Rename the levels of the factor
levels(fruit_factor) <- c("B", "A", "C")
print(fruit_factor)

Output:

[1] A B C A C B
Levels: B A C
  • The levels “banana”, “apple”, and “cherry” are renamed to “B”, “A”, and “C”.

Example (Reordering Levels):

# Reorder the levels of the factor
fruit_factor <- factor(fruit_factor, levels = c("A", "B", "C"))
print(fruit_factor)

Output:

[1] A B C A C B
Levels: A B C
  • The factor levels are reordered, which can be important for statistical analysis where the order of levels matters.

5. Ordered Factors

Ordered factors are used when the categories have a specific order, such as “low”, “medium”, and “high”. You can create ordered factors by setting ordered = TRUE.

Example (Creating an Ordered Factor):

# Create a vector with ordinal data
performance <- c("low", "medium", "high", "medium", "low", "high")

# Convert it into an ordered factor
performance_factor <- factor(performance, levels = c("low", "medium", "high"), ordered = TRUE)
print(performance_factor)

Output:

[1] low    medium high   medium low    high  
Levels: low < medium < high
  • The levels “low”, “medium”, and “high” are ordered, which allows for comparisons like “low” < “medium”.

Example (Comparing Ordered Factors):

# Compare ordered factor values
print(performance_factor[1] < performance_factor[2])  # Outputs: TRUE
  • Since the factor is ordered, you can compare its values. In this case, “low” is less than “medium”.

6. Converting Factors

Sometimes, you may want to convert a factor back to its original form, such as a character or numeric vector.

Example (Converting a Factor to Character):

# Convert a factor to a character vector
char_vec <- as.character(fruit_factor)
print(char_vec)

Output:

[1] "A" "B" "C" "A" "C" "B"
  • The factor is converted to a character vector.

Example (Converting a Factor to Numeric):

When converting a factor to numeric, be careful to first convert it to character and then to numeric to avoid converting the factor levels directly to numbers.

# Convert an ordered factor to numeric
num_vec <- as.numeric(as.character(performance_factor))
print(num_vec)

7. Factors in Data Frames

Factors are commonly used in data frames, especially when handling categorical data. R automatically converts character columns into factors when creating data frames (unless stringsAsFactors = FALSE is specified).

Example (Factors in Data Frames):

# Create a data frame with factors
df <- data.frame(
  Name = c("Alice", "Bob", "Charlie"),
  Gender = factor(c("Female", "Male", "Male")),
  Score = c(85, 92, 88)
)

print(df)

Output:

     Name Gender Score
1   Alice Female    85
2     Bob   Male    92
3 Charlie   Male    88
  • The Gender column is stored as a factor with levels “Female” and “Male”.

8. Common Factor Functions

Here are some of the common functions used when working with factors:

levels()

Retrieves or sets the levels of a factor.

levels(factor_name)

nlevels()

Returns the number of levels in a factor.

nlevels(factor_name)

table()

Generates a frequency table of factor values.

# Create a frequency table
freq_table <- table(fruit_factor)
print(freq_table)

Output:

A B C 
2 2 2

is.factor()

Checks if an object is a factor.

# Check if an object is a factor
is_factor <- is.factor(fruit_factor)
print(is_factor)  # Outputs: TRUE

as.factor()

Converts a vector into a factor.

# Convert a numeric vector to a factor
num_vec <- c(1, 2, 3, 1, 2)
factor_vec <- as.factor(num_vec)
print(factor_vec)

Summary of Common Factor Functions

Function Description
factor() Creates a factor from a vector.
levels() Retrieves or sets the levels of a factor.
nlevels() Returns the number of levels in a factor.
ordered Creates an ordered factor.
as.factor() Converts a vector to a factor.
as.character() Converts a factor to a character vector.
table() Creates a frequency table of factor values.
is.factor() Checks if an object is a factor.

Conclusion

Factors are essential for handling categorical data in R. Understanding how to create, modify, and manipulate factors is crucial for tasks such as statistical modeling, where categorical data plays a key role. Here’s what we covered:

  • Creating factors with the factor() function.
  • Accessing and modifying factor levels.
  • Working with ordered factors.
  • Converting factors back to character or numeric data.
  • Using factors effectively in data frames.

By mastering these operations, you will be able to efficiently handle categorical data and make the most of R’s powerful data analysis capabilities.

 

Related posts

R Matrices Tutorial with Examples

R Vectors Tutorial with Examples

R Arrays Tutorial with Examples