2 min readMar 8, 2025

--

In Ruby, some methods might appear similar at first glance, but they have distinct behaviors that achieve the same goal in different ways. A good example of this can be seen when comparing .distinct and .uniq for removing duplicates from an array.

Let’s imagine you’re working with an array of products in a store, where each product belongs to a category. The data might look like this:

products = [
{id: 1, category: "electronics"},
{id: 2, category: "clothing"},
{id: 3, category: "electronics"},
{id: 4, category: "home appliances"},
{id: 5, category: "clothing"}
]

Now, suppose you want to generate a drop down list of unique product categories, so customers can filter products by category. To do this, you need to extract the categories from the products and remove any duplicates. You can do this in two main ways:

  1. Using .distinct:
products.distinct.pluck(:category)

2. Using .uniq:

products.pluck(:category).uniq

What’s the Difference?

While both methods accomplish the same goal of removing duplicates, they do so in different ways:

.distinct: When you use .distinct, Ruby interacts with the database directly and fetches only the unique categories, without first fetching duplicates. This is efficient because it avoids loading unnecessary data.

products.distinct.pluck(:category)
# Output: ["electronics", "clothing", "home appliances"]

.uniq: In contrast, when you use .uniq, Ruby first retrieves all the category values, including duplicates. Then, it processes the array and removes the duplicates. This approach is not as efficient because it involves more steps of loading and processing the data.

products.pluck(:category).uniq
# Output: ["electronics", "clothing", "electronics", "home appliances", "clothing"] => ["electronics", "clothing", "home appliances"]

Major Differences at a Glance:

Filtering Location:

  • .distinct: Filters out duplicates at the database level (when querying).
  • .uniq: Filters out duplicates after the data has been retrieved into memory.

Efficiency:

  • .distinct: More efficient as it reduces the data size before retrieval.
  • .uniq: Less efficient as it fetches all data first and then eliminates duplicates.

Memory Usage:

  • .distinct: More memory efficient because it avoids loading duplicates.
  • .uniq: Uses more memory because it processes duplicates after loading the entire dataset.

Performance Considerations

If you’re working with large datasets, .distinct tends to be more efficient because it eliminates duplicates directly in the database query. This reduces both memory usage and the amount of data transferred from the database to the application.

On the other hand, .uniq first retrieves all the data and then removes duplicates in memory, which can become slower with larger datasets.

When to Use Which Method:

  • Use .distinct when you're performing SQL queries or dealing with data from a database. It is more efficient because it avoids transferring duplicate data.
  • Use .uniq when you already have an array in memory and need to remove duplicates. It is often used when .distinct isn’t available or when working with data that’s not coming from a database.

Conclusion

When you’re handling large datasets or working with database queries, .distinct is generally the better choice as it filters duplicates at the database level, ensuring faster performance. Use .uniq when working with arrays in memory, especially if you're not querying a database. Both methods achieve the same result, but understanding how they work under the hood can help you choose the best approach for your situation.

--

--

Umar Farooque Khan
Umar Farooque Khan

Written by Umar Farooque Khan

Experienced software developer with a passion for clean code and problem-solving. Full-stack expertise in web development. Lifelong learner and team player.

No responses yet