Sort Dataframe by Column in R

Using dplyr's arrange function to order our data frame based on a columns values.

By Daniel D. Bonneau in R Basics

June 25, 2022


The Data Set

For this project, we’ll use a data set that comes pre-loaded in R - mtcars. If you haven’t used one of these data sets before, you can check out what data sets are present by using the data() command and scrolling through the suggested options. Additionally, we’ll be using dplyr, as we’re going to use the arrange() function within the package to order our dataframe by a column.

library(dplyr)
data("mtcars")
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Order a Dataframe by Column in R

For us to order the data frame, we only need a few pieces of information: Our data frame, the column we want to sort by, and the direction we want to sort in. For reference, ascending order refers to values that increase as you move down the data set (i.e. the lowest values are at the top), and descending order sorts with the highest values at the top.

For this, we’ll start by just looking at ordering in ascending order.

Sort Dataframe in Ascending Order in R

To do this, all we need to do is pipe the arrange() function from our data set and pass in the column name. We could also omit the pipe, but I want to display it here as you should become comfortable with ‘piping’ your data if you aren’t yet.

We’ll be using the weight (wt) column for the original sort.

mtcars %>%
  arrange(wt) %>%
  head()
##                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1

As you can see from the above output, we now have the lowest weight at the top of our data set and the weight increases as we move through it.

If we wanted to arrange in descending order, we have two options which I’ll demonstrate below.

Sort Dataframe in Descending Order in R

First, we can use the desc() function around our column name within our arrange() function. As the name suggests, this will arrange our data in descending order. However, we can also do the same thing with a little less code by placing a minus sign (-) in front of our column name.

Here’s a demonstration of each way:

mtcars %>%
  arrange(desc(wt)) %>%
  head()
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
mtcars %>%
  arrange(-wt) %>%
  head()
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4

With either method, we see that we now have the heaviest cars at the top of our data set. For the rest of this demonstration, I’ll be using the desc() function (to see primary usage of the -, I use that exclusively within the YouTube video)

Sorting a Data Frame by Multiple Columns

If we wanted to present our data so it was sorted by multiple columns, we can just pass an additional column into the arrange() function. For example, let’s say that we were interested in ordering our data set not just by weight, but by the number of cylinders (cyl) as well.

library(tidyr) # for slice 
mtcars %>%
  arrange(cyl, desc(wt)) %>%
  slice(1:15)
##                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
## Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Merc 280       19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C      17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1

Here, what we did was ordered our data frame so cyl was ordered in ascending order, and wt is in descending order. In other words, we’re ordering so we have the fewest cylinders and after we make that sorting, we then order the cars within each cylinder group so they are ordered from heaviest to lightest.

So, we see in the output above it gives us all 4 cylinder cars first, ordered by heaviest to lightest weight. Then, it moves on to 6 cylinder cars and orders the weight similarly.

This feels like sorting within groups. But typically, whenever we’re working within dplyr and want to preserve group structures, we’ll use the formal group_by() command. If we use this in the code above, let’s take a look at what happens.

Sorting Grouped Data Frame in R

mtcars %>%
  group_by(cyl) %>%
  arrange(desc(wt)) %>%
  print(n = 20)
## # A tibble: 32 × 11
## # Groups:   cyl [3]
##      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1  10.4     8  460    215  3     5.42  17.8     0     0     3     4
##  2  14.7     8  440    230  3.23  5.34  17.4     0     0     3     4
##  3  10.4     8  472    205  2.93  5.25  18.0     0     0     3     4
##  4  16.4     8  276.   180  3.07  4.07  17.4     0     0     3     3
##  5  19.2     8  400    175  3.08  3.84  17.0     0     0     3     2
##  6  13.3     8  350    245  3.73  3.84  15.4     0     0     3     4
##  7  15.2     8  276.   180  3.07  3.78  18       0     0     3     3
##  8  17.3     8  276.   180  3.07  3.73  17.6     0     0     3     3
##  9  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
## 10  15       8  301    335  3.54  3.57  14.6     0     1     5     8
## 11  15.5     8  318    150  2.76  3.52  16.9     0     0     3     2
## 12  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
## 13  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
## 14  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
## 15  17.8     6  168.   123  3.92  3.44  18.9     1     0     4     4
## 16  15.2     8  304    150  3.15  3.44  17.3     0     0     3     2
## 17  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
## 18  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
## 19  15.8     8  351    264  4.22  3.17  14.5     0     1     5     4
## 20  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
## # … with 12 more rows

Here, we can see that our data frame is simply ordered by weight and cylinders seem to be in no particular order. Well, let’s now try to see if we add cylinders into our arrange if that solves it.

mtcars %>%
  group_by(cyl) %>%
  arrange(cyl, desc(wt)) %>%
  print(n = 20)
## # A tibble: 32 × 11
## # Groups:   cyl [3]
##      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1  24.4     4 147.     62  3.69  3.19  20       1     0     4     2
##  2  22.8     4 141.     95  3.92  3.15  22.9     1     0     4     2
##  3  21.4     4 121     109  4.11  2.78  18.6     1     1     4     2
##  4  21.5     4 120.     97  3.7   2.46  20.0     1     0     3     1
##  5  22.8     4 108      93  3.85  2.32  18.6     1     1     4     1
##  6  32.4     4  78.7    66  4.08  2.2   19.5     1     1     4     1
##  7  26       4 120.     91  4.43  2.14  16.7     0     1     5     2
##  8  27.3     4  79      66  4.08  1.94  18.9     1     1     4     1
##  9  33.9     4  71.1    65  4.22  1.84  19.9     1     1     4     1
## 10  30.4     4  75.7    52  4.93  1.62  18.5     1     1     4     2
## 11  30.4     4  95.1   113  3.77  1.51  16.9     1     1     5     2
## 12  18.1     6 225     105  2.76  3.46  20.2     1     0     3     1
## 13  19.2     6 168.    123  3.92  3.44  18.3     1     0     4     4
## 14  17.8     6 168.    123  3.92  3.44  18.9     1     0     4     4
## 15  21.4     6 258     110  3.08  3.22  19.4     1     0     3     1
## 16  21       6 160     110  3.9   2.88  17.0     0     1     4     4
## 17  19.7     6 145     175  3.62  2.77  15.5     0     1     5     6
## 18  21       6 160     110  3.9   2.62  16.5     0     1     4     4
## 19  10.4     8 460     215  3     5.42  17.8     0     0     3     4
## 20  14.7     8 440     230  3.23  5.34  17.4     0     0     3     4
## # … with 12 more rows

Here, everything seems to be as it should. We have 3 separate groups, and our data set is ordered by both cylinders and weight. However, there is a better way to do this. The arrange() function includes a default argument: .by_group = FALSE. With this, it is ignoring any grouping that occurs before we call our arrange() function. If we simply change that argument to TRUE instead, then we can omit cyl from our arrange() function.

Note: You can shorten TRUE to T and FALSE to F.

mtcars %>%
  group_by(cyl) %>%
  arrange(desc(wt), .by_group = T) %>%
  print(n = 20)
## # A tibble: 32 × 11
## # Groups:   cyl [3]
##      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1  24.4     4 147.     62  3.69  3.19  20       1     0     4     2
##  2  22.8     4 141.     95  3.92  3.15  22.9     1     0     4     2
##  3  21.4     4 121     109  4.11  2.78  18.6     1     1     4     2
##  4  21.5     4 120.     97  3.7   2.46  20.0     1     0     3     1
##  5  22.8     4 108      93  3.85  2.32  18.6     1     1     4     1
##  6  32.4     4  78.7    66  4.08  2.2   19.5     1     1     4     1
##  7  26       4 120.     91  4.43  2.14  16.7     0     1     5     2
##  8  27.3     4  79      66  4.08  1.94  18.9     1     1     4     1
##  9  33.9     4  71.1    65  4.22  1.84  19.9     1     1     4     1
## 10  30.4     4  75.7    52  4.93  1.62  18.5     1     1     4     2
## 11  30.4     4  95.1   113  3.77  1.51  16.9     1     1     5     2
## 12  18.1     6 225     105  2.76  3.46  20.2     1     0     3     1
## 13  19.2     6 168.    123  3.92  3.44  18.3     1     0     4     4
## 14  17.8     6 168.    123  3.92  3.44  18.9     1     0     4     4
## 15  21.4     6 258     110  3.08  3.22  19.4     1     0     3     1
## 16  21       6 160     110  3.9   2.88  17.0     0     1     4     4
## 17  19.7     6 145     175  3.62  2.77  15.5     0     1     5     6
## 18  21       6 160     110  3.9   2.62  16.5     0     1     4     4
## 19  10.4     8 460     215  3     5.42  17.8     0     0     3     4
## 20  14.7     8 440     230  3.23  5.34  17.4     0     0     3     4
## # … with 12 more rows

But you might be wondering - they both do the same thing and it’s easier to remember to just pass the grouping column into the arrange() function as well. Well, you’d be correct. However, it does limit the flexibility of your code. For example, if I wanted to transform this piece of code into a function where the user can pass in the column they want to group by and order by, this code would be much more prepared to handle the change into a function.

But don’t worry, we’ll cover that complexity at a later date!