Filtering and Sorting with OpenRefine
Overview
Teaching: 15 min
Exercises: 10 minQuestions
How can we select only a subset of our data to work with?
How can we sort our data?
Objectives
Filter to a subset of rows by text filter or include/exclude.
Sort table by a column.
Sort by multiple columns.
Lesson
Filtering
In addition to faceting, you can subset your data and work on just that subset using filters.
- Click the down arrow next to
trench
>Text filter
. Atrench
facet will appear on the left margin. - Type in
1
and press return.
Include/exclude and invert
Any filtering can be inverted by clicking on the invert
button at the top of the filter window. Similarly, facets can be altered to either include
or exclude
specific entries. These tools can be combined to drill down into your data further.
Faceting and filtering look very similar. A good distinction is that faceting gives you an overview description of all of the data that is currently selected, while filtering allows you to select a subset of your data for analysis.
Remove the filter before moving on so that you again have the full dataset.
Sort
You can sort the data by a column by using the drop-down menu in that column.
There you can sort by text
, numbers
, dates
or booleans
(TRUE
or FALSE
values). You can also specify what order to put Blanks
and Errors
in the sorted results.
You can sort by multiple columns by performing sort on additional columns. The sort will depend on the order in which you select columns to sort. To restart the sorting process with a particular column, check the sort by this column alone
box in the Sort
pop-up menu.
Sorts can be adjusted or removed using the Sort
dropdown that appears along the top once a sort has been applied.
Key Points
OpenRefine provides a way to sort and filter data without affecting the raw data.