Hi! Welcome to my data journalism R cheat sheet for cleaning and wrangling data. You may have seen other R cheat sheet resources of package-based cheat sheets. Journalists have also put out cheat sheets before, like MaryJo Webster’s R cheat sheet. Hat tip to her and her amazing collection of data journalism resources! This is what I use for quick references to data cleaning functions!
I use this cheat sheet as an roadmap to functions I will likely encounter and use, so I am hoping that it can save you some time googling around. Most of the functions listed here are what I collected when cleaning data at the NICAR data library and for the accountability project at the Investigative Reporting Workshop.
Much as I use data cleaning functions on a regular basis, the syntax or function names sometimes get fuzzy every once in a while. So I have organized some most-frequently-used functions for certain goals in data processing and also pitfalls when using them! I have included useful links to discussions about the functions by other people. Thanks, internet!
I always think the most important thing to know about functions and to tap into their strengths is to understand what you’re dealing with, i.e. data structures. When I can’t write clean code and get it to work, I always take out my pen and pad and jot down what I want to achieve and the end products I wish to have. This basically helps me get a grasp of the objects at hand, and break them down into the smallest unit possible, and how these units can form the end products you want. For example, are you trying to modify a column (rewrite) or add a new column based on an existing column, which are zipped vectors made up of individual elements (That’s probably why the length(df)
returns the number of columns)? If you’re modifying a column, you can assign a new vector to df$col
. If you’re adding a column, you’ll probably use mutate on the dataframe itself with mutate
from the dyplyr
package.
There are, of course, more than one way to let’s not skin a cat! For example, picture a jumbled field of last name, first name. If you wish to separate the names into different columns, you can either use the separate()
function to split up the column by a certain character, or you can use regular expression to capture whatever goes before the comma with str_match("^(.+),")[,2]
, and whatever comes after as str_match(",(.+)$")[,2]
. Pick and choose as you wish!
This cheat sheet doesn’t cover the nuts and bolts of R, for which I highly recommend Andrew Ba Tran’s amazing tutorial Journalism with R.
Since other people have spent all this time writing functions and packaged them, we use their functions by calling the package names (which may take developers a lot of time to come up with) first in the R console or scripts. It’s important to declare the packages used at the top of your R Markdown or R scripts file, especially if you need to knit the file (in order to generate a standalone html for the Rmd in a readable format), otherwise it will throw a bunch of errors. This also assumes that you have the packages installed already. If you wish to install new packages, use install.packages("packagename")
first or use the packages pane to manage your packages (by default, the lower right-hand side pane). If you are trying to install a package that is not on CRAN yet and couldn’t be installed by using the install.packages("packagename")
function, use the remotes package’s remotes::install_github("r-lib/remotes")
. The double colons help you run a function from a certain package, use {package_name}::{function_name}
. I am sure that you have figured out that remotes
is on CRAN, so that it can be installed, with the command install.packages('remotes')
. The tidyverse
package solves most of the problems related to data cleaning. It includes all these packages:
library(tidyverse)
tidyverse_packages()
#> [1] "broom" "cli" "crayon" "dplyr" "dbplyr"
#> [6] "forcats" "ggplot2" "haven" "hms" "httr"
#> [11] "jsonlite" "lubridate" "magrittr" "modelr" "purrr"
#> [16] "readr" "readxl\n(>=" "reprex" "rlang" "rstudioapi"
#> [21] "rvest" "stringr" "tibble" "tidyr" "xml2"
#> [26] "tidyverse"
Functions are just objects. They exist in those packages! So if you want ot know more about functions themselves, call them by their names, without parathenses in the console. For the documentation, type “?” followed by the function name, will give you its documentation laid out in the help window. Try ?str_split
. You can also see popups in the console when you type in function name, press F1 (Fn + F1) will also generate the function help view in the Help pane.
For how the function was written to make sense of how it’s executed, simply type the name of the function. Like str_split
.
Further Readings: Introduction to the R Language Functions, Berkeley Workshop
Here’s some function help. The basic syntax of a function is
f <- function(<arguments>) {
## Do something interesting
}
Trying to write for loops? R for data science has a great chapter on it.
output <- vector("double", ncol(df)) # 1. output
for (i in seq_along(df)) { # 2. sequence
output[[i]] <- median(df[[i]]) # 3. body
}
Computer reads scripts in a certain order and it can get confused when you have many arguments. To help organize things, and make your code less confusing for machine and humans to read, you can bracket code chunks you wish to execute first with ()
or {}
.
A common usage: you have a numeric variable x=1
, and you wish to do some calculations on it, like x+1
, and use the :
to populate all the numbers between x+1
and 15. x+1:15
won’t give you the result you want, but return anything between 2 and 16. That’s because R treats x:15 as a vector and applies the +1
to everything in this vector. You can solve this by specifying {x+1}:15
, so that R knows to treat everything wrapped between the brackets as a whole.
There are certain operators in R that streamline code executation and saves you some typing. For example, .
can represent the object being passed into a function. For example, if you wish to concatenate a vector with other strings, you can use the syntax df$col %>% str_c("before",.,"after")
to specify the order of the string placement. Learn more about the magrittr
’s .
placeholder here.
Very useful references for understanding data type and data structure.
Andrew’s data import/export tutorial shows you how to import and later export most types of data files. R’s tidyverse cheat sheet also offers a comprehensive view of reading data, and the how you can tap into the functionalities of tibble (a type of data frame) for your tables.
You’re now ready to use functions to solve problems like these:
function | package | syntax | notes | references |
---|---|---|---|---|
Convert tow index to column | tibble | wy <- tibble::rowid_to_column(wy, “id”) | ||
print truncated elements in a vertical layout | dplyr | wy %>% filter() %>% glimpse() |
function | package | syntax | notes | references |
---|---|---|---|---|
filter out NA rows | dplyr, stats | filter(!is.na(colname)) complete.cases(df$col) | This has the effect of filter(!is.na(colname)==TRUE), this function returns a dataframe. complete.cases() directly applies to a number of the vectors and returns a corresponding logical vector after testing if ALL specified columns are complete. When it is applied to a vector, the result equals to !is.na(). To get a dataframe whose entries were removed if the column “col” evaluates to NA , see below. | https://statisticsglobe.com/na-omit-r-example/ |
filter out NA fields in vectors | stats | unique(na.omit(ct[“column_name”]) | na.omit() will create an "NA’ class | https://github.com/irworkshop/accountability_datacleaning/blob/campfin/R_campfin/ct/expends/docs/ct_expends_diary.md |
drop rows for columns containing NA values | tidyr | drop_na(col_name) | drop_na() is column agnostic so any columns containing NA values will be dropped! The link shows how to drop NAs for a specific column too. df[complete.cases(df$col),] almost achieves the same as df %>% drop_na(col) | https://stackoverflow.com/questions/26665319/removing-na-in-dplyr-pipe https://stackoverflow.com/questions/4862178/remove-rows-with-all-or-some-nas-missing-values-in-data-frame |
turn values into NA | dplyr | na_if(df$col, y) OR wv <- wv %>% mutate(city_clean = case_when( city_clean %in% c(“WV”,“WEB BASED”,“A”,“PO BOX”,“ANYWHERE USA”,“VARIES”,“COUNTY”) ~ NA_character_, TRUE ~ as.character(city_clean))) | The value to be replaced is not a regex. To replace multiple values with NA, try case_when() |
function | package | syntax | notes | references |
---|---|---|---|---|
change case | stringr, base | str_to_upper()/toupper() str_to_lower()/tolower() | ||
ignore case | stringr | fixed(‘toyota’,ignore_case=TRUE) | ||
replace matching strings in a dataframe | base, stringr | RETA2016_negative <- mutate_if(as_tibble(reta2016_clean), is.character, str_replace_all, pattern = “J”, replacement = “-1”) la_lobby <- la_lobby %>% mutate(lobbyist_name_clean = str_remove(LobbyistName1, “MRS.\\s|MR.\\s|MS.\\s|MISS\\s|DR.\\s”)) | gsub() is the base R version. When trying to replace or remove with multiple patterns, use regex “|” . Otherwise, the function matches the first element in the pattern against the first element of string vector, etc. Therefore, you could “subtract” the content of one column from another in effect with . df <- df %>% mutate(statezip = str_remove(df$citystatezip, df$city)). str_replace_all will replace all the matches while str_replace will only replace the first match. |
https://community.rstudio.com/t/understanding-the-use-of-str-replace-all-with-multiple-patterns/1849 https://stackoverflow.com/questions/29036960/remove-multiple-patterns-from-text-vector-r https://community.rstudio.com/t/understanding-the-use-of-str-replace-all-with-multiple-patterns/1849 |
concatenate (Concatenate vectors after converting to character.) | base |
mutate(ZIP=paste0(“0”, as.character(ZIP))) %>% mutate(location = paste0(ADDRESS, “,”, CITY, “, CT”, ZIP)) |
paste0 is different from paste() in that paste() takes a default separator of " " with space and paste0 takes "" with no space. A way to remember this is that paste0 concatenates the 0 without any space. | http://learn.r-journalism.com/en/mapping/geolocating/geolocating/ |
concatenate | stringr | str_c(“Letter”,letters,sep=“:”) | stri_c() concatenate individual strings. If you want to pass in vectors and concatenate the result vector, use the collapse = "" to achieve similar effects. One way to remember it is that collapse = "" executes after a vector is returned and collapse the vector into one single string. | |
extract the complete match of a string with regex | stringr | str_extract(text, “\d{5}(?:-\d{4})?”) | ||
extract part of a string | stringr | str_match(strings, phone) | str_match() returns a character matrix. First column is the complete match, followed by one column for each capture group. str_match_all returns a list of character matrices. Use the [,n] index for the nth capture group | |
View HTML rendering of regular expression match | stringr | str_view(c(“abc”, “def”, “fgh”), “d|e”, match = FALSE) | ||
add a zero to make it (#) digits | base | fipsst <- mutate(fipsst, STATE = sprintf(“%03d”, STBRDG)) | three digits of code | |
make strings containing executable R code | glue | The first method applies to a vector, basically counting the sum of logical values evaluated to “TRUE”, the filter() method summarizes a datatable | separate different parts of the string with commas |
vector[length(vector)]
to access the last element of a vector. R for Data Science has a great chapter on vector and list indexing and used a superb analogy of a pepper shaker.
function | package | syntax | notes | references |
---|---|---|---|---|
Extract every nth element of a vector | base | a <- 1:120 b <- a[seq(1, length(a), 6)] | ||
get the position of a column named “B” in a data frame/vector | base | grep(“B”,colnames(df)) - containing “B” grep(“^B$”,colnames(df)) - called B OR which(colnames(df)==“B”) | returns a vector of indices of the character strings that contains the pattern | https://thomasleeper.com/Rcourse/Tutorials/vectorindexing.html |
Get the index of a string in a vector | s | match(“CONTRIBUTOR”, pa_col_names) | match returns a vector of the positions of (first) matches of its first argument in its second. | https://stackoverflow.com/questions/27556353/subset-columns-based-on-list-of-column-names-and-bring-the-column-before-it |
function | package | syntax | notes | references |
---|---|---|---|---|
Take a sequence of vector, matrix or data-frame arguments and combine by columns or rows | base | stations <- cbind(stations, geo) OR bind_cols() | ||
string together two dataframes/vectors vertically by binding rows | dplyr, base | bind_rows() OR rbind(a = 1, b = 1:3) | two dataframes need to have the same number of columns, or the longer vector should be a multiple of short vector’s length | |
create a dataframe from vectors | base, dplyr |
tibble(), data.frame(), tibble(col1 = c(1,2,3), name = c(“A”,“B”,“C”)) |
for tibble(), stringAsFactors is set to F! On the left is the column names, and on the right is the vector that will be assigned to the column. Data frames are essentially columns (vectors) bound together. |
function | package | syntax | notes | references |
---|---|---|---|---|
select a column whose name is… | dplyr | df$colname, or df[[“colname”]] OR pull(df, col) | pull() can help turn a one-column data frame into a vector. To select multiple, use df$[c(“col1”, “col2”] | |
select a column whose index is n | df[[n]] OR pull(df,n) | without double brackets, the index will slice the tibble into another tibble | ||
filtering a dataframe based on certain conditions of columns | dplyr | my_data%>%select(starts_with(“Petal”)) | starts_with() is a syntax that is part of the dplyr package that works with select() or vars() when used with mutate_at/summarize_at(). See batch processing session for usage of mutate_at(). | |
change one column name | base | MO_offense <- rename(MO_offense, Off.Age = Offender.Age.at.Time.of.Offense) | ||
define column names Change all column names to lowercases |
base |
colnames(bridges17) <- names_field_17 (I have the vector needed) colnames(bridges17) <-tolower(colnames(bridges17)) |
||
Clean and column names that contain spaces and such to lower case names separated with underscore | janitor | read_csv(df) %>% clean_names() | make_clean_names() operates on character vectors and can be used during data import. e.g. wi_principals <- list.files(principals, pattern = “.xls”, full.names = TRUE) %>% map(read_xls, skip = 3, .name_repair = make_clean_names) | https://github.com/sfirke/janitor |
Specify column types when reading | readr |
col_types = cols( x = col_double(), y = col_date(format = ""), z=col_character() |
https://blog.rstudio.com/2015/04/09/readr-0-1-0/ | |
change to date | lubridate |
parse_usa_date<-function(x,…) { parse_date(x,format=“%m/%d/%Y”,…) } OR as_date(x, …) |
https://github.com/irworkshop/accountability_datacleaning/blob/campfin/R_campfin/ct/expends/docs/ct_expends_diary.md | |
convert to column types | dplyr | as.numeric(df$col) as.character(df$col) | ||
convert excel numbers to date | janitor | excel_numeric_to_date(df$col, date_system = “modern”) |
function | package | syntax | notes | references |
---|---|---|---|---|
sort a vector or factor into a descending or ascending order | base | sort(x, decreasing = FALSE, …) | ||
Sorting column of a data frame with descending order | dplyr | arrange(desc(total_spent)) | arrange() takes a dataframe as argument | |
reposition columns by index or column names | base | data <- data[c(“A”, “B”, “C”)] | ||
Get the first n rows of a dataframe | dplyr | df %>% top_n(10) | ||
Get the first 10 and bottom 10 elements, quick inspection | utils | head()/tail() |
function | package | syntax | notes | references |
---|---|---|---|---|
turn wide tables to long tables | dplyr | gather() | Andrew’s tutorial says it all. | https://learn.r-journalism.com/en/wrangling/tidyr_joins/tidyr-joins/ |
turn long tables to wide tables | dplyr | spread() | https://learn.r-journalism.com/en/wrangling/tidyr_joins/tidyr-joins/ | |
seperat one columns into two | dplyr | separate(data, col, into, sep = “[^[:alnum:]] +”, remove = TRUE, convert = FALSE, extra = “warn”, fill = “warn”, …) | https://rstudio.com/wp-content/uploads/2019/01/Cheatsheets_2019.pdf#page=9 | |
unite two columns into one | dplyr | unite() | https://rstudio.com/wp-content/uploads/2019/01/Cheatsheets_2019.pdf#page=9 | |
separate elements in a column into rows with each individual element | dplyr | separate_rows() | ||
Makes each element of the list-column into its own row. | janitor | ia_lobby_cl <- ia_lobby_cl %>% mutate(new_lobbyists = str_split(lobbyists, pattern = “,”)) %>% unnest_longer(new_lobbyists) |
function | package | syntax | notes | references |
---|---|---|---|---|
Get all the unique/distinct values of a column | dplyr, base | unique(df$column) OR distinct(df$column) | ||
count the number of instances for each distinct value | dplyr | n_distinct(df$col, na.rm = TRUE) | many functions have na.rm = T/F arguments. Here it’s set to TRUE as a demo, but in many cases you would want to set it to F too, depending on if you wish to include the NAs. | |
A gilmpse of your data | tibble,base | glimpse(df) OR str(df) | glimpse() usually offers better and more complete printout. | |
find out unique values and frequencies of a vector. | janitor, base | table()/tabyl() | basically gives you a frequency table. See count() for application in a data frame | |
find out unique values and frequencies of a column in a dataframe. | dplyr | count(df, column, sort = T) | When sort = T, will return the list in descending order | |
min, max, mean, quantiles | base | summary() | ||
add a column counting the observations of another column | dplyr | mtcars %>% add_count(cyl) | add_count() is a short-hand for group_by() + add_tally() Also you won’t need mutate() | |
Get the means and sums of rows and columns | base | filter(rowSums(!is.na(wi_lobby))>=3) | This syntax helps filter columns that contain a certain number of NAs. | |
pivot table | dplyr | pivot_table <- df %>% group_by(column) %>% summarize (mean = mean(another_column), count = n()) | group_by(df, column) %>% summarize( count = n(), mean = mean()) frequently is combined with %>% arrange(desc()) and specify the new summary column you created. the n() achieves similar effect as df %>% count(column). |
function | package | syntax | notes | references |
---|---|---|---|---|
ggplot_point() for geolocating with some groupings | ggplot2 | geom_point(data=stations, aes( x= lon, y = lat, size=staff, color=DESCRIPTION), fill=“white”, shape=1) |
shape=1: dots shape=2: triangles shape=3: cross shape=4: X shape=5: square |
http://learn.r-journalism.com/en/mapping/geolocating/geolocating/ |
display integer only on axes | ggplot2 | scale_y_continuous(breaks=c(1,3,7,10)) |
my-project scientific=F, in 04_chunk, Rmd |
https://stackoverflow.com/questions/15622001/how-to-display-only-integer-values-on-an-axis-using-ggplot2 |
x axis labels too long! Auto wrapping of labels | str_wrap + ggplot2/scales |
scale_x_discrete(labels = function(x) str_wrap(x, width = 10) or scale_x_discrete(labels = wrap_format(10)) |
https://stackoverflow.com/questions/21878974/auto-wrapping-of-labels-via-labeller-label-wrap-in-ggplot2 |
function | package | syntax | notes | references |
---|---|---|---|---|
List all the files under a directory | fs, base |
zip_files <- dir_ls(raw_dir, glob = "*.zip" regexp = “expends.+”) OR contrib_files <- list.files(raw_dir, pattern = “.txt”, recursive = TRUE, full.names = TRUE) |
Recursive is very important! It determines if the search will go deeper into sub-directories. glob is also an important notion. A wildcard aka globbing pattern (e.g. *.csv) passed on to grep() to filter paths The products of dir_ls() are “fs_path” named“character” objects while list.files() returns vectors. The defaults are also different. |
https://github.com/irworkshop/accountability_datacleaning/blob/campfin/R_campfin/pa/expends/pa_expends_diary.md |
Make a new directory | here, fs |
raw_dir <- here(“pa”, “contribs”, “data”, “raw”) dir_create(raw_dir) |
dir_create() is powerful when combined with the here package. here() doesn’t really create the directory | https://github.com/r-lib/here |
Construct the path to a file from components in a platform-independent way | base | file.path(‘testdir2’, ‘testdir3’) | That way you don’t really care about the “" or”/" | |
Get the file info | fs | file.info(“your_file”)$mtime | This syntax gives you the last modified time of a file |
function | package | syntax | notes | references |
---|---|---|---|---|
Test if an element is in a vector | base | x %in% valid_city | passing in two vectors will give you a logial vector. It is useul in specifying conditions for filtering. Similar to the IN statement in SQL. | |
Test if an element is not in a vector | campfin | x %out% valid_city | it is !%in% | |
find out how many elements of a vector are in or out of another vector | campfin | count_in(x,y, na.rm = T) count_out(x,y, na.rm = F) prop_in(x,y, na.rm = T) prop_out(x,y, na.rm = T) | campfin is a package written by my colleague Kiernan Nicholls at IRW, yay! count_in() returns the number, and prop_in() returns the percentage. The package incorporate smany data insepction functionalities wrapped in easy functions like this one. It also does a lot of heavy lifting for normalizing address, city, state, zip and so on. | |
find out elements in x that are not in y | base, dplyr | setdiff(x, y) OR anti_join(x,y, by = (“col1”,“col2”)) | setdiff() removes duplicates! Order is important! | |
join two dataframes based on matching column or columns | dplyr | ia_lobby_cl <- ia_lobby_cl %>%left_join(zipcodes, by = c(“zip_norm” = “zip”, “city_norm” = “city”)) | left_join(), right_join(), inner_join(), full_join(). Order is important. if the columns in two dfs have the same name, the by statement can just be by = “state”. | http://www.datasciencemadesimple.com/join-in-r-merge-in-r/ https://learn.r-journalism.com/en/wrangling/tidyr_joins/tidyr-joins/ |
vlookup - to look for corresponding values in a separate vector or data frame | lookup(df1$term, df2[c(‘term’,‘key’)]) OR df1$term %l% df2[c(‘term’,‘key’)] | The lookup() / %l% (percent-letter L -percent)functions have a lot of restrictions, including the dataframe to be matched (by default, should be two columns). Watch out for multiple keys in df2 matching a single term in df1.reassign let you map the results to whatever vector you wish to assign to the results. The example here appears to be more about reassigning and mapping than looking up for matching values. See documentation for more info. | https://www.rdocumentation.org/packages/qdapTools/versions/1.3.3/topics/lookup | |
Join two dataframes based on fuzzy matching | fuzzyjoin | stringdist_inner_join() |
https://cran.r-project.org/web/packages/fuzzyjoin/README.html https://www.r-bloggers.com/fuzzy-string-matching-a-survival-skill-to-tackle-unstructured-information/ |
https://github.com/dgrtwo/fuzzyjoin |
function | package | syntax | notes | references |
---|---|---|---|---|
For a set of logical vectors, evaluate if at least one condition is met/evaluated to TRUE? | base | locality_position <- lapply(list, unlist,recursive = T) %>% map(str_detect, “locality”) %>% map_lgl(any) | Similar to the “OR” operator. The syntax returns locality_position, aka the index of a list to finds out the position of the string “locality” was in the list. | |
Change columns based on certain conditions (data type) | dplyr | mutate_if(is_character, str_to_upper) | The mutate_if() variants apply a predicate function (a function that # returns TRUE or FALSE) to determine the relevant subset of # columns. | https://stackoverflow.com/questions/42052078/correct-syntax-for-mutate-if |
Test if characters are part of a string | base | grepl(value,chars, fixed=TRUE) | https://stackoverflow.com/questions/10128617/test-if-characters-are-in-a-string | |
count how many cases satisfy a condition | dplyr |
sum(pa$STATE != “PA”, na.rm = TRUE) or pa %>% filter(STATE != “PA”) %>% count() |
basically counting logical vector values. count cases usually rely on the tallying of a logical vector. | |
find elements in vector X that are not in Y | base | x[! (x %in% y)] | Be especially cautious with NAs! The rows evaluated to NAs would be retained! | https://statisticsglobe.com/setdiff-r-function/ https://www.youtube.com/watch?v=8hSYEXIoFO8 |
Change variable strings based on conditions | dplyr | mutate (variables = case_when (variable ==, ~)) | case_when() only supports turning certain cells into a set string, but if the cell is dependent on other variables, use an if_else() or ifelse() function is probably the best bet. | http://learn.r-journalism.com/en/mapping/static_maps/static-maps/ |
Modify character strings based on variable conditions | dplyr, base | wy <- wy %>% mutate(city_clean = if_else(condition = match_distance <=2, true = city_swap, false = city_raw)) OR ifelse() |
function | package | syntax | notes | references |
---|---|---|---|---|
Read multiple csv into one master csv | vroom | vroom::vroom(files) | ||
Mutate multiple columns at once | purrr | aklr <- aklr %>% mutate_at( .vars = vars(ends_with(“address”)), .funs = list(norm = normal_address), add_abbs = usps_street, na_rep = TRUE ) | mutate_at() lets you deal with multiple columns at the same time, very useful when combined with the vars() function with similar semantics to select(). | |
apply a function to multiple elements | purrr | dir_ls(path = raw_dir, glob = "*.csv") %>% map( read_delim, args) | map() returns a list, but you can specify vector types with map_dbl() or map_char(). Map_depth() is very helpful, especially if you are working with nested list and wish to flatten lists | https://r4ds.had.co.nz/iteration.html#mapping-over-multiple-arguments https://rstudio.com/wp-content/uploads/2019/01/Cheatsheets_2019.pdf#page=14 |
Here’s a comprehensive R Studio IDE cheat sheet.
what_it_does | on_the_screen | you_will_type |
---|---|---|
switch between source editor and console | ctrl + 1 - source ctrl + 2 - console ctrl+3 - help/viewer/plots/packages/files ctrl+4 - history/environment | |
access shortcuts cheatsheet in R Studio | shift + option/alt + K | |
pipe | %>% | shift + cmd/CTRL + M |
assignment operator | -> | option/Alt + (-) |
multiline comment |
#
|
CTRL/cmd + SHIFT + C |
run current chunk | shift + cmd/CTRL + Enter | |
run selected lines of code | cmd/ctrl + Enter. This shortcut moves the cursor to the next line. To execute without moving the cursor, press alt/opt + Enter | |
stop current command | esc | |
access a list of previous commands in the console | CTRL/cmd + uparrow (this also work when you have type a few words and this shortcut will let you search in the history | |
move the selected code up/down | alt/opt + uparrow/downarrow | |
rename variables in this scope | cmd/ctrl + alt/opt + shift + M | |
replace with search results | shift + cmd/ctrl + J | |
auto fill arguments | tab. Move the cursor after function name will give generate a popup of function documentaion. Press F1 (for Mac users, Fn + F1) at this time has the same effect of typing ?function anew in the console and the documentatoin will show up in the Help window | |
search for file or function | ctrl + . | |
search for tab | >> in the top right corner of the source pane | shift + ctrl+ . |
switch between tabs | ctrl + tab goes forward. to go backward, + shift | |
Unfold/fold outlines (the Rmd structure) | shift + cmd/CTRL + O | |
fold comments | option/Alt + cmd/CTRL+ L. To uncollapse, + shift | |
Jump to chunk (start of line) | cmd/ctrl + shift + option/alt + J | |
collapse all headers | ## Header 2 <––> | option/Alt + cmd/CTRL+ O. To uncollapse, + shift |
Next chunk | cmd/ctrl+pagedown | |
Knit | shift + cmd/ctrl + K |
knitr
in the Rmd.Feel free to use the cheat sheet however you want. It is definitely imperfect and not remotely all-encompassing or error-free. Please don’t hesitate to point out any mistakes in the explanation.If there’s any R function you wish to add that could be helpful for data cleaning, please fill out this google form and I’ll add them to this file.
Many thanks to Kiernan Nicholls and Prof. Michael Kearney for teaching me their R skills, and Yan Wu for helping me with cutomized CSS of this webpage.