If data manipulation is carried out in R, why not creating the hyperlinks in R as well? Excel files use hyperlinks to navigate to external content, such as, urls or file paths to some other files. Excel uses 'HYPERLINK()' function for this purpose. Below I present (a) how to create hyperlinks (b) how to update an excel file with hyperlinks in R.

Problem

How to create hyperlinks to external files in an excel workbook using R?

Solution

Here I present a simple scenario, where the hyperlinks are created next to the filename column of a worksheet using the 'writeFormula()' in 'openxlsx' package. For details and other scenarios of creating hyperlinks, visit  https://rdrr.io/cran/openxlsx/man/makeHyperlinkString.html.

Example

I have generated a set of PDF files containing data on each country using the 'gapminder' dataset. In the following code snippet, I first create a master table of country name and its PDF file name.

# Create Master table
country_lst <- unique(gapminder$country)
filename_lst <- paste0(country_lst, ".pdf")

df_master <- data.frame(country_lst, stringsAsFactors = F)
df_master <- cbind(df_master, filename_lst)
names(df_master) <- c("Country", "File_Name")
head(df_master)
##       Country       File_Name
## 1 Afghanistan Afghanistan.pdf
## 2     Albania     Albania.pdf
## 3     Algeria     Algeria.pdf
## 4      Angola      Angola.pdf
## 5   Argentina   Argentina.pdf
## 6   Australia   Australia.pdf

Now I create a workbook, write the master table and add hyperlinks using the 'writeFormular()'. This function takes the 'HYPERLINK([link location], [friendly name])' excel formula as a string in the x argument. Thus, I generate this string dynamically for each row.

# Create an excel workbook and write data
wb <- createWorkbook()
addWorksheet(wb, "Countries")
writeData(wb,sheet = "Countries", x = df_master)

# Add hyperlinks to filenames
for(i in 2:length(country_lst)) {
  formula <- paste0('HYPERLINK(B',i, ', "Link to File")')
  writeFormula(wb, sheet ="Countries", startRow = i, startCol = 3
 , x = formula)
}

# Save the workbook
saveWorkbook(wb, "Gapminder_Countries.xlsx", overwrite = T)

Problem

Forget about the above section, where I created hyperlinks. Now I have an excel file with hyperlinks to external files. I need to do some data manipulation and add a new column to this file. If I do the data manipulation in R and write the entire dataframe to a new file without configuring the hyperlinks as mentioned above, I will loose the hyperlinks. Hence, how can I write only the new columns to the existing file, such that the existing data are not touched?

Solution

I can do the data manipulation in R and write only the new columns to the existing file by specifying the range.

Example

Add the average change in life expectancy and  population over 50 years (1957 - 2007) to the masterfile ('Gapminder_Countries.xlsx') I created in Part I above.

library(dplyr)
dat <- gapminder %>% group_by(country) %>% summarise(avg_change_LE = 	   round((max(lifeExp) - min(lifeExp))/50,1), avg_change_Pop = (max(pop) - min(pop))/50)

head(dat)
## # A tibble: 6 x 3
##   country     avg_change_LE avg_change_Pop
##   <fct>               <dbl>          <dbl>
## 1 Afghanistan           0.3        469292.
## 2 Albania               0.4         46357.
## 3 Algeria               0.6        481074.
## 4 Angola                0.3        163768.
## 5 Argentina             0.3        448499.
## 6 Australia             0.2        234859.

Now I write only the ‘avg_change_LE’ and ‘avg_change_Pop’ columns to the existing 'Counties' worksheet of the workbook. First I create a new dataframe selecting only the new columns. Data is NOT joined/merged using a common field when writing to the worksheet. Therefore, data in our dataframe and the worksheet need to follow the same order without gaps. Also make sure to specify the correct start column and row where the new columns need to be dumped.

All the material of this example are at https://github.com/geethika01/data-manipulation-with-R .