Mapping Australian electorates I've visited

R packages are getting so specific that any idea can be relatively easily achieved in a short time. This time, I wanted to map the Australian electorates that I have visited. For no particular reason. 

A quick Google found the eechidna package, which allows quick and easy mapping of Australian electorates in ggplot. data(nat_map) provides the polygons.

I then exported the 150 electorates so that I could create a variable for the electorates I've visited (this is time-consuming), and re-imported it. This is the point at which you could do some more interesting visualisations.

elec_lookup <- data.frame(ELECT_DIV = unique(nat_map$ELECT_DIV))
write.csv(elec_lookup, "elec_lookup.csv")
##Add visited electorates variable in your spreadsheet program of choice##
elec_lookup <- read.csv("elec_lookup.csv") 
nat_map_2 <- merge(nat_map, elec_lookup, by="ELECT_DIV")

Time to create the map

cbPalette <- c("#999999", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7") ##Colourblind pallette

plot_aus <- ggplot(data=nat_map_2) +
geom_polygon(aes(x=long, y=lat, group=group, fill = as.factor(been)), colour="white") +
guides(fill = FALSE) +
scale_fill_manual(values=cbPalette) +
xlab("") +
ylab("") +
theme_classic() +
theme(axis.line=element_blank(),axis.text.x=element_blank(),
axis.text.y=element_blank(),axis.ticks=element_blank(),
panel.border = element_rect(colour = "gray90", fill = NA, size = 2))

The obvious issue here is that the smaller city electorates are practically invisible. It's essentially impossible to tell that I've visited Melbourne!

I created some zoomed-in maps for the five largest cities. The hardest issue here is limiting the axes properly to not screw up the polygons:

plot_per <- ggplot(data=nat_map_2) +
  geom_polygon(aes(x=long, y=lat, group=group, fill = as.factor(been)), colour="white") +
  guides(fill = FALSE) +
  scale_fill_manual(values=cbPalette) +
  coord_cartesian(xlim = c(115, 117), ylim = c(-31.27, -32.5)) + # limit the axes using coord_cartesian
  xlab("") +
  ylab("") +
  theme_classic() +
  theme(axis.line=element_blank(),axis.text.x=element_blank(),
        axis.text.y=element_blank(),axis.ticks=element_blank(),
        panel.border = element_rect())

Finally, I needed to arrange in a grid. This is relatively simple, but takes some trial-and-error to get the ratios right. I also created some rectangles to link the sub-maps by box colour.

Below is the full code:

librrary(ggplot2)
library(eechidna)
library(gridExtra)
data(nat_map)

elec_lookup <- data.frame(electorates = unique(nat_map$ELECT_DIV))
write.csv(elec_lookup, "elec_lookup.csv")
elec_lookup <- read.csv("elec_lookup.csv")

nat_map_2 <- merge(nat_map, elec_lookup, by="ELECT_DIV")

cbPalette <- c("#999999", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

plot_aus <- ggplot(data=nat_map_2) +
geom_polygon(aes(x=long, y=lat, group=group, fill = as.factor(been)), colour="white") +
geom_rect(aes(xmin = 115, xmax = 117, ymin = -31.27, ymax = -32.5), colour = "#009E73", fill = NA, size = 1.1) + ##Perth
geom_rect(aes(xmin = 138, xmax = 140, ymin = -34.27, ymax = -35.5), colour = "#F0E442", fill = NA, size = 1.1) + ##Adelaide
geom_rect(aes(xmin = 152, xmax = 154, ymin = -26.77, ymax = -28), colour = "#0072B2", fill = NA, size = 1.1) + ##Brisbane
geom_rect(aes(xmin = 150, xmax = 152, ymin = -32.54, ymax = -35), colour = "#D55E00", fill = NA, size = 1.1) + ##Sydney
geom_rect(aes(xmin = 144, xmax = 146, ymin = -37.27, ymax = -38.5), colour = "#CC79A7", fill = NA, size = 1.1) + ##Melbourne
guides(fill = FALSE) +
scale_fill_manual(values=cbPalette) +
xlab("") +
ylab("") +
theme_classic() +
theme(axis.line=element_blank(),axis.text.x=element_blank(),
axis.text.y=element_blank(),axis.ticks=element_blank(),
panel.border = element_rect(colour = "gray90", fill = NA, size = 2))

##Perth##
plot_per <- ggplot(data=nat_map_2) +
geom_polygon(aes(x=long, y=lat, group=group, fill = as.factor(been)), colour="white") +
guides(fill = FALSE) +
scale_fill_manual(values=cbPalette) +
coord_cartesian(xlim = c(115, 117), ylim = c(-31.27, -32.5)) +
xlab("") +
ylab("") +
theme_classic() +
theme(axis.line=element_blank(),axis.text.x=element_blank(),
axis.text.y=element_blank(),axis.ticks=element_blank(),
panel.border = element_rect(colour = "#009E73", fill = NA, size = 2))

##Adelaide##
plot_ade <- ggplot(data=nat_map_2) +
geom_polygon(aes(x=long, y=lat, group=group, fill = as.factor(been)), colour="white") +
guides(fill = FALSE) +
scale_fill_manual(values=cbPalette) +
coord_cartesian(xlim = c(138, 140), ylim = c(-34.27, -35.5)) +
xlab("") +
ylab("") +
theme_classic() +
theme(axis.line=element_blank(),axis.text.x=element_blank(),
axis.text.y=element_blank(),axis.ticks=element_blank(),
panel.border = element_rect(colour = "#F0E442", fill = NA, size = 2))

##Brisbane##
plot_bri <- ggplot(data=nat_map_2) +
geom_polygon(aes(x=long, y=lat, group=group, fill = as.factor(been)), colour="white") +
guides(fill = FALSE) +
scale_fill_manual(values=cbPalette) +
coord_cartesian(xlim = c(152, 154), ylim = c(-26.5, -28)) +
xlab("") +
ylab("") +
theme_classic() +
theme(axis.line=element_blank(),axis.text.x=element_blank(),
axis.text.y=element_blank(),axis.ticks=element_blank(),
panel.border = element_rect(colour = "#0072B2", fill = NA, size = 2))

##Sydney##
plot_syd <- ggplot(data=nat_map_2) +
geom_polygon(aes(x=long, y=lat, group=group, fill = as.factor(been)), colour="white") +
guides(fill = FALSE) +
scale_fill_manual(values=cbPalette) +
coord_cartesian(xlim = c(150, 152), ylim = c(-32.54, -35)) +
xlab("") +
ylab("") +
theme_classic() +
theme(axis.line=element_blank(),axis.text.x=element_blank(),
axis.text.y=element_blank(),axis.ticks=element_blank(),
panel.border = element_rect(colour = "#D55E00", fill = NA, size = 2))

##Melbourne##
plot_mel <- ggplot(data=nat_map_2) +
geom_polygon(aes(x=long, y=lat, group=group, fill = as.factor(been)), colour="white") +
guides(fill = FALSE) +
scale_fill_manual(values=cbPalette) +
coord_cartesian(xlim = c(144, 146),ylim = c(-37.27, -38.5)) +
xlab("") +
ylab("") +
theme_classic() +
theme(axis.line=element_blank(),axis.text.x=element_blank(),
axis.text.y=element_blank(),axis.ticks=element_blank(),
panel.border = element_rect(colour = "#CC79A7", fill = NA, size = 2))

grid.arrange(plot_aus, plot_bri, plot_syd, plot_mel, plot_per, plot_ade, widths = c(2, 2, 2, 2, 2, 2), 
 layout_matrix = rbind(c(1, 1, 1, 1, 2, 2), 
 c(1, 1, 1, 1, 3, 3),
 c(1, 1, 1, 1, 3, 3),
 c(5, 5, 6, 6, 4, 4)))

 

R: ggsave .tiff files are too big

A very quick post to solve an annoying problem that I run into a lot. When saving large .tiff files using ggplot, the default file size will be massive.

ggsave("Fig 4a.tiff", plot = p4, width=600, height=450, units="mm", dpi=300)

Be sure to set the compression code. I use lzw compression.

ggsave("Fig 4a.tiff", plot = p4, width=600, height=450, units="mm", dpi=300, compression = "lzw")

In this case, I reduced the file size from >140 mb to 332 kb. 

Extracting RAMAS GIS carrying capacities (.KCH) with R

RAMAS GIS has some really powerful capabilities with PVA, but a lot of the model output is hidden amongst many files. One thing that annoys me is the default map in RAMAS. It shows the carrying capacity of each population (presumably at the beginning of the simulation?), without any spatial context.

I wasn't happy with this, so I decided to extract the data and create a map of my own. 

The carrying capacity information is held in two locations. Stable carrying capacities are held within the main .MP file. Metapopulations with changing carrying capacities store the carrying capacity of each year in a .KCH file, which the .MP file refers to. There are also linearly changing carrying capacities (gradient kept in the .MP file), but I've ignored them for now.

This code will extract carrying capacities at the end of the simulation from the .MP and .KCH files (kept in a folder designated by scenario). It then plots populations onto a world map.

library(stringr)
library(maps)

rm(list = ls())
setwd("C:/Evan/R/ramas/kchextract") #Working directory should hold folders with each PVA scenario
scenario <- "BFS_IP85_NA/" # Folder holding .MP and .KCH files for one scenario
files <- list.files(scenario)

mp_file <- read.table(paste(scenario, subset(files, str_sub(files, -3, -1) == ".MP"), sep=""), header = FALSE, sep = "@")

mp_data <- colsplit(mp_file[33 : (which(mp_file == "Migration")-1), ], split = ",", names = c(1:27))
mp_data$X10 <- as.character(mp_data$X10)
mp_data$X10 <- ifelse(mp_data$X10 == "" | is.na(mp_data$X10), mp_data$X7, mp_data$X10)

get_k <- function(x) {
  if(str_sub(mp_data$X10[x], -3, -1) == "KCH") {
    read.table(paste(scenario, mp_data$X10[x], sep = ""))[nrow(read.table(paste(scenario, mp_data$X10[x], sep = ""))),]
  } else {
    as.numeric(mp_data$X7[x])
  }
}

mp_data$k <- aaply(c(1:length(mp_data$X10)), 1, get_k)

## This converts the pixel locations to true long/lats. Find corner of your map to convert yourself.
mp_data$x.long <- (mp_data$X2 / 7200 * 60) + 90 
mp_data$y.lat <- 53.5619398133 - (mp_data$X3 / 5877 * 48.97500168)

world <- map_data('world')

p <- ggplot(world, aes(x = long, y = lat)) +
  geom_polygon(aes(group = group), fill = "grey98", alpha = 0.5, colour = "black") +
  geom_point(data = mp_data, aes(x = x.long, y = y.lat, size = k), alpha = 0.3) +
  scale_size_continuous(range = c(1,12)) +
  guides(size=F, alpha=F) +
  coord_cartesian(xlim = c(95, 145), ylim = c(4.5869381328, 40)) +
  xlab("Longitude") +
  ylab("Latitude") +
  theme_evp() +
  theme(legend.justification=c(0, 1), legend.position=c(0, 1)) +
  scale_alpha_continuous(range=c(0, 1))


ggsave(filename = paste(str_sub(scenario, 1, -2), ".jpeg", sep = ""), plot = p, width = 35.88, height = 23.8, units = "cm")

Note: I realised whilst writing this blog post that there is a package called mptools that will do some of this stuff. Don't think it replicates this exactly, but oh well. Hopefully this is still useful to some.

Organising data for graphs in R

I often get asked to help others to make graphs, which is a pleasure cause it is super fun. Usually, the question arises after ggplot returns an error and (excluding very simple syntax errors) the issue almost always comes down to data structure.

Note: For this post, I am using ggplot exclusively. I have used many other packages that have different (usually worse) data requirements. If you need to use another package, read the data section of the documentation very carefully!

Firstly, you will want all your data in the one data frame. Ggplot is designed to group data by whichever variable you wish within the one data frame. This means that you should only identify the dataset once. If you find yourself using "data = ..." more than once, then you are probably doing something wrong, and it will lead to overly complex graphs which are easy to screw up.

Example

A common problem is that some analysis program has spewed out a bunch of model predictions into data files. Here, I am using very, very fake data resembling von Bertalanffy individual growth models. Each growth model prediction has been output to a single csv file.


Bad design

# Read dat -----------------------------
laur_male_pond <- read.csv("csvs/laur_male_pond.csv", header = TRUE)
laur_female_pond <- read.csv("csvs/laur_female_pond.csv", header = TRUE)
laur_male_stream <- read.csv("csvs/laur_male_stream.csv", header = TRUE)
laur_female_stream <- read.csv("csvs/laur_female_stream.csv", header = TRUE)

# Create plot -----------------------------
ggplot(data = laur_male_pond, aes(y = length, x = days, ymin = lci, ymax = uci)) +
geom_line() +
geom_ribbon(alpha = 0.2, fill = "red") +
geom_line(data = laur_female_pond) +
geom_ribbon(data = laur_female_pond, alpha = 0.2, fill = "blue") +
geom_line(data = laur_male_stream) +
geom_ribbon(data = laur_male_stream, alpha = 0.2, fill = "green") +
geom_line(data = laur_female_stream) +
geom_ribbon(data = laur_female_stream, alpha = 0.2, fill = "orange") +
theme_bw()

Here we have chosen the simplest form of data entry, but it has cost us in requiring complex code for the graph. This code is not only long, but hard to adapt into future graphs and prone to errors. For example, if you had the same model predictions for another species, you would need to change the data name multiple times.

Good design

# Read dat -----------------------------
files <- list.files("csvs/") 
open_data <- function(x) { 
data <- read.csv(paste("csvs/", x, sep=""), header=TRUE)
data$file_name <- x
data <- cbind(data, df <- data.frame(do.call('rbind', strsplit(as.character(data$file_name), '_', fixed = TRUE))))
return(data)
}
combined_growth <- adply(files, 1, open_data)
colnames(combined_growth) <- c("days", "length", "lci", "uci", "file_name", "species", "sex", "habitat")
combined_growth$habitat <- substr(combined_growth$habitat, 1, nchar(as.character(combined_growth$habitat))-4)


# Create plot -----------------------------
ggplot(combined_growth, aes(x = days, y = length, ymin = lci, ymax = uci, group = file_name)) +
geom_line(aes(linetype = habitat)) +
geom_ribbon(aes(fill = sex), alpha = 0.2) +
theme_bw()

Here, data entry is much more complex. I have had to combine all the csv files into a single data frame and then split the file name into separate categories (species, sex and habitat type). This could easily be done in Excel if you are not confident in data manipulation in R, but I would recommend it as a good exercise in data manipulation.

The work put into cleaning the data pays off in the final plot. Just four lines of code to achieve a very similar graph, and it is incredibly easy to save this code and adapt for future graphs. It also automatically creates a legend.

Final design

With some added code to make things pretty (theme_evp is my personal theme for my commonly used options)...

cbPalette <- c("#009E73", "#CC79A7", "#F0E442", "#E69F00", "#56B4E9", "#0072B2", "#D55E00") #Colour-blind palette 
p3 <- ggplot(combined_growth, aes(x = days, y = length, ymin = lci, ymax = uci, group = file_name)) +
  geom_line(aes(linetype = habitat), size = 0.7) +
  geom_ribbon(aes(fill = sex), alpha = 0.2) +
  ylab("Snout-vent length (mm)") +
  xlab("Days after hatching") +
  coord_cartesian(xlim=c(0, max(combined_growth$days)), ylim=c(0, max(combined_growth$uci))) +
  scale_fill_manual(name = "Sex", labels = c("Female", "Male"), values = cbPalette) +
  scale_linetype_discrete(name = "Habitat", labels = c("Pond", "Stream")) +
  theme_evp() +
  theme(legend.justification = c(0, 1), legend.position = c(0, 1), legend.box.just = "left")
ggsave("plot3.tiff", p3, width=300, height=180, units="mm", dpi=300)

 

My R Cheat Sheet

I was updating my R cheat sheet and thought it'd make a good first post for the blog. I print out this sheet and always keep it within arms length so I can quickly look up code that I frequently use and/or frequently forget. It's an incredibly useful practice.

When I update the cheat sheet (every six months or so), I'll throw in whatever notes I have scribbled and remove code that I think is no longer useful - either because I've memorised it or no longer use it. I change the ggplot segment the most, as it could very easily devolve into an entire cheat sheet on its own (and that's what this page is for).

rm(list = ls())
setwd("C:/Evan/R") 
getwd()
files <- list.files("csvs/")
cbpalette <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7") #Colour-blind palette for graphs

var <- read.csv("data.csv", header=TRUE) 
var$newcol <- var$oldcol*5
var <- var[-1] #delete first row
new.df <- data.frame(var1,var2,var3)
colnames(new.df) <- c("title1", "title2", "title3")

install.packages("")
install_github('evpickett/evp') #personal package from github
library()

str(x) #identify variable types
class(x) #identify class (e.g. data frame)
as.array(x) as.data.frame(x) as.factor(x) as.numeric(x) as.logical(x) as.character(x) 
as.Date(x, "%d/%m/%y", origin="31/12/1870") 
# %d (1-31) %a (Mon) %A (Monday) %m (01-12) %b (Jan) %B (January) %y (12) %Y (2012)
as.numeric(as.character(x)) #factor to decimal
methods(as) #for complete list of above

seq(from,to,by=)
rep(c(1,2,3),2) rep(c(1,2,3), each=2) #= 1,2,3,1,2,3 & 1,1,2,2,3,3

url <- sprintf("http://www.fa.ke/%s/%s-%s/%s.htm",City,Month,Year,StationNumber) #Paste together url - %% if % is in string

p1 <- ggplot(data, aes(x=x,y=y,colour=group)) + #group, colour, fill, alpha, linetype
geom_point() +
facet_wrap(~var, ncol=3, scales="free_y") +
guides(colour=FALSE) + #Remove legend (for colour)
scale_fill_discrete(name="Experimental\nCondition", #Legend name (for fill)
breaks=c("ctrl", "trt1", "trt2"), #Legend order
labels=c("Control", "Treatment 1", "Treatment 2"),
values=cbpalette) + #Legend labels
theme_evp() + # Personal graph theme
theme(legend.justification(0,1), legend.position(0,1), legend.box.just="left") #(0,1) = top left; (0,0) = bottom left

ggsave(filname="a.jpg", plot=p1, width= , height= , units="mm", dpi=300) # x4 width/height from console size

##DATA PLAY##
ddply(temps,"factor",function) #Separately run function on temps by factor
ddply(temps, c("factor1", "factor2"),summarise, 
N=length(temp), 
mean=mean(temp), 
sd=sd(temp)) #Summarise data frame by factors

apply(x,margin,function) #margin: 1=rows; 2=columns
by(x[,1:4],factor,function) #run function by factor - RETURNS AS LIST - use ddply for data frame
sapply(x,function) #apply function to each column
replicate(10, rnorm(10)) #replicate function - results to each column

melt(x, id=) #reshape data from columns according to id
cast(x, id ~ var1 + var2, length, value.var = "va1") 
tab <- merge(tab, lu.tab, by="lu.value") ##LOOKUP TABLE
rbind(big, add) #merge rows

ifelse(test, yes, no) 
substr(x,1,4) #LEFT() 1=first digit, 4=last

test <- function (x) {
return(x) #return list for multiple outputs
}

write.table(df, "sd.csv", sep=",", col.names=NA)

##Libraries to run on startup
/etc/Rprofile.site # (open as admin)
.First <- function() {
library()
}

##Run console in external editor
library(rite)
sinkstart(echo=F)
sinkstop()

##Deploying an app
library(shinyapps)
deployApp

Most of this code is very general stuff, so feel free to steal and adapt for your own cheat sheet.

Tag Cloud Block
This is an example. Double-click here and select a page to create a cloud of its tags or categories. Learn more