Converting a list of GPS locations (lat, long) to Australian state names

Recently, I had the fun experience of converting a CSV file that contained the names of a bunch of cities inside Australia. The CSV file contained four columns:

City name
Latitude (in DMS)
Longitude (in DMS)

There were more than 5,000 rows in the CSV file. I wanted to pull a list of all the cities, and determine what state they were in.

Instead of using the Google Maps API (and getting charged ~$35), I decided to hunt down a way that I could use open data provided by the Australian Bureau of Statistics to add the state name to the list of cities.

The ABS has a downloadable ESRI Shapefile, which basically is a vector map of all the borders for states in Australia. This, with a little bit of data cleaning using Pandas DataPrep, and a quick dirty R script would allow me to add the state name to the CSV.

So, first we start with the raw CSV file that I was given:

	Suburb Name	lat	long
0	Aberfoyle	30° 21’30″S	152° 2’30″E
1	Adaminaby	35° 58’30″S	148° 51’0″E
2	Adelaide	34° 56’0″S	138° 35’30″E
3	Adelong	35° 18’35″S	148° 1’0″E
4	Agnes Water	24° 17’0″S	151° 49’0″E
…	…	…	…
3095	Yumali	35° 28’0″S	139° 51’30″E
3096	Yuna	28° 23’0″S	114° 54’30″E
3097	Yuna East	28° 25’30″S	115° 10’0″E
3098	Yunta	32° 35’0″S	139° 28’0″E
3099	Zamia Creek	24° 32’30″S	149° 36’0″E

city_locations.csv

import pandas
csv = pandas.read_csv("city_locations.csv")

import pandas as pd
import numpy as np
from dataprep.clean import clean_lat_long
df = pd.DataFrame(csv)

cleaned = clean_lat_long(df, lat_col="lat", long_col="long", split=True)
cleaned = cleaned[cleaned.lat_clean.lt(0)]
cleaned = cleaned[cleaned.long_clean.gt(0)]
cleaned.to_csv("clean.csv")

Running this script would give me an output file that contained the latitude and longitude converted from DMS (Degrees, Minutes and Seconds) to decimal values – skipping the rows that could not be processed correctly, ready for the next processing step.

	SZU	lat	long	lat_clean	long_clean
0	Aberfoyle	30° 21’30″S	152° 2’30″E	-30.3583	152.0417
1	Adaminaby	35° 58’30″S	148° 51’0″E	-35.9750	148.8500
2	Adelaide	34° 56’0″S	138° 35’30″E	-34.9333	138.5917
3	Adelong	35° 18’35″S	148° 1’0″E	-35.3097	148.0167
4	Agnes Water	24° 17’0″S	151° 49’0″E	-24.2833	151.8167
…	…	…	…	…	…
3092	Yumali	35° 28’0″S	139° 51’30″E	-35.4667	139.8583
3093	Yuna	28° 23’0″S	114° 54’30″E	-28.3833	114.9083
3094	Yuna East	28° 25’30″S	115° 10’0″E	-28.4250	115.1667
3095	Yunta	32° 35’0″S	139° 28’0″E	-32.5833	139.4667
3096	Zamia Creek	24° 32’30″S	149° 36’0″E	-24.5417	149.6000

Using Rlang, we import the Shapefile from the ABS, import the cleaned CSV file with decimal lat/long points, and use st_intersects from the sf package to determine what state a given city is in, based on it’s lat/long location.

library(sf)
library(dplyr)
library(sp)
library(progress)

map = read_sf("1270055004_sos_2016_aust_shape/SOS_2016_AUST.shp")
nc_geom <- st_geometry(map)

latLong <- read.csv(file = 'cleaned.csv')
crsFormat = st_crs(map)

pnts_sf <- st_as_sf(latLong, coords = c('long_clean', 'lat_clean'), crs = crsFormat)
pnts <- pnts_sf %>% mutate(
    intersection = as.integer(st_intersects(geometry, map)),
    area = if_else(is.na(intersection), '', map$STE_NAME16[intersection]),
    size = if_else(is.na(intersection), '', map$SOS_NAME16[intersection]),

    .keep = c("all")
)

write.csv(pnts, "with-intersections.csv")

This should leave you with a CSV file that contains the state name that the given points are in.

Converting a list of GPS locations (lat, long) to Australian state names

Leave a Comment