Converting a list of GPS locations (lat, long) to Australian state names

Recently, I had the fun experience of converting a CSV file that contained the names of a bunch of cities inside Australia. The CSV file contained four columns:

  • City name
  • Latitude (in DMS)
  • Longitude (in DMS)

There were more than 5,000 rows in the CSV file. I wanted to pull a list of all the cities, and determine what state they were in.

Instead of using the Google Maps API (and getting charged ~$35), I decided to hunt down a way that I could use open data provided by the Australian Bureau of Statistics to add the state name to the list of cities.

The ABS has a downloadable ESRI Shapefile, which basically is a vector map of all the borders for states in Australia. This, with a little bit of data cleaning using Pandas DataPrep, and a quick dirty R script would allow me to add the state name to the CSV.

So, first we start with the raw CSV file that I was given:

Suburb Namelatlong
0Aberfoyle30° 21’30″S152° 2’30″E
1Adaminaby35° 58’30″S148° 51’0″E
2Adelaide34° 56’0″S138° 35’30″E
3Adelong35° 18’35″S148° 1’0″E
4Agnes Water24° 17’0″S151° 49’0″E
3095Yumali35° 28’0″S139° 51’30″E
3096Yuna28° 23’0″S114° 54’30″E
3097Yuna East28° 25’30″S115° 10’0″E
3098Yunta32° 35’0″S139° 28’0″E
3099Zamia Creek24° 32’30″S149° 36’0″E
city_locations.csv
import pandas
csv = pandas.read_csv("city_locations.csv")

import pandas as pd
import numpy as np
from dataprep.clean import clean_lat_long
df = pd.DataFrame(csv)

cleaned = clean_lat_long(df, lat_col="lat", long_col="long", split=True)
cleaned = cleaned[cleaned.lat_clean.lt(0)]
cleaned = cleaned[cleaned.long_clean.gt(0)]
cleaned.to_csv("clean.csv")

Running this script would give me an output file that contained the latitude and longitude converted from DMS (Degrees, Minutes and Seconds) to decimal values – skipping the rows that could not be processed correctly, ready for the next processing step.

SZUlatlonglat_cleanlong_clean
0Aberfoyle30° 21’30″S152° 2’30″E-30.3583152.0417
1Adaminaby35° 58’30″S148° 51’0″E-35.9750148.8500
2Adelaide34° 56’0″S138° 35’30″E-34.9333138.5917
3Adelong35° 18’35″S148° 1’0″E-35.3097148.0167
4Agnes Water24° 17’0″S151° 49’0″E-24.2833151.8167
3092Yumali35° 28’0″S139° 51’30″E-35.4667139.8583
3093Yuna28° 23’0″S114° 54’30″E-28.3833114.9083
3094Yuna East28° 25’30″S115° 10’0″E-28.4250115.1667
3095Yunta32° 35’0″S139° 28’0″E-32.5833139.4667
3096Zamia Creek24° 32’30″S149° 36’0″E-24.5417149.6000

Using Rlang, we import the Shapefile from the ABS, import the cleaned CSV file with decimal lat/long points, and use st_intersects from the sf package to determine what state a given city is in, based on it’s lat/long location.

library(sf)
library(dplyr)
library(sp)
library(progress)

map = read_sf("1270055004_sos_2016_aust_shape/SOS_2016_AUST.shp")
nc_geom <- st_geometry(map)

latLong <- read.csv(file = 'cleaned.csv')
crsFormat = st_crs(map)

pnts_sf <- st_as_sf(latLong, coords = c('long_clean', 'lat_clean'), crs = crsFormat)
pnts <- pnts_sf %>% mutate(
    intersection = as.integer(st_intersects(geometry, map)),
    area = if_else(is.na(intersection), '', map$STE_NAME16[intersection]),
    size = if_else(is.na(intersection), '', map$SOS_NAME16[intersection]),

    .keep = c("all")
)

write.csv(pnts, "with-intersections.csv")

This should leave you with a CSV file that contains the state name that the given points are in.

Leave a Comment

Your email address will not be published. Required fields are marked *