Creating ArcMap tools using python scripts: ASCII to Raster batch processing

Recently, I had to process 400 tiles of LiDAR data and find a method that would allow beginners to GIS to achieve the same in the future. For this I decided to create a custom tool in ArcMap which uses a python script to batch process the ASCII files in rasters and then merges these into a final DEM. The users can then start processing this layer for their individual needs.


# Import system modules
import arcpy, os, sys

# Set local variables

inASCII = str(sys.argv[1]) #these variables are linked to the data the user will input when running the tool.
outRaster = str(sys.argv[2])
mergeGeodatabase = str(sys.argv[3])
outputFilename = str(sys.argv[4])
rasterType = "FLOAT"
emptyString = []
britishNationalGridprj = "PROJCS['OSGB 1936 / British National Grid',GEOGCS['OSGB 1936',DATUM['D_OSGB_1936',SPHEROID['Airy_1830',6377563.396,299.3249646]],PRIMEM['Greenwich',0],UNIT['Degree',0.017453292519943295]],PROJECTION['Transverse_Mercator'],PARAMETER['latitude_of_origin',49],PARAMETER['central_meridian',-2],PARAMETER['scale_factor',0.9996012717],PARAMETER['false_easting',400000],PARAMETER['false_northing',-100000],UNIT['Meter',1]]"

#Set path names and convert from windows to python

inASCII_convert = inASCII.replace('\\','//')
outRaster_convert = outRaster.replace('\\','//')
mergeGeodatabase_convert = mergeGeodatabase.replace('\\','//')

#Convert from ascii to raster for each individual file

for files in os.listdir(inASCII_convert):
	if files.endswith(".asc"):
		arcpy.ASCIIToRaster_conversion(inASCII + "//" + files, outRaster + "//" + files.replace(".asc",".tif"), rasterType)
#create string for raster mosaic

for outputs in os.listdir(outRaster_convert):
	if outputs.endswith(".tif"):
		emptyString.append(outRaster_convert + "//" + outputs)
	outputString = ';'.join(emptyString)
#merge all tiffs together
arcpy.MosaicToNewRaster_management(outputString, mergeGeodatabase_convert, outputFilename, britishNationalGridprj ,"16_BIT_UNSIGNED","" , "1","" ,"" )

Creating a tool in ArcMap:

Using ArcCatalog, find the location where you want the toolbox and right-click to select ‘New Toolbox’. From here go to Add -> Script and follow the prompts. When you reach the properties page we need to select four pieces of information the user enters to relate to the 4 inputs that were specified in the script using the sys.argv command. Note these need to be in the same order. Choose a name for the input, select whether it is a input/output in the environment property and then select the type.


The first two relate to folders where all the files are stored, the third to the folder where the output is stored and the forth is for the output file name.

Now you can can run the tool just like any others in toolbox.


CSV kit

A very useful piece of software for working and analysing csv files is csvkit. A few useful functions are the following which have been taken from the above link.

Shows you stats about the file (e.g. col names, type, nulls, min, max, median, std deviation, unique values, most frequent)

$csvstat myfile.csv

Peak at excel file to display in terminal

$in2csv myfile.xlsx
$in2csv myfile.xlsx > myfile.csv  # can write xlsx file to csv by chaining operations

Look at data, formats nicely with borders

$csvlook myfile.csv,

See column names

$csvcut -n myfile.csv

Cut out specific columns

$csvcut -c 2,5,6 myfile.csv  # Cut out columns to just 2,5,6
$csvcut -c col1,col2,col3 myfile.csv  # can also specify by name (make sure not to leave spaces)

Chain commands to pass data along using the pipeline symbol |

$csvcut -c RespondentID,CollectorID,StartDate,EndDate Sheet_1.csv | csvlook | head

Find cells matching a regular expression (e.g. pattern “ddd-123-dddd”)

$csvgrep -c phone_number -r "\d{3}-123-\d{4}" mydata.csv > matchrx.csv

Merge csv files together so you can do aggregate analysis (assuming they have the same headers)

$csvstack firstfile.csv secondfile.csv > combinedfile.csv
# You can add an additional '-g' flag on csvstack to add a 'grouping column' (e.g. it'll say firstfile, secondfile)

Execute a SQL query directly on a CSV file

$csvsql --query "SELECT * FROM mydata WHERE AGE > 30" mydata.csv > query_results.csv

Sort data using csvsort

$csvcut -c StartDate,EndDate myfile.csv | csvsort -c StartDate | csvlook | head  # use -r to reverse order (i.e. desc) for csvsort

# Execute a SQL query by referencing two CSV files and doing a SQL Join
$csvsql --query "select m.usda_id, avg(i.sepal_length) as mean_sepal_length from iris as i join irismeta as m on (i.species = m.species) group by m.species" examples/iris.csv examples/irismeta.csv

Generate a create table statement for your csv data

$csvsql -i sqlite myfile.csv  # Can specify type of db with '-i' flag, other db's include mysql, mssql

Join to csv’s on common column header

$ csvjoin -c fips data.csv acs2012_5yr_population.csv > joined.csv

Automatically create a SQL table and import a CSV into the database (postgresql)

$createdb dbname
$csvsql --db postgresql://username:password@localhost:port/dbname --insert mydata.csv
$psql -q dbname -c "\d mydata"  # shows table, col names, data types
$psql -q dbname -c "SELECT * FROM mydata"

Extract a table from a SQL database into a CSV

sql2csv --db postgresql://username:password@localhost/dbname --query "select * from mydata" > extract.csv
sql2csv --db mssql://username:password@servername/dbname --query "select top 100 * from [dbname].[dbo].[tablename]" > extract.csv

Conduct analysis on csv in python

$ csvpy data.csv
Welcome! "data.csv" has been loaded in a CSVKitReader object named "reader".
>>> print len(list(reader))
>>> quit()

Time-series analysis of albedo on Svalbard (Part 1)

My current work has been focused on using MOD10A1 albedo data to quantify albedo change in the 21st century across the High Arctic. Data has been collected daily between May – September, to ensure a full melt season is viewed, and has been processed and filtered using various open source software (GDAL, NumPy, QGIS, RSGISLib). Over 10,000 images were analysed across three sites in the Arctic (Austfonna, Humboldt and Agassiz Ice Cap), detailing how albedo has changed daily, monthly and yearly, both spatially and with respect to elevation. Below are some preliminary results for Austfonna glacier in Svalbard with future posts analysing the other sites.

Figure 1 shows the data filtering hierarchy that was utilised for each image (SZA = solar zenith angle).


The daily albedo for each year was plotted to both gain an understanding of melt season characteristics, and see whether the amplitude is increasing throughout the time-series. If so, this would infer that albedo values are dropping lower through the melt season (e.g. 2008 – 2013). Melt season length can also be deduced from the width of the quadratic curves.


Finally, moving images were created to investigate the spatial patterns in albedo change. Two years were analysed to provide comparison with 2008 (top) seeing a relatively stable melt season and 2013 (bottom) witnessing a more erratic trend with very low albedo values.

Appendix: Below is the code used for filtering the data based on different thresholds for the SZA’s presented in figure 1. Open source software was used throughout its development.


#!/usr/bin/env python

# An python class to calculate 11 day running statistics for each pixel
# in the image and filtering according to criteria determined by median
# and standard Deviation. Output is a new tiff file.
# Author: Nathan Shaw

# import relevent modules
import os.path
import sys
import math
from osgeo import gdal
import numpy as np
import glob

class Filtering(object):

 #function to collect 11 consectutive files for running statistics
 def elevens(startFile, num):
      return (startFile[pos - num: pos + num + 1] for pos in xrange(5, len(startFile)-5,1))

 # a function to calculate the median of the 11 files
 def median(valueList):
     order = sorted(valueList)
     if len(order) % 2 == 0:
         return ((order[len(order)/2] + order[(len(order)/2) - 1]) / 2.0)
         return order[len(order)/2]

 # function which returns a tiff from a numpy array
 def createTiff(raster, File, folder, outputFile):
     #find raster projection for creation
     trans = File.GetGeoTransform()
     #find raster size for creation
     Ny, Nx = raster.shape
     format = "GTiff"
     driver = gdal.GetDriverByName( format )
     dst_ds = driver.Create("//home//student//Desktop//DISSERTATION_DATA//" + str(folder.replace('_CLIP','_FILTER')) + "//" + str(outputFile), Nx, Ny, 1, gdal.GDT_Byte)
     dst_ds = None
     print outputFile + ' complete'

 # a function to compare the statistics from the 11 files and the individual image in question
 discarding pixels as and when needed
 def pixelFiltering(inputFile, outputFile, folder, emptyAv, emptySD, emptyMed):
     data = gdal.Open("//home//student//Desktop//DISSERTATION_DATA//" + str(folder) + "//" + str(inputFile))
     myArray2 = data.ReadAsArray()
     x = myArray2.shape
     #finding image number so different filtering criteria can be imposed for early/
     late melt season.
     dataFind = inputFile[13:16]
     #locate days in melt season where SZA is greater than 70 degrees
     if (int(dataFind) <= 134 or int(dataFind) >= 237):
         for numRows2 in range(x[0]):
             for numCols2 in range(x[1]):
                 pixelValue = myArray2[numRows2, numCols2]
                 #sort through pixels only doing comparison on ones between 0 and 100
                 #select maximum albedo for time of year with 97 possible due to low SZA
                 if (pixelValue >= 97 and pixelValue <= 100): 
                      myArray2[numRows2, numCols2] = 0
                      # define pixel to be zero (or Nan as long as outside 0 - 100)
                 elif (pixelValue > 100 or pixelValue <= 0):
                 #filter the pixel value based on median and SD values (unique for below/after 70 degree SZA - and for each glacier)
                      if ((pixelValue > emptyAv[numRows2, numCols2] + 1.5*emptySD[numRows2,numCols2]) and (pixelValue > emptyMed[numRows2, numCols2] + 4)):
                      # data outside 2*SD is removed but only if outside median +- 4 aswell. 
                      This is just for SZA > 70
                          myArray2[numRows2, numCols2] = 0

                      elif ((pixelValue < emptyAv[numRows2, numCols2] - 1.5*emptySD[numRows2, numCols2]) and (pixelValue > emptyMed[numRows2, numCols2]
                      + 4)):
                          myArray2[numRows2, numCols2] = 0
         #output final numpy array to tiff   
         createTiff(myArray2, data, folder, outputFile)
     #for all images where SZA is less than 70 degrees
         for numRows2 in range(x[0]):
             for numCols2 in range(x[1]):
                  pixelValue = myArray2[numRows2, numCols2]
                  #sort through pixels only doing compariosn on ones between 0 and 100
                  if (pixelValue >= 95 and pixelValue <= 100): 
                       # chose 95 for days when SZA < 70 as albedo never going to be higher
                       myArray2[numRows2, numCols2] = 0
                       # define pixel to be zero (or Nan as long as outside 0 - 100)
                  elif (pixelValue > 100 or pixelValue <= 0):
                       if ((pixelValue > emptyAv[numRows2, numCols2] + 2*emptySD[numRows2,numCols2]) and (pixelValue > emptyMed[numRows2, numCols2] + 4)):
                       # data outside 2*SD is removed but only if outside median +- 4 aswell. This is just for SZA < 70
                       myArray2[numRows2, numCols2] = 0

                  elif ((pixelValue < emptyAv[numRows2, numCols2] - 2*emptySD[numRows2, numCols2]) and (pixelValue > emptyMed[numRows2, numCols2] + 4)):
                       myArray2[numRows2, numCols2] = 0
           #output final numpy array to tiff   
           createTiff(myArray2, data, folder, outputFile)   

 # main function which will collect images, stack them, calculate statistics (mean, SD and median) before compare with image in question
 def convertToNumpy(inFile): 
     images = [a for a in os.listdir(inFile) if a.endswith('CLIP')]
     for files in sorted(images):
         images2 = [a for a in os.listdir("//home//student//Desktop//DISSERTATION_DATA//" + str(files)) if a.endswith('.tif')]
         for group in elevens(sorted(images2),5):
             empty = []
             for files2 in group:
                 data = gdal.Open("//home//student//Desktop//DISSERTATION_DATA//" + str(files) + "//" + str(files2)) #could put os.change dir here
                 myArray1 = data.ReadAsArray()
             #numpy function to stack all images
             array = np.dstack(empty)
             #calculate statistics for each pixel in stacked array
             #get image dimensions
             x = array.shape
             emptyAv = np.empty((x[0],x[1]))
             emptySD = np.empty((x[0],x[1]))
             emptyMed = np.empty((x[0],x[1]))
             for numRows in range(x[0]):
                 for numCols in range(x[1]):
                      pixel_list = array[numRows, numCols]
                      #calculation of mean,SD and median
                      # empty list for albedo values
                      albedoList = []
                      for items in pixel_list:
                           if (items < 95 and items > 0):
                               # selcting pixels with no albedo higher than 95 in JJA, 97 in dates outside SZA of 70 
                      if len(albedoList) != 0:
                          #calculate average
                          average = sum(albedoList)/len(albedoList)
                          #calculate SD
                          standardDeviation = math.sqrt(sum([dev*dev for dev in [a - average for a in albedoList]]) / len(albedoList))
                          #append all values to empty numpy arrays to allow pixel 
                          by pixel comparison
                          emptyAv[numRows, numCols] = sum(albedoList)/len(albedoList)
                          emptySD[numRows, numCols] = standardDeviation 
                          emptyMed[numRows, numCols] = median(albedoList) 
                          emptyAv[numRows, numCols] = 0
                          emptySD[numRows, numCols] = 0
                          emptyMed[numRows, numCols] = 0

              pixelFiltering(group[5],str(group[5].replace('.tif','_filter.tif')),files, emptyAv, emptySD, emptyMed)
              print group[5] + '\n'
 # executable function
 def run(self, initialDirectory):
if __name__ == "__main__":
    obj = Filtering()"//home//student//Desktop//DISSERTATION_DATA")