- Multiple Spreadsheet data sources converted to points in Arcpy and mapped with ArcGIS pro
What is Arcpy?¶
Arcpy is a Python library for automating GIS tasks in Arc GIS Pro
- Collection of modules, functions, and classes from the ArcGIS toolbox
- It allows you to perform geoprocessing, analysis, data management and mapping automation using python
In [ ]:
import arcpy # allows for access to ArcGIS pro geoprocessing tools and workflow automation
import os # enables interaction with local system resources (paths to folders and files)
Set up a workspace¶
arcpy.env.workspace
¶
- Use-case: get or set the current workspace environment in which to perform arcpy operations
Breaking down the synax
- arcpy --> library
- env --> class that manages environment settings
- workspace --> attribute of the env class within the arcpy library
Note the general pattern: library.class.attribute
In [72]:
##===================== Define the path to script and project folder
# Gets folder where the script is located , use try/ except to handle different environments
try:
# works on normal python .py scripts
script_path = os.path.dirname(os.path.abspath(__file__))
("running a python script")
except NameError:
# works in a notebook environment
script_path= os.getcwd()
("running in a notebook")
print("Script is here", script_path)
# Get the folder inside the script folder
dir_path = os.path.join(script_path, "01 Data")
print("Directory path set to : ", dir_path)
# make sure the path exists on your operating system, if not create it
if not os.path.exists(dir_path):
os.makedirs(dir_path)
# set the folder path as the working environment, and store it for later use
try:
arcpy.env.workspace = dir_path
wksp = arcpy.env.workspace
except Exception as e:
print(f"Error setting up the path, {e}, Exception type: {type(e).__name__}") # python's built-in error messaging, {e} prints description and type(e).__name__ category
print("Working environment is here",wksp)
In [ ]:
## We often end up re-running the workflow to tweak the output
## Modify the env attribute to allow overwriting output files
arcpy.env.overwriteOutput = True
Arcpy Documentation¶
- Before getting into the weeds, always good to have ArcGIS documentation on hand
- Link to ArcGIS Docs
- Here you can search a tool by name and then copy the required syntax from the python tab
- paste the code into your script for reference
In [ ]:
### Not bad...But why not load and search the documentation directly in this workbook?
Load ArcGIS Documentation directly into the Notebook¶
In [11]:
from IPython.display import IFrame # a module for controlling notebook outputs, allows you to embed images, video, webpages with the Iframe function
# ArcGIS Pro documentation URL for a specific tool
tool_url = "https://pro.arcgis.com/en/pro-app/latest/tool-reference/data-management/xy-table-to-point.htm"
# Display the documentation inside Jupyter Notebook
IFrame(tool_url, width="100%", height="600px") # iframe can be used to display local or online webpages, documents, reports, visualizations , videos
Out[11]:
Listing and searching toolbox and tool names¶
In [21]:
#get a list of all toolboxes
toolboxes=arcpy.ListToolboxes("*")
print("Availiable Toolboxes", toolboxes)
In [ ]:
del toolboxes # delete the variable
In [30]:
## access all tool names
# get list of all tools
tools =arcpy.ListTools("*") # the asterisk is a wildcard that can be replaced with a string of the name or part of what you are looking for e.g., "*Ma"
print("Availiable Tools", tools)
In [34]:
# get list of all tools within a specific toolbox
tools =arcpy.ListTools("*_management") # * is a wildcard that can be replaced with a string of the name or part of what you are looking for e.g., "*Ma"
print("Availiable Tools", tools)
In [32]:
del tools
In [36]:
# Search within a specific toolbox
print(arcpy.ListTools("*_analysis")) # Get tools only from the Analysis Toolbox
In [37]:
# Search by specific tool name
print(arcpy.ListTools("Buffer_*")) # Get tools that start with the word Buffer
Creating Geodatabases to manage data in ArcGIS Pro¶
arcpy.management.CreateFileGDB()
¶
- arcpy.management.CreateFileGDB(out_folder_path, out_name, {out_version})
- Geodatabases are the way project files are structured and managed in ArcGIS Pro
- Collection of geographic datasets held in a common folder
- Use-case: creating, managing and editing datasets
In [107]:
##======================== Create a Geodatabase to store project data
## TOOL: arcpy.management.CreateFileGDB(out_folder_path, out_name, {out_version})
# folder path
gdb_folder = r'C:\Projects\my_git_pages_website\Py-and-Sky-Labs\content\ArcPY\02 Result'
if not os.path.exists(gdb_folder):
os.makedirs(gdb_folder)
#params, path
gdb_name = 'MN_EnvironmentalData.gdb'
gdb_path = os.path.join(gdb_folder,gdb_name)
# Create gdb if it does not exist
if not arcpy.Exists(gdb_path):
arcpy.management.CreateFileGDB(gdb_folder,gdb_name)
else:
print("GDB already exists", gdb_path)
Define Coordinate System by creating a spatial reference object¶
arcpy.SpatialReference()
¶
- Use-case: changing or defining the CS for a new dataset
multiple ways you can specify a coordinate system
- use a projection file (.prj), name of coordinate system(e.g., GCS_WGS _1984), or WKID (e.g., 4326) also known as factory code
- WKID is less prone to error
- Link to look up coordinate systems : https://pro.arcgis.com/en/pro-app/arcpy/classes/pdf/geographic_coordinate_systems.pdf
Note the general pattern: library.class.attribute
- accessing attributes of class gets you more info
- sr = arcpy.SpatialReference(4326)
- sr.name
In [61]:
##======================== Create a Spatial Reference object to define Coordinate Systems
### TOOL: arcpy.SpatialReference(item, {vcs}, {text})
# Input Data USGC Minnesota Wells : https://nwis.waterdata.usgs.gov/nwis/gwlevels?search_criteria=state_cd&submitted_form=introduction
# Metadata Shows CS is NAD83
wkid = 4269 # GCS_North_American_1983
sr= arcpy.SpatialReference(wkid) # Can pass in a Geographic or Projected CS (e.g., could have passed in wkid 26915 for PCS NAD83 UTM Zone 15N)
print("\nName of CS: ",sr.name, " \nType of CS: ", sr.type, "\nDatum name: " , sr.datumName, "\nUnits are : ", sr.angularUnitName)
Using Geoprocessing Tools in Python¶
- Geoprocessing tools are called using this general structure: arcpy.module.ToolName
- module is pythons way of accessing a toolbox as seen in ArcGIS
arcpy.management.XYTableToPoint
- arcpy.management.XYTableToPoint(in_table, out_feature_class, x_field, y_field, {z_field}, {coordinate_system})
- Optional parameters have curly braces { }
- Skip optional ones with double quotes "" or specify only those optional parameters you are using
```python
## Optional Parameter specified
arcpy.management.XYTableToPoint(intable, outfeature, x_field, y_field, coordinate_system = arcpy.SpatialReference(4326))
- Note: 4326 is the well-known ID (WKID) for WGS 1984
In [108]:
##======================== Load in example wells csv file data and check its datatypes
# note: to identify x, y fields and CS, open the file or load into python (pandas or similar)
import pandas as pd
input_data = r'C:\Projects\my_git_pages_website\Py-and-Sky-Labs\content\ArcPY\01 Data\MN_Water_wells_rft.csv'
# Read CSV using tab as the delimiter
df = pd.read_csv(input_data, delimiter=",", encoding="utf-8") # ensure file is comma-delimited, we need comma-delimited for arcpy to work
print(df.dtypes)
print()
print(df.columns)
In [109]:
##======================== Reformat datatypes to match the expected format
#Convert 'int64' fields to float' (ArcPy does not support int64)
df["site_no"] = df["site_no"].astype("float64") # Converts int64 to float64
df["county_cd"] = df["county_cd"].astype("float64")
# save to csv with fixed formatting
df.to_csv(r"01 Data\MN_Water_wells_fixed.csv", index=False, sep=',', encoding="utf-8")
In [114]:
##======================== Change the working directory to the GDB
# set the folder path as the working environment, and store it for later use
try:
arcpy.env.workspace = gdb_path
wksp = arcpy.env.workspace ## optionally, update the path in wksp variable
except Exception as e:
print(f"Error setting up the path, {e}, Exception type: {type(e).__name__}") # python's built-in error messaging, {e} prints description and type(e).__name__ category
print("Working environment is here",wksp)
##======================== Convert spreadsheet data to point a feature class based on x,y coordinates from a table
### TOOL: arcpy.management.XYTableToPoint(in_table, out_feature_class, x_field, y_field, {z_field}, {coordinate_system})
# params
input_data = r'C:\Projects\my_git_pages_website\Py-and-Sky-Labs\content\ArcPY\01 Data\MN_Water_wells_fixed.csv'
output_data= "MN_Wells"
y_coords = 'dec_lat_va'
x_coords = 'dec_long_va' # ensure longitude is x
# try/except to handle errors
try: # argument (sr) passed in with the key word (coordinate_system)
arcpy.management.XYTableToPoint(input_data, output_data, x_coords, y_coords, coordinate_system=sr)
except Exception as e:
print("Error converting table" ,e)
In [120]:
## Check if the new feature exists in the gdb
fc_list = arcpy.ListFeatureClasses() # will list all features in the current workspace (i.e, our gdb), can specify feature_type e.g., Point.
for i, fc in enumerate(fc_list):
print(i, fc, fc )
In [ ]:
input_fc= "MN_Wells"
wkid= 4269 # GCS_North_American_1983
sr = arcpy.SpatialReference(wkid)
out_cs = sr
arcpy.management.DefineProjection(input_fc, out_cs)
In [122]:
# Access properties for the feature class
fc_desc = arcpy.Describe(fc)
fc_desc.spatialReference
Out[122]:
In [155]:
fc_desc.shapeType
Out[155]:
Batch Creation of Multiple Tables to Point features¶
- use-case: multiple new data inputs that need to converted to points
Steps:¶
- Check Coordinate Systems match
- Project to common CS, if needed
- Check x,y (lat/lon) columns, re-format
- Convert tables to points
Data Sources :¶
- Geography: MN
Input Name | Coordinate System | Source |
---|---|---|
Water Well locations | GCS: NAD 83 | CSV output |
What's in my neighborhood sites | PCS: NAD 83 UTM Zone 15N | MetaData |
MN pollution control activity sites | PCS: NAD 83 UTM Zone 15N | MetaData |
PFAS levels | GCS: NAD 83 UTM Zone 15N | Report |
Ground Water contamination sites | PCS: NAD 83 UTM Zone 15N | MetaData |
- All are CSV file type
- Two input datasets are different, we will demonstrate how to reproject the GCS to the same PCS as the others
In [161]:
###==================== Project spatial data from one coordinate system to another
## TOOL: arcpy.management.Project(in_dataset, out_dataset, out_coor_system, {transform_method}, {in_coor_system}, {preserve_shape}, {max_deviation}, {vertical})
# params
input_data = "MN_Wells"
out_proj= 'MN_Wells_repojected'
wkid_proj = 26915
out_cs = arcpy.SpatialReference(wkid_proj)
print("\nName of CS: ",out_cs.name, " \nType of CS: ", out_cs.type , "\nUnits are : ", out_cs.linearUnitName)
try:
arcpy.management.Project(input_data, out_proj,out_cs)
except Exception as e:
print("Error projecting the data", e)
In [ ]:
# check that new projected fc exists in the workspace
arcpy.ListFeatureClasses()
Out[ ]:
In [231]:
##===================== Select and create a list of the tables to be processed
input_folder_path = r'C:\Users\beste\Desktop\ArcPy\what_is_arcpy\01 Data'
new_file_list = set()
# use Walk to list out the names of the files in the folder
for dirpath, dirnames, filenames in arcpy.da.Walk(input_folder_path): # returns folder path, folders, and file names. Is a tuple of folder path, folder names, and files
print("Walking through this folder", dirpath)
for file in filenames:
if not file.startswith("MN_Water"):
print(file)
full_path = os.path.join(input_folder_path,file)
new_file_list.add(full_path)
print("Compiled input files", new_file_list)
In [232]:
# check columns are same for x,y fields, reformat them if not
for file in new_file_list:
df= pd.read_csv(file)
df.columns = df.columns.str.lower().str.strip() #Initially, Lat , Long casing was different, reformat all to be lowercase, trim extra spaces
print("File path: ", file,"\n" ,df.columns)
df.to_csv(file, index= False) # prevents extra index column being added
In [233]:
##======================== Convert Multiple spreadsheets to point a feature class based on x,y coordinates
# verify the workspace where points will be created
arcpy.env.workspace = gdb_path
arcpy.env.overwriteOutput=True
print("Gdb path set to: ", gdb_path)
# params
y_coords = 'latitude'
x_coords = 'longitude' # ensure longitude is x
##======================= Loop through files one-by-one and convert them to points
counter= 0
# try/except to handle errors
for file in new_file_list:
input_data = file
output_name = os.path.splitext(os.path.basename(file))[0] # get the file name , split extension from file name, and select the file name
print(output_name)
output_data = os.path.join(gdb_path,output_name)
try: # argument (sr) passed in with the key word (coordinate_system)
arcpy.management.XYTableToPoint(input_data, output_name, x_coords, y_coords, coordinate_system=sr)
print(f"Successfully Converted {output_name} to points")
counter+=1
except Exception as e:
print(f"Error converting table {output_name}" ,e)
print(f"Finished Converting {counter} tables to points")
In [234]:
# check the features that exist in the workspace
arcpy.ListFeatureClasses()
Out[234]: