- Multiple Shapefile data sources access programmatically online and converted to points in Arcpy, then mapped with ArcGIS pro
- Shown below is a dashboard of these data for the latest sampling dates where contamination data is grouped by city/township
What is Arcpy?: Part 3¶
Reminder: Arcpy is a Python library for automating GIS tasks in Arc GIS Pro
- Collection of modules, functions, and classes from the ArcGIS toolbox
- It allows you to perform geoprocessing, analysis, data management and mapping automation using python
In [1]:
import arcpy # allows for access to ArcGIS pro geoprocessing tools and workflow automation
import os # enables interaction with local system resources (paths to folders and files)
import requests # access data from the web
import zipfile ## need this to process and extract downloaded zipfiles later on
import pandas as pd ## use to check spreadsheet formatting
initial_dir= os.getcwd() # capture the initial directory before setting environment for arcpy
print("Initial working directory: ", initial_dir)
In [2]:
### BE SURE TO INSTALL MAGIC LIBRARY FIRST for filetype validation example below :
# step1 activate arcgispro-py3 environment: conda activate "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3"
# step2 install library: pip install python-magic-bin on windows, for unix system use conda install -c conda-forge python-magic
import magic ## to validate file type requested from web
What we did last time¶
- Used requests libary to pull a shapefile dataset of ground water contamination from MN Geospatial Commons
- Used arcpy.conversion.FeatureClassToGeodatabase() to load a list of shapefiles into a geodatabase
- Created a function that would simplify checking the existance and characteristics of our feature classes
What we will do this time:¶
- Revisit the custom function to check our gdb
- Create a new function using the request library
- Automate download and loading of new data into our project
Create and re-use Custom Functions to Save time¶
- When you create a python script, the functions within those scripts can be loaded into other scripts
- To import your script, do as you would any library in python
if your script is called my_script.py, then:
python import my_script as ms
Now the functions within this script are available to use
python ms.my_function()
- If you would like these tools to test or follow along download the script file here: Download
In [3]:
## We have such a script to use
# So, let's load in our custom tool
import custom_arcpy_tools as cat
In [4]:
cat.listFC_dataset_Attributes()
## We purposefully caused an error
# The error is due to workspace not being set in this new notebook or script
In [ ]:
### Don't remember the arguments for this function?
### Use function_name??
In [5]:
help(cat.listFC_dataset_Attributes) ## use help(function_name) to get the documentation for the function
In [6]:
## Can use magic commmands in the notebook to get file paths
##
%lsmagic ## lists out all magic commands
Out[6]:
In [7]:
## will use %pwd
# to check our working directory and look for the geodatabase name
%pwd
Out[7]:
In [8]:
# list the files in the current folder
%ls
In [ ]:
## Not here
## Change the directory to the folder with the gdb
In [9]:
%cd ./02 Result
In [10]:
# list the files again
%ls
In [ ]:
## should see a file ending with .gdb
In [11]:
## Use the gdb name in the folder, and set the path to the gdb
###====================== Set up file paths
gdb_path = r"c:\Projects\my_git_pages_website\Py-and-Sky-Labs\content\ArcPY\02 Result\MN_WaterContamination.gdb"
arcpy.env.workspace= gdb_path
wksp = arcpy.env.workspace
arcpy.env.overwriteOutput= True
In [12]:
arcpy.Exists(wksp) ## same as using print statement following if not os.path.exists(gdb_path)
Out[12]:
In [13]:
# Retry the function
cat.listFC_dataset_Attributes()
In [14]:
# this should confirm we have 10 feature classes
arcpy.ListFeatureClasses()
Out[14]:
In [32]:
## custom function to create sub-folder within the folder where your script exists
### Project Folder/
## |---Script.py
## |---Subfolder
## check where the script is and make a folder
def get_path_mkfolder(make_folder=False,folder_name=None):
"""Get the path to your script or notebook and optionally creates a folder in at the same level
rootFolder/
|--ScriptFolder/
|---Script.py
|---New Folder
Args:
make_folder (bool, optional): Set to True to make a new folder. Defaults to False.
folder_name (str, optional): Name of the Folder to be created. Defaults to None.
Return:
str: The path of the created folder, or the script directory
"""
try:
script_path=os.path.dirname(os.path.abspath(__file__))
print("Running a python script file here: ", script_path)
if make_folder and folder_name:
subfolder_path = os.path.join(script_path,folder_name)
#check if the folder exists
if not os.path.exist(subfolder_path):
os.makedirs(subfolder_path)# make the folder
print("Created subfolder at: ", subfolder_path)
else:
print("Folder already exists: ",subfolder_path)
return subfolder_path # return subfolder path
print("Script in this folder: ", script_path)
return script_path # return script path
# handle typos, undefined paths
except NameError:
# handle cases where __file__ is not available , user is in a notebook
script_path = initial_dir
print("Running in a notebook here: ", script_path)
if make_folder and folder_name:
subfolder_path = os.path.join(script_path,folder_name)
# check if folder exists
if not os.path.exists(subfolder_path):
os.makedirs(subfolder_path) # create the folder
print("Created subfolder at: ", subfolder_path)
else:
print("Folder already exists: ", subfolder_path)
return subfolder_path
print("Script in this folder: ", script_path)
return script_path
In [33]:
folder_path =get_path_mkfolder(make_folder=True, folder_name="01 Data") # if following along, for consistency with later parts, change to "01 Data"
print("Folder path to use", folder_path)
In [ ]:
## Now we can use this folder to store new inputs, and manage the folder organization
## Here the project already had a defined folder layout, and we could have re-used the 01 Data folder to store new inputs
## rootFolder:
# |---Script.ipynb
# |---01 Data Folder
# |--- .shp files
# |---02 Results Folder
# |--- .gdb
# |--03 Map
# |--04 TestFolder
In [27]:
# prefixing with ! and using cd.. & tree will run system command tree to get a tree like view of the project folder
! cd.. & tree
In [34]:
##===================== Define the path to script and project folder
# set the folder path as the working environment, and store it for later use
try:
arcpy.env.workspace = gdb_path
wksp = arcpy.env.workspace
except Exception as e:
print(f"Error setting up the path, {e}, Exception type: {type(e).__name__}") # python's built-in error messaging, {e} prints description and type(e).__name__ category
print("Working environment is here",wksp)
In [35]:
## Modify the env attribute to allow overwriting output files
arcpy.env.overwriteOutput = True
Data Sources :¶
- Geography: MN
Input Name | file types | Coordinate System | Source |
---|---|---|---|
MN PFAS levels | 1 excel file | GCS: NAD 83 | Report |
MN Groundwater contamination atlas | 6 shape files | PCS: NAD 83 UTM Zone 15N | MetaData |
MN Cities and townships | 1 shapefile | PCS: NAD 83 UTM Zone 15N | MetaData |
- We have already handled the first two datasets and source in the previous walkthrough
- Now will retrieve the cities and townships data
Get online data with the Requests library¶
requests.get(url, params= None, **kwargs)
¶
- Required argument : url (url to send the request to)
- params(optional, default is None), allows you to pass in query parameters that filter, modify or sort the data you're requesting
- response = response.get("https://some.example.com/data", params ={key:value})
- **kwargs: keyword arguments. A way to pass additional arguments to the request function as a dictionary
- can pass optional keyword arguments, headers, timeout
- Use-case: make customized requests to web pages, API or get files
- Example request with parameters:
- response = response.get("https://some.example.com/data", params={"State":"Minnesota"}, headers="User-Agent":"my-app")
In [ ]:
###=================== Automate ShapeFile dataset download: access, download and store new input shapefile dataset from a url
# Ofcourse, we can always navigate to MN Geocommons or elsewhere, click, download and place the zipfile into our data input folder
# Would have to do this every time, i.e., creating folders, navigating to extracted folder, extracting files
# The downloadShapefile Function below accomplishes all this automatically using only the target url, and optional target folder
# Note: it re-uses are previous function to get the script path and use that to create the target folder for storing data
def downloadShapefile(url, target_folder=None):
"""Access and download a shapefile from a url pointing to the shapefile
Args:
url (str): Url of the Shapefile. Uses the requests library to access the file header
handles large file size by not streaming all content into memory if the file is above 200 megabytes
downloads the zip file and then extracts to a local path
target_folder(str, optional): path to the folder where files should be stored.
target_folder defaults to "01 Data" which is created in the same folder where the script exists
Returns:
Tuple of File paths, str: Returns the path to the downloaded zip file
"""
try:
# Set up folders where data will be stored
data_folder = target_folder if target_folder else get_path_mkfolder(True, "01 Data")
print("Subfolder will be created within: ", data_folder)
# Create folders for the zipfile download and to hold the extracted files
file_path = os.path.join(data_folder, r"ShapeFile_Inputs\downloadedData.zip")
extracted_zipFolder = os.path.join(data_folder, r"ShapeFile_Inputs\ExtractedZip")
# Check folders exist
if not os.path.exists(os.path.dirname(file_path)):
os.makedirs(os.path.dirname(file_path)) # make the folder to hold the download if it does not exists
print("Set the path for the download to: ", file_path)
if not os.path.exists(extracted_zipFolder):
os.makedirs(extracted_zipFolder) # make the folder to hold the extracted files if it does not exists
print("Set the path for Extracted zip files to", extracted_zipFolder)
else:
print("Extracted Zip folder exists, please validate the output exists: ", extracted_zipFolder)
# get the file size
response_head= requests.head(url)
if 'Content-Length' in response_head.headers:
response_header= response_head.headers
file_size = response_header['Content-Length']
file_size_mb= round(int(file_size) / (1000000)) # convert bytes to megabytes
print(f"Shapefile Data File size ~: {file_size_mb} megabytes")
else:
print("Content length header not found in response. Proceeding with download")
# check file size
if file_size_mb <200:
# Download file
response= requests.get(url)
with open(file_path, "wb") as f:
f.write(response.content)
else: # Download file in chunks
response = requests.get(url, stream=True)
with open(file_path, "wb") as f:
for chunk in response.iter_content(chunk_size=65536): # chunk size to load into memory from the response
if chunk:
chunk.write(response.content)
print("Download Successful")
## Validate the filetype is zip
if magic.from_file(file_path, mime=True) == "application/zip": # extracts, checks the filetype
print("Extracting data from zip file...")
# read the zip file
with zipfile.ZipFile(file_path, "r") as zip_f:
zip_f.extractall(extracted_zipFolder) # extract the zipfile to the target folder
print("Extraction complete. Files extracted to", extracted_zipFolder)
else:
print("Error : Downloaded file is not a valid zip file")
return None, None
return extracted_zipFolder, file_path # note it returns a tuple that needs to be unpacked for the first element [0]
# handle url request errors
except requests.exceptions.RequestException as e:
print(f"Error occurred during the Download{e}")
return None
# handle general errors
except Exception as e:
print("Unexpected error", e, type(e).__name__)
return None, None
In [37]:
url =r"https://resources.gisdata.mn.gov/pub/gdrs/data/pub/us_mn_state_dot/bdry_mn_city_township_unorg/shp_bdry_mn_city_township_unorg.zip"
# returns a tuple of paths that are useful for easily accessing the extracted shapefiles and zipfile download itself
extracted_zipFolder = downloadShapefile(url)[0] # take the first item in the tuple to store the location of the shapefiles
In [38]:
print(extracted_zipFolder)
In [ ]:
# We now have the dataset stored locally, if done regularly could modify this download to occur on a regular schedule
# example: download shapefile dataset from a url weekly, daily
In [40]:
# quick reminder of where our current workspace is set
print("Data Folder is here" , wksp)
# let's call it gdb for clarity
gdb=wksp
In [ ]:
## Search the documentation for arcpy.conversion.FeatureClassToGeodatabase()
from IPython.display import IFrame # a module for controlling notebook outputs, allows you to embed images, video, webpages with the Iframe function
# ArcGIS Pro documentation URL for a specific tool
tool_url = "https://pro.arcgis.com/en/pro-app/latest/tool-reference/data-management/xy-table-to-point.htm"
# Display the documentation inside Jupyter Notebook
IFrame(tool_url, width="100%", height="600px") # iframe can be used to display local or online webpages, documents, reports, visualizations , videos
In [41]:
###========================== Load all the shapefile into a GDB
def loadShapefilesToGDB(gdb_path, shapefile_inputs):
"""Loads all shapefiles in a folder into a geodatabase
Args:
gdb_path (str): Path to the geodatabase
shapefile_inputs (str): Path to the folder holding the shapefiles
Returns:
list: Returns a list of shapefile paths that were used for loading into the GDB, or an emtpy list if no shapefiles were processed
"""
try:
# use extracted zipfiles if the path exists, otherise use the specified path to the files
if not shapefile_inputs or not os.path.exists(shapefile_inputs):
print("Invalid path to shapefile folder provided")
return [] # return empty list
# Loop through the folder with extracted files and compile a list of all the shape file paths
shapefile_list = [] # start an empty list to hold .shp files
for f in os.listdir(shapefile_inputs):
if f.endswith(".shp"):
full_shp_path= os.path.join(shapefile_inputs,f) # get the full path for each file
shapefile_list.append(full_shp_path) # add to a list
###================================ Batch Convert shapefile to Feature classes in the Gdb
# check the list of shp files exists
if not shapefile_list:
print("no shape files found for extraction")
return []
else:
## Load the shapefiles into the gdb
### TOOL: arcpy.conversion.FeatureClassToGeodatabase(Input_Features, Output_Geodatabase)
arcpy.conversion.FeatureClassToGeodatabase(shapefile_list,gdb_path)
print(f" Successfully Loaded shapefiles into into the {gdb_path}")
return shapefile_list
except Exception as e:
print("Error Loading Shapefiles into the GDB ", e, type(e).__name__)
except arcpy.ExecuteError:
print("Arcpy Error: ", arcpy.GetMessages(2))
In [42]:
# we do not need the input shapefile paths, but for demonstration the list is extracted
loaded_shapefiles=loadShapefilesToGDB(wksp,extracted_zipFolder)
print("These files were loaded into the GDB", loaded_shapefiles)
In [43]:
## Check the gdb
arcpy.ListFeatureClasses()
Out[43]:
In [44]:
## Recall we can use the Describe Object to access properties of a Feature class like its Spatial Reference
## Get all the fc classes from the gdb
list_fc = arcpy.ListFeatureClasses()
for fc in list_fc:
desc= arcpy.Describe(fc)
sr = desc.SpatialReference
print(f"Feature class: , {fc}, Spatial Reference name :, {sr.name}, Spatial Ref Type : {sr.type}, Geometry{desc.shapetype}, WKID: {sr.factorycode}")
print("-"*150,"\n") # add line and space between prints
In [45]:
# We already have a custom function that gathers key feature class attributes and prints the workspace to avoid repetitive coding
cat.listFC_dataset_Attributes()