Introduction to List Comprehensions¶
- Definition
- List comprehensions are a concise and efficient way to create lists in Python
- They provide a syntactically elegant method to perform operations and apply conditions to iterables, allowing the creation or transformation of lists in a single line of code
Why Do We Use List Comprehensions?:
- Conciseness: Reduces the amount of code needed compared to traditional loops, making the code cleaner and easier to read
- Performance: Generally faster than equivalent for loops due to optimized implementation and reduced overhead in Python
- Expressiveness: Allows the code to be more descriptive and focused on the operation itself, rather than the mechanics of looping and appending to lists
- Versatility: Capable of incorporating conditional logic within the list creation, which lets you filter elements or apply complex transformations easily
- Key point: Use List comprehensions to transform or extract data
Syntax:
- Basic Structure: A list comprehension consists of brackets containing an expression followed by a for clause
- Optionally, it can include one or more for or if clauses.
Generalized Examples:¶
[expression for item in iterable]
[expression for item in iterable if condition]
[expression for item in iterable if condition1 if condition2]
[expression for item in iterable1 for item2 in iterable2]
In [1]:
import pandas as pd
import os
Basic List Comprehension¶
In [ ]:
# basic list comprehension for squaring numbers and creating a list
# Basic components of list comprehension
# (1) expression
# (2) item
# (3) iterable
# (1) (2) (3)
squares = [x**2 for x in range(10)]
print(squares)
In [4]:
# What happenned?
# The list comprehension iterated over the range of numbers from 0 to 9, squaring each number and storing it in a list
Conditional List Comprehension¶
In [ ]:
# Generate a list of even numbers between 1 and 21
# list comprehension with a condition
# Basic components of list comprehension
# (1) expression
# (2) item
# (3) iterable
# (4) condition
# (1) (2) (3) (4)
evens = [x**2 for x in range(1,21) if x %2==0]
print(evens)
# print(sqrt{evens})
In [ ]:
even = [x if x % 2 == 0 else'not even' for x in range(1,21) ]
even
In [ ]:
# List comprehension that filters results based on membership in predefined list
magic_nums = [1,2,7,8]
mylist_magnum = [x**2 for x in range(10) if x in magic_nums]
print(magic_nums)
print(mylist_magnum)
In [ ]:
# Create a list of tuples with numbers and their squares
numbers_and_squares = [(x,x**2) for x in range(10)]
print(numbers_and_squares)
numbers_and_squares.append((10,100))
print(numbers_and_squares)
# use pop () method to remove and store an item
# remove item at index 0 and store it in a variable
one_item =numbers_and_squares.pop(0)
print(one_item)
print(numbers_and_squares)
In [ ]:
# Create a list of tuples with numbers and their squares
numbers_and_squares = [(x,x**2) for x in range(10)]
print(numbers_and_squares)
numbers_and_squares.append((10,100))
print(numbers_and_squares)
# use delete statement to remove items without storing them
# can accept a slice to delete a range of items
# remove item at index 0 without storing it
del numbers_and_squares[0]
print(numbers_and_squares)
In [ ]:
# Create a list of tuples with numbers and their squares
numbers_and_squares = [(x,x**2) for x in range(10)]
print(numbers_and_squares)
numbers_and_squares.append((10,100))
print(numbers_and_squares)
# Remove items at indices 0 to 4 (inclusive of 0, exclusive of 5)
del numbers_and_squares[0:5]
print(numbers_and_squares)
Extracting data using list comprehensions¶
In [ ]:
# Extract values that meet a certain criteria
# Basic components of list comprehension
# (1) expression
# (2) item
# (3) iterable
# (4) condition
test_scores= [50,60,65,98,91,85,100]
# (1) (2) (3) (4)
passing_grades = [x for x in test_scores if x >60]
print(passing_grades)
In [ ]:
#### Extract a single item
# generalized form for extracting single elements from a list based on criteria
""" list_data =
[element['key1'] for element in list if element['key2']>x]
"""
# List comprehension to extract names from a list of dictionaries
# each dictionary is a person with a name and age
people_data = [
{'name': 'John', 'age': 28},
{'name': 'Anna', 'age': 20},
{'name': 'James', 'age': 18},
{'name': 'Linda', 'age': 30}
]
# acess a specific key value for each dictionary in the list of dictionaries
adults_info = [person['name'] for person in people_data if person['age']>21]
print(adults_info)
In [ ]:
del adults_info[0]
print(adults_info)
In [ ]:
# List comprehension to extract tuple of names and age from a list of dictionaries
# each dictionary is a person with a name and age
people_data = [
{'name': 'John', 'age': 28},
{'name': 'Anna', 'age': 20},
{'name': 'James', 'age': 18},
{'name': 'Linda', 'age': 30}
]
# access specific pair key values as a tuple() for each dictionary in list of dictionaries
adults_info = [(person['name'], person['age']) for person in people_data if person['age']>21]
print(adults_info)
In [ ]:
del adults_info[0]
print(adults_info)
In [ ]:
# List comprehension to extract tuple of names and age from a list of dictionaries
# each dictionary is a person with a name and age
people_data = [
{'name': 'John', 'age': 28},
{'name': 'Anna', 'age': 20},
{'name': 'James', 'age': 18},
{'name': 'Linda', 'age': 30}
]
adults_info = [(person['name'], person['age']) for person in people_data if person['age']>21]
print(adults_info)
In [ ]:
# use pop() and append() list methods to reorganize the list
first_adult=adults_info.pop(0)
print(first_adult)
print(adults_info)
adults_info.append(first_adult)
print(adults_info)
Understanding DataFrame Iteration with iterrows()
¶
1) Introduction to DataFrame Iteration
- Why Iteration? Iteration over DataFrames is commonly needed when each row of data must be processed individually
- While vectorized operations are preferred for performance, iteration is useful for complex operations that aren't easily vectorized or when debugging row by row.
2) Using
iterrows()
: - Definition: iterrows() is a generator that iterates over the rows of a DataFrame
- It allows you to loop through each row of the DataFrame, with the row returned as a Series object
- yields a tuple for each row in the DataFrame as index, series pairs
Syntax:
- index: Represents the index of the row in the DataFrame
- row: A Series containing the row data
- iterrows(): a generator that iterates over the rows
- row['column_name']: Accesses data in a specific column for that row
Example:¶
``` for index, row in df.iterrows(): row['column_name']
Data cleaning with iterrows()
¶
In [ ]:
# iterrating over rows with iterrows():
# Create a sample dataframe
data = {'Name': ['John', 'Anna', 'James'], 'Age': ['28',22,'35a']}
sdf= pd.DataFrame(data)
# iterate over the dataframe
# for each row number , extract the row data in the dataframe, repeat for all rows
for index,row in sdf.iterrows():
# strip leading/trailing white space from the name
sdf.at[index, 'Name'] = row['Name'].strip()
# Check if the row 'Age' is a string
if isinstance(row['Age'], str):
print(f"String data index is {index}, Name is {row['Name']}, Age is {row['Age']}")
# Boolean check if expected numeric data was entered as strings, if contains letters or special characters return False
if row['Age'].isdigit():
sdf.at[index,'Age'] = int(row['Age']) # True,if is a digit from 0-9, then convert to integer
print(f"Cleaned data index is {index}, Name is {row['Name']}, Age is {row['Age']}")
else:
sdf.at[index, 'Age'] = pd.NA
print(f"Uncleaned data index is {index}, Name is {row['Name']}, Age is {row['Age']}")
In [ ]:
import pandas as pd
# Create a sample dataframe
data = {
'UserName': ['JohnDoe', 'AnnaSmith', 'JamesBond', 'JohnDoe', 'AnnaSmith'],
'UniqueID': [101, 102, 103, 101, 102],
'State': ['NY', 'CA', 'TX', 'NY', 'CA']
}
df = pd.DataFrame(data)
# initiate an empty set that will only hold unique IDs
unique_userId= set()
# iterate over the dataframe
for index, row in df.iterrows():
# Check if the current rows uniqueID is already in the set
if row['UniqueID'] in unique_userId:
# At this row, Create a new column, mark as a duplicate
df.at[index, 'Duplicates'] = True
else:
# Add the current row unique ID to the set
unique_userId.add(row['UniqueID'])
# Mark the row as False for Duplicates
df.at[index, 'Duplicates']= False
print(df)
In [ ]:
# remove duplicate
df= df[df['Duplicates']==False]
print(df)
In [ ]:
df=df.drop(columns='Duplicates')
print(df)
In [ ]:
#duplicated
# Create a sample dataframe
data = {
'UserName': ['JohnDoe', 'AnnaSmith', 'JamesBond', 'JohnDoe', 'AnnaSmith'],
'UniqueID': [101, 102, 103, 101, 102],
'State': ['NY', 'CA', 'TX', 'NY', 'CA']
}
df = pd.DataFrame(data)
# duplciates method used to identify duplicates
# subset paratemeter= specifies the column to check for dups
# keep parameter, keeps the first occurence and marks subsquent duplicates
df['Duplicates']= df.duplicated(subset='UniqueID', keep='first')
print(df)
In [ ]:
df= df[~df['Duplicates']==True]
print(df)
In [ ]:
df=df.drop(columns='Duplicates')
print(df)
In [ ]:
#duplicated
# Create a sample dataframe
data = {
'UserName': ['JohnDoe', 'AnnaSmith', 'JamesBond', 'JohnDoe', 'AnnaSmith'],
'UniqueID': [101, 102, 103, 101, 102],
'State': ['NY', 'CA', 'TX', 'NY', 'CA']
}
df = pd.DataFrame(data)
# Count duplicates
# Group by the column you want to check for duplicates
# Use transform('size') to get the count of each group
# Assign result to a new column
df['Counts']= df.groupby('UniqueID')['UniqueID'].transform('size')
print(df)
In [ ]:
# Create a sample dataframe
data = {
'UserName': ['JohnDoe', 'AnnaSmith', 'JamesBond', 'JohnDoe', 'AnnaSmith'],
'UniqueID': [101, 102, 103, 101, 102],
'State': ['NY', 'CA', 'TX', 'NY', 'CA'],
'Sales': [200, 150, 300, 250, 100]
}
df = pd.DataFrame(data)
# Normalize sales within each UniqueID group
# in a groubpy object lambda x represents the group
df['NormalizedSales'] = df.groupby('UniqueID')['Sales'].transform(lambda x: (x.sum()))
print(df)
Data transformation with iterrows()
¶
In [ ]:
# Create a sample dataframe
data = {'Name': ['John', 'Anna', 'James'], 'Age': [28,22,35]}
sdf= pd.DataFrame(data)
for index, row in sdf.iterrows():
sdf.at[index, 'New Age'] = row['Age']+10 # add 10 years to each persons age
print(sdf)
Multi-column Conditional Flagging or Computation with itterows()
¶
In [ ]:
# Create a sample dataframe
data = {'Name': ['John', 'Anna', 'James'], 'Age': [28,22,35]}
sdf= pd.DataFrame(data)
for index, row in sdf.iterrows():
if row['Age'] < 30 and "J" in row['Name']:
sdf.at[index, 'Category'] = 'Young J'
else:
sdf.at[index, 'Category'] = 'Other'
print(sdf)
In [ ]:
for index,row in sdf.iterrows():
if row['Age']>30 and 'a' in row['Name']:
sdf.at[index, 'Flag'] = True
else:
sdf.at[index, 'Flag']= False
print(sdf)
Mark specific rows with iterrows()
¶
In [ ]:
for index, row in sdf.iterrows():
if row['Name'].startswith('J') and row['Age']>25:
sdf.at[index, 'Status'] = 'Senior J'
print(sdf)
Combine List Comprehension and iterrows()¶
- extract a specific list from a dataframe
In [2]:
# Load a sample dataset to demonstrate application of list comprehsion and itterows()
# this is a dataset for tornado occuring in the state of minnesota in recent years
df = pd.read_csv(r".\Data\storm_data_search_results.csv")
# Set the option to display all columns
pd.set_option('display.max_columns', None)
df.head()
Out[2]:
In [3]:
df.info()
In [9]:
df.groupby("TOR_F_SCALE")["EVENT_ID"].size()
Out[9]:
In [10]:
# List comprehension to extract coordinates of EF0 tornado events
ef0_coordinates = [(row['BEGIN_LAT'], row['BEGIN_LON']) for index, row in df.iterrows() if row['TOR_F_SCALE'] == 'EF0']
print(ef0_coordinates)
print(len(ef0_coordinates))
In [5]:
coord_df= pd.DataFrame(ef0_coordinates)
coord_df
Out[5]:
In [6]:
# List comprehension to extract coordinates of EF0 tornado events
ef0_coordinates_c = [(row["CZ_NAME_STR"], row['BEGIN_LAT'], row['BEGIN_LON']) for index, row in df.iterrows() if row['TOR_F_SCALE'] == 'EF0']
print(ef0_coordinates_c)
In [8]:
df_cc=pd.DataFrame(ef0_coordinates_c, columns=["County", "Lat", "Lon"])
df_cc
Out[8]:
Converting data into DataFrame compatible format¶
DataFrame can be readily created from the following data structures:
list of tuples:
- Structure:
data = [ (1, 'Alice'),...] OR data= [ (1, ('Alice', 25)),...]
- each entry in the tuple (a,b) represents columns
- convert through assignment of tuple elements to columns
- df = pd.DataFrame(data, columns= ['col1','col2'])
- Structure:
list of dictionaries:
- Structure:
data = [ {'ID': 1, 'Name': 'Alice'},...]
- each entry in the dictionary {key1:value1, key2:value2} represents a row
- convert directly
- df= pd.DataFrame(data)
- Structure:
dictionary of lists:
- Structure:
data = { 'ID': [1,2,3], 'Name': ['Alice', 'Bob', 'Charlie']}
- each key : [list] pair is a column
- convert directly
- df= pd.DataFrame(data)
- Structure:
List of lists:
- Structure:
data= [ [ 1, 'Alice'] , [2, 'Bob'], [3, 'Charlie']]
- each list represents multiple columns
- convert through assignment of list element to columns
- df= pd.DataFrame(data, columns= ['col1','col2'])
- Structure:
Dictionary of dictionaries:
- Structure:
data = { 'row1': {'ID':1,'Name':'Alice},...}
- each key: dictionary entry represents a row
- convert through assignment of index orientation
- df= pd.DataFrame(data, orient='index')
- Structure:
Convert a list of Tuples to DataFrame
In [ ]:
# Recognize that a dataframe can be readily made from a list of tuples
# each tuple is a new row, each element within the tuple is a new column
ef0_df= pd.DataFrame(ef0_coordinates, columns=['latitude', 'longitude'])
ef0_df
In [ ]:
# add a column to flag the rows that are EF0 data
ef0_df["EF0_data"]=True
ef0_df
In [ ]:
tor_data_cdf= df.copy()
column_names= {
'BEGIN_LAT': 'latitude',
'BEGIN_LON': 'longitude'
}
tor_data_cdf=tor_data_cdf.rename(columns=column_names)
merge_df = tor_data_cdf.merge(ef0_df[["latitude", "longitude", "EF0_data"]], on=["latitude", "longitude"], how='left')
merge_df
In [ ]:
len(tor_data_cdf)
In [ ]:
tor_data_cdf.groupby("TOR_F_SCALE")["TOR_F_SCALE"].size()
In [ ]:
tor_data_cdf["TOR_F_SCALE"].notna().sum()
In [ ]:
tor_data_cdf["TOR_F_SCALE"].count()
In [ ]:
merge_df=merge_df[merge_df["EF0_data"]==True]
len(merge_df)
Reading Shapefiles in Cartopy¶
- What are shapefile?
- Common file type in GIS used to represent distinct objects
- file name ends in .shp
- Contains objects represented in vector data format
- spatial data with distinct boundaries
- examples: buildings, rivers, roads
Access records in Shapefile with CartoPy .Reader()¶
In [ ]:
import cartopy.io.shapereader as shpreader
# read in the shapefile
shapefile_path = r'G:\My Drive\Python_projects\my_git_pages_website\Py-and-Sky-Labs\content\Python Examples\Data\US_County_borders\ne_10m_admin_2_counties.shp'
# Use shpreader.Reader to load the shapefile
reader = shpreader.Reader(shapefile_path)
print(type(reader))
# This creates a Cartopy Reader object that can be used to access the shapefile's records
In [ ]:
# extract the shapefile record data the reader object
records = reader.records()
print(next(records))
# access the first record
rec1=next(records)
# print is attributes
rec1.attributes
In [ ]:
# access the county Name field
print(rec1.attributes['NAME'])
# get attributes of the shapefile
# for attribute_n,attribute_v in rec1.attributes.items():
# print({attribute_n}, {attribute_v})
# access the geometry for the first record
print(rec1.geometry)
# access the coordinates only
# print(list(rec1.geometry.exterior.coords))
# access the shape for the record
rec1.geometry.geom_type
In [ ]:
# access a specific attribute of the feautures in the shapefile
# start a list of counties
MN_counties_list = []
# loop through the list of counties in the shapefile records
for county in reader.records():
if county.attributes['REGION']=='MN':
print((county.attributes['NAME']))
MN_counties_list.append(county)
print(f'appended {len(MN_counties_list)} counties')
In [ ]:
print(MN_counties_list)
In [ ]:
len(MN_counties_list)
In [ ]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(14,10), subplot_kw={'projection':ccrs.PlateCarree()})
# plot all geometries for all counties
counties = list(reader.geometries())
print(len(counties))
ax.add_geometries(counties, ccrs.PlateCarree(), edgecolor='black', facecolor='grey', zorder=0) # changed to show difference (use none to make clear)
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')
ax.add_feature(cfeature.LAKES, zorder=3)
ax.add_feature(cfeature.RIVERS, zorder=2)
# Limit the map extent to Minnesota
# w, e, s, n bounds
ax.set_extent([-97.5, -89.5, 43.5, 49.5], crs=ccrs.PlateCarree()) # Adjust these values based on the actual coordinates of Minnesota
# ax.set_legend()
# mn_geom = []
# for county in reader.records():
# if county.attributes['REGION']=='MN':
# mn_geom.append(county.geometry)
# mn_geom = [county.geometry for county in reader.records() if county.attributes['REGION']== 'MN']
# ax.add_geometries(mn_geom, ccrs.PlateCarree(), edgecolor='black', facecolor='lightpink', zorder=1)
# Add grid lines with labels
gl = ax.gridlines(draw_labels=True, linestyle='--', color='gray')
gl.top_labels = False
gl.right_labels = False
# Plot Minnesota counties with names
for county in reader.records():
if county.attributes["REGION"]=='MN':
geometry = county.geometry
name = county.attributes['NAME']
ax.add_geometries([geometry], ccrs.PlateCarree(), edgecolor='black', facecolor='lightpink', zorder=1)
x, y = geometry.centroid.x, geometry.centroid.y
ax.text(x, y, name, fontsize=9, ha='center', transform=ccrs.PlateCarree())
plt.show()
In [ ]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
# define the map projection
proj= ccrs.PlateCarree()
# generate the figure and define its axes
fig, ax= plt.subplots(figsize=(14,10),subplot_kw={'projection':proj})
# extract all the geometries from the shapefile reader object
all_counties = list(reader.geometries())
# plot the county shapes on the map
ax.add_geometries(all_counties, crs= proj, edgecolor= 'none', facecolor='grey', zorder=0)
# add map base features
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.RIVERS, zorder=6)
ax.add_feature(cfeature.LAKES, zorder=7)
ax.add_feature(cfeature.BORDERS, linestyle=':', zorder=3, edgecolor='black')
ax.add_feature(cfeature.STATES, linestyle='-', zorder=4, edgecolor='black')
# set the extent for the map
ax.set_extent([-97.5,-89.5, 42.5, 49.5], crs=proj)
for county in reader.records():
if county.attributes["REGION"]=='MN':
geometry= county.geometry
name= county.attributes["NAME"]
x,y = county.geometry.centroid.x, county.geometry.centroid.y
ax.text(x,y, name, fontsize=9,ha= 'center' , transform=ccrs.Geodetic(), zorder=8)
ax.add_geometries(geometry,crs=proj , edgecolor='black', facecolor='lightpink', zorder=4)
plt.show()
In [ ]:
import cartopy.feature as cfeature
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(14,10), subplot_kw={'projection':proj})
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.OCEAN)
ax.add_feature(cfeature.LAND)
ax.add_feature(cfeature.BORDERS)
gl = ax.gridlines(draw_labels=True, linestyle=':', color='black',zorder=3)
plt.show()
Access records in Shapefile with reader.records()¶
In [ ]:
# Load the records of the shapefile
# The records() returns a generator object that can be used to iterate over the records in the shapefile
records = reader.records()
# Print the records object
type(records) # Output: <class 'generator'>
Access record attributes in Shapefile with record.attributes.items():¶
In [ ]:
# check the first record to understand the structure
# 'records' is an iterator of shapefile records
# Use next() to get the first record from the iterator
first_record = next(records)
# first_record is a Record object that contains both geometry and attribute data
print("Attributes and Values for the first Record:")
print(f"\n{first_record}\n")
# Print the number of attributes in the first record
# is a dictionary containing the attribute names and values
print(len(first_record.attributes))
# Iterate over the dictionary items, which are the attributes, and print each attribute name and value.
for attribute_name, attribute_value in first_record.attributes.items():
print(f"{attribute_name}:{attribute_value}")
Access record Geometry in Shapefile with record.geometry():¶
In [ ]:
print("\nFirst Record Geometry:\n")
print(first_record.geometry)
# Print the geometry type of the first record
print("\nGeometry type:\n")
print(first_record.geometry.geom_type) # gives the type of the geometry (e.g., 'Polygon').
# Access the geometry data for the first record
geometry = first_record.geometry
# Access the coordinate points of the exterior ring
if geometry.geom_type == 'Polygon':
# Access the coordinates of the exterior ring of the polygon
exterior_coords = list(geometry.exterior.coords)
print("\nExterior Coordinates:")
for coord in exterior_coords:
print(coord)
print(f"\nNumber of coordinates in the exterior ring: {len(exterior_coords)}") # gives the number of coordinates of the exterior ring of the polygon
# Handle MultiPolygon records
elif geometry.geom_type == 'MultiPolygon':
count_coords = 0
#Use geometry.geoms to iterate over each polygon in the MultiPolygon
for polygon in geometry.geoms:
count_coords += len(polygon.exterior.coords)
print("\nExterior Coordinates of a Polygon in MultiPolygon:")
for coord in polygon.exterior.coords:
print(coord)
print(f"\nTotal number of coordinates in the exterior rings of the MultiPolygon: {count_coords}")
else:
print("Not a polygon or multipolygon")
# will print paris of coordinates that define the shape of the polygon
# each coordinate pair (lon, lat) is a vertex of the polygon
# each vertex is a point (lon,lat) on the map
# these points connect to form the boundary that defines the shape of the polygon
Bonus 1: Convert to GeoJson and visualize the record geometry on a web map¶
- convert to geojson
- print the json string
- Visualize geometry by pasting json into this browser: https://geojson.io/#map=8.12/48.817/-121.884
In [ ]:
from shapely.geometry import mapping
import json
# Get the first record
first_record = next(records)
# Convert the geometry to GeoJSON format
geometry_geojson = mapping(first_record.geometry)
# Create a GeoJSON feature
# Wrap the geometry in a GeoJSON feature and include the attributes
geojson_feature = {
"type": "Feature",
"geometry": geometry_geojson,
"properties": first_record.attributes
}
# Convert the feature to a JSON string for easy copying into a browser
geojson_str = json.dumps(geojson_feature, indent=2)
# Print the GeoJSON string
print(geojson_str)
In [ ]:
# read in the shapefile
shapefile_path = r'G:\My Drive\Python_projects\my_git_pages_website\Py-and-Sky-Labs\content\Python Examples\Data\US_County_borders\ne_10m_admin_2_counties.shp'
reader = shpreader.Reader(shapefile_path)
# Filter for counties in Minnesota
#List Comprehension: Creates a list in memory, which can be inefficient for large datasets
minnesota_counties = [county for county in reader.records() if county.attributes['REGION'] == 'MN']
# Find the specific county (Scott) in Minnesota
# Create Generator expression to find the specific county in Minnesota
# This creates a generator that yields counties matching the condition
#Generator Expression: Creates a generator that yields items one at a time, without storing the entire sequence in memory
scott_county_generator = (county for county in minnesota_counties if county.attributes['NAME'] == 'Scott')
#we use next() in conjunction with a generator expression
# next() allows us to efficiently retrieve the first (and in this case, the only) matching item without creating an intermediate list
scott_county = next(scott_county_generator, None)
# Check if the record is found
if scott_county:
print("Scott County Geometry:\n")
print(scott_county.geometry) # Print the geometry
else:
print("Scott County not found in the dataset.")
In [ ]:
# Alternative approach with two list comprehensions, less memory-efficient for large datasets
# Load the shapefile
shapefile_path = r'G:\My Drive\Python_projects\my_git_pages_website\Py-and-Sky-Labs\content\Python Examples\Data\US_County_borders\ne_10m_admin_2_counties.shp'
reader = shpreader.Reader(shapefile_path)
# The records() method returns a generator that yields records one at a time
# Each record corresponds to a row in the shapefile's attribute table
something = reader.records()
print(type(something)) # <class 'generator'>
# Since reader.records() returns a generator (which is an iterable), you can use a list comprehension to filter or transform the records
# Filter for counties in Minnesota using list comprehension
# List comprehensions can iterate over any iterable, including generators, lists, tuples, dictionaries, sets, and objects that implement the iterator protocol
minnesota_counties = [county for county in reader.records() if county.attributes['REGION'] == 'MN']
# Find the specific county (Scott) in Minnesota using list comprehension and next()
# This creates a list of counties named 'Scott'
scott_county_list = [county for county in minnesota_counties if county.attributes['NAME'] == 'Scott']
# Convert the list to an iterator and use next() to get the first item
# If the list is empty, next() returns None
scott_county = next(iter(scott_county_list), None)
# Check if the record is found
if scott_county:
print("Scott County Geometry:\n")
print(scott_county.geometry) # Print the geometry
else:
print("Scott County not found in the dataset.")
In [ ]:
# Convert the geometry to GeoJSON format
geometry_geojson = mapping(scott_county.geometry)
# Create a GeoJSON feature
# Wrap the geometry in a GeoJSON feature and include the attributes
geojson_feature = {
"type": "Feature",
"geometry": geometry_geojson,
"properties": scott_county.attributes
}
# Convert the feature to a JSON string for easy copying into a browser
geojson_str = json.dumps(geojson_feature, indent=2)
# Print the GeoJSON string
print(geojson_str)
In [ ]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from matplotlib import pyplot as plt
import cartopy.io.shapereader as shpreader
# Path to the Natural Earth shapefile
shapefile_path = r'G:\My Drive\Python_projects\my_git_pages_website\Py-and-Sky-Labs\content\Python Examples\Data\US_County_borders\ne_10m_admin_2_counties.shp'
# choose the correct coordinate reference system
proj = ccrs.PlateCarree()
# Initialize the figure and axes for the plots
fig, ax = plt.subplots(figsize=(14, 10), subplot_kw={'projection': proj})
# Add features to the map from Natural Earth
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')
ax.add_feature(cfeature.STATES, linestyle=':')
# Plot the data points
ax.scatter(merge_df ["longitude"],merge_df["latitude"], color='red', s=50, edgecolor='k')
# Load the shapefile and filter for counties in Minnesota
reader = shpreader.Reader(shapefile_path)
minnesota_counties = [county for county in reader.records() if county.attributes['REGION'] == 'MN']
# Plot only the filtered counties
for county in minnesota_counties:
geometry = county.geometry
name = county.attributes['NAME']
ax.add_geometries([geometry], ccrs.PlateCarree(), edgecolor='black', facecolor='none')
x, y = geometry.centroid.x, geometry.centroid.y
ax.text(x, y, name, fontsize=9, ha='center', transform=ccrs.Geodetic())
# Limit the map extent to Minnesota
# set extents defines a bounding box for the map using 4 values
# These 4 values are the coordinates of the four corners of the bounding box
# each value represents a state border ( approx borders)
# extents are set up as follows:
# west , east, south, north
# west longitude , east longitude, south latitude, north latitude
#range of longitudes, # range of latitudes
ax.set_extent([-97.5, -89.5, 43.5, 49.5], crs= proj) # Adjust these values based on the actual coordinates of Minnesota
# Set the title
ax.set_title('EF0 Tornadoes in Minnesota from May 2016 to September 2023')
# Show the plot
plt.show()
In [ ]:
# verify the timeframe of the data
merge_df['BEGIN_DATE'] = pd.to_datetime(merge_df['BEGIN_DATE'])
merge_df["BEGIN_DATE"].sort_values(ascending=False).head(1)
In [ ]:
merge_df["BEGIN_DATE"].sort_values(ascending=True).head(1)
In [109]:
# nice plot. We see here that the EFO pretty much impact all counties
# can we get a sense of which had the most EF0 in this timeframe?
In [ ]:
merge_cdf= merge_df.copy()
merge_cdf["County_Counts"] = merge_cdf.groupby("CZ_NAME_STR")["EVENT_ID"].transform('count')
merge_cdf.sort_values(by="County_Counts",ascending=False)
In [ ]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import cartopy.io.shapereader as shpreader
# path the natural Earth shapefile
shapefile_path = r'G:\My Drive\Python_projects\my_git_pages_website\Py-and-Sky-Labs\content\Python Examples\Data\US_County_borders\ne_10m_admin_2_counties.shp'
# initialize the figure and axes for the plot
fig, ax = plt.subplots(figsize=(16,10), subplot_kw= {'projection':proj})
# add features to the map
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.LAKES)
ax.add_feature(cfeature.BORDERS, linestyle=":")
ax.add_feature(cfeature.STATES, linestyle="-")
# Plot the data points using coutnty column to determine the size of the markers
ax.scatter(merge_cdf["longitude"], merge_cdf["latitude"], s=merge_cdf["County_Counts"]*10, color='red', alpha=0.5 , edgecolor= 'k', zorder=1)
# load the shapefile and filter to MN counties
reader= shpreader.Reader(shapefile_path)
minnesota_counties=[county for county in reader.records() if county.attributes['REGION']=='MN']
# plot only the filter counties
for county in minnesota_counties:
geometry= county.geometry
name= county.attributes['NAME']
ax.add_geometries([geometry], proj, edgecolor='black', facecolor='none')
x,y= geometry.centroid.x, geometry.centroid.y
ax.text(x, y , name , fontsize=9, ha='center', transform=ccrs.Geodetic())
# set the borders to MN
ax.set_extent([-97.5, -89.5, 43.5 , 49.5], crs=proj)
# Set the title
ax.set_title("EF0 Tornadoes in Minnesota May 2016 to September 2023")
plt.show()
In [ ]:
merge_cdf["CZ_NAME_STR"].value_counts().head(10)
In [ ]:
# make a unique county name list
unique_county_names= merge_cdf["CZ_NAME_STR"].unique()
plt_data = merge_cdf.groupby("CZ_NAME_STR").first().reset_index()
plt_data
In [ ]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import cartopy.io.shapereader as shpreader
# path the natural Earth shapefile
shapefile_path = r'G:\My Drive\Python_projects\my_git_pages_website\Py-and-Sky-Labs\content\Python Examples\Data\US_County_borders\ne_10m_admin_2_counties.shp'
# initialize the figure and axes for the plot
fig, ax = plt.subplots(figsize=(16,10), subplot_kw= {'projection':proj})
# add features to the map
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.LAKES)
ax.add_feature(cfeature.BORDERS, linestyle=":")
ax.add_feature(cfeature.STATES, linestyle="-")
# Plot the data points using coutnty column to determine the size of the markers
ax.scatter(plt_data["longitude"], plt_data["latitude"], s=plt_data["County_Counts"]*10, color='red', alpha=0.5 , edgecolor= 'k', zorder=1)
# load the shapefile and filter to MN counties
reader= shpreader.Reader(shapefile_path)
minnesota_counties=[county for county in reader.records() if county.attributes['REGION']=='MN']
# plot only the filter counties
for county in minnesota_counties:
geometry= county.geometry
name= county.attributes['NAME']
ax.add_geometries([geometry], proj, edgecolor='black', facecolor='none')
x,y= geometry.centroid.x, geometry.centroid.y
ax.text(x, y , name , fontsize=9, ha='center', transform=ccrs.Geodetic())
# set the borders to MN
ax.set_extent([-97.5, -89.5, 43.5 , 49.5], crs=proj)
# Set the title
ax.set_title("EF0 Tornadoes in Minnesota May 2016 to September 2023")
plt.show()
Introduction to Dictionary Comprehensions¶
- Definition
- Dictionary comprehensions are a concise and efficient way to create dictionaries in Python
- Similar to list comprehensions, provide an elegant way to perform operations and apply conditions to iterables,
- Specifically allow for the creation or transformation of dictionary key-value pairs in a single line of code
Why Do We Use Dictionary Comprehensions?:
- Conciseness: Reduces the complexity and amount of code compared to traditional loops for creating dictionaries, making the code more readable
- Performance: Generally faster than using a loop to add items to a dictionary due to optimized implementation and reduced overhead
- Expressiveness: Enhances code clarity by focusing on the dictionary creation logic rather than the mechanics of looping and inserting key-value pairs
- Versatility: Capable of incorporating conditional logic and multiple sources, allowing for sophisticated transformations and filtering in dictionary creation
- Key point: Use dictionary comprehensions to efficiently transform or map data into key-value pairs
Syntax:
- Basic Structure: A dictionary comprehension consists of curly braces
{}
containing a key-value pair expression followed by a for clause - Optionally, it can include one or more for or if clauses
Generalized Examples:¶
{key_expr: value_expr for item in iterable}
{key_expr: value_expr for item in iterable if condition}
{key_expr: value_expr for item in iterable if condition1 if condition2}
{key_expr: value_expr for item in iterable1 for item2 in iterable2}
In [ ]:
# quick recap how to manipulate dictionary
from datetime import datetime
from pprint import pprint # pretty print, readable format for dictionaries
# create an empty dictionary for products and their metadata
products_dict = {}
# add to the dictionary
# This is a nested dictionary, where the key is product ID and the value is another dictionary
products_dict['0001-2024']= {
'name':'apple',
'amount':2,
'date': datetime.now().date().strftime('%Y%m%d')}
# display the dictionary
pprint(products_dict)
products_dict['0002-2024'] = {'name': 'banana'}
# display each item dictionary entry along with labels
for productID, metadata in products_dict.items():
pprint(f"Product ID: {productID}, Metadata: {metadata}")
# print the number of items
print(f"Number of items: {len(products_dict)}")
# update an existing entry
# allows you to add new key value pairs or update existing ones
products_dict['0002-2024'].update({'amount':10, 'date': datetime.now().strftime('%Y%m%d')})
#Dipsplay the items
pprint(products_dict.items())
pprint(list(products_dict.items()))
products_dict.update({'0003-2024':{'name': 'mangos','amount':100, 'date': datetime.now().strftime('%Y%m%d')}})
#Dipsplay the items
print(f"Number of items: {len(products_dict)}")
# display the dictionary
pprint(products_dict)
print()
counter=0
# List out the final nested dictionary of products
print("Final product list:")
for productID, metadata in products_dict.items():
pprint(f"PRoduct ID {counter+1} : {productID}, Metadata: {metadata}")
counter+=1
Basic Dictionary Comprehension¶
In [ ]:
# Make a dictionary where the keys are numbers and values are their squares
# Basic components of dictionary comprehension
# (1) key-value expression
# (2) item
# (3) iterable
#(1) (2) (3)
squares = {x:x**2 for x in range(1,10)}
print(squares)
Conditional Dictionary Comprehension¶
In [ ]:
# Make a dicionary of even numbers and their squares
even_squares = {x:x**2 for x in range(1,10) if x%2==0}
print(even_squares)
Using Functions in Dictionary Comprehension¶
In [ ]:
# Create a dictionary that maps each word in a list to its length
# suppose you start with a list of words
word_list = ["apple", "banana", "cherry"]
word_length_dict= {word:len(word) for word in word_list}
print(word_length_dict)
Using a dataframe in a Dictionary Comprehension¶
In [11]:
import pandas as pd
# Create a dictionary that maps each word in a column of a dataframe to its length
# You start with a dataframe
data = {'Words':(["apple", "banana", "cherry"])*2} # Duplicate the list to increase the number of items
print(data)
df = pd.DataFrame(data)
print(df)
# use a dictionary comprehension directly on the dataframe column to map each word in the column to its length
word_length_dict= {word:len(word) for word in data['Words']}
print(word_length_dict)
# add a count to the same dataframe as a new column
df["Word Counts"]=df.groupby("Words")["Words"].transform("count")
print(f"\n{df}\n")
# Map the lengths fromt dictionary to the key names in the column Words
df["Word Lenghths"] = df["Words"].map(word_length_dict)
print(f"\n{df}\n")
In [12]:
# Read HTML tables using the lxml parser
counties_list = pd.read_html(
"https://en.wikipedia.org/wiki/List_of_counties_in_Minnesota"
)
In [13]:
counties_list=counties_list[0]
In [14]:
counties_list
# we know MN has 87 counties. Now we have the fill dataset
Out[14]:
In [45]:
import pandas as pd
import re
# Sample data
data = pd.DataFrame({
'Area': ['757.96 sq mi (1,963 km2)', '123.45 sq mi (320 km2)', '678.90 sq mi (1,760 km2)']
})
# Function to extract the numeric part
def extract_numeric(area_str):
match = re.match(r'^\d+(\.\d+)?', area_str)
return float(match.group()) if match else None
# Apply the function to the 'Area' column
data['Area_Sq_Mi'] = data['Area'].apply(extract_numeric)
print(data)
In [46]:
largest_counties=counties_list["Area_Sq_Mi"].nlargest()
smallest_counties=counties_list["Area_Sq_Mi"].nsmallest()
print(largest_counties)
print(smallest_counties)
Create a list of county coordinates with Geopy¶
Geocoding with Nominatim via Geopy¶
- Geocoding is the process of converting addresses (like "1600 Amphitheatre Parkway, Mountain View, CA") into geographic coordinates (like latitude 37.423021 and longitude -122.083739)
- can use to place markers on a map, or position the map
Capabilities of Nominatim (Geopy):
- Address Geocoding: Converts street addresses or other descriptive locations into geographic coordinates.
- Reverse Geocoding: Converts geographic coordinates into a human-readable address.
- Extensive Coverage: Utilizes OpenStreetMap data, providing global coverage often with fine-grained control over geocoding queries.
- Customization Options: Allows customization of requests, including specifying the language of the result, the bounding box for constraining searches, and more.
Syntax for Geocoding and Reverse Geocoding 1) Geocoding (Address to Coordinates)
- Initialization: Create a Nominatim object with a user-defined user_agent
- Query: Use the .geocode() method with the address as a string.
2) Reverse Geocoding (Coordinates to Address)
- Initialization: Create a Nominatim object with a user-defined user_agent
- Query: Use the .reverse() method with a string in the format "latitude, longitude".
In [66]:
from geopy.geocoders import Nominatim
import requests
geolocator = Nominatim(user_agent="geocode_Address")
def getAddress_coords(address):
location = geolocator.geocode(address)
if location:
latitude, longitude = location.latitude, location.longitude
print(location)
# Get elevation in meters
elevation_url = f"https://api.open-elevation.com/api/v1/lookup?locations={latitude},{longitude}"
response = requests.get(elevation_url)
elevation_data = response.json()
print(elevation_data)
elevation_meters = elevation_data['results'][0]['elevation'] if 'results' in elevation_data else None
print(elevation_meters)
#convert elevation to feet
elevation_feet = elevation_meters * 3.28084 if elevation_meters is not None else None
print(elevation_feet)
return (latitude, longitude, elevation_feet)
else:
print("Address Not Found, coordinates will be blank")
return (None,None, None)
In [ ]:
geocode_result = getAddress_coords("5057 Edgewater Court, Savage, MN")
print(geocode_result)
In [ ]:
from geopy.geocoders import Nominatim
#initialize geocoder, Nominatim object
geolocator= Nominatim(user_agent= "geocode_Address")
def getAddress_coords(address):
location= geolocator.geocode(address)
if location:
print(location)
return (location.latitude, location.longitude)
else:
print("Address Not Found, coodinates will be blank")
return(None,None)
geocode_result = getAddress_coords("5057 Edgewater Court, Savage, MN")
print(geocode_result)
In [ ]:
geocode_result = getAddress_coords("Murphy-Hanrehan Park Reserve, Savage, MN")
print(geocode_result)
In [ ]:
geolocator= Nominatim(user_agent= "geocode_Address")
def getAddress(coords):
location= geolocator.reverse(coords)
if location:
return (location.address)
else:
print("Address Not Found, coodinates will be blank")
geocode_Address_result = getAddress(geocode_result)
print(geocode_Address_result)
In [18]:
# Combine retrieval of external data from geocoding service with a dictionary comprehension of a dataframe column
# Create a dictionary tha maps counties to their coordinates
from geopy.geocoders import Nominatim
# initialize geocoder
geolocator= Nominatim(user_agent= "geoapiExercise")
def get_lat_lon(county):
# Append ", Minnesota" to ensure the geocoding query is localized
location= geolocator.geocode(county+ ", Minnesota")
if location:
return (location.latitude, location.longitude)
else:
return (None, None)
# # Dictionary comprehension that maps a dataframe column of county names to their lat, lon coordinates
# # the function of the dictionary comprehension returns the coordinates for each key in the dictionary defined by the dataframe column
coordinates_list = {county: get_lat_lon(county) for county in counties_list["County"]}
In [ ]:
# extract county names
# county_names= counties_list["County"]
# #print the names of the county
# print(county_names)
# print(county_names.dtype)
# print(f"The dataset is an {type(county_names)}")
# print(f"The data value in the dataset is an {county_names.dtype}")
# print(county_names.index.to_list())
In [29]:
coordinates_list
Out[29]:
Convert dictionary into list of tuples¶
- will be an iterable in this format (a, b) OR (a,(b,c))
- can loop through these key and value
In [20]:
type(coordinates_list)
Out[20]:
In [21]:
coordinates_list.items() # converts dictionary to an iterable of tuples (key,value)
Out[21]:
In [30]:
# items() method of dictionaries returns an iterable of tuples
# each tuple consist of key-value pairs from the dictionary
type(coordinates_list.items()) # this dict_items object is an iterable
Out[30]:
In [43]:
# because this is an iterable , we can use it in a loop to access its elements
#...OR convert it to other iterables like lists that are often required for further data processing
for county,data in coordinates_list.items():
print(county,data) # prints each county and its associated data
In [38]:
# using items method on a dictionary returns a list of tuple pairs
for county,data in coordinates_list.items():
if county == 'Scott County':
print(county,data) # prints county and its associated data
# we do not see the outer parentheses because we are separately access data and county
In [ ]:
import pandas as pd
# recall that dataframes can be made from list of tuples
list_t= [('apples', (100, 2)), ('pears', (20,3))]
t_df= pd.DataFrame(list_t, columns=['Fruit', 'Data'])
t_df
# Critical to recognize dictionary can be converted to list of tuples
# because pandas DataFrames can be constructed efficiently from lists of tuples,
# each tuple is a row and each element of the tuple a column
In [ ]:
# list of tuples [(a),((x,y))]
list_t= [('apples', (100, 2)), ('pears', (20,3))]
# convert to dataframe assigning each tuple element to a column
t_df= pd.DataFrame(list_t, columns=['Fruit', 'Data'])
print(t_df) # data column at first a tuple
print('\n')
# extract the tuple
# convert it into strings, including the parentheses and list
t_df['Data']=t_df['Data'].astype (str)
# replace the bracket and parentheses
t_df['Data'] = t_df['Data'].str.replace('[()]',"",regex=True)
# split and expand by the comma into two new columns
t_df[['Lat','Lon']]= t_df['Data'].str.split(',',expand=True)
print(t_df)
In [ ]:
#Knowing that iterable of tuples form .items() can be converted into a list of tuples
# allows for straightforward creation of a DataFrame.
list(coordinates_list.items())
# because pandas DataFrames can be created from lists of tuples
# each tuple is a row and each element of the tuple a column
In [ ]:
len(list(coordinates_list.items()))
In [ ]:
import pandas as pd
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
# Convert the coordinates list to a DataFrame
data = pd.DataFrame(list(coordinates_list.items()), columns=["County", "Coordinates"])
#ensure all are tuples with two elements
data['Coordinates'] = data ['Coordinates'].apply(lambda x: x if isinstance(x,tuple) and len(x)==2 else (None, None))
print(len(data))
# Check if any coordinates are (None, None)
none_coordinates = data[data['Coordinates'] == (None, None)]
print(none_coordinates)
# Extract latitude and longitude into separate columns
data[['Latitude', 'Longitude']] = pd.DataFrame(data['Coordinastes'].tolist(), index=data.index)
# Initialize the figure and axes for the plots
fig, ax = plt.subplots(figsize=(14, 10), subplot_kw={'projection': ccrs.PlateCarree()})
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')
# Plot the data points
ax.scatter(data['Longitude'], data['Latitude'], color='red', s=50, edgecolor='k', zorder=5)
# Add labels for each point
for i, row in data.iterrows():
ax.text(row['Longitude'] + 0.02, row['Latitude'] + 0.02, row['County'], fontsize=12)
# Set the title
ax.set_title('County Coordinates in Minnesota')
# Show the plot
plt.show()
In [ ]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
# Convert the coordinates list to a DataFrame
data = pd.DataFrame(list(coordinates_list.items()), columns=["County", "Coordinates"])
#ensure all are tuples with two elements
data['Coordinates'] = data ['Coordinates'].apply(lambda x: x if isinstance(x,tuple) and len(x)==2 else (None, None))
print(len(data))
# Check if any coordinates are (None, None)
none_coordinates = data[data['Coordinates'] == (None, None)]
print(none_coordinates)
print(data['Coordinates'])
# Extract latitude and longitude into separate columns
data['Latitude'], data['Longitude'] = zip(*data['Coordinates'])
print(data['Coordinates'])
print(data['Latitude'])
print(data['Longitude'])
# Initialize the figure and axes for the plots
fig, ax = plt.subplots(figsize=(14, 10), subplot_kw={'projection': ccrs.PlateCarree()})
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')
# Plot the data points
ax.scatter(data['Longitude'], data['Latitude'], color='red', s=50, edgecolor='k', zorder=5)
# Add labels for each point
for i, row in data.iterrows():
ax.text(row['Longitude'] + 0.02, row['Latitude'] + 0.02, row['County'], fontsize=12)
# Set the title
ax.set_title('County Coordinates in Minnesota')
# Show the plot
plt.show()
In [ ]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import cartopy.io.shapereader as shpreader
# Path to the Natural Earth shapefile
shapefile_path = r'G:\My Drive\Python_projects\my_git_pages_website\Py-and-Sky-Labs\content\Python Examples\Data\US_County_borders\ne_10m_admin_2_counties.shp'
# Initialize the figure and axes for the plots
fig, ax = plt.subplots(figsize=(14, 10), subplot_kw={'projection': ccrs.PlateCarree()})
# Add built-in Cartopy features
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')
# Load and plot the county boundaries
reader = shpreader.Reader(shapefile_path)
counties = list(reader.geometries())
ax.add_geometries(counties, ccrs.PlateCarree(), edgecolor='black', facecolor='none')
# Assuming 'data' is your DataFrame with the 'Longitude' and 'Latitude'
ax.scatter(data['Longitude'], data['Latitude'], color='red', s=50, edgecolor='k', zorder=5)
# Optionally add labels for each point
for i, row in data.iterrows():
ax.text(row['Longitude'] + 0.02, row['Latitude'] + 0.02, row['County'], fontsize=12)
# Set the title
ax.set_title('County Coordinates in Minnesota')
# Show the plot
plt.show()
In [ ]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import cartopy.io.shapereader as shpreader
# Path to the Natural Earth shapefile
shapefile_path = r'G:\My Drive\Python_projects\my_git_pages_website\Py-and-Sky-Labs\content\Python Examples\Data\US_County_borders\ne_10m_admin_2_counties.shp'
# Initialize the figure and axes for the plots
fig, ax = plt.subplots(figsize=(10, 15), subplot_kw={'projection': ccrs.PlateCarree()})
# Add built-in Cartopy features
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')
# Load the shapefile and filter for counties in Minnesota
reader = shpreader.Reader(shapefile_path)
minnesota_counties = [county for county in reader.records() if county.attributes['REGION'] == 'MN']
# Plot only the filtered counties
for county in minnesota_counties:
geometry = county.geometry
name = county.attributes['NAME']
ax.add_geometries([geometry], ccrs.PlateCarree(), edgecolor='black', facecolor='none')
x, y = geometry.centroid.x, geometry.centroid.y
ax.text(x, y, name, fontsize=9, ha='center', transform=ccrs.Geodetic())
# Limit the map extent to Minnesota
ax.set_extent([-97.5, -89.5, 43.5, 49.5], crs=ccrs.PlateCarree()) # Adjust these values based on the actual coordinates of Minnesota
# Plot the data points derived from the geocoded lat lon coordinates
ax.scatter(data['Longitude'], data['Latitude'], color='red', s=50, edgecolor='k', alpha=0.5, zorder=1)
# Set the title
ax.set_title('County Coordinates in Minnesota')
# Show the plot
plt.show()
In [ ]:
mn_counties
.to_dict()
¶
In [ ]:
# # Convert Filtered_top_bot_data to a dictionary mapping countries to life expectancy
# life_expectancy = Filtered_top_bot_data.set_index('country')['lifeExp'].to_dict()
# print(life_expectancy)
Plot using .items()
¶
In [ ]:
# Plot each country's coordinates
# Assuming `top_countries` and `bottom_countries` are lists of country names
# for country, (lat, lon) in coordinates.items():
# if lat and lon: # Check if lat and lon are not None
# color = 'green' if country in top_countries else 'red'
# plt.plot(lon, lat, marker='o', color=color, markersize=5, transform=ccrs.Geodetic())
# plt.text(lon, lat, country, transform=ccrs.Geodetic())
# plt.title('Top and Bottom African Countries by Life Expectancy')
# plt.show()
Combine Dictionary Comprehension and iterrows() to create a dictionary based on multiple columns of a dataframe¶
In [ ]:
# Extending the DataFrame with another column
data = {'Words': ["apple", "banana", "cherry"], 'Type': ["fruit", "fruit", "fruit"]}
df = pd.DataFrame(data)
# Dictionary mapping word to a tuple of (word length, type)
word_info_dict = {row['Words']: (len(row['Words']), row['Type']) for index, row in df.iterrows()}
print(word_info_dict)
generator expressions¶
my_generator = (x*x for x in range(10)) for value in my_generator: print(value)