List and Dictionary Comprehension, Cartopy

Introduction to List Comprehensions

  • Definition
    • List comprehensions are a concise and efficient way to create lists in Python
    • They provide a syntactically elegant method to perform operations and apply conditions to iterables, allowing the creation or transformation of lists in a single line of code

Why Do We Use List Comprehensions?:

  • Conciseness: Reduces the amount of code needed compared to traditional loops, making the code cleaner and easier to read
  • Performance: Generally faster than equivalent for loops due to optimized implementation and reduced overhead in Python
  • Expressiveness: Allows the code to be more descriptive and focused on the operation itself, rather than the mechanics of looping and appending to lists
  • Versatility: Capable of incorporating conditional logic within the list creation, which lets you filter elements or apply complex transformations easily
  • Key point: Use List comprehensions to transform or extract data

Syntax:

  • Basic Structure: A list comprehension consists of brackets containing an expression followed by a for clause
  • Optionally, it can include one or more for or if clauses.

Generalized Examples:

  • [expression for item in iterable]
  • [expression for item in iterable if condition]
  • [expression for item in iterable if condition1 if condition2]
  • [expression for item in iterable1 for item2 in iterable2]
In [1]:
import pandas as pd
import os

Basic List Comprehension

In [ ]:
# basic list comprehension for squaring numbers and creating a list

# Basic components of list comprehension
# (1) expression
# (2) item
# (3) iterable
#         (1)      (2)   (3)
squares = [x**2 for x in range(10)]

print(squares)
In [4]:
# What happenned?
# The list comprehension iterated over the range of numbers from 0 to 9, squaring each number and storing it in a list

Conditional List Comprehension

In [ ]:
# Generate a list of even numbers between 1 and 21
#  list comprehension with a condition

# Basic components of list comprehension
# (1) expression
# (2) item
# (3) iterable
# (4) condition

#         (1)    (2)    (3)       (4)
evens = [x**2 for x in range(1,21) if x %2==0]

print(evens)

# print(sqrt{evens})
In [ ]:
even = [x if x % 2 == 0 else'not even' for x in range(1,21) ]

even
In [ ]:
# List comprehension that filters results based on membership in predefined list
magic_nums = [1,2,7,8]

mylist_magnum = [x**2 for x in range(10) if x in magic_nums]

print(magic_nums)

print(mylist_magnum)
In [ ]:
# Create a list of tuples with numbers and their squares

numbers_and_squares = [(x,x**2) for x in range(10)]

print(numbers_and_squares)

numbers_and_squares.append((10,100))

print(numbers_and_squares)

# use pop () method to remove and store an item

# remove item at index 0 and store it in a variable
one_item =numbers_and_squares.pop(0)
print(one_item)
print(numbers_and_squares)
In [ ]:
# Create a list of tuples with numbers and their squares

numbers_and_squares = [(x,x**2) for x in range(10)]

print(numbers_and_squares)

numbers_and_squares.append((10,100))

print(numbers_and_squares)

# use delete statement to remove items without storing them
# can accept a slice to delete a range of items

# remove item at index 0 without storing it
del numbers_and_squares[0]

print(numbers_and_squares)
In [ ]:
# Create a list of tuples with numbers and their squares

numbers_and_squares = [(x,x**2) for x in range(10)]

print(numbers_and_squares)

numbers_and_squares.append((10,100))

print(numbers_and_squares)

# Remove items at indices 0 to 4 (inclusive of 0, exclusive of 5)
del numbers_and_squares[0:5]

print(numbers_and_squares)  

Extracting data using list comprehensions

In [ ]:
# Extract values that meet a certain criteria

# Basic components of list comprehension
# (1) expression
# (2) item
# (3) iterable
# (4) condition

test_scores= [50,60,65,98,91,85,100]

#                 (1)   (2)   (3)            (4)
passing_grades = [x for x in test_scores if x >60]

print(passing_grades)
In [ ]:
#### Extract a single item
# generalized form for extracting single elements from a list based on criteria
""" list_data = 
[element['key1'] for element in list if element['key2']>x]
"""

# List comprehension to extract names from a list of dictionaries
# each dictionary is a person with a name and age
people_data = [
    {'name': 'John', 'age': 28},
    {'name': 'Anna', 'age': 20},
    {'name': 'James', 'age': 18},
    {'name': 'Linda', 'age': 30}
]
 # acess a specific  key value for each dictionary in the list of dictionaries
adults_info = [person['name'] for person in people_data if person['age']>21]

print(adults_info)
In [ ]:
del adults_info[0]
print(adults_info)
In [ ]:
# List comprehension to extract tuple of names and age from a list of dictionaries
# each dictionary is a person with a name and age
people_data = [
    {'name': 'John', 'age': 28},
    {'name': 'Anna', 'age': 20},
    {'name': 'James', 'age': 18},
    {'name': 'Linda', 'age': 30}
]
# access specific pair key values as a tuple() for each dictionary in list of dictionaries
adults_info = [(person['name'], person['age']) for person in people_data if person['age']>21]

print(adults_info)
In [ ]:
del adults_info[0]
print(adults_info)
In [ ]:
# List comprehension to extract tuple of names and age from a list of dictionaries
# each dictionary is a person with a name and age
people_data = [
    {'name': 'John', 'age': 28},
    {'name': 'Anna', 'age': 20},
    {'name': 'James', 'age': 18},
    {'name': 'Linda', 'age': 30}
]

adults_info = [(person['name'], person['age']) for person in people_data if person['age']>21]

print(adults_info)
In [ ]:
# use pop() and append() list methods to reorganize the list

first_adult=adults_info.pop(0)

print(first_adult)

print(adults_info)

adults_info.append(first_adult)

print(adults_info)

Understanding DataFrame Iteration with iterrows()

1) Introduction to DataFrame Iteration

  • Why Iteration? Iteration over DataFrames is commonly needed when each row of data must be processed individually
  • While vectorized operations are preferred for performance, iteration is useful for complex operations that aren't easily vectorized or when debugging row by row. 2) Using iterrows():
  • Definition: iterrows() is a generator that iterates over the rows of a DataFrame
  • It allows you to loop through each row of the DataFrame, with the row returned as a Series object
  • yields a tuple for each row in the DataFrame as index, series pairs

Syntax:

  • index: Represents the index of the row in the DataFrame
  • row: A Series containing the row data
  • iterrows(): a generator that iterates over the rows
  • row['column_name']: Accesses data in a specific column for that row

Example:

``` for index, row in df.iterrows(): row['column_name']

Data cleaning with iterrows()

In [ ]:
# iterrating over rows with iterrows():


# Create a sample dataframe
data = {'Name': ['John', 'Anna', 'James'], 'Age': ['28',22,'35a']}
sdf= pd.DataFrame(data)

# iterate over the dataframe
# for each row number , extract the row data in the dataframe, repeat for all rows
for index,row in sdf.iterrows():
    # strip leading/trailing white space from the name
    sdf.at[index, 'Name'] = row['Name'].strip()
    # Check if the row 'Age' is a string
    if isinstance(row['Age'], str):
        print(f"String data index is {index}, Name is {row['Name']}, Age is {row['Age']}")
        # Boolean check if expected numeric data was entered as strings, if contains letters or special characters return False
        if row['Age'].isdigit():
            sdf.at[index,'Age'] = int(row['Age']) # True,if is a digit from 0-9, then convert to integer
            print(f"Cleaned data index is {index}, Name is {row['Name']}, Age is {row['Age']}")
        else:
            sdf.at[index, 'Age'] = pd.NA
            print(f"Uncleaned data index is {index}, Name is {row['Name']}, Age is {row['Age']}")
In [ ]:
import pandas as pd

# Create a sample dataframe
data = {
    'UserName': ['JohnDoe', 'AnnaSmith', 'JamesBond', 'JohnDoe', 'AnnaSmith'],
    'UniqueID': [101, 102, 103, 101, 102],
    'State': ['NY', 'CA', 'TX', 'NY', 'CA']
}
df = pd.DataFrame(data)


# initiate an empty set that will only hold unique IDs
unique_userId= set()

# iterate over the dataframe
for index, row in df.iterrows():

    # Check if the current rows uniqueID is already in the set
    if row['UniqueID'] in unique_userId:
        
        # At this row, Create a new column, mark as a duplicate
        df.at[index, 'Duplicates'] = True

    else: 
        # Add the current row unique ID to the set
        unique_userId.add(row['UniqueID'])
        # Mark the row as False for Duplicates
        df.at[index, 'Duplicates']= False

print(df)
In [ ]:
# remove duplicate

df= df[df['Duplicates']==False]

print(df)
In [ ]:
df=df.drop(columns='Duplicates')

print(df)
In [ ]:
#duplicated

# Create a sample dataframe
data = {
    'UserName': ['JohnDoe', 'AnnaSmith', 'JamesBond', 'JohnDoe', 'AnnaSmith'],
    'UniqueID': [101, 102, 103, 101, 102],
    'State': ['NY', 'CA', 'TX', 'NY', 'CA']
}
df = pd.DataFrame(data)

# duplciates method used to identify duplicates
# subset paratemeter= specifies the column to check for dups
# keep parameter, keeps the first occurence and marks subsquent duplicates

df['Duplicates']= df.duplicated(subset='UniqueID', keep='first')
print(df)
In [ ]:
df= df[~df['Duplicates']==True]
print(df)
In [ ]:
df=df.drop(columns='Duplicates')

print(df)
In [ ]:
#duplicated

# Create a sample dataframe
data = {
    'UserName': ['JohnDoe', 'AnnaSmith', 'JamesBond', 'JohnDoe', 'AnnaSmith'],
    'UniqueID': [101, 102, 103, 101, 102],
    'State': ['NY', 'CA', 'TX', 'NY', 'CA']
}
df = pd.DataFrame(data)

# Count duplicates
# Group by the column you want to check for duplicates
# Use transform('size') to get the count of each group
# Assign result to a new column

df['Counts']= df.groupby('UniqueID')['UniqueID'].transform('size')

print(df)
In [ ]:
# Create a sample dataframe
data = {
    'UserName': ['JohnDoe', 'AnnaSmith', 'JamesBond', 'JohnDoe', 'AnnaSmith'],
    'UniqueID': [101, 102, 103, 101, 102],
    'State': ['NY', 'CA', 'TX', 'NY', 'CA'],
    'Sales': [200, 150, 300, 250, 100]
}
df = pd.DataFrame(data)

# Normalize sales within each UniqueID group
# in a groubpy object lambda x represents the group
df['NormalizedSales'] = df.groupby('UniqueID')['Sales'].transform(lambda x: (x.sum()))

print(df)

Data transformation with iterrows()

In [ ]:
# Create a sample dataframe
data = {'Name': ['John', 'Anna', 'James'], 'Age': [28,22,35]}
sdf= pd.DataFrame(data)
for index, row in sdf.iterrows():
    sdf.at[index, 'New Age'] = row['Age']+10 # add 10 years to each persons age

print(sdf)

Multi-column Conditional Flagging or Computation with itterows()

In [ ]:
# Create a sample dataframe
data = {'Name': ['John', 'Anna', 'James'], 'Age': [28,22,35]}
sdf= pd.DataFrame(data)

for index, row in sdf.iterrows():
    if row['Age'] < 30 and "J" in row['Name']:
        sdf.at[index, 'Category'] = 'Young J'
    else:
        sdf.at[index, 'Category'] = 'Other'

print(sdf)
In [ ]:
for index,row in sdf.iterrows():
    if row['Age']>30 and 'a' in row['Name']:
        sdf.at[index, 'Flag'] = True

    else:
        sdf.at[index, 'Flag']= False
print(sdf)

Mark specific rows with iterrows()

In [ ]:
for index, row in sdf.iterrows():
    if row['Name'].startswith('J') and row['Age']>25:
        sdf.at[index, 'Status'] = 'Senior J'

print(sdf)

Combine List Comprehension and iterrows()

  • extract a specific list from a dataframe
In [2]:
# Load a sample dataset to demonstrate application of list comprehsion and itterows()
# this is a dataset for tornado occuring in the state of minnesota in recent years

df = pd.read_csv(r".\Data\storm_data_search_results.csv")

# Set the option to display all columns
pd.set_option('display.max_columns', None)

df.head()
Out[2]:
EVENT_ID CZ_NAME_STR BEGIN_LOCATION BEGIN_DATE BEGIN_TIME EVENT_TYPE MAGNITUDE TOR_F_SCALE DEATHS_DIRECT INJURIES_DIRECT DAMAGE_PROPERTY_NUM DAMAGE_CROPS_NUM STATE_ABBR CZ_TIMEZONE MAGNITUDE_TYPE EPISODE_ID CZ_TYPE CZ_FIPS WFO INJURIES_INDIRECT DEATHS_INDIRECT SOURCE FLOOD_CAUSE TOR_LENGTH TOR_WIDTH BEGIN_RANGE BEGIN_AZIMUTH END_RANGE END_AZIMUTH END_LOCATION END_DATE END_TIME BEGIN_LAT BEGIN_LON END_LAT END_LON EVENT_NARRATIVE EPISODE_NARRATIVE ABSOLUTE_ROWNUMBER
0 626306 POPE CO. VILLARD 05/25/2016 1410 Tornado EF0 0 0 10000 0 MN CST-6 104565 C 121 MPX 0 0 Law Enforcement 0.16 25 2 SSW 1 SSW VILLARD 05/25/2016 1412 45.6989 -95.2829 45.7000 -95.2800 A few boats were flipped, a shed was damaged a... An Isolated but severe thunderstorm developed ... 1
1 626307 STEARNS CO. ST ANTHONY 05/25/2016 1709 Tornado EF0 0 0 15000 0 MN CST-6 104565 C 145 MPX 0 0 Trained Spotter 3.30 25 1 NE 3 SSE ST FRANCIS 05/25/2016 1715 45.6894 -94.6042 45.7298 -94.5674 A trained spotter video taped a tornado near H... An Isolated but severe thunderstorm developed ... 2
2 629201 RED LAKE CO. OKLEE 05/27/2016 1314 Tornado EF0 0 0 0 0 MN CST-6 104632 C 125 FGF 0 0 Law Enforcement 0.05 50 2 WNW 2 WNW OKLEE 05/27/2016 1315 47.8400 -95.9200 47.8400 -95.9200 Two funnel clouds were noted between Brooks an... Morning sunshine and moisture from recent rain... 3
3 629205 CLAY CO. MOORHEAD ARPT 05/27/2016 1357 Tornado EF0 0 0 0 0 MN CST-6 104632 C 27 FGF 0 0 Storm Chaser 0.05 75 2 SSW 2 SSW MOORHEAD ARPT 05/27/2016 1358 46.8200 -96.7000 46.8200 -96.7000 Evidence from photographs and video indicate t... Morning sunshine and moisture from recent rain... 4
4 629206 CLAY CO. GLYNDON 05/27/2016 1401 Tornado EF0 0 0 0 0 MN CST-6 104632 C 27 FGF 0 0 Broadcast Media 0.05 50 2 NNW 2 NNW GLYNDON 05/27/2016 1402 46.9000 -96.6000 46.9000 -96.6000 A brief touchdown was noted in a photo and rep... Morning sunshine and moisture from recent rain... 5
In [3]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 467 entries, 0 to 466
Data columns (total 39 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   EVENT_ID             467 non-null    int64  
 1   CZ_NAME_STR          467 non-null    object 
 2   BEGIN_LOCATION       467 non-null    object 
 3   BEGIN_DATE           467 non-null    object 
 4   BEGIN_TIME           467 non-null    int64  
 5   EVENT_TYPE           467 non-null    object 
 6   MAGNITUDE            467 non-null    object 
 7   TOR_F_SCALE          467 non-null    object 
 8   DEATHS_DIRECT        467 non-null    int64  
 9   INJURIES_DIRECT      467 non-null    int64  
 10  DAMAGE_PROPERTY_NUM  467 non-null    int64  
 11  DAMAGE_CROPS_NUM     467 non-null    int64  
 12  STATE_ABBR           467 non-null    object 
 13  CZ_TIMEZONE          467 non-null    object 
 14  MAGNITUDE_TYPE       467 non-null    object 
 15  EPISODE_ID           467 non-null    int64  
 16  CZ_TYPE              467 non-null    object 
 17  CZ_FIPS              467 non-null    int64  
 18  WFO                  467 non-null    object 
 19  INJURIES_INDIRECT    467 non-null    int64  
 20  DEATHS_INDIRECT      467 non-null    int64  
 21  SOURCE               467 non-null    object 
 22  FLOOD_CAUSE          467 non-null    object 
 23  TOR_LENGTH           467 non-null    float64
 24  TOR_WIDTH            467 non-null    int64  
 25  BEGIN_RANGE          467 non-null    int64  
 26  BEGIN_AZIMUTH        467 non-null    object 
 27  END_RANGE            467 non-null    int64  
 28  END_AZIMUTH          467 non-null    object 
 29  END_LOCATION         467 non-null    object 
 30  END_DATE             467 non-null    object 
 31  END_TIME             467 non-null    int64  
 32  BEGIN_LAT            467 non-null    float64
 33  BEGIN_LON            467 non-null    float64
 34  END_LAT              467 non-null    float64
 35  END_LON              467 non-null    float64
 36  EVENT_NARRATIVE      467 non-null    object 
 37  EPISODE_NARRATIVE    467 non-null    object 
 38  ABSOLUTE_ROWNUMBER   467 non-null    int64  
dtypes: float64(5), int64(15), object(19)
memory usage: 142.4+ KB
In [9]:
df.groupby("TOR_F_SCALE")["EVENT_ID"].size()
Out[9]:
TOR_F_SCALE
EF0    246
EF1    166
EF2     24
EF4      2
EFU     29
Name: EVENT_ID, dtype: int64
In [10]:
# List comprehension to extract coordinates of EF0 tornado events
ef0_coordinates = [(row['BEGIN_LAT'], row['BEGIN_LON']) for index, row in df.iterrows() if row['TOR_F_SCALE'] == 'EF0']
print(ef0_coordinates)

print(len(ef0_coordinates))
[(45.6989, -95.2829), (45.6894, -94.6042), (47.84, -95.92), (46.82, -96.7), (46.9, -96.6), (45.5488, -94.808), (44.1, -96.3105), (43.9812, -96.3452), (43.9589, -94.2033), (44.1973, -93.5417), (44.2106, -93.5253), (44.3134, -93.4661), (44.1282, -93.8622), (45.51, -96.64), (45.56, -96.6), (45.61, -96.53), (48.91, -95.72), (46.5, -94.79), (46.496, -94.7789), (45.2445, -95.9863), (45.2506, -95.8705), (44.3938, -92.9217), (45.3265, -94.4055), (44.9472, -94.026), (43.762, -93.2111), (47.62, -96.58), (47.54, -96.51), (47.8878, -94.7866), (43.8482, -93.2348), (44.1631, -92.1789), (44.1286, -92.2541), (45.4433, -95.8244), (45.79, -95.8), (46.34, -96.54), (46.3, -96.29), (44.0561, -92.3083), (43.9989, -92.1358), (43.998, -92.3098), (44.3147, -94.3519), (45.3784, -93.2268), (45.3072, -93.0197), (45.3229, -92.8595), (45.1811, -92.8581), (46.1529, -94.9219), (45.9763, -95.5803), (45.9815, -95.3762), (44.3831, -95.8188), (44.3921, -95.78), (44.1615, -95.0339), (46.81, -96.58), (47.4506, -94.2105), (47.4341, -94.2526), (47.3999, -94.2862), (44.2279, -94.1762), (44.244, -94.1705), (44.3939, -94.1851), (44.5377, -94.2398), (44.5654, -94.2676), (44.5359, -93.5852), (44.5436, -93.59), (44.694, -94.5097), (44.7246, -93.4707), (44.8517, -94.309), (44.8802, -94.0465), (44.936, -95.7351), (45.4528, -95.009), (43.8444, -93.7624), (43.5129, -92.2036), (44.41, -96.15), (48.02, -94.98), (43.54, -94.67), (45.1407, -94.7912), (45.1396, -94.7571), (44.3406, -93.0498), (44.3424, -93.0405), (44.4935, -92.7473), (43.7077, -94.2478), (44.0432, -94.1665), (44.1155, -94.0785), (44.1186, -93.6027), (44.1816, -93.6305), (44.0797, -93.4774), (44.4013, -93.293), (44.2672, -92.9006), (44.5419, -92.9688), (44.5829, -92.975), (44.537, -92.919), (44.3353, -92.6512), (44.3665, -92.5506), (44.3684, -92.54), (44.6382, -92.6787), (47.18, -96.07), (47.185, -96.065), (43.5837, -93.2778), (43.5482, -92.3902), (43.6029, -92.3531), (43.6835, -92.3285), (43.5005, -92.2577), (44.0028, -96.3268), (44.046, -93.2279), (44.2028, -94.8309), (44.1049, -94.6351), (43.76, -93.17), (44.5843, -93.9426), (44.2999, -93.6969), (46.7, -96.74), (46.84, -96.45), (44.5962, -93.6957), (48.48, -95.22), (48.0519, -92.6779), (43.9355, -91.5095), (44.8053, -94.3299), (44.8555, -94.1804), (43.8698, -95.1275), (43.61, -94.41), (46.28, -95.44), (43.9761, -93.3639), (44.5721, -93.1216), (43.9071, -95.7834), (43.7162, -95.7486), (44.1095, -95.7834), (44.1313, -95.7583), (43.86, -95.2617), (45.9851, -93.7478), (46.4339, -93.7791), (44.9632, -93.7922), (45.7518, -93.3108), (44.443, -92.2769), (44.7455, -93.1139), (43.6501, -93.0624), (44.8795, -95.1011), (48.23, -96.72), (44.0015, -91.8063), (44.0318, -91.7238), (48.42, -95.83), (46.24, -95.52), (46.2639, -93.9951), (44.2047, -94.1376), (45.6716, -93.1011), (44.8809, -92.9104), (46.77, -96.25), (44.6018, -94.2095), (46.93, -95.12), (47.42, -96.27), (45.97, -96.19), (45.97, -96.24), (45.95, -96.12), (45.7443, -95.6223), (43.6655, -95.666), (45.8545, -95.2105), (45.8278, -95.0473), (46.2176, -94.6484), (45.2072, -94.986), (46.0678, -94.294), (44.767, -94.3406), (45.1899, -94.8422), (45.7759, -94.4501), (45.2391, -94.7576), (45.2433, -94.7635), (45.9745, -93.8804), (45.0452, -94.5504), (44.8404, -93.7498), (46.54, -93.01), (45.1378, -93.657), (45.0351, -93.368), (44.7089, -96.2051), (44.1849, -93.314), (44.1989, -93.3035), (44.1965, -93.3154), (44.2135, -93.3378), (44.5394, -93.8252), (44.5432, -93.825), (44.4055, -93.3002), (44.6251, -93.2647), (46.29, -95.66), (47.78, -94.86), (47.89, -96.73), (45.4506, -95.003), (45.5726, -94.5994), (44.2441, -94.8627), (44.226, -94.8068), (44.3038, -94.641), (43.8756, -93.5422), (43.8758, -93.5136), (44.4824, -93.952), (44.7507, -93.383), (44.7906, -93.2434), (44.7388, -93.2146), (44.0331, -92.2261), (45.68, -96.46), (45.76, -96.29), (45.3, -96.34), (45.51, -96.5), (45.5, -96.56), (45.8019, -96.293), (43.4997, -93.6704), (43.5323, -93.3052), (43.4997, -93.0559), (43.5142, -93.0493), (43.6113, -93.2505), (43.894, -93.0592), (43.7763, -92.37), (43.6699, -92.0805), (43.9605, -91.7908), (43.8998, -91.9822), (43.9088, -91.897), (47.68, -96.47), (44.6497, -92.8613), (43.9483, -95.4623), (44.2215, -94.4585), (44.6386, -93.9182), (44.6748, -93.8888), (45.1844, -93.272), (45.3534, -95.2893), (45.6663, -95.5279), (45.843, -96.2661), (45.7447, -94.9528), (46.3123, -95.5963), (46.1159, -94.6098), (46.9438, -92.0955), (43.5719, -95.9786), (45.1454, -95.9201), (44.8885, -94.0098), (44.9783, -93.9716), (47.229, -93.6597), (48.1, -95.63), (46.86, -96.5), (48.13, -95.66), (48.8626, -96.7817), (43.8605, -91.8798), (43.7967, -91.6015), (43.7162, -93.6339), (44.7152, -93.2642), (44.7636, -93.2156), (44.8248, -93.1569), (44.9018, -93.07), (44.9467, -93.035), (44.9004, -95.261), (44.8988, -95.2485), (44.8185, -95.4746), (43.8942, -95.1545), (47.41, -96.6), (47.4991, -96.7571), (45.3009, -94.9739), (43.818, -95.505), (45.1051, -93.8302)]
246
In [5]:
coord_df= pd.DataFrame(ef0_coordinates)

coord_df
Out[5]:
0 1
0 45.6989 -95.2829
1 45.6894 -94.6042
2 47.8400 -95.9200
3 46.8200 -96.7000
4 46.9000 -96.6000
... ... ...
241 47.4100 -96.6000
242 47.4991 -96.7571
243 45.3009 -94.9739
244 43.8180 -95.5050
245 45.1051 -93.8302

246 rows × 2 columns

In [6]:
# List comprehension to extract coordinates of EF0 tornado events
ef0_coordinates_c = [(row["CZ_NAME_STR"], row['BEGIN_LAT'], row['BEGIN_LON']) for index, row in df.iterrows() if row['TOR_F_SCALE'] == 'EF0']
print(ef0_coordinates_c)
[('POPE CO.', 45.6989, -95.2829), ('STEARNS CO.', 45.6894, -94.6042), ('RED LAKE CO.', 47.84, -95.92), ('CLAY CO.', 46.82, -96.7), ('CLAY CO.', 46.9, -96.6), ('STEARNS CO.', 45.5488, -94.808), ('PIPESTONE CO.', 44.1, -96.3105), ('PIPESTONE CO.', 43.9812, -96.3452), ('BLUE EARTH CO.', 43.9589, -94.2033), ('LE SUEUR CO.', 44.1973, -93.5417), ('RICE CO.', 44.2106, -93.5253), ('RICE CO.', 44.3134, -93.4661), ('BLUE EARTH CO.', 44.1282, -93.8622), ('BIG STONE CO.', 45.51, -96.64), ('BIG STONE CO.', 45.56, -96.6), ('TRAVERSE CO.', 45.61, -96.53), ('ROSEAU CO.', 48.91, -95.72), ('WADENA CO.', 46.5, -94.79), ('CASS CO.', 46.496, -94.7789), ('SWIFT CO.', 45.2445, -95.9863), ('SWIFT CO.', 45.2506, -95.8705), ('GOODHUE CO.', 44.3938, -92.9217), ('STEARNS CO.', 45.3265, -94.4055), ('MCLEOD CO.', 44.9472, -94.026), ('FREEBORN CO.', 43.762, -93.2111), ('POLK CO.', 47.62, -96.58), ('POLK CO.', 47.54, -96.51), ('BELTRAMI CO.', 47.8878, -94.7866), ('STEELE CO.', 43.8482, -93.2348), ('WABASHA CO.', 44.1631, -92.1789), ('WABASHA CO.', 44.1286, -92.2541), ('STEVENS CO.', 45.4433, -95.8244), ('GRANT CO.', 45.79, -95.8), ('WILKIN CO.', 46.34, -96.54), ('WILKIN CO.', 46.3, -96.29), ('OLMSTED CO.', 44.0561, -92.3083), ('OLMSTED CO.', 43.9989, -92.1358), ('OLMSTED CO.', 43.998, -92.3098), ('NICOLLET CO.', 44.3147, -94.3519), ('ANOKA CO.', 45.3784, -93.2268), ('CHISAGO CO.', 45.3072, -93.0197), ('CHISAGO CO.', 45.3229, -92.8595), ('WASHINGTON CO.', 45.1811, -92.8581), ('TODD CO.', 46.1529, -94.9219), ('DOUGLAS CO.', 45.9763, -95.5803), ('DOUGLAS CO.', 45.9815, -95.3762), ('LYON CO.', 44.3831, -95.8188), ('LYON CO.', 44.3921, -95.78), ('BROWN CO.', 44.1615, -95.0339), ('CLAY CO.', 46.81, -96.58), ('CASS CO.', 47.4506, -94.2105), ('CASS CO.', 47.4341, -94.2526), ('CASS CO.', 47.3999, -94.2862), ('NICOLLET CO.', 44.2279, -94.1762), ('NICOLLET CO.', 44.244, -94.1705), ('NICOLLET CO.', 44.3939, -94.1851), ('SIBLEY CO.', 44.5377, -94.2398), ('SIBLEY CO.', 44.5654, -94.2676), ('LE SUEUR CO.', 44.5359, -93.5852), ('SCOTT CO.', 44.5436, -93.59), ('SIBLEY CO.', 44.694, -94.5097), ('SCOTT CO.', 44.7246, -93.4707), ('MCLEOD CO.', 44.8517, -94.309), ('MCLEOD CO.', 44.8802, -94.0465), ('CHIPPEWA CO.', 44.936, -95.7351), ('STEARNS CO.', 45.4528, -95.009), ('FARIBAULT CO.', 43.8444, -93.7624), ('FILLMORE CO.', 43.5129, -92.2036), ('LINCOLN CO.', 44.41, -96.15), ('BELTRAMI CO.', 48.02, -94.98), ('MARTIN CO.', 43.54, -94.67), ('KANDIYOHI CO.', 45.1407, -94.7912), ('MEEKER CO.', 45.1396, -94.7571), ('RICE CO.', 44.3406, -93.0498), ('GOODHUE CO.', 44.3424, -93.0405), ('GOODHUE CO.', 44.4935, -92.7473), ('FARIBAULT CO.', 43.7077, -94.2478), ('BLUE EARTH CO.', 44.0432, -94.1665), ('BLUE EARTH CO.', 44.1155, -94.0785), ('WASECA CO.', 44.1186, -93.6027), ('WASECA CO.', 44.1816, -93.6305), ('WASECA CO.', 44.0797, -93.4774), ('RICE CO.', 44.4013, -93.293), ('GOODHUE CO.', 44.2672, -92.9006), ('DAKOTA CO.', 44.5419, -92.9688), ('DAKOTA CO.', 44.5829, -92.975), ('GOODHUE CO.', 44.537, -92.919), ('GOODHUE CO.', 44.3353, -92.6512), ('WABASHA CO.', 44.3665, -92.5506), ('GOODHUE CO.', 44.3684, -92.54), ('GOODHUE CO.', 44.6382, -92.6787), ('NORMAN CO.', 47.18, -96.07), ('MAHNOMEN CO.', 47.185, -96.065), ('FREEBORN CO.', 43.5837, -93.2778), ('FILLMORE CO.', 43.5482, -92.3902), ('FILLMORE CO.', 43.6029, -92.3531), ('FILLMORE CO.', 43.6835, -92.3285), ('FILLMORE CO.', 43.5005, -92.2577), ('PIPESTONE CO.', 44.0028, -96.3268), ('STEELE CO.', 44.046, -93.2279), ('BROWN CO.', 44.2028, -94.8309), ('WATONWAN CO.', 44.1049, -94.6351), ('FREEBORN CO.', 43.76, -93.17), ('SIBLEY CO.', 44.5843, -93.9426), ('LE SUEUR CO.', 44.2999, -93.6969), ('CLAY CO.', 46.7, -96.74), ('CLAY CO.', 46.84, -96.45), ('SCOTT CO.', 44.5962, -93.6957), ('BELTRAMI CO.', 48.48, -95.22), ('ST. LOUIS CO.', 48.0519, -92.6779), ('WINONA CO.', 43.9355, -91.5095), ('MCLEOD CO.', 44.8053, -94.3299), ('MCLEOD CO.', 44.8555, -94.1804), ('COTTONWOOD CO.', 43.8698, -95.1275), ('MARTIN CO.', 43.61, -94.41), ('OTTER TAIL CO.', 46.28, -95.44), ('STEELE CO.', 43.9761, -93.3639), ('DAKOTA CO.', 44.5721, -93.1216), ('MURRAY CO.', 43.9071, -95.7834), ('NOBLES CO.', 43.7162, -95.7486), ('MURRAY CO.', 44.1095, -95.7834), ('MURRAY CO.', 44.1313, -95.7583), ('COTTONWOOD CO.', 43.86, -95.2617), ('MILLE LACS CO.', 45.9851, -93.7478), ('AITKIN CO.', 46.4339, -93.7791), ('CARVER CO.', 44.9632, -93.7922), ('KANABEC CO.', 45.7518, -93.3108), ('WABASHA CO.', 44.443, -92.2769), ('DAKOTA CO.', 44.7455, -93.1139), ('FREEBORN CO.', 43.6501, -93.0624), ('RENVILLE CO.', 44.8795, -95.1011), ('MARSHALL CO.', 48.23, -96.72), ('WINONA CO.', 44.0015, -91.8063), ('WINONA CO.', 44.0318, -91.7238), ('MARSHALL CO.', 48.42, -95.83), ('OTTER TAIL CO.', 46.24, -95.52), ('CROW WING CO.', 46.2639, -93.9951), ('NICOLLET CO.', 44.2047, -94.1376), ('CHISAGO CO.', 45.6716, -93.1011), ('WASHINGTON CO.', 44.8809, -92.9104), ('CLAY CO.', 46.77, -96.25), ('SIBLEY CO.', 44.6018, -94.2095), ('HUBBARD CO.', 46.93, -95.12), ('NORMAN CO.', 47.42, -96.27), ('GRANT CO.', 45.97, -96.19), ('GRANT CO.', 45.97, -96.24), ('GRANT CO.', 45.95, -96.12), ('POPE CO.', 45.7443, -95.6223), ('NOBLES CO.', 43.6655, -95.666), ('DOUGLAS CO.', 45.8545, -95.2105), ('TODD CO.', 45.8278, -95.0473), ('MORRISON CO.', 46.2176, -94.6484), ('KANDIYOHI CO.', 45.2072, -94.986), ('MORRISON CO.', 46.0678, -94.294), ('MCLEOD CO.', 44.767, -94.3406), ('KANDIYOHI CO.', 45.1899, -94.8422), ('MORRISON CO.', 45.7759, -94.4501), ('MEEKER CO.', 45.2391, -94.7576), ('MEEKER CO.', 45.2433, -94.7635), ('MORRISON CO.', 45.9745, -93.8804), ('MEEKER CO.', 45.0452, -94.5504), ('CARVER CO.', 44.8404, -93.7498), ('CARLTON CO.', 46.54, -93.01), ('HENNEPIN CO.', 45.1378, -93.657), ('HENNEPIN CO.', 45.0351, -93.368), ('YELLOW MEDICINE CO.', 44.7089, -96.2051), ('STEELE CO.', 44.1849, -93.314), ('RICE CO.', 44.1989, -93.3035), ('RICE CO.', 44.1965, -93.3154), ('RICE CO.', 44.2135, -93.3378), ('LE SUEUR CO.', 44.5394, -93.8252), ('SCOTT CO.', 44.5432, -93.825), ('RICE CO.', 44.4055, -93.3002), ('DAKOTA CO.', 44.6251, -93.2647), ('OTTER TAIL CO.', 46.29, -95.66), ('BELTRAMI CO.', 47.78, -94.86), ('POLK CO.', 47.89, -96.73), ('STEARNS CO.', 45.4506, -95.003), ('STEARNS CO.', 45.5726, -94.5994), ('BROWN CO.', 44.2441, -94.8627), ('BROWN CO.', 44.226, -94.8068), ('BROWN CO.', 44.3038, -94.641), ('WASECA CO.', 43.8756, -93.5422), ('WASECA CO.', 43.8758, -93.5136), ('SIBLEY CO.', 44.4824, -93.952), ('SCOTT CO.', 44.7507, -93.383), ('DAKOTA CO.', 44.7906, -93.2434), ('DAKOTA CO.', 44.7388, -93.2146), ('OLMSTED CO.', 44.0331, -92.2261), ('TRAVERSE CO.', 45.68, -96.46), ('TRAVERSE CO.', 45.76, -96.29), ('BIG STONE CO.', 45.3, -96.34), ('BIG STONE CO.', 45.51, -96.5), ('BIG STONE CO.', 45.5, -96.56), ('TRAVERSE CO.', 45.8019, -96.293), ('FARIBAULT CO.', 43.4997, -93.6704), ('FREEBORN CO.', 43.5323, -93.3052), ('FREEBORN CO.', 43.4997, -93.0559), ('MOWER CO.', 43.5142, -93.0493), ('FREEBORN CO.', 43.6113, -93.2505), ('STEELE CO.', 43.894, -93.0592), ('FILLMORE CO.', 43.7763, -92.37), ('FILLMORE CO.', 43.6699, -92.0805), ('WINONA CO.', 43.9605, -91.7908), ('WINONA CO.', 43.8998, -91.9822), ('WINONA CO.', 43.9088, -91.897), ('POLK CO.', 47.68, -96.47), ('DAKOTA CO.', 44.6497, -92.8613), ('COTTONWOOD CO.', 43.9483, -95.4623), ('BROWN CO.', 44.2215, -94.4585), ('SIBLEY CO.', 44.6386, -93.9182), ('CARVER CO.', 44.6748, -93.8888), ('ANOKA CO.', 45.1844, -93.272), ('SWIFT CO.', 45.3534, -95.2893), ('POPE CO.', 45.6663, -95.5279), ('TRAVERSE CO.', 45.843, -96.2661), ('STEARNS CO.', 45.7447, -94.9528), ('OTTER TAIL CO.', 46.3123, -95.5963), ('MORRISON CO.', 46.1159, -94.6098), ('ST. LOUIS CO.', 46.9438, -92.0955), ('NOBLES CO.', 43.5719, -95.9786), ('CHIPPEWA CO.', 45.1454, -95.9201), ('CARVER CO.', 44.8885, -94.0098), ('WRIGHT CO.', 44.9783, -93.9716), ('ITASCA CO.', 47.229, -93.6597), ('PENNINGTON CO.', 48.1, -95.63), ('CLAY CO.', 46.86, -96.5), ('PENNINGTON CO.', 48.13, -95.66), ('KITTSON CO.', 48.8626, -96.7817), ('WINONA CO.', 43.8605, -91.8798), ('HOUSTON CO.', 43.7967, -91.6015), ('FREEBORN CO.', 43.7162, -93.6339), ('DAKOTA CO.', 44.7152, -93.2642), ('DAKOTA CO.', 44.7636, -93.2156), ('DAKOTA CO.', 44.8248, -93.1569), ('DAKOTA CO.', 44.9018, -93.07), ('RAMSEY CO.', 44.9467, -93.035), ('CHIPPEWA CO.', 44.9004, -95.261), ('KANDIYOHI CO.', 44.8988, -95.2485), ('RENVILLE CO.', 44.8185, -95.4746), ('COTTONWOOD CO.', 43.8942, -95.1545), ('NORMAN CO.', 47.41, -96.6), ('POLK CO.', 47.4991, -96.7571), ('KANDIYOHI CO.', 45.3009, -94.9739), ('NOBLES CO.', 43.818, -95.505), ('WRIGHT CO.', 45.1051, -93.8302)]
In [8]:
df_cc=pd.DataFrame(ef0_coordinates_c, columns=["County", "Lat", "Lon"])

df_cc
Out[8]:
County Lat Lon
0 POPE CO. 45.6989 -95.2829
1 STEARNS CO. 45.6894 -94.6042
2 RED LAKE CO. 47.8400 -95.9200
3 CLAY CO. 46.8200 -96.7000
4 CLAY CO. 46.9000 -96.6000
... ... ... ...
241 NORMAN CO. 47.4100 -96.6000
242 POLK CO. 47.4991 -96.7571
243 KANDIYOHI CO. 45.3009 -94.9739
244 NOBLES CO. 43.8180 -95.5050
245 WRIGHT CO. 45.1051 -93.8302

246 rows × 3 columns

Converting data into DataFrame compatible format

  • DataFrame can be readily created from the following data structures:

    • list of tuples:

      • Structure: data = [ (1, 'Alice'),...] OR data= [ (1, ('Alice', 25)),...]
      • each entry in the tuple (a,b) represents columns
      • convert through assignment of tuple elements to columns
      • df = pd.DataFrame(data, columns= ['col1','col2'])
    • list of dictionaries:

      • Structure: data = [ {'ID': 1, 'Name': 'Alice'},...]
      • each entry in the dictionary {key1:value1, key2:value2} represents a row
      • convert directly
      • df= pd.DataFrame(data)
    • dictionary of lists:

      • Structure: data = { 'ID': [1,2,3], 'Name': ['Alice', 'Bob', 'Charlie']}
      • each key : [list] pair is a column
      • convert directly
      • df= pd.DataFrame(data)
    • List of lists:

      • Structure: data= [ [ 1, 'Alice'] , [2, 'Bob'], [3, 'Charlie']]
      • each list represents multiple columns
      • convert through assignment of list element to columns
      • df= pd.DataFrame(data, columns= ['col1','col2'])
    • Dictionary of dictionaries:

      • Structure: data = { 'row1': {'ID':1,'Name':'Alice},...}
      • each key: dictionary entry represents a row
      • convert through assignment of index orientation
      • df= pd.DataFrame(data, orient='index')

Convert a list of Tuples to DataFrame

In [ ]:
# Recognize that  a dataframe can be readily made from a list of tuples
# each tuple is a new row, each element within the tuple is a new column
ef0_df= pd.DataFrame(ef0_coordinates, columns=['latitude', 'longitude'])

ef0_df
In [ ]:
# add a column to flag the rows that are EF0 data
ef0_df["EF0_data"]=True

ef0_df
In [ ]:
tor_data_cdf= df.copy()

column_names= {
    'BEGIN_LAT': 'latitude',
    'BEGIN_LON': 'longitude'
}
tor_data_cdf=tor_data_cdf.rename(columns=column_names)
merge_df = tor_data_cdf.merge(ef0_df[["latitude", "longitude", "EF0_data"]], on=["latitude", "longitude"], how='left') 

merge_df
In [ ]:
len(tor_data_cdf)
In [ ]:
tor_data_cdf.groupby("TOR_F_SCALE")["TOR_F_SCALE"].size()
In [ ]:
tor_data_cdf["TOR_F_SCALE"].notna().sum()
In [ ]:
tor_data_cdf["TOR_F_SCALE"].count()
In [ ]:
merge_df=merge_df[merge_df["EF0_data"]==True]
len(merge_df)

Reading Shapefiles in Cartopy

  • What are shapefile?
    • Common file type in GIS used to represent distinct objects
    • file name ends in .shp
    • Contains objects represented in vector data format
      • spatial data with distinct boundaries
      • examples: buildings, rivers, roads

Access records in Shapefile with CartoPy .Reader()

In [ ]:
import cartopy.io.shapereader as shpreader

# read in the shapefile
shapefile_path = r'G:\My Drive\Python_projects\my_git_pages_website\Py-and-Sky-Labs\content\Python Examples\Data\US_County_borders\ne_10m_admin_2_counties.shp'

# Use shpreader.Reader to load the shapefile

reader = shpreader.Reader(shapefile_path)

print(type(reader))

# This creates a Cartopy Reader object that can be used to access the shapefile's records
In [ ]:
# extract the shapefile record data the reader object
records = reader.records()

print(next(records)) 

# access the first record
rec1=next(records)

# print is attributes
rec1.attributes
In [ ]:
# access the county Name field
print(rec1.attributes['NAME'])

# get attributes of the shapefile
# for attribute_n,attribute_v in rec1.attributes.items():
#     print({attribute_n}, {attribute_v})

# access the geometry for the first record
print(rec1.geometry)

# access the coordinates only
# print(list(rec1.geometry.exterior.coords))

# access the shape for the record
rec1.geometry.geom_type
In [ ]:
# access a specific attribute of the feautures in the shapefile

# start a list of counties

MN_counties_list = []

# loop through the list of counties in the shapefile records
for county in reader.records():
    if county.attributes['REGION']=='MN':
        print((county.attributes['NAME']))
        MN_counties_list.append(county)
        print(f'appended {len(MN_counties_list)} counties')
In [ ]:
print(MN_counties_list)
In [ ]:
len(MN_counties_list)
In [ ]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(14,10), subplot_kw={'projection':ccrs.PlateCarree()})

# plot all geometries for all counties
counties = list(reader.geometries())
print(len(counties))

ax.add_geometries(counties, ccrs.PlateCarree(), edgecolor='black', facecolor='grey', zorder=0) # changed to show difference (use none to make clear)

ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')
ax.add_feature(cfeature.LAKES, zorder=3)
ax.add_feature(cfeature.RIVERS, zorder=2)
# Limit the map extent to Minnesota
#                w, e, s, n bounds
ax.set_extent([-97.5, -89.5, 43.5, 49.5], crs=ccrs.PlateCarree())  # Adjust these values based on the actual coordinates of Minnesota

# ax.set_legend()

# mn_geom = []
# for county in reader.records():
#     if county.attributes['REGION']=='MN':
#         mn_geom.append(county.geometry)
# mn_geom = [county.geometry for county in reader.records() if county.attributes['REGION']== 'MN']

# ax.add_geometries(mn_geom, ccrs.PlateCarree(), edgecolor='black', facecolor='lightpink', zorder=1)

# Add grid lines with labels
gl = ax.gridlines(draw_labels=True, linestyle='--', color='gray')
gl.top_labels = False
gl.right_labels = False

# Plot Minnesota counties with names
for county in reader.records():
        if county.attributes["REGION"]=='MN':
            geometry = county.geometry
            name = county.attributes['NAME']
            ax.add_geometries([geometry], ccrs.PlateCarree(), edgecolor='black', facecolor='lightpink', zorder=1)
            x, y = geometry.centroid.x, geometry.centroid.y
            ax.text(x, y, name, fontsize=9, ha='center', transform=ccrs.PlateCarree())


plt.show()
In [ ]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt

# define the map projection
proj= ccrs.PlateCarree()

# generate the figure and define its axes
fig, ax= plt.subplots(figsize=(14,10),subplot_kw={'projection':proj})


# extract all the geometries from the shapefile reader object
all_counties = list(reader.geometries())

# plot the county shapes on the map
ax.add_geometries(all_counties, crs= proj, edgecolor= 'none', facecolor='grey', zorder=0)


# add map base features
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.RIVERS, zorder=6)
ax.add_feature(cfeature.LAKES, zorder=7)
ax.add_feature(cfeature.BORDERS, linestyle=':', zorder=3, edgecolor='black')
ax.add_feature(cfeature.STATES, linestyle='-', zorder=4, edgecolor='black')


# set the extent for the map
ax.set_extent([-97.5,-89.5, 42.5, 49.5], crs=proj)


for county in reader.records():
    if county.attributes["REGION"]=='MN':
        geometry= county.geometry
        name= county.attributes["NAME"]
        x,y = county.geometry.centroid.x, county.geometry.centroid.y
        ax.text(x,y, name, fontsize=9,ha= 'center' , transform=ccrs.Geodetic(), zorder=8)
        ax.add_geometries(geometry,crs=proj , edgecolor='black', facecolor='lightpink', zorder=4)





plt.show()
In [ ]:
import cartopy.feature as cfeature
import cartopy.crs as ccrs
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(14,10), subplot_kw={'projection':proj})

ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.OCEAN)
ax.add_feature(cfeature.LAND)
ax.add_feature(cfeature.BORDERS)
gl = ax.gridlines(draw_labels=True, linestyle=':', color='black',zorder=3)
plt.show()

Access records in Shapefile with reader.records()

In [ ]:
# Load the records of the shapefile
# The records() returns a generator object that can be used to iterate over the records in the shapefile
records = reader.records()

# Print the records object
type(records) # Output: <class 'generator'>

Access record attributes in Shapefile with record.attributes.items():

In [ ]:
# check the first record to understand the structure

# 'records' is an iterator of shapefile records
# Use next() to get the first record from the iterator
first_record = next(records)

# first_record is a Record object that contains both geometry and attribute data
print("Attributes and Values for the first Record:")
print(f"\n{first_record}\n")

# Print the number of attributes in the first record
#  is a dictionary containing the attribute names and values
print(len(first_record.attributes))

# Iterate over the dictionary items, which are the attributes, and print each attribute name and value.
for attribute_name, attribute_value in first_record.attributes.items():
    print(f"{attribute_name}:{attribute_value}")

Access record Geometry in Shapefile with record.geometry():

In [ ]:
print("\nFirst Record Geometry:\n")
print(first_record.geometry)

# Print the geometry type of the first record
print("\nGeometry type:\n")
print(first_record.geometry.geom_type)  # gives the type of the geometry (e.g., 'Polygon').

# Access the geometry data for the first record
geometry = first_record.geometry

# Access the coordinate points of the exterior ring
if geometry.geom_type == 'Polygon':
    # Access the coordinates of the exterior ring of the polygon
    exterior_coords = list(geometry.exterior.coords)
    print("\nExterior Coordinates:")
    for coord in exterior_coords:
        print(coord)
    print(f"\nNumber of coordinates in the exterior ring: {len(exterior_coords)}")  # gives the number of coordinates of the exterior ring of the polygon

# Handle MultiPolygon records
elif geometry.geom_type == 'MultiPolygon':
    count_coords = 0
    #Use geometry.geoms to iterate over each polygon in the MultiPolygon
    for polygon in geometry.geoms:
        count_coords += len(polygon.exterior.coords)
        print("\nExterior Coordinates of a Polygon in MultiPolygon:")
        for coord in polygon.exterior.coords:
            print(coord)
    print(f"\nTotal number of coordinates in the exterior rings of the MultiPolygon: {count_coords}")
else:
    print("Not a polygon or multipolygon")

# will print paris of coordinates that define the shape of the polygon
# each coordinate pair (lon, lat) is a vertex of the polygon
# each vertex is a point (lon,lat) on the map
# these points connect to form the boundary that defines the shape of the polygon

Bonus 1: Convert to GeoJson and visualize the record geometry on a web map

image.png

In [ ]:
from shapely.geometry import mapping
import json


# Get the first record
first_record = next(records)

# Convert the geometry to GeoJSON format
geometry_geojson = mapping(first_record.geometry)

# Create a GeoJSON feature
# Wrap the geometry in a GeoJSON feature and include the attributes
geojson_feature = {
    "type": "Feature",
    "geometry": geometry_geojson,
    "properties": first_record.attributes
}

# Convert the feature to a JSON string  for easy copying into a browser
geojson_str = json.dumps(geojson_feature, indent=2)

# Print the GeoJSON string
print(geojson_str)
In [ ]:
# read in the shapefile
shapefile_path = r'G:\My Drive\Python_projects\my_git_pages_website\Py-and-Sky-Labs\content\Python Examples\Data\US_County_borders\ne_10m_admin_2_counties.shp'

reader = shpreader.Reader(shapefile_path)

# Filter for counties in Minnesota
#List Comprehension: Creates a list in memory, which can be inefficient for large datasets
minnesota_counties = [county for county in reader.records() if county.attributes['REGION'] == 'MN']

# Find the specific county (Scott) in Minnesota
# Create Generator expression to find the specific county in Minnesota
# This creates a generator that yields counties matching the condition

#Generator Expression: Creates a generator that yields items one at a time, without storing the entire sequence in memory
scott_county_generator = (county for county in minnesota_counties if county.attributes['NAME'] == 'Scott')


#we use next() in conjunction with a generator expression
#  next() allows us to efficiently retrieve the first (and in this case, the only) matching item without creating an intermediate list
scott_county = next(scott_county_generator, None)

# Check if the record is found
if scott_county:
    print("Scott County Geometry:\n")
    print(scott_county.geometry)  # Print the geometry 
else:
    print("Scott County not found in the dataset.")
In [ ]:
# Alternative approach with two list comprehensions, less memory-efficient for large datasets

# Load the shapefile
shapefile_path = r'G:\My Drive\Python_projects\my_git_pages_website\Py-and-Sky-Labs\content\Python Examples\Data\US_County_borders\ne_10m_admin_2_counties.shp'
reader = shpreader.Reader(shapefile_path)

# The records() method returns a generator that yields records one at a time
# Each record corresponds to a row in the shapefile's attribute table
something = reader.records()
print(type(something))  # <class 'generator'>

# Since reader.records() returns a generator (which is an iterable), you can use a list comprehension to filter or transform the records
# Filter for counties in Minnesota using list comprehension
# List comprehensions can iterate over any iterable, including generators, lists, tuples, dictionaries, sets, and objects that implement the iterator protocol
minnesota_counties = [county for county in reader.records() if county.attributes['REGION'] == 'MN']

# Find the specific county (Scott) in Minnesota using list comprehension and next()
# This creates a list of counties named 'Scott'
scott_county_list = [county for county in minnesota_counties if county.attributes['NAME'] == 'Scott']

# Convert the list to an iterator and use next() to get the first item
# If the list is empty, next() returns None
scott_county = next(iter(scott_county_list), None)

# Check if the record is found
if scott_county:
    print("Scott County Geometry:\n")
    print(scott_county.geometry)  # Print the geometry
else:
    print("Scott County not found in the dataset.")
In [ ]:
# Convert the geometry to GeoJSON format
geometry_geojson = mapping(scott_county.geometry)

# Create a GeoJSON feature
# Wrap the geometry in a GeoJSON feature and include the attributes
geojson_feature = {
    "type": "Feature",
    "geometry": geometry_geojson,
    "properties": scott_county.attributes
}

# Convert the feature to a JSON string  for easy copying into a browser
geojson_str = json.dumps(geojson_feature, indent=2)

# Print the GeoJSON string
print(geojson_str)
In [ ]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from matplotlib import pyplot as plt 
import cartopy.io.shapereader as shpreader

# Path to the Natural Earth shapefile
shapefile_path = r'G:\My Drive\Python_projects\my_git_pages_website\Py-and-Sky-Labs\content\Python Examples\Data\US_County_borders\ne_10m_admin_2_counties.shp'

# choose the correct coordinate reference system
proj = ccrs.PlateCarree()

# Initialize the figure and axes for the plots
fig, ax = plt.subplots(figsize=(14, 10), subplot_kw={'projection': proj})

# Add features to the map from Natural Earth
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')
ax.add_feature(cfeature.STATES, linestyle=':')

# Plot the data points
ax.scatter(merge_df ["longitude"],merge_df["latitude"], color='red', s=50, edgecolor='k')


# Load the shapefile and filter for counties in Minnesota
reader = shpreader.Reader(shapefile_path)
minnesota_counties = [county for county in reader.records() if county.attributes['REGION'] == 'MN']

# Plot only the filtered counties
for county in minnesota_counties:
    geometry = county.geometry
    name = county.attributes['NAME']
    ax.add_geometries([geometry], ccrs.PlateCarree(), edgecolor='black', facecolor='none')
    x, y = geometry.centroid.x, geometry.centroid.y
    ax.text(x, y, name, fontsize=9, ha='center', transform=ccrs.Geodetic())
# Limit the map extent to Minnesota

# set extents defines a bounding box for the map using 4 values
# These 4 values are the coordinates of the four corners of the bounding box
# each value represents a state border ( approx borders)
# extents are set up as follows: 
    # west , east, south, north
    # west longitude , east longitude, south latitude, north latitude 

            #range of  longitudes, # range of latitudes
ax.set_extent([-97.5, -89.5, 43.5, 49.5], crs= proj)  # Adjust these values based on the actual coordinates of Minnesota

# Set the title
ax.set_title('EF0 Tornadoes in Minnesota from May 2016 to September 2023')

# Show the plot
plt.show()
In [ ]:
# verify the timeframe of the data
merge_df['BEGIN_DATE'] = pd.to_datetime(merge_df['BEGIN_DATE'])
merge_df["BEGIN_DATE"].sort_values(ascending=False).head(1)
In [ ]:
merge_df["BEGIN_DATE"].sort_values(ascending=True).head(1)
In [109]:
# nice plot. We see here that the EFO pretty much impact all counties
# can we get a sense of which had the most EF0 in this timeframe?
In [ ]:
merge_cdf= merge_df.copy()

merge_cdf["County_Counts"] = merge_cdf.groupby("CZ_NAME_STR")["EVENT_ID"].transform('count')

merge_cdf.sort_values(by="County_Counts",ascending=False)
In [ ]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import cartopy.io.shapereader as shpreader

# path the natural Earth shapefile

shapefile_path = r'G:\My Drive\Python_projects\my_git_pages_website\Py-and-Sky-Labs\content\Python Examples\Data\US_County_borders\ne_10m_admin_2_counties.shp'



# initialize the figure and axes for the plot
fig, ax = plt.subplots(figsize=(16,10), subplot_kw= {'projection':proj})


# add features to the map
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.LAKES)
ax.add_feature(cfeature.BORDERS, linestyle=":")
ax.add_feature(cfeature.STATES, linestyle="-")




# Plot the data points using coutnty column to determine the size of the markers
ax.scatter(merge_cdf["longitude"], merge_cdf["latitude"], s=merge_cdf["County_Counts"]*10, color='red', alpha=0.5 , edgecolor= 'k', zorder=1)


# load the shapefile and filter to MN counties
reader= shpreader.Reader(shapefile_path)
minnesota_counties=[county for county in reader.records() if county.attributes['REGION']=='MN']

# plot only the filter counties

for county in minnesota_counties:
    geometry= county.geometry
    name= county.attributes['NAME']
    ax.add_geometries([geometry], proj, edgecolor='black', facecolor='none')
    x,y= geometry.centroid.x, geometry.centroid.y
    ax.text(x, y , name , fontsize=9, ha='center', transform=ccrs.Geodetic())





# set the borders to MN
ax.set_extent([-97.5, -89.5, 43.5 , 49.5], crs=proj)

# Set the title
ax.set_title("EF0 Tornadoes in Minnesota May 2016 to September 2023")

plt.show()
In [ ]:
merge_cdf["CZ_NAME_STR"].value_counts().head(10)
In [ ]:
# make a unique county name list
unique_county_names= merge_cdf["CZ_NAME_STR"].unique()
plt_data = merge_cdf.groupby("CZ_NAME_STR").first().reset_index()
plt_data
In [ ]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import cartopy.io.shapereader as shpreader

# path the natural Earth shapefile

shapefile_path = r'G:\My Drive\Python_projects\my_git_pages_website\Py-and-Sky-Labs\content\Python Examples\Data\US_County_borders\ne_10m_admin_2_counties.shp'



# initialize the figure and axes for the plot
fig, ax = plt.subplots(figsize=(16,10), subplot_kw= {'projection':proj})


# add features to the map
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.LAKES)
ax.add_feature(cfeature.BORDERS, linestyle=":")
ax.add_feature(cfeature.STATES, linestyle="-")




# Plot the data points using coutnty column to determine the size of the markers
ax.scatter(plt_data["longitude"], plt_data["latitude"], s=plt_data["County_Counts"]*10, color='red', alpha=0.5 , edgecolor= 'k', zorder=1)


# load the shapefile and filter to MN counties
reader= shpreader.Reader(shapefile_path)
minnesota_counties=[county for county in reader.records() if county.attributes['REGION']=='MN']

# plot only the filter counties

for county in minnesota_counties:
    geometry= county.geometry
    name= county.attributes['NAME']
    ax.add_geometries([geometry], proj, edgecolor='black', facecolor='none')
    x,y= geometry.centroid.x, geometry.centroid.y
    ax.text(x, y , name , fontsize=9, ha='center', transform=ccrs.Geodetic())





# set the borders to MN
ax.set_extent([-97.5, -89.5, 43.5 , 49.5], crs=proj)

# Set the title
ax.set_title("EF0 Tornadoes in Minnesota May 2016 to September 2023")

plt.show()

Introduction to Dictionary Comprehensions

  • Definition
    • Dictionary comprehensions are a concise and efficient way to create dictionaries in Python
    • Similar to list comprehensions, provide an elegant way to perform operations and apply conditions to iterables,
    • Specifically allow for the creation or transformation of dictionary key-value pairs in a single line of code

Why Do We Use Dictionary Comprehensions?:

  • Conciseness: Reduces the complexity and amount of code compared to traditional loops for creating dictionaries, making the code more readable
  • Performance: Generally faster than using a loop to add items to a dictionary due to optimized implementation and reduced overhead
  • Expressiveness: Enhances code clarity by focusing on the dictionary creation logic rather than the mechanics of looping and inserting key-value pairs
  • Versatility: Capable of incorporating conditional logic and multiple sources, allowing for sophisticated transformations and filtering in dictionary creation
  • Key point: Use dictionary comprehensions to efficiently transform or map data into key-value pairs

Syntax:

  • Basic Structure: A dictionary comprehension consists of curly braces {} containing a key-value pair expression followed by a for clause
  • Optionally, it can include one or more for or if clauses

Generalized Examples:

  • {key_expr: value_expr for item in iterable}
  • {key_expr: value_expr for item in iterable if condition}
  • {key_expr: value_expr for item in iterable if condition1 if condition2}
  • {key_expr: value_expr for item in iterable1 for item2 in iterable2}
In [ ]:
# quick recap how to manipulate dictionary
from datetime import datetime

from pprint import pprint # pretty print, readable format for dictionaries

# create an empty dictionary for products and their metadata
products_dict = {}

# add to the dictionary
# This is a nested dictionary, where the key is product ID and the value is another dictionary
products_dict['0001-2024']= {
    'name':'apple', 
    'amount':2, 
    'date': datetime.now().date().strftime('%Y%m%d')}

# display the dictionary
pprint(products_dict)

products_dict['0002-2024'] = {'name': 'banana'}

# display each item dictionary entry along with labels
for productID, metadata in products_dict.items():
    pprint(f"Product ID: {productID}, Metadata: {metadata}")

# print the number of items
print(f"Number of items: {len(products_dict)}")

# update an existing entry 
# allows you to add new key value pairs or update existing ones
products_dict['0002-2024'].update({'amount':10, 'date': datetime.now().strftime('%Y%m%d')})

#Dipsplay the items
pprint(products_dict.items())
pprint(list(products_dict.items()))


products_dict.update({'0003-2024':{'name': 'mangos','amount':100, 'date': datetime.now().strftime('%Y%m%d')}})

#Dipsplay the items
print(f"Number of items: {len(products_dict)}")

# display the dictionary
pprint(products_dict)

print()

counter=0
# List out the final nested dictionary of products
print("Final product list:")
for productID, metadata in products_dict.items():
    
    pprint(f"PRoduct ID {counter+1} : {productID}, Metadata: {metadata}")
    counter+=1

Basic Dictionary Comprehension

In [ ]:
# Make a dictionary where the keys are numbers and values are their squares

# Basic components of dictionary comprehension
# (1) key-value expression
# (2) item
# (3) iterable

            #(1)     (2)     (3)
squares = {x:x**2 for x in range(1,10)}
print(squares)

Conditional Dictionary Comprehension

In [ ]:
# Make  a dicionary of even numbers and their squares
even_squares = {x:x**2 for x in range(1,10) if x%2==0}

print(even_squares)

Using Functions in Dictionary Comprehension

In [ ]:
# Create a dictionary that maps each word in a list to its length

# suppose you start with a list of words
word_list = ["apple", "banana", "cherry"]


word_length_dict=  {word:len(word) for word in word_list}

print(word_length_dict)

Using a dataframe in a Dictionary Comprehension

In [11]:
import pandas as pd
# Create a dictionary that maps each word in a column of a dataframe to its length

# You start with a dataframe
data = {'Words':(["apple", "banana", "cherry"])*2} # Duplicate the list to increase the number of items
print(data)


df = pd.DataFrame(data)
print(df)

# use a dictionary comprehension directly on the dataframe column to map each word in the column to its length
word_length_dict=  {word:len(word) for word in data['Words']}

print(word_length_dict)

# add a count to the same dataframe as a new column
df["Word Counts"]=df.groupby("Words")["Words"].transform("count")

print(f"\n{df}\n")

# Map the lengths fromt dictionary to the key names in the column Words
df["Word Lenghths"] = df["Words"].map(word_length_dict)
print(f"\n{df}\n")
{'Words': ['apple', 'banana', 'cherry', 'apple', 'banana', 'cherry']}
    Words
0   apple
1  banana
2  cherry
3   apple
4  banana
5  cherry
{'apple': 5, 'banana': 6, 'cherry': 6}

    Words  Word Counts
0   apple            2
1  banana            2
2  cherry            2
3   apple            2
4  banana            2
5  cherry            2


    Words  Word Counts  Word Lenghths
0   apple            2              5
1  banana            2              6
2  cherry            2              6
3   apple            2              5
4  banana            2              6
5  cherry            2              6

In [12]:
# Read HTML tables using the lxml parser
counties_list = pd.read_html(
    "https://en.wikipedia.org/wiki/List_of_counties_in_Minnesota"
)
In [13]:
counties_list=counties_list[0]
In [14]:
counties_list 
# we know MN has 87 counties. Now we have the fill dataset 
Out[14]:
County FIPS code[3] County seat[4] Est.[1][4] Origin[5][6][7] Etymology Population[8] Area[4][8] Map
0 Aitkin County 1 Aitkin 1857 Pine County, Ramsey County William Alexander Aitken (1785–1851), early fu... 16102 1,819.30 sq mi (4,712 km2) NaN
1 Anoka County 3 Anoka 1857 Ramsey County Dakota word meaning "both sides" 372441 423.61 sq mi (1,097 km2) NaN
2 Becker County 5 Detroit Lakes 1858 Cass County, Pembina County George Loomis Becker, former state senator and... 35283 1,310.42 sq mi (3,394 km2) NaN
3 Beltrami County 7 Bemidji 1866 Unorganized Territory, Itasca County, Pembina ... Giacomo Beltrami, Italian explorer who explore... 46718 2,505.27 sq mi (6,489 km2) NaN
4 Benton County 9 Foley 1849 One of nine original counties; formed from res... Thomas Hart Benton (1782–1858), former United ... 41600 408.28 sq mi (1,057 km2) NaN
... ... ... ... ... ... ... ... ... ...
82 Watonwan County 165 St. James 1860 Brown County Watonwan River, a river that flows through Min... 11077 434.51 sq mi (1,125 km2) NaN
83 Wilkin County 167 Breckenridge 1858 Cass County, Pembina County Alexander Wilkin (1820–1864), Minnesota politi... 6306 751.43 sq mi (1,946 km2) NaN
84 Winona County 169 Winona 1854 Fillmore County, Wabasha County Named after Wee-No-Nah, Sister, or Cousin of C... 49721 626.30 sq mi (1,622 km2) NaN
85 Wright County 171 Buffalo 1855 Cass County, Sibley County Silas Wright (1795–1847), former United States... 151150 660.75 sq mi (1,711 km2) NaN
86 Yellow Medicine County 173 Granite Falls 1871 Redwood County Yellow Medicine River, a river that flows thro... 9467 757.96 sq mi (1,963 km2) NaN

87 rows × 9 columns

In [45]:
import pandas as pd
import re

# Sample data
data = pd.DataFrame({
    'Area': ['757.96 sq mi (1,963 km2)', '123.45 sq mi (320 km2)', '678.90 sq mi (1,760 km2)']
})

# Function to extract the numeric part
def extract_numeric(area_str):
    match = re.match(r'^\d+(\.\d+)?', area_str)
    return float(match.group()) if match else None

# Apply the function to the 'Area' column
data['Area_Sq_Mi'] = data['Area'].apply(extract_numeric)

print(data)
                       Area  Area_Sq_Mi
0  757.96 sq mi (1,963 km2)      757.96
1    123.45 sq mi (320 km2)      123.45
2  678.90 sq mi (1,760 km2)      678.90
In [46]:
largest_counties=counties_list["Area_Sq_Mi"].nlargest()
smallest_counties=counties_list["Area_Sq_Mi"].nsmallest()

print(largest_counties)
print(smallest_counties)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File c:\Users\Brian\.conda\envs\myenv\Lib\site-packages\pandas\core\indexes\base.py:3805, in Index.get_loc(self, key)
   3804 try:
-> 3805     return self._engine.get_loc(casted_key)
   3806 except KeyError as err:

File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()

File pandas\\_libs\\hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas\\_libs\\hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Area_Sq_Mi'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[46], line 1
----> 1 largest_counties=counties_list["Area_Sq_Mi"].nlargest()
      2 smallest_counties=counties_list["Area_Sq_Mi"].nsmallest()
      4 print(largest_counties)

File c:\Users\Brian\.conda\envs\myenv\Lib\site-packages\pandas\core\frame.py:4102, in DataFrame.__getitem__(self, key)
   4100 if self.columns.nlevels > 1:
   4101     return self._getitem_multilevel(key)
-> 4102 indexer = self.columns.get_loc(key)
   4103 if is_integer(indexer):
   4104     indexer = [indexer]

File c:\Users\Brian\.conda\envs\myenv\Lib\site-packages\pandas\core\indexes\base.py:3812, in Index.get_loc(self, key)
   3807     if isinstance(casted_key, slice) or (
   3808         isinstance(casted_key, abc.Iterable)
   3809         and any(isinstance(x, slice) for x in casted_key)
   3810     ):
   3811         raise InvalidIndexError(key)
-> 3812     raise KeyError(key) from err
   3813 except TypeError:
   3814     # If we have a listlike key, _check_indexing_error will raise
   3815     #  InvalidIndexError. Otherwise we fall through and re-raise
   3816     #  the TypeError.
   3817     self._check_indexing_error(key)

KeyError: 'Area_Sq_Mi'

Create a list of county coordinates with Geopy

Geocoding with Nominatim via Geopy

  • Geocoding is the process of converting addresses (like "1600 Amphitheatre Parkway, Mountain View, CA") into geographic coordinates (like latitude 37.423021 and longitude -122.083739)
  • can use to place markers on a map, or position the map

Capabilities of Nominatim (Geopy):

  • Address Geocoding: Converts street addresses or other descriptive locations into geographic coordinates.
  • Reverse Geocoding: Converts geographic coordinates into a human-readable address.
  • Extensive Coverage: Utilizes OpenStreetMap data, providing global coverage often with fine-grained control over geocoding queries.
  • Customization Options: Allows customization of requests, including specifying the language of the result, the bounding box for constraining searches, and more.

Syntax for Geocoding and Reverse Geocoding 1) Geocoding (Address to Coordinates)

- Initialization: Create a Nominatim object with a user-defined user_agent
- Query: Use the .geocode() method with the address as a string.

2) Reverse Geocoding (Coordinates to Address)

- Initialization: Create a Nominatim object with a user-defined user_agent
- Query: Use the .reverse() method with a string in the format "latitude, longitude".
In [66]:
from geopy.geocoders import Nominatim
import requests

geolocator = Nominatim(user_agent="geocode_Address")

def getAddress_coords(address):
    location = geolocator.geocode(address)
    if location:
        latitude, longitude = location.latitude, location.longitude
        print(location)
        
        # Get elevation in meters
        elevation_url = f"https://api.open-elevation.com/api/v1/lookup?locations={latitude},{longitude}"
        response = requests.get(elevation_url)
        elevation_data = response.json()
        print(elevation_data)
        elevation_meters = elevation_data['results'][0]['elevation'] if 'results' in elevation_data else None
        print(elevation_meters)
        #convert elevation to feet
        elevation_feet = elevation_meters * 3.28084 if elevation_meters is not None else None
        print(elevation_feet)
        return (latitude, longitude, elevation_feet)
    else:
        print("Address Not Found, coordinates will be blank")
        return (None,None, None)
In [ ]:
geocode_result = getAddress_coords("5057 Edgewater Court, Savage, MN")
print(geocode_result)
In [ ]:
from geopy.geocoders import Nominatim

#initialize geocoder, Nominatim object

geolocator= Nominatim(user_agent= "geocode_Address")

def getAddress_coords(address):
    location= geolocator.geocode(address)
    if location:
        print(location)
        return (location.latitude, location.longitude)
    else:
        print("Address Not Found, coodinates will be blank")
        return(None,None)

geocode_result = getAddress_coords("5057 Edgewater Court, Savage, MN")

print(geocode_result)
In [ ]:
geocode_result = getAddress_coords("Murphy-Hanrehan Park Reserve, Savage, MN")

print(geocode_result)
In [ ]:
geolocator= Nominatim(user_agent= "geocode_Address")

def getAddress(coords):
    location= geolocator.reverse(coords)
    if location:
        return (location.address)
    else:
        print("Address Not Found, coodinates will be blank")

geocode_Address_result = getAddress(geocode_result)

print(geocode_Address_result)
In [18]:
# Combine retrieval of external data from geocoding service with a dictionary comprehension of a dataframe column
# Create a dictionary tha maps counties to their coordinates

from geopy.geocoders import Nominatim

# initialize geocoder

geolocator= Nominatim(user_agent= "geoapiExercise")

def get_lat_lon(county):
    # Append ", Minnesota" to ensure the geocoding query is localized
    location= geolocator.geocode(county+ ", Minnesota")
    if location:
        return (location.latitude, location.longitude)
    else:
        return (None, None)
        


# # Dictionary comprehension that maps a dataframe column of county names to their lat, lon coordinates 
# # the function of the dictionary comprehension returns the coordinates for each key in the dictionary defined by the dataframe column
coordinates_list = {county: get_lat_lon(county) for county in  counties_list["County"]}
In [ ]:
# extract county names
# county_names= counties_list["County"]


# #print the names of the county
# print(county_names)
# print(county_names.dtype)

# print(f"The dataset is an {type(county_names)}")
# print(f"The data value in the dataset is an {county_names.dtype}")
# print(county_names.index.to_list())
In [29]:
coordinates_list
Out[29]:
{'Aitkin County': (46.5714822, -93.3847595),
 'Anoka County': (45.2710195, -93.2827625),
 'Becker County': (46.9298236, -95.6761851),
 'Beltrami County': (47.9978537, -94.8799011),
 'Benton County': (45.7162129, -94.0481042),
 'Big Stone County': (45.385266, -96.3557364),
 'Blue Earth County': (44.0109722, -94.0560643),
 'Brown County': (44.2350232, -94.6955051),
 'Carlton County': (46.5799933, -92.7206334),
 'Carver County': (44.807118, -93.7871792),
 'Cass County': (47.0234117, -94.3454604),
 'Chippewa County': (45.027661, -95.5314914),
 'Chisago County': (45.4758877, -92.8849411),
 'Clay County': (46.8994904, -96.5088202),
 'Clearwater County': (47.5643825, -95.3747844),
 'Cook County': (47.9149076, -90.47301),
 'Cottonwood County': (44.019068, -95.1658845),
 'Crow Wing County': (46.4665237, -94.1017044),
 'Dakota County': (44.666655, -93.044911),
 'Dodge County': (44.0175404, -92.8678406),
 'Douglas County': (45.9340479, -95.4627651),
 'Faribault County': (43.6647961, -93.9510501),
 'Fillmore County': (43.6466588, -92.0636359),
 'Freeborn County': (43.6763617, -93.3501681),
 'Goodhue County': (44.396973, -92.7175627),
 'Grant County': (45.9358795, -96.0272071),
 'Hennepin County': (45.0257232, -93.4865052),
 'Houston County': (43.6624222, -91.4685617),
 'Hubbard County': (47.1138266, -94.9427679),
 'Isanti County': (45.56932235, -93.32652095523574),
 'Itasca County': (47.4968343, -93.6225663),
 'Jackson County': (43.670011, -95.1500626),
 'Kanabec County': (45.8986948, -93.2850016),
 'Kandiyohi County': (45.142373, -95.0025846),
 'Kittson County': (48.7709208, -96.8074141),
 'Koochiching County': (48.221596, -93.7684251),
 'Lac qui Parle County': (44.986426, -96.2024907),
 'Lake County': (47.6348022, -91.4394994),
 'Lake of the Woods County': (48.7032282, -94.8480091),
 'Le Sueur County': (44.3771652, -93.711443),
 'Lincoln County': (44.4020631, -96.2627763),
 'Lyon County': (44.3880733, -95.8287296),
 'McLeod County': (44.8169135, -94.2495251),
 'Mahnomen County': (47.3313602, -95.8142911),
 'Marshall County': (48.3605336, -96.381968),
 'Martin County': (43.6564337, -94.5498419),
 'Meeker County': (45.1183643, -94.5175345),
 'Mille Lacs County': (45.9311972, -93.640356),
 'Morrison County': (45.9926837, -94.2554658),
 'Mower County': (43.6832277, -92.753704),
 'Murray County': (44.017855, -95.7615205),
 'Nicollet County': (44.3380412, -94.2362169),
 'Nobles County': (43.6634212, -95.7527672),
 'Norman County': (47.3194344, -96.4625779),
 'Olmsted County': (43.9997437, -92.3767816),
 'Otter Tail County': (46.4184196, -95.713142),
 'Pennington County': (48.0513335, -96.0829271),
 'Pine County': (46.0820957, -92.7542126),
 'Pipestone County': (44.0270012, -96.2566582),
 'Polk County': (47.6554613, -96.4193484),
 'Pope County': (45.5850258, -95.4469471),
 'Ramsey County': (45.0165728, -93.0949501),
 'Red Lake County': (47.8605178, -96.0988343),
 'Redwood County': (44.3788613, -95.2532373),
 'Renville County': (44.7242874, -94.9084771),
 'Rice County': (44.3413376, -93.2865484),
 'Rock County': (43.6733632, -96.2574328),
 'Roseau County': (48.7710371, -95.7697882),
 'Saint Louis County': (47.6201005, -92.4363343),
 'Scott County': (44.6506998, -93.5025726),
 'Sherburne County': (45.4427088, -93.7459202),
 'Sibley County': (44.5603522, -94.2085682),
 'Stearns County': (45.535326, -94.6139422),
 'Steele County': (44.0137336, -93.2203671),
 'Stevens County': (45.5837016, -95.9946194),
 'Swift County': (45.2797223, -95.6898654),
 'Todd County': (46.0588428, -94.887283),
 'Traverse County': (45.7836323, -96.4215265),
 'Wabasha County': (44.2767596, -92.2018164),
 'Wadena County': (46.5850936, -94.9606684),
 'Waseca County': (44.0172242, -93.5885717),
 'Washington County': (45.0078657, -92.874565),
 'Watonwan County': (43.9736055, -94.6370354),
 'Wilkin County': (46.3258354, -96.4586194),
 'Winona County': (43.9582272, -91.7807784),
 'Wright County': (45.1489061, -93.9639196),
 'Yellow Medicine County': (44.7198536, -95.8533555)}

Convert dictionary into list of tuples

  • will be an iterable in this format (a, b) OR (a,(b,c))
  • can loop through these key and value
In [20]:
type(coordinates_list)
Out[20]:
dict
In [21]:
coordinates_list.items() # converts dictionary to an iterable of tuples (key,value) 
Out[21]:
dict_items([('Aitkin County', (46.5714822, -93.3847595)), ('Anoka County', (45.2710195, -93.2827625)), ('Becker County', (46.9298236, -95.6761851)), ('Beltrami County', (47.9978537, -94.8799011)), ('Benton County', (45.7162129, -94.0481042)), ('Big Stone County', (45.385266, -96.3557364)), ('Blue Earth County', (44.0109722, -94.0560643)), ('Brown County', (44.2350232, -94.6955051)), ('Carlton County', (46.5799933, -92.7206334)), ('Carver County', (44.807118, -93.7871792)), ('Cass County', (47.0234117, -94.3454604)), ('Chippewa County', (45.027661, -95.5314914)), ('Chisago County', (45.4758877, -92.8849411)), ('Clay County', (46.8994904, -96.5088202)), ('Clearwater County', (47.5643825, -95.3747844)), ('Cook County', (47.9149076, -90.47301)), ('Cottonwood County', (44.019068, -95.1658845)), ('Crow Wing County', (46.4665237, -94.1017044)), ('Dakota County', (44.666655, -93.044911)), ('Dodge County', (44.0175404, -92.8678406)), ('Douglas County', (45.9340479, -95.4627651)), ('Faribault County', (43.6647961, -93.9510501)), ('Fillmore County', (43.6466588, -92.0636359)), ('Freeborn County', (43.6763617, -93.3501681)), ('Goodhue County', (44.396973, -92.7175627)), ('Grant County', (45.9358795, -96.0272071)), ('Hennepin County', (45.0257232, -93.4865052)), ('Houston County', (43.6624222, -91.4685617)), ('Hubbard County', (47.1138266, -94.9427679)), ('Isanti County', (45.56932235, -93.32652095523574)), ('Itasca County', (47.4968343, -93.6225663)), ('Jackson County', (43.670011, -95.1500626)), ('Kanabec County', (45.8986948, -93.2850016)), ('Kandiyohi County', (45.142373, -95.0025846)), ('Kittson County', (48.7709208, -96.8074141)), ('Koochiching County', (48.221596, -93.7684251)), ('Lac qui Parle County', (44.986426, -96.2024907)), ('Lake County', (47.6348022, -91.4394994)), ('Lake of the Woods County', (48.7032282, -94.8480091)), ('Le Sueur County', (44.3771652, -93.711443)), ('Lincoln County', (44.4020631, -96.2627763)), ('Lyon County', (44.3880733, -95.8287296)), ('McLeod County', (44.8169135, -94.2495251)), ('Mahnomen County', (47.3313602, -95.8142911)), ('Marshall County', (48.3605336, -96.381968)), ('Martin County', (43.6564337, -94.5498419)), ('Meeker County', (45.1183643, -94.5175345)), ('Mille Lacs County', (45.9311972, -93.640356)), ('Morrison County', (45.9926837, -94.2554658)), ('Mower County', (43.6832277, -92.753704)), ('Murray County', (44.017855, -95.7615205)), ('Nicollet County', (44.3380412, -94.2362169)), ('Nobles County', (43.6634212, -95.7527672)), ('Norman County', (47.3194344, -96.4625779)), ('Olmsted County', (43.9997437, -92.3767816)), ('Otter Tail County', (46.4184196, -95.713142)), ('Pennington County', (48.0513335, -96.0829271)), ('Pine County', (46.0820957, -92.7542126)), ('Pipestone County', (44.0270012, -96.2566582)), ('Polk County', (47.6554613, -96.4193484)), ('Pope County', (45.5850258, -95.4469471)), ('Ramsey County', (45.0165728, -93.0949501)), ('Red Lake County', (47.8605178, -96.0988343)), ('Redwood County', (44.3788613, -95.2532373)), ('Renville County', (44.7242874, -94.9084771)), ('Rice County', (44.3413376, -93.2865484)), ('Rock County', (43.6733632, -96.2574328)), ('Roseau County', (48.7710371, -95.7697882)), ('Saint Louis County', (47.6201005, -92.4363343)), ('Scott County', (44.6506998, -93.5025726)), ('Sherburne County', (45.4427088, -93.7459202)), ('Sibley County', (44.5603522, -94.2085682)), ('Stearns County', (45.535326, -94.6139422)), ('Steele County', (44.0137336, -93.2203671)), ('Stevens County', (45.5837016, -95.9946194)), ('Swift County', (45.2797223, -95.6898654)), ('Todd County', (46.0588428, -94.887283)), ('Traverse County', (45.7836323, -96.4215265)), ('Wabasha County', (44.2767596, -92.2018164)), ('Wadena County', (46.5850936, -94.9606684)), ('Waseca County', (44.0172242, -93.5885717)), ('Washington County', (45.0078657, -92.874565)), ('Watonwan County', (43.9736055, -94.6370354)), ('Wilkin County', (46.3258354, -96.4586194)), ('Winona County', (43.9582272, -91.7807784)), ('Wright County', (45.1489061, -93.9639196)), ('Yellow Medicine County', (44.7198536, -95.8533555))])
In [30]:
# items() method of dictionaries returns an iterable of tuples
# each tuple consist of key-value pairs from the dictionary

type(coordinates_list.items()) # this dict_items object is an iterable
Out[30]:
dict_items
In [43]:
# because this is an iterable , we can use it in a loop to access its elements 
#...OR convert it to other iterables like lists that are often required for further data processing
for county,data in coordinates_list.items():
        print(county,data)  # prints each county and its associated data
Aitkin County (46.5714822, -93.3847595)
Anoka County (45.2710195, -93.2827625)
Becker County (46.9298236, -95.6761851)
Beltrami County (47.9978537, -94.8799011)
Benton County (45.7162129, -94.0481042)
Big Stone County (45.385266, -96.3557364)
Blue Earth County (44.0109722, -94.0560643)
Brown County (44.2350232, -94.6955051)
Carlton County (46.5799933, -92.7206334)
Carver County (44.807118, -93.7871792)
Cass County (47.0234117, -94.3454604)
Chippewa County (45.027661, -95.5314914)
Chisago County (45.4758877, -92.8849411)
Clay County (46.8994904, -96.5088202)
Clearwater County (47.5643825, -95.3747844)
Cook County (47.9149076, -90.47301)
Cottonwood County (44.019068, -95.1658845)
Crow Wing County (46.4665237, -94.1017044)
Dakota County (44.666655, -93.044911)
Dodge County (44.0175404, -92.8678406)
Douglas County (45.9340479, -95.4627651)
Faribault County (43.6647961, -93.9510501)
Fillmore County (43.6466588, -92.0636359)
Freeborn County (43.6763617, -93.3501681)
Goodhue County (44.396973, -92.7175627)
Grant County (45.9358795, -96.0272071)
Hennepin County (45.0257232, -93.4865052)
Houston County (43.6624222, -91.4685617)
Hubbard County (47.1138266, -94.9427679)
Isanti County (45.56932235, -93.32652095523574)
Itasca County (47.4968343, -93.6225663)
Jackson County (43.670011, -95.1500626)
Kanabec County (45.8986948, -93.2850016)
Kandiyohi County (45.142373, -95.0025846)
Kittson County (48.7709208, -96.8074141)
Koochiching County (48.221596, -93.7684251)
Lac qui Parle County (44.986426, -96.2024907)
Lake County (47.6348022, -91.4394994)
Lake of the Woods County (48.7032282, -94.8480091)
Le Sueur County (44.3771652, -93.711443)
Lincoln County (44.4020631, -96.2627763)
Lyon County (44.3880733, -95.8287296)
McLeod County (44.8169135, -94.2495251)
Mahnomen County (47.3313602, -95.8142911)
Marshall County (48.3605336, -96.381968)
Martin County (43.6564337, -94.5498419)
Meeker County (45.1183643, -94.5175345)
Mille Lacs County (45.9311972, -93.640356)
Morrison County (45.9926837, -94.2554658)
Mower County (43.6832277, -92.753704)
Murray County (44.017855, -95.7615205)
Nicollet County (44.3380412, -94.2362169)
Nobles County (43.6634212, -95.7527672)
Norman County (47.3194344, -96.4625779)
Olmsted County (43.9997437, -92.3767816)
Otter Tail County (46.4184196, -95.713142)
Pennington County (48.0513335, -96.0829271)
Pine County (46.0820957, -92.7542126)
Pipestone County (44.0270012, -96.2566582)
Polk County (47.6554613, -96.4193484)
Pope County (45.5850258, -95.4469471)
Ramsey County (45.0165728, -93.0949501)
Red Lake County (47.8605178, -96.0988343)
Redwood County (44.3788613, -95.2532373)
Renville County (44.7242874, -94.9084771)
Rice County (44.3413376, -93.2865484)
Rock County (43.6733632, -96.2574328)
Roseau County (48.7710371, -95.7697882)
Saint Louis County (47.6201005, -92.4363343)
Scott County (44.6506998, -93.5025726)
Sherburne County (45.4427088, -93.7459202)
Sibley County (44.5603522, -94.2085682)
Stearns County (45.535326, -94.6139422)
Steele County (44.0137336, -93.2203671)
Stevens County (45.5837016, -95.9946194)
Swift County (45.2797223, -95.6898654)
Todd County (46.0588428, -94.887283)
Traverse County (45.7836323, -96.4215265)
Wabasha County (44.2767596, -92.2018164)
Wadena County (46.5850936, -94.9606684)
Waseca County (44.0172242, -93.5885717)
Washington County (45.0078657, -92.874565)
Watonwan County (43.9736055, -94.6370354)
Wilkin County (46.3258354, -96.4586194)
Winona County (43.9582272, -91.7807784)
Wright County (45.1489061, -93.9639196)
Yellow Medicine County (44.7198536, -95.8533555)
In [38]:
# using items method on a dictionary returns a list of tuple pairs
for county,data in coordinates_list.items():
    if county == 'Scott County':
        print(county,data)  #  prints county and its associated data
# we do not see the outer parentheses because we are separately access data and county
Scott County (44.6506998, -93.5025726)
In [ ]:
import pandas as pd

# recall that dataframes can be made from list of tuples

list_t= [('apples', (100, 2)), ('pears', (20,3))]

t_df= pd.DataFrame(list_t, columns=['Fruit', 'Data'])

t_df


# Critical to recognize dictionary can be converted to list of tuples
# because pandas DataFrames can be constructed efficiently from lists of tuples, 
# each tuple is a row and each element of the tuple a column
In [ ]:
# list of tuples [(a),((x,y))]
list_t= [('apples', (100, 2)), ('pears', (20,3))]

# convert to dataframe assigning each tuple element to a column
t_df= pd.DataFrame(list_t, columns=['Fruit', 'Data'])

print(t_df)  # data column at first a tuple

print('\n')

# extract the tuple

# convert it into strings, including the parentheses and list
t_df['Data']=t_df['Data'].astype (str)

# replace the bracket and parentheses
t_df['Data'] = t_df['Data'].str.replace('[()]',"",regex=True)

# split and expand by the comma into two new columns
t_df[['Lat','Lon']]= t_df['Data'].str.split(',',expand=True)

print(t_df)
In [ ]:
#Knowing that iterable of tuples form .items() can be converted into a list of tuples
#  allows for straightforward creation of a DataFrame.
list(coordinates_list.items())

# because pandas DataFrames can be created  from lists of tuples
# each tuple is a row and each element of the tuple a column
In [ ]:
len(list(coordinates_list.items()))
In [ ]:
import pandas as pd
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt


# Convert the coordinates list to a DataFrame
data = pd.DataFrame(list(coordinates_list.items()), columns=["County", "Coordinates"])


#ensure all are tuples with two elements
data['Coordinates'] = data ['Coordinates'].apply(lambda x: x if isinstance(x,tuple) and len(x)==2 else (None, None))

print(len(data))
# Check if any coordinates are (None, None)
none_coordinates = data[data['Coordinates'] == (None, None)]
print(none_coordinates)

# Extract latitude and longitude into separate columns
data[['Latitude', 'Longitude']] = pd.DataFrame(data['Coordinastes'].tolist(), index=data.index)

# Initialize the figure and axes for the plots
fig, ax = plt.subplots(figsize=(14, 10), subplot_kw={'projection': ccrs.PlateCarree()})
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')

# Plot the data points
ax.scatter(data['Longitude'], data['Latitude'], color='red', s=50, edgecolor='k', zorder=5)

# Add labels for each point
for i, row in data.iterrows():
    ax.text(row['Longitude'] + 0.02, row['Latitude'] + 0.02, row['County'], fontsize=12)

# Set the title
ax.set_title('County Coordinates in Minnesota')

# Show the plot
plt.show()
In [ ]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt


# Convert the coordinates list to a DataFrame
data = pd.DataFrame(list(coordinates_list.items()), columns=["County", "Coordinates"])


#ensure all are tuples with two elements
data['Coordinates'] = data ['Coordinates'].apply(lambda x: x if isinstance(x,tuple) and len(x)==2 else (None, None))

print(len(data))
# Check if any coordinates are (None, None)
none_coordinates = data[data['Coordinates'] == (None, None)]
print(none_coordinates)
print(data['Coordinates'])
# Extract latitude and longitude into separate columns
data['Latitude'], data['Longitude'] = zip(*data['Coordinates'])
print(data['Coordinates'])

print(data['Latitude'])
print(data['Longitude'])


# Initialize the figure and axes for the plots
fig, ax = plt.subplots(figsize=(14, 10), subplot_kw={'projection': ccrs.PlateCarree()})
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')

# Plot the data points
ax.scatter(data['Longitude'], data['Latitude'], color='red', s=50, edgecolor='k', zorder=5)

# Add labels for each point
for i, row in data.iterrows():
    ax.text(row['Longitude'] + 0.02, row['Latitude'] + 0.02, row['County'], fontsize=12)

# Set the title
ax.set_title('County Coordinates in Minnesota')

# Show the plot
plt.show()
In [ ]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import cartopy.io.shapereader as shpreader

# Path to the Natural Earth shapefile
shapefile_path = r'G:\My Drive\Python_projects\my_git_pages_website\Py-and-Sky-Labs\content\Python Examples\Data\US_County_borders\ne_10m_admin_2_counties.shp'

# Initialize the figure and axes for the plots
fig, ax = plt.subplots(figsize=(14, 10), subplot_kw={'projection': ccrs.PlateCarree()})

# Add built-in Cartopy features
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')

# Load and plot the county boundaries
reader = shpreader.Reader(shapefile_path)
counties = list(reader.geometries())
ax.add_geometries(counties, ccrs.PlateCarree(), edgecolor='black', facecolor='none')

# Assuming 'data' is your DataFrame with the 'Longitude' and 'Latitude'
ax.scatter(data['Longitude'], data['Latitude'], color='red', s=50, edgecolor='k', zorder=5)

# Optionally add labels for each point
for i, row in data.iterrows():
    ax.text(row['Longitude'] + 0.02, row['Latitude'] + 0.02, row['County'], fontsize=12)

# Set the title
ax.set_title('County Coordinates in Minnesota')

# Show the plot
plt.show()
In [ ]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import cartopy.io.shapereader as shpreader

# Path to the Natural Earth shapefile
shapefile_path = r'G:\My Drive\Python_projects\my_git_pages_website\Py-and-Sky-Labs\content\Python Examples\Data\US_County_borders\ne_10m_admin_2_counties.shp'

# Initialize the figure and axes for the plots
fig, ax = plt.subplots(figsize=(10, 15), subplot_kw={'projection': ccrs.PlateCarree()})

# Add built-in Cartopy features
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')

# Load the shapefile and filter for counties in Minnesota
reader = shpreader.Reader(shapefile_path)
minnesota_counties = [county for county in reader.records() if county.attributes['REGION'] == 'MN']

# Plot only the filtered counties
for county in minnesota_counties:
    geometry = county.geometry
    name = county.attributes['NAME']
    ax.add_geometries([geometry], ccrs.PlateCarree(), edgecolor='black', facecolor='none')
    x, y = geometry.centroid.x, geometry.centroid.y
    ax.text(x, y, name, fontsize=9, ha='center', transform=ccrs.Geodetic())

# Limit the map extent to Minnesota
ax.set_extent([-97.5, -89.5, 43.5, 49.5], crs=ccrs.PlateCarree())  # Adjust these values based on the actual coordinates of Minnesota

# Plot the data points derived from the geocoded lat lon coordinates
ax.scatter(data['Longitude'], data['Latitude'], color='red', s=50, edgecolor='k', alpha=0.5, zorder=1)

# Set the title
ax.set_title('County Coordinates in Minnesota')

# Show the plot
plt.show()
In [ ]:
mn_counties

.to_dict()

In [ ]:
# # Convert Filtered_top_bot_data to a dictionary mapping countries to life expectancy
# life_expectancy = Filtered_top_bot_data.set_index('country')['lifeExp'].to_dict()
# print(life_expectancy)

Plot using .items()

In [ ]:
# Plot each country's coordinates
# Assuming `top_countries` and `bottom_countries` are lists of country names
# for country, (lat, lon) in coordinates.items():
#     if lat and lon:  # Check if lat and lon are not None
#         color = 'green' if country in top_countries else 'red'
#         plt.plot(lon, lat, marker='o', color=color, markersize=5, transform=ccrs.Geodetic())
#         plt.text(lon, lat, country, transform=ccrs.Geodetic())

# plt.title('Top and Bottom African Countries by Life Expectancy')
# plt.show()

Combine Dictionary Comprehension and iterrows() to create a dictionary based on multiple columns of a dataframe

In [ ]:
# Extending the DataFrame with another column
data = {'Words': ["apple", "banana", "cherry"], 'Type': ["fruit", "fruit", "fruit"]}
df = pd.DataFrame(data)

# Dictionary mapping word to a tuple of (word length, type)
word_info_dict = {row['Words']: (len(row['Words']), row['Type']) for index, row in df.iterrows()}

print(word_info_dict)

generator expressions

my_generator = (x*x for x in range(10)) for value in my_generator: print(value)

links

social