List and Dictionary Comprehension, Cartopy

Introduction to List Comprehensions

  • Definition
    • List comprehensions are a concise and efficient way to create lists in Python
    • They provide a syntactically elegant method to perform operations and apply conditions to iterables, allowing the creation or transformation of lists in a single line of code

Why Do We Use List Comprehensions?:

  • Conciseness: Reduces the amount of code needed compared to traditional loops, making the code cleaner and easier to read
  • Performance: Generally faster than equivalent for loops due to optimized implementation and reduced overhead in Python
  • Expressiveness: Allows the code to be more descriptive and focused on the operation itself, rather than the mechanics of looping and appending to lists
  • Versatility: Capable of incorporating conditional logic within the list creation, which lets you filter elements or apply complex transformations easily
  • Key point: Use List comprehensions to transform or extract data

Syntax:

  • Basic Structure: A list comprehension consists of brackets containing an expression followed by a for clause
  • Optionally, it can include one or more for or if clauses.

Generalized Examples:

  • [expression for item in iterable]
  • [expression for item in iterable if condition]
  • [expression for item in iterable if condition1 if condition2]
  • [expression for item in iterable1 for item2 in iterable2]
In [60]:
import pandas as pd
import os

Basic List Comprehension

In [61]:
# basic list comprehension for squaring numbers and creating a list

# Basic components of list comprehension
# (1) expression
# (2) item
# (3) iterable
#         (1)      (2)   (3)
squares = [x**2 for x in range(10)]

print(squares)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
In [62]:
# What happenned?
# The list comprehension iterated over the range of numbers from 0 to 9, squaring each number and storing it in a list

Conditional List Comprehension

In [63]:
# Generate a list of even numbers between 1 and 21
#  list comprehension with a condition

# Basic components of list comprehension
# (1) expression
# (2) item
# (3) iterable
# (4) condition

#         (1)    (2)    (3)       (4)
evens = [x**2 for x in range(1,21) if x %2==0]

print(evens)

# print(sqrt{evens})
[4, 16, 36, 64, 100, 144, 196, 256, 324, 400]
In [64]:
# List comprehension that filters results based on membership in predefined list
magic_nums = [1,2,7,8]

mylist_magnum = [x**2 for x in range(10) if x in magic_nums]

print(magic_nums)

print(mylist_magnum)
[1, 2, 7, 8]
[1, 4, 49, 64]
In [65]:
# Create a list of tuples with numbers and their squares

numbers_and_squares = [(x,x**2) for x in range(10)]

print(numbers_and_squares)

numbers_and_squares.append((10,100))

print(numbers_and_squares)

# use pop () method to remove and store an item

# remove item at index 0 and store it in a variable
one_item =numbers_and_squares.pop(0)
print(one_item)
print(numbers_and_squares)
[(0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25), (6, 36), (7, 49), (8, 64), (9, 81)]
[(0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25), (6, 36), (7, 49), (8, 64), (9, 81), (10, 100)]
(0, 0)
[(1, 1), (2, 4), (3, 9), (4, 16), (5, 25), (6, 36), (7, 49), (8, 64), (9, 81), (10, 100)]
In [66]:
# Create a list of tuples with numbers and their squares

numbers_and_squares = [(x,x**2) for x in range(10)]

print(numbers_and_squares)

numbers_and_squares.append((10,100))

print(numbers_and_squares)

# use delete statement to remove items without storing them
# can accept a slice to delete a range of items

# remove item at index 0 without storing it
del numbers_and_squares[0]

print(numbers_and_squares)
[(0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25), (6, 36), (7, 49), (8, 64), (9, 81)]
[(0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25), (6, 36), (7, 49), (8, 64), (9, 81), (10, 100)]
[(1, 1), (2, 4), (3, 9), (4, 16), (5, 25), (6, 36), (7, 49), (8, 64), (9, 81), (10, 100)]
In [67]:
# Create a list of tuples with numbers and their squares

numbers_and_squares = [(x,x**2) for x in range(10)]

print(numbers_and_squares)

numbers_and_squares.append((10,100))

print(numbers_and_squares)

# Remove items at indices 0 to 4 (inclusive of 0, exclusive of 5)
del numbers_and_squares[0:5]

print(numbers_and_squares)  # Output: [(5, 25), (6, 36), (7, 49), (8, 64), (9, 81), (10, 100)]
[(0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25), (6, 36), (7, 49), (8, 64), (9, 81)]
[(0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25), (6, 36), (7, 49), (8, 64), (9, 81), (10, 100)]
[(5, 25), (6, 36), (7, 49), (8, 64), (9, 81), (10, 100)]

Extracting data using list comprehensions

In [68]:
# Extract values that meet a certain criteria

# Basic components of list comprehension
# (1) expression
# (2) item
# (3) iterable
# (4) condition

test_scores= [50,60,65,98,91,85,100]

#                 (1)   (2)   (3)            (4)
passing_grades = [x for x in test_scores if x >60]

print(passing_grades)
[65, 98, 91, 85, 100]
In [69]:
#### Extract a single item
# generalized form for extracting single elements from a list based on criteria
""" list_data = [element['key1'] for element in list if element['key2']>x]"""

# List comprehension to extract names from a list of dictionaries
# each dictionary is a person with a name and age
people_data = [
    {'name': 'John', 'age': 28},
    {'name': 'Anna', 'age': 20},
    {'name': 'James', 'age': 18},
    {'name': 'Linda', 'age': 30}
]

adults_info = [person['name'] for person in people_data if person['age']>21]

print(adults_info)
['John', 'Linda']
In [70]:
del adults_info[0]
print(adults_info)
['Linda']
In [71]:
# List comprehension to extract tuple of names and age from a list of dictionaries
# each dictionary is a person with a name and age
people_data = [
    {'name': 'John', 'age': 28},
    {'name': 'Anna', 'age': 20},
    {'name': 'James', 'age': 18},
    {'name': 'Linda', 'age': 30}
]

adults_info = [(person['name'], person['age']) for person in people_data if person['age']>21]

print(adults_info)
[('John', 28), ('Linda', 30)]
In [72]:
del adults_info[0]
print(adults_info)
[('Linda', 30)]
In [73]:
# List comprehension to extract tuple of names and age from a list of dictionaries
# each dictionary is a person with a name and age
people_data = [
    {'name': 'John', 'age': 28},
    {'name': 'Anna', 'age': 20},
    {'name': 'James', 'age': 18},
    {'name': 'Linda', 'age': 30}
]

adults_info = [(person['name'], person['age']) for person in people_data if person['age']>21]

print(adults_info)
[('John', 28), ('Linda', 30)]
In [74]:
first_adult=adults_info.pop(0)

print(first_adult)

print(adults_info)

adults_info.append(first_adult)

print(adults_info)
('John', 28)
[('Linda', 30)]
[('Linda', 30), ('John', 28)]

Understanding DataFrame Iteration with iterrows()

1) Introduction to DataFrame Iteration

  • Why Iteration? Iteration over DataFrames is commonly needed when each row of data must be processed individually
  • While vectorized operations are preferred for performance, iteration is useful for complex operations that aren't easily vectorized or when debugging row by row. 2) Using iterrows():
  • Definition: iterrows() is a generator that iterates over the rows of a DataFrame
  • It allows you to loop through each row of the DataFrame, with the row returned as a Series object
  • yields a tuple for each row in the DataFrame as index, series pairs

Syntax:

  • index: Represents the index of the row in the DataFrame
  • row: A Series containing the row data
  • iterrows(): a generator that iterates over the rows
  • row['column_name']: Accesses data in a specific column for that row

Example:

``` for index, row in df.iterrows(): row['column_name']

Data cleaning with iterrows()

In [75]:
# iterrating over rows with iterrows():


# Create a sample dataframe
data = {'Name': ['John', 'Anna', 'James'], 'Age': ['28',22,'35a']}
sdf= pd.DataFrame(data)

# iterate over the dataframe
# for each row number , extract the row data in the dataframe, repeat for all rows
for index,row in sdf.iterrows():
    # strip leading/trailing white space from the name
    sdf.at[index, 'Name'] = row['Name'].strip()
    # Check if the row 'Age' is a string
    if isinstance(row['Age'], str):
        print(f"String data index is {index}, Name is {row['Name']}, Age is {row['Age']}")
        # Boolean check if expected numeric data was entered as strings, if contains letters or special characters return False
        if row['Age'].isdigit():
            sdf.at[index,'Age'] = int(row['Age']) # True,if is a digit from 0-9, then convert to integer
            print(f"Cleaned data index is {index}, Name is {row['Name']}, Age is {row['Age']}")
        else:
            sdf.at[index, 'Age'] = pd.NA
            print(f"Uncleaned data index is {index}, Name is {row['Name']}, Age is {row['Age']}")
String data index is 0, Name is John, Age is 28
Cleaned data index is 0, Name is John, Age is 28
String data index is 2, Name is James, Age is 35a
Uncleaned data index is 2, Name is James, Age is <NA>
In [76]:
import pandas as pd

# Create a sample dataframe
data = {
    'UserName': ['JohnDoe', 'AnnaSmith', 'JamesBond', 'JohnDoe', 'AnnaSmith'],
    'UniqueID': [101, 102, 103, 101, 102],
    'State': ['NY', 'CA', 'TX', 'NY', 'CA']
}
df = pd.DataFrame(data)


# initiate an empty set that will only hold unique IDs
unique_userId= set()

# iterate over the dataframe
for index, row in df.iterrows():

    # Check if the current rows uniqueID is already in the set
    if row['UniqueID'] in unique_userId:
        
        # At this row, Create a new column, mark as a duplicate
        df.at[index, 'Duplicates'] = True

    else: 
        # Add the current row unique ID to the set
        unique_userId.add(row['UniqueID'])
        # Mark the row as False for Duplicates
        df.at[index, 'Duplicates']= False

print(df)
    UserName  UniqueID State Duplicates
0    JohnDoe       101    NY      False
1  AnnaSmith       102    CA      False
2  JamesBond       103    TX      False
3    JohnDoe       101    NY       True
4  AnnaSmith       102    CA       True
In [77]:
# remove duplicate

df= df[df['Duplicates']==False]

print(df)
    UserName  UniqueID State Duplicates
0    JohnDoe       101    NY      False
1  AnnaSmith       102    CA      False
2  JamesBond       103    TX      False
In [78]:
df=df.drop(columns='Duplicates')

print(df)
    UserName  UniqueID State
0    JohnDoe       101    NY
1  AnnaSmith       102    CA
2  JamesBond       103    TX
In [79]:
#duplicated

# Create a sample dataframe
data = {
    'UserName': ['JohnDoe', 'AnnaSmith', 'JamesBond', 'JohnDoe', 'AnnaSmith'],
    'UniqueID': [101, 102, 103, 101, 102],
    'State': ['NY', 'CA', 'TX', 'NY', 'CA']
}
df = pd.DataFrame(data)

# duplciates method used to identify duplicates
# subset paratemeter= specifies the column to check for dups
# keep parameter, keeps the first occurence and marks subsquent duplicates

df['Duplicates']= df.duplicated(subset='UniqueID', keep='first')
print(df)
    UserName  UniqueID State  Duplicates
0    JohnDoe       101    NY       False
1  AnnaSmith       102    CA       False
2  JamesBond       103    TX       False
3    JohnDoe       101    NY        True
4  AnnaSmith       102    CA        True
In [ ]:
 
In [80]:
df= df[~df['Duplicates']==True]
print(df)
    UserName  UniqueID State  Duplicates
0    JohnDoe       101    NY       False
1  AnnaSmith       102    CA       False
2  JamesBond       103    TX       False
In [81]:
df=df.drop(columns='Duplicates')

print(df)
    UserName  UniqueID State
0    JohnDoe       101    NY
1  AnnaSmith       102    CA
2  JamesBond       103    TX
In [82]:
#duplicated

# Create a sample dataframe
data = {
    'UserName': ['JohnDoe', 'AnnaSmith', 'JamesBond', 'JohnDoe', 'AnnaSmith'],
    'UniqueID': [101, 102, 103, 101, 102],
    'State': ['NY', 'CA', 'TX', 'NY', 'CA']
}
df = pd.DataFrame(data)

# Count duplicates
# Group by the column you want to check for duplicates
# Use transform('size') to get the count of each group
# Assign result to a new column

df['Counts']= df.groupby('UniqueID')['UniqueID'].transform('size')

print(df)
    UserName  UniqueID State  Counts
0    JohnDoe       101    NY       2
1  AnnaSmith       102    CA       2
2  JamesBond       103    TX       1
3    JohnDoe       101    NY       2
4  AnnaSmith       102    CA       2
In [83]:
# Create a sample dataframe
data = {
    'UserName': ['JohnDoe', 'AnnaSmith', 'JamesBond', 'JohnDoe', 'AnnaSmith'],
    'UniqueID': [101, 102, 103, 101, 102],
    'State': ['NY', 'CA', 'TX', 'NY', 'CA'],
    'Sales': [200, 150, 300, 250, 100]
}
df = pd.DataFrame(data)

# Normalize sales within each UniqueID group
df['NormalizedSales'] = df.groupby('UniqueID')['Sales'].transform(lambda x: (x - x.mean()) / x.std())

print(df)
    UserName  UniqueID State  Sales  NormalizedSales
0    JohnDoe       101    NY    200        -0.707107
1  AnnaSmith       102    CA    150         0.707107
2  JamesBond       103    TX    300              NaN
3    JohnDoe       101    NY    250         0.707107
4  AnnaSmith       102    CA    100        -0.707107

Multi-column conditional flagging of rows with itterows()

In [84]:
# Create a sample dataframe
data = {'Name': ['John', 'Anna', 'James'], 'Age': [28,22,35]}
sdf= pd.DataFrame(data)

for index, row in sdf.iterrows():
    if row['Age'] < 30 and "J" in row['Name']:
        sdf.at[index, 'Category'] = 'Young J'
    else:
        sdf.at[index, 'Category'] = 'Other'

print(sdf)
    Name  Age Category
0   John   28  Young J
1   Anna   22    Other
2  James   35    Other

Data transformation with itterows()

In [85]:
for index, row in sdf.iterrows():
    sdf.at[index, 'New Age'] = row['Age']+10 # add 10 years to each persons age

print(sdf)
    Name  Age Category  New Age
0   John   28  Young J     38.0
1   Anna   22    Other     32.0
2  James   35    Other     45.0

Multi-column Conditional Flagging or Computation with itterows()

In [86]:
for index, row in sdf.iterrows():
    if row['Age'] < 30 and "J" in row['Name']:
        sdf.at[index, 'Category'] = 'Young J'
    else:
        sdf.at[index, 'Category'] = 'Other'

print(sdf)
    Name  Age Category  New Age
0   John   28  Young J     38.0
1   Anna   22    Other     32.0
2  James   35    Other     45.0
In [87]:
for index,row in sdf.iterrows():
    if row['Age']>30 and 'a' in row['Name']:
        sdf.at[index, 'Flag'] = True

    else:
        sdf.at[index, 'Flag']= False
print(sdf)
    Name  Age Category  New Age   Flag
0   John   28  Young J     38.0  False
1   Anna   22    Other     32.0  False
2  James   35    Other     45.0   True

Mark specific rows with itterows()

In [88]:
for index, row in sdf.iterrows():
    if row['Name'].startswith('J') and row['Age']>25:
        sdf.at[index, 'Status'] = 'Senior J'

print(sdf)
    Name  Age Category  New Age   Flag    Status
0   John   28  Young J     38.0  False  Senior J
1   Anna   22    Other     32.0  False       NaN
2  James   35    Other     45.0   True  Senior J

Combine List Comprehension and iterrows() to extract a specific list from a dataframe

In [89]:
# Load a sample dataset to demonstrate application of list comprehsion and itterows()
# this is a dataset for tornado occuring in the state of minnesota in recent years

df = pd.read_csv(r".\Data\storm_data_search_results.csv")
# Set the option to display all columns
pd.set_option('display.max_columns', None)

df.head()
Out[89]:
EVENT_ID CZ_NAME_STR BEGIN_LOCATION BEGIN_DATE BEGIN_TIME EVENT_TYPE MAGNITUDE TOR_F_SCALE DEATHS_DIRECT INJURIES_DIRECT DAMAGE_PROPERTY_NUM DAMAGE_CROPS_NUM STATE_ABBR CZ_TIMEZONE MAGNITUDE_TYPE EPISODE_ID CZ_TYPE CZ_FIPS WFO INJURIES_INDIRECT DEATHS_INDIRECT SOURCE FLOOD_CAUSE TOR_LENGTH TOR_WIDTH BEGIN_RANGE BEGIN_AZIMUTH END_RANGE END_AZIMUTH END_LOCATION END_DATE END_TIME BEGIN_LAT BEGIN_LON END_LAT END_LON EVENT_NARRATIVE EPISODE_NARRATIVE ABSOLUTE_ROWNUMBER
0 626306 POPE CO. VILLARD 05/25/2016 1410 Tornado EF0 0 0 10000 0 MN CST-6 104565 C 121 MPX 0 0 Law Enforcement 0.16 25 2 SSW 1 SSW VILLARD 05/25/2016 1412 45.6989 -95.2829 45.7000 -95.2800 A few boats were flipped, a shed was damaged a... An Isolated but severe thunderstorm developed ... 1
1 626307 STEARNS CO. ST ANTHONY 05/25/2016 1709 Tornado EF0 0 0 15000 0 MN CST-6 104565 C 145 MPX 0 0 Trained Spotter 3.30 25 1 NE 3 SSE ST FRANCIS 05/25/2016 1715 45.6894 -94.6042 45.7298 -94.5674 A trained spotter video taped a tornado near H... An Isolated but severe thunderstorm developed ... 2
2 629201 RED LAKE CO. OKLEE 05/27/2016 1314 Tornado EF0 0 0 0 0 MN CST-6 104632 C 125 FGF 0 0 Law Enforcement 0.05 50 2 WNW 2 WNW OKLEE 05/27/2016 1315 47.8400 -95.9200 47.8400 -95.9200 Two funnel clouds were noted between Brooks an... Morning sunshine and moisture from recent rain... 3
3 629205 CLAY CO. MOORHEAD ARPT 05/27/2016 1357 Tornado EF0 0 0 0 0 MN CST-6 104632 C 27 FGF 0 0 Storm Chaser 0.05 75 2 SSW 2 SSW MOORHEAD ARPT 05/27/2016 1358 46.8200 -96.7000 46.8200 -96.7000 Evidence from photographs and video indicate t... Morning sunshine and moisture from recent rain... 4
4 629206 CLAY CO. GLYNDON 05/27/2016 1401 Tornado EF0 0 0 0 0 MN CST-6 104632 C 27 FGF 0 0 Broadcast Media 0.05 50 2 NNW 2 NNW GLYNDON 05/27/2016 1402 46.9000 -96.6000 46.9000 -96.6000 A brief touchdown was noted in a photo and rep... Morning sunshine and moisture from recent rain... 5
In [90]:
# List comprehension to extract coordinates of EF0 tornado events
ef0_coordinates = [(row['BEGIN_LAT'], row['BEGIN_LON']) for index, row in df.iterrows() if row['TOR_F_SCALE'] == 'EF0']
print(ef0_coordinates)
[(45.6989, -95.2829), (45.6894, -94.6042), (47.84, -95.92), (46.82, -96.7), (46.9, -96.6), (45.5488, -94.808), (44.1, -96.3105), (43.9812, -96.3452), (43.9589, -94.2033), (44.1973, -93.5417), (44.2106, -93.5253), (44.3134, -93.4661), (44.1282, -93.8622), (45.51, -96.64), (45.56, -96.6), (45.61, -96.53), (48.91, -95.72), (46.5, -94.79), (46.496, -94.7789), (45.2445, -95.9863), (45.2506, -95.8705), (44.3938, -92.9217), (45.3265, -94.4055), (44.9472, -94.026), (43.762, -93.2111), (47.62, -96.58), (47.54, -96.51), (47.8878, -94.7866), (43.8482, -93.2348), (44.1631, -92.1789), (44.1286, -92.2541), (45.4433, -95.8244), (45.79, -95.8), (46.34, -96.54), (46.3, -96.29), (44.0561, -92.3083), (43.9989, -92.1358), (43.998, -92.3098), (44.3147, -94.3519), (45.3784, -93.2268), (45.3072, -93.0197), (45.3229, -92.8595), (45.1811, -92.8581), (46.1529, -94.9219), (45.9763, -95.5803), (45.9815, -95.3762), (44.3831, -95.8188), (44.3921, -95.78), (44.1615, -95.0339), (46.81, -96.58), (47.4506, -94.2105), (47.4341, -94.2526), (47.3999, -94.2862), (44.2279, -94.1762), (44.244, -94.1705), (44.3939, -94.1851), (44.5377, -94.2398), (44.5654, -94.2676), (44.5359, -93.5852), (44.5436, -93.59), (44.694, -94.5097), (44.7246, -93.4707), (44.8517, -94.309), (44.8802, -94.0465), (44.936, -95.7351), (45.4528, -95.009), (43.8444, -93.7624), (43.5129, -92.2036), (44.41, -96.15), (48.02, -94.98), (43.54, -94.67), (45.1407, -94.7912), (45.1396, -94.7571), (44.3406, -93.0498), (44.3424, -93.0405), (44.4935, -92.7473), (43.7077, -94.2478), (44.0432, -94.1665), (44.1155, -94.0785), (44.1186, -93.6027), (44.1816, -93.6305), (44.0797, -93.4774), (44.4013, -93.293), (44.2672, -92.9006), (44.5419, -92.9688), (44.5829, -92.975), (44.537, -92.919), (44.3353, -92.6512), (44.3665, -92.5506), (44.3684, -92.54), (44.6382, -92.6787), (47.18, -96.07), (47.185, -96.065), (43.5837, -93.2778), (43.5482, -92.3902), (43.6029, -92.3531), (43.6835, -92.3285), (43.5005, -92.2577), (44.0028, -96.3268), (44.046, -93.2279), (44.2028, -94.8309), (44.1049, -94.6351), (43.76, -93.17), (44.5843, -93.9426), (44.2999, -93.6969), (46.7, -96.74), (46.84, -96.45), (44.5962, -93.6957), (48.48, -95.22), (48.0519, -92.6779), (43.9355, -91.5095), (44.8053, -94.3299), (44.8555, -94.1804), (43.8698, -95.1275), (43.61, -94.41), (46.28, -95.44), (43.9761, -93.3639), (44.5721, -93.1216), (43.9071, -95.7834), (43.7162, -95.7486), (44.1095, -95.7834), (44.1313, -95.7583), (43.86, -95.2617), (45.9851, -93.7478), (46.4339, -93.7791), (44.9632, -93.7922), (45.7518, -93.3108), (44.443, -92.2769), (44.7455, -93.1139), (43.6501, -93.0624), (44.8795, -95.1011), (48.23, -96.72), (44.0015, -91.8063), (44.0318, -91.7238), (48.42, -95.83), (46.24, -95.52), (46.2639, -93.9951), (44.2047, -94.1376), (45.6716, -93.1011), (44.8809, -92.9104), (46.77, -96.25), (44.6018, -94.2095), (46.93, -95.12), (47.42, -96.27), (45.97, -96.19), (45.97, -96.24), (45.95, -96.12), (45.7443, -95.6223), (43.6655, -95.666), (45.8545, -95.2105), (45.8278, -95.0473), (46.2176, -94.6484), (45.2072, -94.986), (46.0678, -94.294), (44.767, -94.3406), (45.1899, -94.8422), (45.7759, -94.4501), (45.2391, -94.7576), (45.2433, -94.7635), (45.9745, -93.8804), (45.0452, -94.5504), (44.8404, -93.7498), (46.54, -93.01), (45.1378, -93.657), (45.0351, -93.368), (44.7089, -96.2051), (44.1849, -93.314), (44.1989, -93.3035), (44.1965, -93.3154), (44.2135, -93.3378), (44.5394, -93.8252), (44.5432, -93.825), (44.4055, -93.3002), (44.6251, -93.2647), (46.29, -95.66), (47.78, -94.86), (47.89, -96.73), (45.4506, -95.003), (45.5726, -94.5994), (44.2441, -94.8627), (44.226, -94.8068), (44.3038, -94.641), (43.8756, -93.5422), (43.8758, -93.5136), (44.4824, -93.952), (44.7507, -93.383), (44.7906, -93.2434), (44.7388, -93.2146), (44.0331, -92.2261), (45.68, -96.46), (45.76, -96.29), (45.3, -96.34), (45.51, -96.5), (45.5, -96.56), (45.8019, -96.293), (43.4997, -93.6704), (43.5323, -93.3052), (43.4997, -93.0559), (43.5142, -93.0493), (43.6113, -93.2505), (43.894, -93.0592), (43.7763, -92.37), (43.6699, -92.0805), (43.9605, -91.7908), (43.8998, -91.9822), (43.9088, -91.897), (47.68, -96.47), (44.6497, -92.8613), (43.9483, -95.4623), (44.2215, -94.4585), (44.6386, -93.9182), (44.6748, -93.8888), (45.1844, -93.272), (45.3534, -95.2893), (45.6663, -95.5279), (45.843, -96.2661), (45.7447, -94.9528), (46.3123, -95.5963), (46.1159, -94.6098), (46.9438, -92.0955), (43.5719, -95.9786), (45.1454, -95.9201), (44.8885, -94.0098), (44.9783, -93.9716), (47.229, -93.6597), (48.1, -95.63), (46.86, -96.5), (48.13, -95.66), (48.8626, -96.7817), (43.8605, -91.8798), (43.7967, -91.6015), (43.7162, -93.6339), (44.7152, -93.2642), (44.7636, -93.2156), (44.8248, -93.1569), (44.9018, -93.07), (44.9467, -93.035), (44.9004, -95.261), (44.8988, -95.2485), (44.8185, -95.4746), (43.8942, -95.1545), (47.41, -96.6), (47.4991, -96.7571), (45.3009, -94.9739), (43.818, -95.505), (45.1051, -93.8302)]
In [91]:
mn_counties= df["CZ_NAME_STR"].unique()
In [92]:
mn_counties
Out[92]:
array(['POPE CO.', 'STEARNS CO.', 'RED LAKE CO.', 'CLAY CO.',
       'PIPESTONE CO.', 'BLUE EARTH CO.', 'LE SUEUR CO.', 'RICE CO.',
       'BIG STONE CO.', 'TRAVERSE CO.', 'LAKE OF THE WOODS CO.',
       'ROSEAU CO.', 'WADENA CO.', 'CASS CO.', 'AITKIN CO.', 'ITASCA CO.',
       'ST. LOUIS CO.', 'CROW WING CO.', 'SWIFT CO.', 'GOODHUE CO.',
       'WABASHA CO.', 'MEEKER CO.', 'POLK CO.', 'BELTRAMI CO.',
       'MCLEOD CO.', 'FREEBORN CO.', 'NORMAN CO.', 'MORRISON CO.',
       'FARIBAULT CO.', 'SHERBURNE CO.', 'STEELE CO.', 'HUBBARD CO.',
       'KANDIYOHI CO.', 'STEVENS CO.', 'GRANT CO.', 'WILKIN CO.',
       'OLMSTED CO.', 'LAKE CO.', 'NICOLLET CO.', 'ANOKA CO.',
       'CHISAGO CO.', 'WASHINGTON CO.', 'TODD CO.', 'DOUGLAS CO.',
       'LYON CO.', 'BROWN CO.', 'FILLMORE CO.', 'SIBLEY CO.', 'SCOTT CO.',
       'NOBLES CO.', 'CHIPPEWA CO.', 'LINCOLN CO.', 'MARTIN CO.',
       'CLEARWATER CO.', 'WINONA CO.', 'WASECA CO.', 'DAKOTA CO.',
       'MAHNOMEN CO.', 'COOK CO.', 'REDWOOD CO.', 'WATONWAN CO.',
       'OTTER TAIL CO.', 'COTTONWOOD CO.', 'MURRAY CO.', 'MILLE LACS CO.',
       'WRIGHT CO.', 'CARVER CO.', 'HENNEPIN CO.', 'KANABEC CO.',
       'MARSHALL CO.', 'RENVILLE CO.', 'BECKER CO.', 'ISANTI CO.',
       'PENNINGTON CO.', 'CARLTON CO.', 'YELLOW MEDICINE CO.',
       'MOWER CO.', 'DODGE CO.', 'HOUSTON CO.', 'LAC QUI PARLE CO.',
       'ROCK CO.', 'PINE CO.', 'KITTSON CO.', 'RAMSEY CO.'], dtype=object)
In [93]:
len(mn_counties)
Out[93]:
84

Introduction to Dictionary Comprehensions

  • Definition
    • Dictionary comprehensions are a concise and efficient way to create dictionaries in Python
    • Similar to list comprehensions, provide an elegant way to perform operations and apply conditions to iterables,
    • Specifically allow for the creation or transformation of dictionary key-value pairs in a single line of code

Why Do We Use Dictionary Comprehensions?:

  • Conciseness: Reduces the complexity and amount of code compared to traditional loops for creating dictionaries, making the code more readable
  • Performance: Generally faster than using a loop to add items to a dictionary due to optimized implementation and reduced overhead
  • Expressiveness: Enhances code clarity by focusing on the dictionary creation logic rather than the mechanics of looping and inserting key-value pairs
  • Versatility: Capable of incorporating conditional logic and multiple sources, allowing for sophisticated transformations and filtering in dictionary creation
  • Key point: Use dictionary comprehensions to efficiently transform or map data into key-value pairs

Syntax:

  • Basic Structure: A dictionary comprehension consists of curly braces {} containing a key-value pair expression followed by a for clause
  • Optionally, it can include one or more for or if clauses

Generalized Examples:

  • {key_expr: value_expr for item in iterable}
  • {key_expr: value_expr for item in iterable if condition}
  • {key_expr: value_expr for item in iterable if condition1 if condition2}
  • {key_expr: value_expr for item in iterable1 for item2 in iterable2}
In [94]:
# quick recap how to manipulate dictionary
from datetime import datetime

from pprint import pprint # pretty print, readable format for dictionaries

# create an empty dictionary for products and their metadata
products_dict = {}

# add to the dictionary
# This is a nested dictionary, where the key is product ID and the value is another dictionary
products_dict['0001-2024']= {
    'name':'apple', 
    'amount':2, 
    'date': datetime.now().date().strftime('%Y%m%d')}

# display the dictionary
pprint(products_dict)

products_dict['0002-2024'] = {'name': 'banana'}

# display each item dictionary entry along with labels
for productID, metadata in products_dict.items():
    pprint(f"Product ID: {productID}, Metadata: {metadata}")

# print the number of items
print(f"Number of items: {len(products_dict)}")

# update an existing entry 
# allows you to add new key value pairs or update existing ones
products_dict['0002-2024'].update({'amount':10, 'date': datetime.now().strftime('%Y%m%d')})

#Dipsplay the items
pprint(products_dict.items())
pprint(list(products_dict.items()))


products_dict.update({'0003-2024':{'name': 'mangos','amount':100, 'date': datetime.now().strftime('%Y%m%d')}})

#Dipsplay the items
print(f"Number of items: {len(products_dict)}")

# display the dictionary
pprint(products_dict)

print()

counter=0
# List out the final nested dictionary of products
print("Final product list:")
for productID, metadata in products_dict.items():
    
    pprint(f"PRoduct ID {counter+1} : {productID}, Metadata: {metadata}")
    counter+=1
{'0001-2024': {'amount': 2, 'date': '20240818', 'name': 'apple'}}
("Product ID: 0001-2024, Metadata: {'name': 'apple', 'amount': 2, 'date': "
 "'20240818'}")
"Product ID: 0002-2024, Metadata: {'name': 'banana'}"
Number of items: 2
dict_items([('0001-2024', {'name': 'apple', 'amount': 2, 'date': '20240818'}), ('0002-2024', {'name': 'banana', 'amount': 10, 'date': '20240818'})])
[('0001-2024', {'amount': 2, 'date': '20240818', 'name': 'apple'}),
 ('0002-2024', {'amount': 10, 'date': '20240818', 'name': 'banana'})]
Number of items: 3
{'0001-2024': {'amount': 2, 'date': '20240818', 'name': 'apple'},
 '0002-2024': {'amount': 10, 'date': '20240818', 'name': 'banana'},
 '0003-2024': {'amount': 100, 'date': '20240818', 'name': 'mangos'}}

Final product list:
("PRoduct ID 1 : 0001-2024, Metadata: {'name': 'apple', 'amount': 2, 'date': "
 "'20240818'}")
("PRoduct ID 2 : 0002-2024, Metadata: {'name': 'banana', 'amount': 10, 'date': "
 "'20240818'}")
("PRoduct ID 3 : 0003-2024, Metadata: {'name': 'mangos', 'amount': 100, "
 "'date': '20240818'}")

Basic Dictionary Comprehension

In [95]:
# Make a dictionary where the keys are numbers and values are their squares

# Basic components of dictionary comprehension
# (1) key-value expression
# (2) item
# (3) iterable

            #(1)     (2)     (3)
squares = {x:x**2 for x in range(1,10)}
print(squares)
{1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}

Conditional Dicionary Comprehension

In [96]:
# Make  a dicionary of even numbers and their squares
even_squares = {x:x**2 for x in range(1,10) if x%2==0}

print(even_squares)
{2: 4, 4: 16, 6: 36, 8: 64}

Using Functions in Dictionary Comprehension

In [97]:
# Create a dictionary that maps each word in a list to its length

# suppose you start with a list of words
word_list = ["apple", "banana", "cherry"]


word_length_dict=  {word:len(word) for word in word_list}

print(word_length_dict)
{'apple': 5, 'banana': 6, 'cherry': 6}

Using a dataframe in a Dictionary Comprehension

In [28]:
import pandas as pd
# Create a dictionary that maps each word in a column of a dataframe to its length

# You start with a dataframe
data = {'Words':(["apple", "banana", "cherry"])*2} # Duplicate the list to increase the number of items
print(data)


df = pd.DataFrame(data)
print(df)

# use a dictionary comprehension directly on the dataframe column to map each word in the column to its length
word_length_dict=  {word:len(word) for word in data['Words']}

print(word_length_dict)

# add a count to the same dataframe as a new column
df["Word Counts"]=df.groupby("Words")["Words"].transform("count")

print(f"\n{df}\n")

# Map the lengths fromt dictionary to the key names in the column Words
df["Word Lenghths"] = df["Words"].map(word_length_dict)
print(f"\n{df}\n")
{'Words': ['apple', 'banana', 'cherry', 'apple', 'banana', 'cherry']}
    Words
0   apple
1  banana
2  cherry
3   apple
4  banana
5  cherry
{'apple': 5, 'banana': 6, 'cherry': 6}

    Words  Word Counts
0   apple            2
1  banana            2
2  cherry            2
3   apple            2
4  banana            2
5  cherry            2


    Words  Word Counts  Word Lenghths
0   apple            2              5
1  banana            2              6
2  cherry            2              6
3   apple            2              5
4  banana            2              6
5  cherry            2              6

In [99]:
# Read HTML tables using the lxml parser
counties_list = pd.read_html(
    "https://en.wikipedia.org/wiki/List_of_counties_in_Minnesota"
)
In [100]:
counties_list=counties_list[0]
In [101]:
counties_list
Out[101]:
County FIPS code[3] County seat[4] Est.[1][4] Origin[5][6][7] Etymology Population[8] Area[4][8] Map
0 Aitkin County 1 Aitkin 1857 Pine County, Ramsey County William Alexander Aitken (1785–1851), early fu... 16102 1,819.30 sq mi (4,712 km2) NaN
1 Anoka County 3 Anoka 1857 Ramsey County Dakota word meaning "both sides" 372441 423.61 sq mi (1,097 km2) NaN
2 Becker County 5 Detroit Lakes 1858 Cass County, Pembina County George Loomis Becker, former state senator and... 35283 1,310.42 sq mi (3,394 km2) NaN
3 Beltrami County 7 Bemidji 1866 Unorganized Territory, Itasca County, Pembina ... Giacomo Beltrami, Italian explorer who explore... 46718 2,505.27 sq mi (6,489 km2) NaN
4 Benton County 9 Foley 1849 One of nine original counties; formed from res... Thomas Hart Benton (1782–1858), former United ... 41600 408.28 sq mi (1,057 km2) NaN
... ... ... ... ... ... ... ... ... ...
82 Watonwan County 165 St. James 1860 Brown County Watonwan River, a river that flows through Min... 11077 434.51 sq mi (1,125 km2) NaN
83 Wilkin County 167 Breckenridge 1858 Cass County, Pembina County Alexander Wilkin (1820–1864), Minnesota politi... 6306 751.43 sq mi (1,946 km2) NaN
84 Winona County 169 Winona 1854 Fillmore County, Wabasha County Named after Wee-No-Nah, Sister, or Cousin of C... 49721 626.30 sq mi (1,622 km2) NaN
85 Wright County 171 Buffalo 1855 Cass County, Sibley County Silas Wright (1795–1847), former United States... 151150 660.75 sq mi (1,711 km2) NaN
86 Yellow Medicine County 173 Granite Falls 1871 Redwood County Yellow Medicine River, a river that flows thro... 9467 757.96 sq mi (1,963 km2) NaN

87 rows × 9 columns

Introduction to Geocoding with Nominatim via Geopy

  • Geocoding is the process of converting addresses (like "1600 Amphitheatre Parkway, Mountain View, CA") into geographic coordinates (like latitude 37.423021 and longitude -122.083739)
  • can use to place markers on a map, or position the map

Capabilities of Nominatim (Geopy):

  • Address Geocoding: Converts street addresses or other descriptive locations into geographic coordinates.
  • Reverse Geocoding: Converts geographic coordinates into a human-readable address.
  • Extensive Coverage: Utilizes OpenStreetMap data, providing global coverage often with fine-grained control over geocoding queries.
  • Customization Options: Allows customization of requests, including specifying the language of the result, the bounding box for constraining searches, and more.

Syntax for Geocoding and Reverse Geocoding 1) Geocoding (Address to Coordinates)

- Initialization: Create a Nominatim object with a user-defined user_agent
- Query: Use the .geocode() method with the address as a string.

2) Reverse Geocoding (Coordinates to Address)

- Initialization: Create a Nominatim object with a user-defined user_agent
- Query: Use the .reverse() method with a string in the format "latitude, longitude".
In [45]:
from geopy.geocoders import Nominatim
import requests

geolocator = Nominatim(user_agent="geocode_Address")

def getAddress_coords(address):
    location = geolocator.geocode(address)
    if location:
        latitude, longitude = location.latitude, location.longitude
        print(location)
        
        # Get elevation in meters
        elevation_url = f"https://api.open-elevation.com/api/v1/lookup?locations={latitude},{longitude}"
        response = requests.get(elevation_url)
        elevation_data = response.json()
        print(elevation_data)
        elevation_meters = elevation_data['results'][0]['elevation'] if 'results' in elevation_data else None
        print(elevation_meters)
        #convert elevation to feet
        elevation_feet = elevation_meters * 3.28084 if elevation_meters is not None else None
        print(elevation_feet)
        return (latitude, longitude, elevation_feet)
    else:
        print("Address Not Found, coordinates will be blank")
        return (None,None, None)
In [46]:
geocode_result = getAddress_coords("5057 Edgewater Court, Savage, MN")
print(geocode_result)
5057, Edgewater Court, Savage, Scott County, Minnesota, 55378, United States
{'results': [{'latitude': 44.730098, 'longitude': -93.343248, 'elevation': 271.0}]}
271.0
889.10764
(44.73009822071884, -93.3432476572783, 889.10764)
In [41]:
from geopy.geocoders import Nominatim

#initialize geocoder, Nominatim object

geolocator= Nominatim(user_agent= "geocode_Address")

def getAddress_coords(address):
    location= geolocator.geocode(address)
    if location:
        print(location)
        return (location.latitude, location.longitude)
    else:
        print("Address Not Found, coodinates will be blank")
        return(None,None)

geocode_result = getAddress_coords("5057 Edgewater Court, Savage, MN")

print(geocode_result)
5057, Edgewater Court, Savage, Scott County, Minnesota, 55378, United States
(44.73009822071884, -93.3432476572783)
In [42]:
geocode_result = getAddress_coords("Murphy-Hanrehan Park Reserve, Savage, MN")

print(geocode_result)
Murphy-Hanrehan Park Reserve, 15501, Savage, Scott County, Minnesota, 55378, United States
(44.71070400000001, -93.3343809887289)
In [104]:
geolocator= Nominatim(user_agent= "geocode_Address")

def getAddress(coords):
    location= geolocator.reverse(coords)
    if location:
        return (location.address)
    else:
        print("Address Not Found, coodinates will be blank")

geocode_Address_result = getAddress(geocode_result)

print(geocode_Address_result)
Sunset Lake Road, Credit River, Scott County, Minnesota, 55306, United States
In [105]:
# Combine retrieval of external data from geocoding service with a dictionary comprehension of a dataframe column
# Create a dictionary tha maps counties to their coordinates

from geopy.geocoders import Nominatim

# initialize geocoder

geolocator= Nominatim(user_agent= "geoapiExercise")

def get_lat_lon(county):
    # Append ", Minnesota" to ensure the geocoding query is localized
    location= geolocator.geocode(county+ ", Minnesota")
    if location:
        return (location.latitude, location.longitude)
    else:
        return (None, None)
        
# county_names= counties_list["County"]


#print the names of the county
print(county_names)
print(county_names.dtype)

# Dictionary comprehension that maps a dataframe column of county names to their lat, lon coordinates 
# the function of the dictionary comprehension returns the coordinates for each key in the dictionary defined by the dataframe column
coordinates_list = {county: get_lat_lon(county) for county in  counties_list["County"]}
0              Aitkin County
1               Anoka County
2              Becker County
3            Beltrami County
4              Benton County
               ...          
82           Watonwan County
83             Wilkin County
84             Winona County
85             Wright County
86    Yellow Medicine County
Name: County, Length: 87, dtype: object
object
In [106]:
print(f"The dataset is an {type(county_names)}")
print(f"The data value in the dataset is an {county_names.dtype}")
print(county_names.index.to_list())
The dataset is an <class 'pandas.core.series.Series'>
The data value in the dataset is an object
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86]
In [107]:
coordinates_list
Out[107]:
{'Aitkin County': (46.5714822, -93.3847595),
 'Anoka County': (45.2710195, -93.2827625),
 'Becker County': (46.9298236, -95.6761851),
 'Beltrami County': (47.9978537, -94.8799011),
 'Benton County': (45.7162129, -94.0481042),
 'Big Stone County': (45.385266, -96.3557364),
 'Blue Earth County': (44.0109722, -94.0560643),
 'Brown County': (44.2350232, -94.6955051),
 'Carlton County': (46.5799933, -92.7206334),
 'Carver County': (44.807118, -93.7871792),
 'Cass County': (47.0234117, -94.3454604),
 'Chippewa County': (45.027661, -95.5314914),
 'Chisago County': (45.4758877, -92.8849411),
 'Clay County': (46.8994904, -96.5088202),
 'Clearwater County': (47.5643825, -95.3747844),
 'Cook County': (47.9149076, -90.47301),
 'Cottonwood County': (44.019068, -95.1658845),
 'Crow Wing County': (46.4665237, -94.1017044),
 'Dakota County': (44.666655, -93.044911),
 'Dodge County': (44.0175404, -92.8678406),
 'Douglas County': (45.9340479, -95.4627651),
 'Faribault County': (43.6647961, -93.9510501),
 'Fillmore County': (43.6466588, -92.0636359),
 'Freeborn County': (43.6763617, -93.3501681),
 'Goodhue County': (44.396973, -92.7175627),
 'Grant County': (45.9358795, -96.0272071),
 'Hennepin County': (45.0257232, -93.4865052),
 'Houston County': (43.6624222, -91.4685617),
 'Hubbard County': (47.1138266, -94.9427679),
 'Isanti County': (45.56932235, -93.32652095523574),
 'Itasca County': (47.4968343, -93.6225663),
 'Jackson County': (43.670011, -95.1500626),
 'Kanabec County': (45.8986948, -93.2850016),
 'Kandiyohi County': (45.142373, -95.0025846),
 'Kittson County': (48.7709208, -96.8074141),
 'Koochiching County': (48.221596, -93.7684251),
 'Lac qui Parle County': (44.986426, -96.2024907),
 'Lake County': (47.6348022, -91.4394994),
 'Lake of the Woods County': (48.7032282, -94.8480091),
 'Le Sueur County': (44.3771652, -93.711443),
 'Lincoln County': (44.4020631, -96.2627763),
 'Lyon County': (44.3880733, -95.8287296),
 'McLeod County': (44.8169135, -94.2495251),
 'Mahnomen County': (47.3313602, -95.8142911),
 'Marshall County': (48.3605336, -96.381968),
 'Martin County': (43.6564337, -94.5498419),
 'Meeker County': (45.1183643, -94.5175345),
 'Mille Lacs County': (45.9311972, -93.640356),
 'Morrison County': (45.9926837, -94.2554658),
 'Mower County': (43.6832277, -92.753704),
 'Murray County': (44.017855, -95.7615205),
 'Nicollet County': (44.3380412, -94.2362169),
 'Nobles County': (43.6634212, -95.7527672),
 'Norman County': (47.3194344, -96.4625779),
 'Olmsted County': (43.9997437, -92.3767816),
 'Otter Tail County': (46.4184196, -95.713142),
 'Pennington County': (48.0513335, -96.0829271),
 'Pine County': (46.0820957, -92.7542126),
 'Pipestone County': (44.0270012, -96.2566582),
 'Polk County': (47.6554613, -96.4193484),
 'Pope County': (45.5850258, -95.4469471),
 'Ramsey County': (45.0165728, -93.0949501),
 'Red Lake County': (47.8605178, -96.0988343),
 'Redwood County': (44.3788613, -95.2532373),
 'Renville County': (44.7242874, -94.9084771),
 'Rice County': (44.3413376, -93.2865484),
 'Rock County': (43.6733632, -96.2574328),
 'Roseau County': (48.7710371, -95.7697882),
 'Saint Louis County': (47.6201005, -92.4363343),
 'Scott County': (44.6506998, -93.5025726),
 'Sherburne County': (45.4427088, -93.7459202),
 'Sibley County': (44.5603522, -94.2085682),
 'Stearns County': (45.535326, -94.6139422),
 'Steele County': (44.0137336, -93.2203671),
 'Stevens County': (45.5837016, -95.9946194),
 'Swift County': (45.2797223, -95.6898654),
 'Todd County': (46.0588428, -94.887283),
 'Traverse County': (45.7836323, -96.4215265),
 'Wabasha County': (44.2767596, -92.2018164),
 'Wadena County': (46.5850936, -94.9606684),
 'Waseca County': (44.0172242, -93.5885717),
 'Washington County': (45.0078657, -92.874565),
 'Watonwan County': (43.9736055, -94.6370354),
 'Wilkin County': (46.3258354, -96.4586194),
 'Winona County': (43.9582272, -91.7807784),
 'Wright County': (45.1489061, -93.9639196),
 'Yellow Medicine County': (44.7198536, -95.8533555)}
In [108]:
type(coordinates_list)
Out[108]:
dict
In [109]:
coordinates_list.items() # converts dictionary to an iterable of tuples (key,value) 
Out[109]:
dict_items([('Aitkin County', (46.5714822, -93.3847595)), ('Anoka County', (45.2710195, -93.2827625)), ('Becker County', (46.9298236, -95.6761851)), ('Beltrami County', (47.9978537, -94.8799011)), ('Benton County', (45.7162129, -94.0481042)), ('Big Stone County', (45.385266, -96.3557364)), ('Blue Earth County', (44.0109722, -94.0560643)), ('Brown County', (44.2350232, -94.6955051)), ('Carlton County', (46.5799933, -92.7206334)), ('Carver County', (44.807118, -93.7871792)), ('Cass County', (47.0234117, -94.3454604)), ('Chippewa County', (45.027661, -95.5314914)), ('Chisago County', (45.4758877, -92.8849411)), ('Clay County', (46.8994904, -96.5088202)), ('Clearwater County', (47.5643825, -95.3747844)), ('Cook County', (47.9149076, -90.47301)), ('Cottonwood County', (44.019068, -95.1658845)), ('Crow Wing County', (46.4665237, -94.1017044)), ('Dakota County', (44.666655, -93.044911)), ('Dodge County', (44.0175404, -92.8678406)), ('Douglas County', (45.9340479, -95.4627651)), ('Faribault County', (43.6647961, -93.9510501)), ('Fillmore County', (43.6466588, -92.0636359)), ('Freeborn County', (43.6763617, -93.3501681)), ('Goodhue County', (44.396973, -92.7175627)), ('Grant County', (45.9358795, -96.0272071)), ('Hennepin County', (45.0257232, -93.4865052)), ('Houston County', (43.6624222, -91.4685617)), ('Hubbard County', (47.1138266, -94.9427679)), ('Isanti County', (45.56932235, -93.32652095523574)), ('Itasca County', (47.4968343, -93.6225663)), ('Jackson County', (43.670011, -95.1500626)), ('Kanabec County', (45.8986948, -93.2850016)), ('Kandiyohi County', (45.142373, -95.0025846)), ('Kittson County', (48.7709208, -96.8074141)), ('Koochiching County', (48.221596, -93.7684251)), ('Lac qui Parle County', (44.986426, -96.2024907)), ('Lake County', (47.6348022, -91.4394994)), ('Lake of the Woods County', (48.7032282, -94.8480091)), ('Le Sueur County', (44.3771652, -93.711443)), ('Lincoln County', (44.4020631, -96.2627763)), ('Lyon County', (44.3880733, -95.8287296)), ('McLeod County', (44.8169135, -94.2495251)), ('Mahnomen County', (47.3313602, -95.8142911)), ('Marshall County', (48.3605336, -96.381968)), ('Martin County', (43.6564337, -94.5498419)), ('Meeker County', (45.1183643, -94.5175345)), ('Mille Lacs County', (45.9311972, -93.640356)), ('Morrison County', (45.9926837, -94.2554658)), ('Mower County', (43.6832277, -92.753704)), ('Murray County', (44.017855, -95.7615205)), ('Nicollet County', (44.3380412, -94.2362169)), ('Nobles County', (43.6634212, -95.7527672)), ('Norman County', (47.3194344, -96.4625779)), ('Olmsted County', (43.9997437, -92.3767816)), ('Otter Tail County', (46.4184196, -95.713142)), ('Pennington County', (48.0513335, -96.0829271)), ('Pine County', (46.0820957, -92.7542126)), ('Pipestone County', (44.0270012, -96.2566582)), ('Polk County', (47.6554613, -96.4193484)), ('Pope County', (45.5850258, -95.4469471)), ('Ramsey County', (45.0165728, -93.0949501)), ('Red Lake County', (47.8605178, -96.0988343)), ('Redwood County', (44.3788613, -95.2532373)), ('Renville County', (44.7242874, -94.9084771)), ('Rice County', (44.3413376, -93.2865484)), ('Rock County', (43.6733632, -96.2574328)), ('Roseau County', (48.7710371, -95.7697882)), ('Saint Louis County', (47.6201005, -92.4363343)), ('Scott County', (44.6506998, -93.5025726)), ('Sherburne County', (45.4427088, -93.7459202)), ('Sibley County', (44.5603522, -94.2085682)), ('Stearns County', (45.535326, -94.6139422)), ('Steele County', (44.0137336, -93.2203671)), ('Stevens County', (45.5837016, -95.9946194)), ('Swift County', (45.2797223, -95.6898654)), ('Todd County', (46.0588428, -94.887283)), ('Traverse County', (45.7836323, -96.4215265)), ('Wabasha County', (44.2767596, -92.2018164)), ('Wadena County', (46.5850936, -94.9606684)), ('Waseca County', (44.0172242, -93.5885717)), ('Washington County', (45.0078657, -92.874565)), ('Watonwan County', (43.9736055, -94.6370354)), ('Wilkin County', (46.3258354, -96.4586194)), ('Winona County', (43.9582272, -91.7807784)), ('Wright County', (45.1489061, -93.9639196)), ('Yellow Medicine County', (44.7198536, -95.8533555))])
In [124]:
# items() method of dictionaries returns an iterable of tuples
# each tuple consist of key-value pairs from the dictionary
type(coordinates_list.items()) # this dict_items object is an iterable

# because this is an iterable , we can use it in a loop to access its elements 
#...OR convert it to other iterables like lists that are often required for further data processing
Out[124]:
dict_items
In [130]:
for county, data in coordinates_list.items():
        print(county, data)  # prints each county and its associated data
Aitkin County (46.5714822, -93.3847595)
Anoka County (45.2710195, -93.2827625)
Becker County (46.9298236, -95.6761851)
Beltrami County (47.9978537, -94.8799011)
Benton County (45.7162129, -94.0481042)
Big Stone County (45.385266, -96.3557364)
Blue Earth County (44.0109722, -94.0560643)
Brown County (44.2350232, -94.6955051)
Carlton County (46.5799933, -92.7206334)
Carver County (44.807118, -93.7871792)
Cass County (47.0234117, -94.3454604)
Chippewa County (45.027661, -95.5314914)
Chisago County (45.4758877, -92.8849411)
Clay County (46.8994904, -96.5088202)
Clearwater County (47.5643825, -95.3747844)
Cook County (47.9149076, -90.47301)
Cottonwood County (44.019068, -95.1658845)
Crow Wing County (46.4665237, -94.1017044)
Dakota County (44.666655, -93.044911)
Dodge County (44.0175404, -92.8678406)
Douglas County (45.9340479, -95.4627651)
Faribault County (43.6647961, -93.9510501)
Fillmore County (43.6466588, -92.0636359)
Freeborn County (43.6763617, -93.3501681)
Goodhue County (44.396973, -92.7175627)
Grant County (45.9358795, -96.0272071)
Hennepin County (45.0257232, -93.4865052)
Houston County (43.6624222, -91.4685617)
Hubbard County (47.1138266, -94.9427679)
Isanti County (45.56932235, -93.32652095523574)
Itasca County (47.4968343, -93.6225663)
Jackson County (43.670011, -95.1500626)
Kanabec County (45.8986948, -93.2850016)
Kandiyohi County (45.142373, -95.0025846)
Kittson County (48.7709208, -96.8074141)
Koochiching County (48.221596, -93.7684251)
Lac qui Parle County (44.986426, -96.2024907)
Lake County (47.6348022, -91.4394994)
Lake of the Woods County (48.7032282, -94.8480091)
Le Sueur County (44.3771652, -93.711443)
Lincoln County (44.4020631, -96.2627763)
Lyon County (44.3880733, -95.8287296)
McLeod County (44.8169135, -94.2495251)
Mahnomen County (47.3313602, -95.8142911)
Marshall County (48.3605336, -96.381968)
Martin County (43.6564337, -94.5498419)
Meeker County (45.1183643, -94.5175345)
Mille Lacs County (45.9311972, -93.640356)
Morrison County (45.9926837, -94.2554658)
Mower County (43.6832277, -92.753704)
Murray County (44.017855, -95.7615205)
Nicollet County (44.3380412, -94.2362169)
Nobles County (43.6634212, -95.7527672)
Norman County (47.3194344, -96.4625779)
Olmsted County (43.9997437, -92.3767816)
Otter Tail County (46.4184196, -95.713142)
Pennington County (48.0513335, -96.0829271)
Pine County (46.0820957, -92.7542126)
Pipestone County (44.0270012, -96.2566582)
Polk County (47.6554613, -96.4193484)
Pope County (45.5850258, -95.4469471)
Ramsey County (45.0165728, -93.0949501)
Red Lake County (47.8605178, -96.0988343)
Redwood County (44.3788613, -95.2532373)
Renville County (44.7242874, -94.9084771)
Rice County (44.3413376, -93.2865484)
Rock County (43.6733632, -96.2574328)
Roseau County (48.7710371, -95.7697882)
Saint Louis County (47.6201005, -92.4363343)
Scott County (44.6506998, -93.5025726)
Sherburne County (45.4427088, -93.7459202)
Sibley County (44.5603522, -94.2085682)
Stearns County (45.535326, -94.6139422)
Steele County (44.0137336, -93.2203671)
Stevens County (45.5837016, -95.9946194)
Swift County (45.2797223, -95.6898654)
Todd County (46.0588428, -94.887283)
Traverse County (45.7836323, -96.4215265)
Wabasha County (44.2767596, -92.2018164)
Wadena County (46.5850936, -94.9606684)
Waseca County (44.0172242, -93.5885717)
Washington County (45.0078657, -92.874565)
Watonwan County (43.9736055, -94.6370354)
Wilkin County (46.3258354, -96.4586194)
Winona County (43.9582272, -91.7807784)
Wright County (45.1489061, -93.9639196)
Yellow Medicine County (44.7198536, -95.8533555)
In [129]:
# we should recognize that the dictionary items are a list of tuple pairs
for county, data in coordinates_list.items():
    if county == 'Scott County':
        print(county, data)  #  prints county and its associated data
Scott County (44.6506998, -93.5025726)
In [120]:
# recall that dataframes can be made from list of tuples

list_dict= [('apples', (100, 2)), ('pears', (20,3))]

dict_df= pd.DataFrame(list_dict, columns=['Fruit', 'Data'])

dict_df


# Critical to recognize dictionary can be converted to list of tuples
# because pandas DataFrames can be constructed efficiently from lists of tuples, 
# each tuple is a row and each element of the tuple a column
Out[120]:
Fruit Data
0 apples (100, 2)
1 pears (20, 3)
In [110]:
#Knowing that iterable of tuples form .items() can be converted into a list of tuples
#  allows for straightforward creation of a DataFrame.
list(coordinates_list.items())

# because pandas DataFrames can be created  from lists of tuples
# each tuple is a row and each element of the tuple a column
Out[110]:
[('Aitkin County', (46.5714822, -93.3847595)),
 ('Anoka County', (45.2710195, -93.2827625)),
 ('Becker County', (46.9298236, -95.6761851)),
 ('Beltrami County', (47.9978537, -94.8799011)),
 ('Benton County', (45.7162129, -94.0481042)),
 ('Big Stone County', (45.385266, -96.3557364)),
 ('Blue Earth County', (44.0109722, -94.0560643)),
 ('Brown County', (44.2350232, -94.6955051)),
 ('Carlton County', (46.5799933, -92.7206334)),
 ('Carver County', (44.807118, -93.7871792)),
 ('Cass County', (47.0234117, -94.3454604)),
 ('Chippewa County', (45.027661, -95.5314914)),
 ('Chisago County', (45.4758877, -92.8849411)),
 ('Clay County', (46.8994904, -96.5088202)),
 ('Clearwater County', (47.5643825, -95.3747844)),
 ('Cook County', (47.9149076, -90.47301)),
 ('Cottonwood County', (44.019068, -95.1658845)),
 ('Crow Wing County', (46.4665237, -94.1017044)),
 ('Dakota County', (44.666655, -93.044911)),
 ('Dodge County', (44.0175404, -92.8678406)),
 ('Douglas County', (45.9340479, -95.4627651)),
 ('Faribault County', (43.6647961, -93.9510501)),
 ('Fillmore County', (43.6466588, -92.0636359)),
 ('Freeborn County', (43.6763617, -93.3501681)),
 ('Goodhue County', (44.396973, -92.7175627)),
 ('Grant County', (45.9358795, -96.0272071)),
 ('Hennepin County', (45.0257232, -93.4865052)),
 ('Houston County', (43.6624222, -91.4685617)),
 ('Hubbard County', (47.1138266, -94.9427679)),
 ('Isanti County', (45.56932235, -93.32652095523574)),
 ('Itasca County', (47.4968343, -93.6225663)),
 ('Jackson County', (43.670011, -95.1500626)),
 ('Kanabec County', (45.8986948, -93.2850016)),
 ('Kandiyohi County', (45.142373, -95.0025846)),
 ('Kittson County', (48.7709208, -96.8074141)),
 ('Koochiching County', (48.221596, -93.7684251)),
 ('Lac qui Parle County', (44.986426, -96.2024907)),
 ('Lake County', (47.6348022, -91.4394994)),
 ('Lake of the Woods County', (48.7032282, -94.8480091)),
 ('Le Sueur County', (44.3771652, -93.711443)),
 ('Lincoln County', (44.4020631, -96.2627763)),
 ('Lyon County', (44.3880733, -95.8287296)),
 ('McLeod County', (44.8169135, -94.2495251)),
 ('Mahnomen County', (47.3313602, -95.8142911)),
 ('Marshall County', (48.3605336, -96.381968)),
 ('Martin County', (43.6564337, -94.5498419)),
 ('Meeker County', (45.1183643, -94.5175345)),
 ('Mille Lacs County', (45.9311972, -93.640356)),
 ('Morrison County', (45.9926837, -94.2554658)),
 ('Mower County', (43.6832277, -92.753704)),
 ('Murray County', (44.017855, -95.7615205)),
 ('Nicollet County', (44.3380412, -94.2362169)),
 ('Nobles County', (43.6634212, -95.7527672)),
 ('Norman County', (47.3194344, -96.4625779)),
 ('Olmsted County', (43.9997437, -92.3767816)),
 ('Otter Tail County', (46.4184196, -95.713142)),
 ('Pennington County', (48.0513335, -96.0829271)),
 ('Pine County', (46.0820957, -92.7542126)),
 ('Pipestone County', (44.0270012, -96.2566582)),
 ('Polk County', (47.6554613, -96.4193484)),
 ('Pope County', (45.5850258, -95.4469471)),
 ('Ramsey County', (45.0165728, -93.0949501)),
 ('Red Lake County', (47.8605178, -96.0988343)),
 ('Redwood County', (44.3788613, -95.2532373)),
 ('Renville County', (44.7242874, -94.9084771)),
 ('Rice County', (44.3413376, -93.2865484)),
 ('Rock County', (43.6733632, -96.2574328)),
 ('Roseau County', (48.7710371, -95.7697882)),
 ('Saint Louis County', (47.6201005, -92.4363343)),
 ('Scott County', (44.6506998, -93.5025726)),
 ('Sherburne County', (45.4427088, -93.7459202)),
 ('Sibley County', (44.5603522, -94.2085682)),
 ('Stearns County', (45.535326, -94.6139422)),
 ('Steele County', (44.0137336, -93.2203671)),
 ('Stevens County', (45.5837016, -95.9946194)),
 ('Swift County', (45.2797223, -95.6898654)),
 ('Todd County', (46.0588428, -94.887283)),
 ('Traverse County', (45.7836323, -96.4215265)),
 ('Wabasha County', (44.2767596, -92.2018164)),
 ('Wadena County', (46.5850936, -94.9606684)),
 ('Waseca County', (44.0172242, -93.5885717)),
 ('Washington County', (45.0078657, -92.874565)),
 ('Watonwan County', (43.9736055, -94.6370354)),
 ('Wilkin County', (46.3258354, -96.4586194)),
 ('Winona County', (43.9582272, -91.7807784)),
 ('Wright County', (45.1489061, -93.9639196)),
 ('Yellow Medicine County', (44.7198536, -95.8533555))]
In [111]:
len(list(coordinates_list.items()))
Out[111]:
87
In [112]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt


# Convert the coordinates list to a DataFrame
data = pd.DataFrame(list(coordinates_list.items()), columns=["County", "Coordinates"])


#ensure all are tuples with two elements
data['Coordinates'] = data ['Coordinates'].apply(lambda x: x if isinstance(x,tuple) and len(x)==2 else (None, None))

print(len(data))
# Check if any coordinates are (None, None)
none_coordinates = data[data['Coordinates'] == (None, None)]
print(none_coordinates)

# Extract latitude and longitude into separate columns
data[['Latitude', 'Longitude']] = pd.DataFrame(data['Coordinates'].tolist(), index=data.index)

# Initialize the figure and axes for the plots
fig, ax = plt.subplots(figsize=(14, 10), subplot_kw={'projection': ccrs.PlateCarree()})
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')

# Plot the data points
ax.scatter(data['Longitude'], data['Latitude'], color='red', s=50, edgecolor='k', zorder=5)

# Add labels for each point
for i, row in data.iterrows():
    ax.text(row['Longitude'] + 0.02, row['Latitude'] + 0.02, row['County'], fontsize=12)

# Set the title
ax.set_title('County Coordinates in Minnesota')

# Show the plot
plt.show()
87
Empty DataFrame
Columns: [County, Coordinates]
Index: []
In [113]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt


# Convert the coordinates list to a DataFrame
data = pd.DataFrame(list(coordinates_list.items()), columns=["County", "Coordinates"])


#ensure all are tuples with two elements
data['Coordinates'] = data ['Coordinates'].apply(lambda x: x if isinstance(x,tuple) and len(x)==2 else (None, None))

print(len(data))
# Check if any coordinates are (None, None)
none_coordinates = data[data['Coordinates'] == (None, None)]
print(none_coordinates)
print(data['Coordinates'])
# Extract latitude and longitude into separate columns
data['Latitude'], data['Longitude'] = zip(*data['Coordinates'])
print(data['Coordinates'])

print(data['Latitude'])
print(data['Longitude'])


# Initialize the figure and axes for the plots
fig, ax = plt.subplots(figsize=(14, 10), subplot_kw={'projection': ccrs.PlateCarree()})
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')

# Plot the data points
ax.scatter(data['Longitude'], data['Latitude'], color='red', s=50, edgecolor='k', zorder=5)

# Add labels for each point
for i, row in data.iterrows():
    ax.text(row['Longitude'] + 0.02, row['Latitude'] + 0.02, row['County'], fontsize=12)

# Set the title
ax.set_title('County Coordinates in Minnesota')

# Show the plot
plt.show()
87
Empty DataFrame
Columns: [County, Coordinates]
Index: []
0     (46.5714822, -93.3847595)
1     (45.2710195, -93.2827625)
2     (46.9298236, -95.6761851)
3     (47.9978537, -94.8799011)
4     (45.7162129, -94.0481042)
                ...            
82    (43.9736055, -94.6370354)
83    (46.3258354, -96.4586194)
84    (43.9582272, -91.7807784)
85    (45.1489061, -93.9639196)
86    (44.7198536, -95.8533555)
Name: Coordinates, Length: 87, dtype: object
0     (46.5714822, -93.3847595)
1     (45.2710195, -93.2827625)
2     (46.9298236, -95.6761851)
3     (47.9978537, -94.8799011)
4     (45.7162129, -94.0481042)
                ...            
82    (43.9736055, -94.6370354)
83    (46.3258354, -96.4586194)
84    (43.9582272, -91.7807784)
85    (45.1489061, -93.9639196)
86    (44.7198536, -95.8533555)
Name: Coordinates, Length: 87, dtype: object
0     46.571482
1     45.271020
2     46.929824
3     47.997854
4     45.716213
        ...    
82    43.973605
83    46.325835
84    43.958227
85    45.148906
86    44.719854
Name: Latitude, Length: 87, dtype: float64
0    -93.384760
1    -93.282763
2    -95.676185
3    -94.879901
4    -94.048104
        ...    
82   -94.637035
83   -96.458619
84   -91.780778
85   -93.963920
86   -95.853356
Name: Longitude, Length: 87, dtype: float64
In [114]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import cartopy.io.shapereader as shpreader

# Path to the Natural Earth shapefile
shapefile_path = r'G:\My Drive\Python_projects\my_git_pages_website\Py-and-Sky-Labs\content\Python Examples\Data\US_County_borders\ne_10m_admin_2_counties.shp'

# Initialize the figure and axes for the plots
fig, ax = plt.subplots(figsize=(14, 10), subplot_kw={'projection': ccrs.PlateCarree()})

# Add built-in Cartopy features
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')

# Load and plot the county boundaries
reader = shpreader.Reader(shapefile_path)
counties = list(reader.geometries())
ax.add_geometries(counties, ccrs.PlateCarree(), edgecolor='black', facecolor='none')

# Assuming 'data' is your DataFrame with the 'Longitude' and 'Latitude'
ax.scatter(data['Longitude'], data['Latitude'], color='red', s=50, edgecolor='k', zorder=5)

# Optionally add labels for each point
for i, row in data.iterrows():
    ax.text(row['Longitude'] + 0.02, row['Latitude'] + 0.02, row['County'], fontsize=12)

# Set the title
ax.set_title('County Coordinates in Minnesota')

# Show the plot
plt.show()
In [115]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import cartopy.io.shapereader as shpreader

# Path to the Natural Earth shapefile
shapefile_path = r'G:\My Drive\Python_projects\my_git_pages_website\Py-and-Sky-Labs\content\Python Examples\Data\US_County_borders\ne_10m_admin_2_counties.shp'

# Initialize the figure and axes for the plots
fig, ax = plt.subplots(figsize=(10, 15), subplot_kw={'projection': ccrs.PlateCarree()})

# Add built-in Cartopy features
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')

# Load the shapefile and filter for counties in Minnesota
reader = shpreader.Reader(shapefile_path)
minnesota_counties = [county for county in reader.records() if county.attributes['REGION'] == 'MN']

# Plot only the filtered counties
for county in minnesota_counties:
    geometry = county.geometry
    name = county.attributes['NAME']
    ax.add_geometries([geometry], ccrs.PlateCarree(), edgecolor='black', facecolor='none')
    x, y = geometry.centroid.x, geometry.centroid.y
    ax.text(x, y, name, fontsize=9, ha='center', transform=ccrs.Geodetic())

# Limit the map extent to Minnesota
ax.set_extent([-97.5, -89.5, 43.5, 49.5], crs=ccrs.PlateCarree())  # Adjust these values based on the actual coordinates of Minnesota

# Plot the data points derived from the geocoded lat lon coordinates
ax.scatter(data['Longitude'], data['Latitude'], color='red', s=50, edgecolor='k', alpha=0.5, zorder=1)

# Set the title
ax.set_title('County Coordinates in Minnesota')

# Show the plot
plt.show()
In [ ]:
mn_counties

.to_dict()

In [116]:
# # Convert Filtered_top_bot_data to a dictionary mapping countries to life expectancy
# life_expectancy = Filtered_top_bot_data.set_index('country')['lifeExp'].to_dict()
# print(life_expectancy)

Plot using .items()

In [117]:
# Plot each country's coordinates
# Assuming `top_countries` and `bottom_countries` are lists of country names
# for country, (lat, lon) in coordinates.items():
#     if lat and lon:  # Check if lat and lon are not None
#         color = 'green' if country in top_countries else 'red'
#         plt.plot(lon, lat, marker='o', color=color, markersize=5, transform=ccrs.Geodetic())
#         plt.text(lon, lat, country, transform=ccrs.Geodetic())

# plt.title('Top and Bottom African Countries by Life Expectancy')
# plt.show()

Combine Dictionary Comprehension and iterrows() to create a dictionary based on multiple columns of a dataframe

In [118]:
# Extending the DataFrame with another column
data = {'Words': ["apple", "banana", "cherry"], 'Type': ["fruit", "fruit", "fruit"]}
df = pd.DataFrame(data)

# Dictionary mapping word to a tuple of (word length, type)
word_info_dict = {row['Words']: (len(row['Words']), row['Type']) for index, row in df.iterrows()}

print(word_info_dict)
{'apple': (5, 'fruit'), 'banana': (6, 'fruit'), 'cherry': (6, 'fruit')}

generator expressions

my_generator = (x*x for x in range(10)) for value in my_generator: print(value)

links

social