Introduction to List Comprehensions¶
- Definition
- List comprehensions are a concise and efficient way to create lists in Python
- They provide a syntactically elegant method to perform operations and apply conditions to iterables, allowing the creation or transformation of lists in a single line of code
Why Do We Use List Comprehensions?:
- Conciseness: Reduces the amount of code needed compared to traditional loops, making the code cleaner and easier to read
- Performance: Generally faster than equivalent for loops due to optimized implementation and reduced overhead in Python
- Expressiveness: Allows the code to be more descriptive and focused on the operation itself, rather than the mechanics of looping and appending to lists
- Versatility: Capable of incorporating conditional logic within the list creation, which lets you filter elements or apply complex transformations easily
- Key point: Use List comprehensions to transform or extract data
Syntax:
- Basic Structure: A list comprehension consists of brackets containing an expression followed by a for clause
- Optionally, it can include one or more for or if clauses.
Generalized Examples:¶
[expression for item in iterable]
[expression for item in iterable if condition]
[expression for item in iterable if condition1 if condition2]
[expression for item in iterable1 for item2 in iterable2]
In [60]:
import pandas as pd
import os
Basic List Comprehension¶
In [61]:
# basic list comprehension for squaring numbers and creating a list
# Basic components of list comprehension
# (1) expression
# (2) item
# (3) iterable
# (1) (2) (3)
squares = [x**2 for x in range(10)]
print(squares)
In [62]:
# What happenned?
# The list comprehension iterated over the range of numbers from 0 to 9, squaring each number and storing it in a list
Conditional List Comprehension¶
In [63]:
# Generate a list of even numbers between 1 and 21
# list comprehension with a condition
# Basic components of list comprehension
# (1) expression
# (2) item
# (3) iterable
# (4) condition
# (1) (2) (3) (4)
evens = [x**2 for x in range(1,21) if x %2==0]
print(evens)
# print(sqrt{evens})
In [64]:
# List comprehension that filters results based on membership in predefined list
magic_nums = [1,2,7,8]
mylist_magnum = [x**2 for x in range(10) if x in magic_nums]
print(magic_nums)
print(mylist_magnum)
In [65]:
# Create a list of tuples with numbers and their squares
numbers_and_squares = [(x,x**2) for x in range(10)]
print(numbers_and_squares)
numbers_and_squares.append((10,100))
print(numbers_and_squares)
# use pop () method to remove and store an item
# remove item at index 0 and store it in a variable
one_item =numbers_and_squares.pop(0)
print(one_item)
print(numbers_and_squares)
In [66]:
# Create a list of tuples with numbers and their squares
numbers_and_squares = [(x,x**2) for x in range(10)]
print(numbers_and_squares)
numbers_and_squares.append((10,100))
print(numbers_and_squares)
# use delete statement to remove items without storing them
# can accept a slice to delete a range of items
# remove item at index 0 without storing it
del numbers_and_squares[0]
print(numbers_and_squares)
In [67]:
# Create a list of tuples with numbers and their squares
numbers_and_squares = [(x,x**2) for x in range(10)]
print(numbers_and_squares)
numbers_and_squares.append((10,100))
print(numbers_and_squares)
# Remove items at indices 0 to 4 (inclusive of 0, exclusive of 5)
del numbers_and_squares[0:5]
print(numbers_and_squares) # Output: [(5, 25), (6, 36), (7, 49), (8, 64), (9, 81), (10, 100)]
Extracting data using list comprehensions¶
In [68]:
# Extract values that meet a certain criteria
# Basic components of list comprehension
# (1) expression
# (2) item
# (3) iterable
# (4) condition
test_scores= [50,60,65,98,91,85,100]
# (1) (2) (3) (4)
passing_grades = [x for x in test_scores if x >60]
print(passing_grades)
In [69]:
#### Extract a single item
# generalized form for extracting single elements from a list based on criteria
""" list_data = [element['key1'] for element in list if element['key2']>x]"""
# List comprehension to extract names from a list of dictionaries
# each dictionary is a person with a name and age
people_data = [
{'name': 'John', 'age': 28},
{'name': 'Anna', 'age': 20},
{'name': 'James', 'age': 18},
{'name': 'Linda', 'age': 30}
]
adults_info = [person['name'] for person in people_data if person['age']>21]
print(adults_info)
In [70]:
del adults_info[0]
print(adults_info)
In [71]:
# List comprehension to extract tuple of names and age from a list of dictionaries
# each dictionary is a person with a name and age
people_data = [
{'name': 'John', 'age': 28},
{'name': 'Anna', 'age': 20},
{'name': 'James', 'age': 18},
{'name': 'Linda', 'age': 30}
]
adults_info = [(person['name'], person['age']) for person in people_data if person['age']>21]
print(adults_info)
In [72]:
del adults_info[0]
print(adults_info)
In [73]:
# List comprehension to extract tuple of names and age from a list of dictionaries
# each dictionary is a person with a name and age
people_data = [
{'name': 'John', 'age': 28},
{'name': 'Anna', 'age': 20},
{'name': 'James', 'age': 18},
{'name': 'Linda', 'age': 30}
]
adults_info = [(person['name'], person['age']) for person in people_data if person['age']>21]
print(adults_info)
In [74]:
first_adult=adults_info.pop(0)
print(first_adult)
print(adults_info)
adults_info.append(first_adult)
print(adults_info)
Understanding DataFrame Iteration with iterrows()
¶
1) Introduction to DataFrame Iteration
- Why Iteration? Iteration over DataFrames is commonly needed when each row of data must be processed individually
- While vectorized operations are preferred for performance, iteration is useful for complex operations that aren't easily vectorized or when debugging row by row.
2) Using
iterrows()
: - Definition: iterrows() is a generator that iterates over the rows of a DataFrame
- It allows you to loop through each row of the DataFrame, with the row returned as a Series object
- yields a tuple for each row in the DataFrame as index, series pairs
Syntax:
- index: Represents the index of the row in the DataFrame
- row: A Series containing the row data
- iterrows(): a generator that iterates over the rows
- row['column_name']: Accesses data in a specific column for that row
Example:¶
``` for index, row in df.iterrows(): row['column_name']
Data cleaning with iterrows()
¶
In [75]:
# iterrating over rows with iterrows():
# Create a sample dataframe
data = {'Name': ['John', 'Anna', 'James'], 'Age': ['28',22,'35a']}
sdf= pd.DataFrame(data)
# iterate over the dataframe
# for each row number , extract the row data in the dataframe, repeat for all rows
for index,row in sdf.iterrows():
# strip leading/trailing white space from the name
sdf.at[index, 'Name'] = row['Name'].strip()
# Check if the row 'Age' is a string
if isinstance(row['Age'], str):
print(f"String data index is {index}, Name is {row['Name']}, Age is {row['Age']}")
# Boolean check if expected numeric data was entered as strings, if contains letters or special characters return False
if row['Age'].isdigit():
sdf.at[index,'Age'] = int(row['Age']) # True,if is a digit from 0-9, then convert to integer
print(f"Cleaned data index is {index}, Name is {row['Name']}, Age is {row['Age']}")
else:
sdf.at[index, 'Age'] = pd.NA
print(f"Uncleaned data index is {index}, Name is {row['Name']}, Age is {row['Age']}")
In [76]:
import pandas as pd
# Create a sample dataframe
data = {
'UserName': ['JohnDoe', 'AnnaSmith', 'JamesBond', 'JohnDoe', 'AnnaSmith'],
'UniqueID': [101, 102, 103, 101, 102],
'State': ['NY', 'CA', 'TX', 'NY', 'CA']
}
df = pd.DataFrame(data)
# initiate an empty set that will only hold unique IDs
unique_userId= set()
# iterate over the dataframe
for index, row in df.iterrows():
# Check if the current rows uniqueID is already in the set
if row['UniqueID'] in unique_userId:
# At this row, Create a new column, mark as a duplicate
df.at[index, 'Duplicates'] = True
else:
# Add the current row unique ID to the set
unique_userId.add(row['UniqueID'])
# Mark the row as False for Duplicates
df.at[index, 'Duplicates']= False
print(df)
In [77]:
# remove duplicate
df= df[df['Duplicates']==False]
print(df)
In [78]:
df=df.drop(columns='Duplicates')
print(df)
In [79]:
#duplicated
# Create a sample dataframe
data = {
'UserName': ['JohnDoe', 'AnnaSmith', 'JamesBond', 'JohnDoe', 'AnnaSmith'],
'UniqueID': [101, 102, 103, 101, 102],
'State': ['NY', 'CA', 'TX', 'NY', 'CA']
}
df = pd.DataFrame(data)
# duplciates method used to identify duplicates
# subset paratemeter= specifies the column to check for dups
# keep parameter, keeps the first occurence and marks subsquent duplicates
df['Duplicates']= df.duplicated(subset='UniqueID', keep='first')
print(df)
In [ ]:
In [80]:
df= df[~df['Duplicates']==True]
print(df)
In [81]:
df=df.drop(columns='Duplicates')
print(df)
In [82]:
#duplicated
# Create a sample dataframe
data = {
'UserName': ['JohnDoe', 'AnnaSmith', 'JamesBond', 'JohnDoe', 'AnnaSmith'],
'UniqueID': [101, 102, 103, 101, 102],
'State': ['NY', 'CA', 'TX', 'NY', 'CA']
}
df = pd.DataFrame(data)
# Count duplicates
# Group by the column you want to check for duplicates
# Use transform('size') to get the count of each group
# Assign result to a new column
df['Counts']= df.groupby('UniqueID')['UniqueID'].transform('size')
print(df)
In [83]:
# Create a sample dataframe
data = {
'UserName': ['JohnDoe', 'AnnaSmith', 'JamesBond', 'JohnDoe', 'AnnaSmith'],
'UniqueID': [101, 102, 103, 101, 102],
'State': ['NY', 'CA', 'TX', 'NY', 'CA'],
'Sales': [200, 150, 300, 250, 100]
}
df = pd.DataFrame(data)
# Normalize sales within each UniqueID group
df['NormalizedSales'] = df.groupby('UniqueID')['Sales'].transform(lambda x: (x - x.mean()) / x.std())
print(df)
Multi-column conditional flagging of rows with itterows()
¶
In [84]:
# Create a sample dataframe
data = {'Name': ['John', 'Anna', 'James'], 'Age': [28,22,35]}
sdf= pd.DataFrame(data)
for index, row in sdf.iterrows():
if row['Age'] < 30 and "J" in row['Name']:
sdf.at[index, 'Category'] = 'Young J'
else:
sdf.at[index, 'Category'] = 'Other'
print(sdf)
Data transformation with itterows()
¶
In [85]:
for index, row in sdf.iterrows():
sdf.at[index, 'New Age'] = row['Age']+10 # add 10 years to each persons age
print(sdf)
Multi-column Conditional Flagging or Computation with itterows()
¶
In [86]:
for index, row in sdf.iterrows():
if row['Age'] < 30 and "J" in row['Name']:
sdf.at[index, 'Category'] = 'Young J'
else:
sdf.at[index, 'Category'] = 'Other'
print(sdf)
In [87]:
for index,row in sdf.iterrows():
if row['Age']>30 and 'a' in row['Name']:
sdf.at[index, 'Flag'] = True
else:
sdf.at[index, 'Flag']= False
print(sdf)
Mark specific rows with itterows()
¶
In [88]:
for index, row in sdf.iterrows():
if row['Name'].startswith('J') and row['Age']>25:
sdf.at[index, 'Status'] = 'Senior J'
print(sdf)
Combine List Comprehension and iterrows() to extract a specific list from a dataframe¶
In [89]:
# Load a sample dataset to demonstrate application of list comprehsion and itterows()
# this is a dataset for tornado occuring in the state of minnesota in recent years
df = pd.read_csv(r".\Data\storm_data_search_results.csv")
# Set the option to display all columns
pd.set_option('display.max_columns', None)
df.head()
Out[89]:
In [90]:
# List comprehension to extract coordinates of EF0 tornado events
ef0_coordinates = [(row['BEGIN_LAT'], row['BEGIN_LON']) for index, row in df.iterrows() if row['TOR_F_SCALE'] == 'EF0']
print(ef0_coordinates)
In [91]:
mn_counties= df["CZ_NAME_STR"].unique()
In [92]:
mn_counties
Out[92]:
In [93]:
len(mn_counties)
Out[93]:
Introduction to Dictionary Comprehensions¶
- Definition
- Dictionary comprehensions are a concise and efficient way to create dictionaries in Python
- Similar to list comprehensions, provide an elegant way to perform operations and apply conditions to iterables,
- Specifically allow for the creation or transformation of dictionary key-value pairs in a single line of code
Why Do We Use Dictionary Comprehensions?:
- Conciseness: Reduces the complexity and amount of code compared to traditional loops for creating dictionaries, making the code more readable
- Performance: Generally faster than using a loop to add items to a dictionary due to optimized implementation and reduced overhead
- Expressiveness: Enhances code clarity by focusing on the dictionary creation logic rather than the mechanics of looping and inserting key-value pairs
- Versatility: Capable of incorporating conditional logic and multiple sources, allowing for sophisticated transformations and filtering in dictionary creation
- Key point: Use dictionary comprehensions to efficiently transform or map data into key-value pairs
Syntax:
- Basic Structure: A dictionary comprehension consists of curly braces
{}
containing a key-value pair expression followed by a for clause - Optionally, it can include one or more for or if clauses
Generalized Examples:¶
{key_expr: value_expr for item in iterable}
{key_expr: value_expr for item in iterable if condition}
{key_expr: value_expr for item in iterable if condition1 if condition2}
{key_expr: value_expr for item in iterable1 for item2 in iterable2}
In [94]:
# quick recap how to manipulate dictionary
from datetime import datetime
from pprint import pprint # pretty print, readable format for dictionaries
# create an empty dictionary for products and their metadata
products_dict = {}
# add to the dictionary
# This is a nested dictionary, where the key is product ID and the value is another dictionary
products_dict['0001-2024']= {
'name':'apple',
'amount':2,
'date': datetime.now().date().strftime('%Y%m%d')}
# display the dictionary
pprint(products_dict)
products_dict['0002-2024'] = {'name': 'banana'}
# display each item dictionary entry along with labels
for productID, metadata in products_dict.items():
pprint(f"Product ID: {productID}, Metadata: {metadata}")
# print the number of items
print(f"Number of items: {len(products_dict)}")
# update an existing entry
# allows you to add new key value pairs or update existing ones
products_dict['0002-2024'].update({'amount':10, 'date': datetime.now().strftime('%Y%m%d')})
#Dipsplay the items
pprint(products_dict.items())
pprint(list(products_dict.items()))
products_dict.update({'0003-2024':{'name': 'mangos','amount':100, 'date': datetime.now().strftime('%Y%m%d')}})
#Dipsplay the items
print(f"Number of items: {len(products_dict)}")
# display the dictionary
pprint(products_dict)
print()
counter=0
# List out the final nested dictionary of products
print("Final product list:")
for productID, metadata in products_dict.items():
pprint(f"PRoduct ID {counter+1} : {productID}, Metadata: {metadata}")
counter+=1
Basic Dictionary Comprehension¶
In [95]:
# Make a dictionary where the keys are numbers and values are their squares
# Basic components of dictionary comprehension
# (1) key-value expression
# (2) item
# (3) iterable
#(1) (2) (3)
squares = {x:x**2 for x in range(1,10)}
print(squares)
Conditional Dicionary Comprehension¶
In [96]:
# Make a dicionary of even numbers and their squares
even_squares = {x:x**2 for x in range(1,10) if x%2==0}
print(even_squares)
Using Functions in Dictionary Comprehension¶
In [97]:
# Create a dictionary that maps each word in a list to its length
# suppose you start with a list of words
word_list = ["apple", "banana", "cherry"]
word_length_dict= {word:len(word) for word in word_list}
print(word_length_dict)
Using a dataframe in a Dictionary Comprehension¶
In [28]:
import pandas as pd
# Create a dictionary that maps each word in a column of a dataframe to its length
# You start with a dataframe
data = {'Words':(["apple", "banana", "cherry"])*2} # Duplicate the list to increase the number of items
print(data)
df = pd.DataFrame(data)
print(df)
# use a dictionary comprehension directly on the dataframe column to map each word in the column to its length
word_length_dict= {word:len(word) for word in data['Words']}
print(word_length_dict)
# add a count to the same dataframe as a new column
df["Word Counts"]=df.groupby("Words")["Words"].transform("count")
print(f"\n{df}\n")
# Map the lengths fromt dictionary to the key names in the column Words
df["Word Lenghths"] = df["Words"].map(word_length_dict)
print(f"\n{df}\n")
In [99]:
# Read HTML tables using the lxml parser
counties_list = pd.read_html(
"https://en.wikipedia.org/wiki/List_of_counties_in_Minnesota"
)
In [100]:
counties_list=counties_list[0]
In [101]:
counties_list
Out[101]:
Introduction to Geocoding with Nominatim via Geopy¶
- Geocoding is the process of converting addresses (like "1600 Amphitheatre Parkway, Mountain View, CA") into geographic coordinates (like latitude 37.423021 and longitude -122.083739)
- can use to place markers on a map, or position the map
Capabilities of Nominatim (Geopy):
- Address Geocoding: Converts street addresses or other descriptive locations into geographic coordinates.
- Reverse Geocoding: Converts geographic coordinates into a human-readable address.
- Extensive Coverage: Utilizes OpenStreetMap data, providing global coverage often with fine-grained control over geocoding queries.
- Customization Options: Allows customization of requests, including specifying the language of the result, the bounding box for constraining searches, and more.
Syntax for Geocoding and Reverse Geocoding 1) Geocoding (Address to Coordinates)
- Initialization: Create a Nominatim object with a user-defined user_agent
- Query: Use the .geocode() method with the address as a string.
2) Reverse Geocoding (Coordinates to Address)
- Initialization: Create a Nominatim object with a user-defined user_agent
- Query: Use the .reverse() method with a string in the format "latitude, longitude".
In [45]:
from geopy.geocoders import Nominatim
import requests
geolocator = Nominatim(user_agent="geocode_Address")
def getAddress_coords(address):
location = geolocator.geocode(address)
if location:
latitude, longitude = location.latitude, location.longitude
print(location)
# Get elevation in meters
elevation_url = f"https://api.open-elevation.com/api/v1/lookup?locations={latitude},{longitude}"
response = requests.get(elevation_url)
elevation_data = response.json()
print(elevation_data)
elevation_meters = elevation_data['results'][0]['elevation'] if 'results' in elevation_data else None
print(elevation_meters)
#convert elevation to feet
elevation_feet = elevation_meters * 3.28084 if elevation_meters is not None else None
print(elevation_feet)
return (latitude, longitude, elevation_feet)
else:
print("Address Not Found, coordinates will be blank")
return (None,None, None)
In [46]:
geocode_result = getAddress_coords("5057 Edgewater Court, Savage, MN")
print(geocode_result)
In [41]:
from geopy.geocoders import Nominatim
#initialize geocoder, Nominatim object
geolocator= Nominatim(user_agent= "geocode_Address")
def getAddress_coords(address):
location= geolocator.geocode(address)
if location:
print(location)
return (location.latitude, location.longitude)
else:
print("Address Not Found, coodinates will be blank")
return(None,None)
geocode_result = getAddress_coords("5057 Edgewater Court, Savage, MN")
print(geocode_result)
In [42]:
geocode_result = getAddress_coords("Murphy-Hanrehan Park Reserve, Savage, MN")
print(geocode_result)
In [104]:
geolocator= Nominatim(user_agent= "geocode_Address")
def getAddress(coords):
location= geolocator.reverse(coords)
if location:
return (location.address)
else:
print("Address Not Found, coodinates will be blank")
geocode_Address_result = getAddress(geocode_result)
print(geocode_Address_result)
In [105]:
# Combine retrieval of external data from geocoding service with a dictionary comprehension of a dataframe column
# Create a dictionary tha maps counties to their coordinates
from geopy.geocoders import Nominatim
# initialize geocoder
geolocator= Nominatim(user_agent= "geoapiExercise")
def get_lat_lon(county):
# Append ", Minnesota" to ensure the geocoding query is localized
location= geolocator.geocode(county+ ", Minnesota")
if location:
return (location.latitude, location.longitude)
else:
return (None, None)
# county_names= counties_list["County"]
#print the names of the county
print(county_names)
print(county_names.dtype)
# Dictionary comprehension that maps a dataframe column of county names to their lat, lon coordinates
# the function of the dictionary comprehension returns the coordinates for each key in the dictionary defined by the dataframe column
coordinates_list = {county: get_lat_lon(county) for county in counties_list["County"]}
In [106]:
print(f"The dataset is an {type(county_names)}")
print(f"The data value in the dataset is an {county_names.dtype}")
print(county_names.index.to_list())
In [107]:
coordinates_list
Out[107]:
In [108]:
type(coordinates_list)
Out[108]:
In [109]:
coordinates_list.items() # converts dictionary to an iterable of tuples (key,value)
Out[109]:
In [124]:
# items() method of dictionaries returns an iterable of tuples
# each tuple consist of key-value pairs from the dictionary
type(coordinates_list.items()) # this dict_items object is an iterable
# because this is an iterable , we can use it in a loop to access its elements
#...OR convert it to other iterables like lists that are often required for further data processing
Out[124]:
In [130]:
for county, data in coordinates_list.items():
print(county, data) # prints each county and its associated data
In [129]:
# we should recognize that the dictionary items are a list of tuple pairs
for county, data in coordinates_list.items():
if county == 'Scott County':
print(county, data) # prints county and its associated data
In [120]:
# recall that dataframes can be made from list of tuples
list_dict= [('apples', (100, 2)), ('pears', (20,3))]
dict_df= pd.DataFrame(list_dict, columns=['Fruit', 'Data'])
dict_df
# Critical to recognize dictionary can be converted to list of tuples
# because pandas DataFrames can be constructed efficiently from lists of tuples,
# each tuple is a row and each element of the tuple a column
Out[120]:
In [110]:
#Knowing that iterable of tuples form .items() can be converted into a list of tuples
# allows for straightforward creation of a DataFrame.
list(coordinates_list.items())
# because pandas DataFrames can be created from lists of tuples
# each tuple is a row and each element of the tuple a column
Out[110]:
In [111]:
len(list(coordinates_list.items()))
Out[111]:
In [112]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
# Convert the coordinates list to a DataFrame
data = pd.DataFrame(list(coordinates_list.items()), columns=["County", "Coordinates"])
#ensure all are tuples with two elements
data['Coordinates'] = data ['Coordinates'].apply(lambda x: x if isinstance(x,tuple) and len(x)==2 else (None, None))
print(len(data))
# Check if any coordinates are (None, None)
none_coordinates = data[data['Coordinates'] == (None, None)]
print(none_coordinates)
# Extract latitude and longitude into separate columns
data[['Latitude', 'Longitude']] = pd.DataFrame(data['Coordinates'].tolist(), index=data.index)
# Initialize the figure and axes for the plots
fig, ax = plt.subplots(figsize=(14, 10), subplot_kw={'projection': ccrs.PlateCarree()})
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')
# Plot the data points
ax.scatter(data['Longitude'], data['Latitude'], color='red', s=50, edgecolor='k', zorder=5)
# Add labels for each point
for i, row in data.iterrows():
ax.text(row['Longitude'] + 0.02, row['Latitude'] + 0.02, row['County'], fontsize=12)
# Set the title
ax.set_title('County Coordinates in Minnesota')
# Show the plot
plt.show()
In [113]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
# Convert the coordinates list to a DataFrame
data = pd.DataFrame(list(coordinates_list.items()), columns=["County", "Coordinates"])
#ensure all are tuples with two elements
data['Coordinates'] = data ['Coordinates'].apply(lambda x: x if isinstance(x,tuple) and len(x)==2 else (None, None))
print(len(data))
# Check if any coordinates are (None, None)
none_coordinates = data[data['Coordinates'] == (None, None)]
print(none_coordinates)
print(data['Coordinates'])
# Extract latitude and longitude into separate columns
data['Latitude'], data['Longitude'] = zip(*data['Coordinates'])
print(data['Coordinates'])
print(data['Latitude'])
print(data['Longitude'])
# Initialize the figure and axes for the plots
fig, ax = plt.subplots(figsize=(14, 10), subplot_kw={'projection': ccrs.PlateCarree()})
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')
# Plot the data points
ax.scatter(data['Longitude'], data['Latitude'], color='red', s=50, edgecolor='k', zorder=5)
# Add labels for each point
for i, row in data.iterrows():
ax.text(row['Longitude'] + 0.02, row['Latitude'] + 0.02, row['County'], fontsize=12)
# Set the title
ax.set_title('County Coordinates in Minnesota')
# Show the plot
plt.show()
In [114]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import cartopy.io.shapereader as shpreader
# Path to the Natural Earth shapefile
shapefile_path = r'G:\My Drive\Python_projects\my_git_pages_website\Py-and-Sky-Labs\content\Python Examples\Data\US_County_borders\ne_10m_admin_2_counties.shp'
# Initialize the figure and axes for the plots
fig, ax = plt.subplots(figsize=(14, 10), subplot_kw={'projection': ccrs.PlateCarree()})
# Add built-in Cartopy features
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')
# Load and plot the county boundaries
reader = shpreader.Reader(shapefile_path)
counties = list(reader.geometries())
ax.add_geometries(counties, ccrs.PlateCarree(), edgecolor='black', facecolor='none')
# Assuming 'data' is your DataFrame with the 'Longitude' and 'Latitude'
ax.scatter(data['Longitude'], data['Latitude'], color='red', s=50, edgecolor='k', zorder=5)
# Optionally add labels for each point
for i, row in data.iterrows():
ax.text(row['Longitude'] + 0.02, row['Latitude'] + 0.02, row['County'], fontsize=12)
# Set the title
ax.set_title('County Coordinates in Minnesota')
# Show the plot
plt.show()
In [115]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import cartopy.io.shapereader as shpreader
# Path to the Natural Earth shapefile
shapefile_path = r'G:\My Drive\Python_projects\my_git_pages_website\Py-and-Sky-Labs\content\Python Examples\Data\US_County_borders\ne_10m_admin_2_counties.shp'
# Initialize the figure and axes for the plots
fig, ax = plt.subplots(figsize=(10, 15), subplot_kw={'projection': ccrs.PlateCarree()})
# Add built-in Cartopy features
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':')
# Load the shapefile and filter for counties in Minnesota
reader = shpreader.Reader(shapefile_path)
minnesota_counties = [county for county in reader.records() if county.attributes['REGION'] == 'MN']
# Plot only the filtered counties
for county in minnesota_counties:
geometry = county.geometry
name = county.attributes['NAME']
ax.add_geometries([geometry], ccrs.PlateCarree(), edgecolor='black', facecolor='none')
x, y = geometry.centroid.x, geometry.centroid.y
ax.text(x, y, name, fontsize=9, ha='center', transform=ccrs.Geodetic())
# Limit the map extent to Minnesota
ax.set_extent([-97.5, -89.5, 43.5, 49.5], crs=ccrs.PlateCarree()) # Adjust these values based on the actual coordinates of Minnesota
# Plot the data points derived from the geocoded lat lon coordinates
ax.scatter(data['Longitude'], data['Latitude'], color='red', s=50, edgecolor='k', alpha=0.5, zorder=1)
# Set the title
ax.set_title('County Coordinates in Minnesota')
# Show the plot
plt.show()
In [ ]:
mn_counties
.to_dict()
¶
In [116]:
# # Convert Filtered_top_bot_data to a dictionary mapping countries to life expectancy
# life_expectancy = Filtered_top_bot_data.set_index('country')['lifeExp'].to_dict()
# print(life_expectancy)
Plot using .items()
¶
In [117]:
# Plot each country's coordinates
# Assuming `top_countries` and `bottom_countries` are lists of country names
# for country, (lat, lon) in coordinates.items():
# if lat and lon: # Check if lat and lon are not None
# color = 'green' if country in top_countries else 'red'
# plt.plot(lon, lat, marker='o', color=color, markersize=5, transform=ccrs.Geodetic())
# plt.text(lon, lat, country, transform=ccrs.Geodetic())
# plt.title('Top and Bottom African Countries by Life Expectancy')
# plt.show()
Combine Dictionary Comprehension and iterrows() to create a dictionary based on multiple columns of a dataframe¶
In [118]:
# Extending the DataFrame with another column
data = {'Words': ["apple", "banana", "cherry"], 'Type': ["fruit", "fruit", "fruit"]}
df = pd.DataFrame(data)
# Dictionary mapping word to a tuple of (word length, type)
word_info_dict = {row['Words']: (len(row['Words']), row['Type']) for index, row in df.iterrows()}
print(word_info_dict)
generator expressions¶
my_generator = (x*x for x in range(10)) for value in my_generator: print(value)