Functional programming¶
- a style of programming where your output is determined solely by your input
- The function does not change anything outside of it or depend on external data to produce the output
Why use Functional programming?:
- makes your code easier to understand, test, debug and build upon
- It's widely used in data analysis and other fields where computation is important
4 Key concepts in functional programming:
Pure Functions: These are functions where the output is determined only by their input
- Given the same input, they will always produce the same output
- Have no side-effects, meaning they don't change anything -beside the function input- in the program
Immutability: In functional programming, data is not changed
- Instead, new data is created from the existing data, which is easier to follow and debug
First-Class Functions: In functional programming, functions are treated like any other variable
- They can be assigned to variables, stored in data structures, passed as arguments to other functions, and returned as values from other functions
Higher-Order Functions: These are functions that take one or more functions as arguments, return a function as a result, or both
- This is a key part of functional programming and you'll see this concept used often
Goal: To use functional programming concepts in our Python code¶
The Essence of Functional Programming in Data Analysis¶
In data analysis tasks, particularly with tools like pandas, the essence of functional programming—writing clear, concise, and effective code—is achieved by:
- Minimizing Use of Mutable Data structures: By avoiding in-place modifications (changes made directly to the data structure)
- Encapsulating Operations in Functions: Encapsulation in a function improves readability, reusability, and allows for better testing and debugging
- Utilizing Functional Constructs: Leveraging built-in methods that abstract complex operations into simpler, concise statements
import pandas as pd # should already be pre-installed with conda
from itables import init_notebook_mode, show #to fully take advantage of the ability to display our data as a table format, let's use the itables library
init_notebook_mode(all_interactive=True) #After this, any Pandas DataFrame, or Series, is displayed as interactive table, which lets you explore, filter or sort your data
1. Pure functions:¶
Definition: A function is pure if its output is only determined by its input
- no changes are made beyond the input
# Pure function:
def number_power_two(x):
return x ** 2
print(number_power_two(2)) # prints 4
# key point: will always produce same output for same input
# Why? Because it has a local scope: uses a local variable x , which is only defined by the input value when the function is called
Non-Pure functions:¶
Definition: this function depends on a global variable, which means that its output can change even with the same input
- Global Variables: These are variables that are defined in the main body of the script, outside any function, and can be accessed from any function in the code
- They have a global scope, meaning they can be accessed anywhere in the program
- However, to modify a global variable within a function, you must use the
global
keyword before the variable name
Local Variables: These are variables that are defined only within a function
- keypoint: local variables can only be accessed and modified within that function, and they cease to exist once the function has finished running
# non-Pure function:
y=3 # here is a global variable called y, which is a variable that operates across the entire scope of your code
def number_power_two(x):
# The function is reading the value of the global variable y,
# but it's only accessing and not modifying y, so we don't use global keyword
return x**y
print(number_power_two(2)) # prints 8
# This is NOT a pure function
# If we change y elsewhere in our code, the same input e.g., x=2
# will also change the output
Function with Side Effects:¶
Definition: a function that depends on a global variable and modifies it
- The
global
keyword is used before a variable to indicate that the variable is a global variable - Keyword
global
is necessary if you want to modify a global variable from within a function - Without the
global
keyword, Python would treat a variable assigned within a function as a new local variable, leaving the global variable unchanged
In functional programming, it's recommended to avoid using global variables and modifying them within functions, as this can lead to unpredictable side effects (unpredictable things occuring in your program)
- it's better to use and return local variables, and to pass any necessary data into a function as arguments
# example of what NOT to do in functional programming...
y=3
def number_power_two_change_y(x):
global y # we need to use keyword global to access AND modify the global variable y inside the function
y= y*2 # global variable y has been changed!
return x**y
print(number_power_two_change_y(2)) # prints 12
print(y) # prints 6, y has been changed!!
# In a larger program, it would become quite difficult to track the value of y if we keep changing its value ...
# Do this instead...
# This function uses only local variables and arguments
def add_numbers(x, y):
# x and y are local variables. They only exist within this function
result = x + y # result is also a local variable
return result
# Rather than using global variables, call the function with those data as arguments. e.g., 3 and 4
print(add_numbers(3, 4)) # prints: 7
#This function doesn't use or modify any global variables, it operates solely on its arguments and local variables
# Makes the function's behavior easy to understand and predict
# given the same arguments, it will always produce the same result
2. Immutable functions in functional programming¶
Immutatbility is a key concept in functional programming
Immutability: In functional programming is data that cannot be changed
Instead, new data is created from the existing data, which leads to less bugs and easier troubleshooting if you do have them
Certain data types are inherently immutable: tuples, integers, strings
- Others data types are mutable: lists, dictionaries, and most relevant here DataFrames!
Key point: Understanding immutability ensures that data not get changed unexpectedly
# Strings in python are immutable meaning any operation on them will create a new string
# define our function to modify an input string
def capitalize_string(input_string):
# this function returns a new string
return input_string.capitalize() # this is a string method used to Capitalize a string
original_string = 'chicago'
# Call the function and print the result
print(f"New string is : {capitalize_string(original_string)}") # capitalizes and returns a new string
# print the original, will it be changed?...
print("Original string is:", original_string) # still the same
# We see that the function did not alter the original string
# # The function actually returned a new string with the first letter capilized
# While strings are immutable, lists are mutable
# So, you can directly change the list contents like adding, removing, or modifying its elements
def upper_string(input_string):
return input_string.upper()
# Original list of city names
original_strings = ['Chicago', 'Philadelphia', 'Houston']
# Modifying the list by replacing elements
for i in range(len(original_strings)):
# Although strings are immutable, the list that contains them is mutable
original_strings[i] = upper_string(original_strings[i]) # we are directly modifying the list
print("Modified strings in mutable list: ", original_strings)
# We see that the function altered the original list of strings
# This is a something we need to be aware of if the program refers to the original list elsewhere
# Why?: Because if we change the original data, then how do we...
# refer back to the original data?
# compare the new result with the original?
# apply a different function to the original data?
# imagine you had a function that processes an input list of cities and appends a 'new' city to the list
# what would happen if this function was called in different parts of the program
# How should it go about adding data to the original list?
def add_new_cities(data_list, city):
'''Simulates process of adding by appending strings to input list'''
new_city= city
data_list.append(new_city) # directly modifies the list passed to it
# Keep in mind whether your intent is to modify the original data?
# Original list of city names
original_list = ['Chicago', 'Philadelphia', 'Houston']
# call the function for the first time
# Function call 1
add_new_cities(original_list, "New Orleans")
print("After first processing:", original_list)
# Function call 2
add_new_cities(original_list,"New Orleans")
print("After second processing:", original_list)
# What happened?
# We modified the original data
# the first time the function was called on the original data, it correctly added the New city to the list
# Later, the function was called again and adds to the same list, creating likely unintended additions to the original Data
# Can lead to bugs if we do not understand that the original data is being modified when we append to a list in this way
# Instead, copy the original data
# then return the the new local variable created that holds the modified list
def add_new_cities(data_list, city):
'''Simulates process of adding by appending strings to input list'''
new_city= city
data_list= data_list.copy()
data_list.append(new_city) # directly modifies the list passed to it
return data_list
# is the intent to modify the original data? Did you already have a .copy()? If so, making a copy is NOT necessary
# Original list of city names
original_list = ['Chicago', 'Philadelphia', 'Houston']
# call the function for the first time
# Function call 1
processed_data1 = add_new_cities(original_list, "New Orleans")
print("Original list, after first processing:", original_list)
print("New list, after first processing:", processed_data1)
# Function call 2
processed_data2 = add_new_cities(original_list,"New Orleans")
print("Original list, after second processing:", original_list)
print("New list, after second processing:", processed_data2)
# Here, the function deliberately avoids altering the original list
# It works on a copy of the list and returns this modified copy
# The purpose is to leave the original data unchanged
# A more practical case, using a dataframe
# Clean up zip code data while preserving the original data
# Create a DataFrame with various zip formats
df = pd.DataFrame({
'original_zip': ['55255-10202', '12343-29292', '84848-12020', '3033'] # Although numeric, hyphenated zips in the raw data will be interpreted by pandas as strings as represented here in quotes
})
# function to clean up zipcodes and keep only the first 5 digits
def clean_zipcodes(input_zip):
# Split the input string at the hyphen and select the first part
first_part= input_zip.split("-")[0] # string method called .split(), [0] selects the first part of the string
# Pad zeros to ensure zip is always 5 characters
return first_part.zfill(5)
# Apply the function to the original_zip column and create a new column for the cleaned zip codes
df['new_zip'] = df['original_zip'].apply(clean_zipcodes) # will splitting and padding zips modify the original data?...
# Display the original and new data
print("\nModified DataFrame with Cleaned Zip Codes:\n", df[['original_zip', 'new_zip']])
print("\n")
# Display the original data
print("Original DataFrame:\n", df[['original_zip']])
# we performed several manipulations on the original zips without altering the original data
# value of keeping the original data in the dataframe is that
# it is easier to backtrack what was done later on if we need to verify and compare
# Say we find that we often need to change the data in our dataframe, maybe dates and mags have typos..
# We need a function to simplify this re-occuring process
# but our dataframe has a tuple multi index
# So, our function should take a dataframe, and a tuple, then return a NEW dataframe with the updated index
# Create a DataFrame from a dictionary of tuples
data = {
'mag': (1,1,3,3),
'date': ('April 10 2024','April 11 2024','April 12 2024','April 13 2024'),
'inj' : (1,2,100,200)
}
tdata = pd.DataFrame(data)
def set_new_index(df, row, new_index_values):
print(df) #input data we want to change
# Create a new DataFrame with the desired index
df_copy = df.reset_index() # reset the index and create a NEW dataframe
print(df_copy) # check the results of the NEW dataframe
df_copy.loc[row, ['mag', 'date']] = new_index_values # change the mag and date for the first row of the NEW dataframe
print(df_copy)
df_copy.set_index(['mag', 'date'], inplace=True) # set the index to the tuple of mag, date for the NEW dataframe
print(df_copy)
return df_copy # return the NEW dataframe with the modified index
# This pure function , it doesn't modify the original DataFrame (it has no side effects)
# will always product the same output Dataframe for the same input Dataframe,index (we have no external variables in the function modifying our output)
new_tdata = set_new_index(tdata, 0, (2, 'April 12 2024')) # creates a new DataFrame with the multi index (tuple) updated
print(new_tdata)
# This function doesn't modify the original DataFrame
#set_new_index is a pure function
# This function doesn't modify the original DataFrame
# will always produce the same output for the same input
3. First-Class Functions¶
- Central concept to functional programming
- Functions are treated like any other value or variable
- This concept allows for flexible design patterns and can simplify many complex programming tasks
Key Characteristics:
- Variable Assignment: Functions can be assigned to variables.
- Function as Argument: Functions can be passed as arguments to other functions.
- Function as Return Value: Functions can return other functions.
- Also, can be stored in data structures
Variable Assignment¶
# Assigning a function to a variable
# Let's create a function that greets the name entered in the arguement
# # and modify it with some Seinfeld references...
def Hello(name):
if name== "Newman":
return f" Hello,...Newman!"
if name == "Jerry":
return f" Hello, ... Jerry!"
if name == "Mulva":
return f" You don't know my name do you?"
else:
return f"Hello, {name}" # this line alone, without the 'else' , is sufficient for the example to function, but that would be no fun!!
print(Hello("Mulva"))
# now for the key point here: let's save that function in a variable
greet = Hello
# let's check on this data....
print(greet) # yep, we have a stored function
print(greet("Newman"))
Store the function in a data structure¶
# Stored in Data Structures
def square(x):
return x * x
def cube(x):
return x * x * x
functions = [square, cube]
print(type(functions)) # a list
print(functions) # holding two function objects
results =[] # create an empty list to hold the results
for func in functions:
results.append(func(3)) # append the result to the list
print(results) # Outputs 9 (3^2) and 27 (3^3) respectively
# Assign functions to variables and store them in Data structures
import numpy as np
data = {
'mag': [1,1,3,3], # column 1
'date': ['April 10 2024','April 11 2024','April 12 2024','April 13 2024'], # column 2
'inj' : [1,2,100,200] # column 3
}
tdata = pd.DataFrame(data)
# Define some transformation functions
def log(x):
return np.log(x)
def square(x):
return x**2
# store these for later use on our data
# function dictionary, holding two functions
transformation = {
'log' :log,
'square': square
}
# suppose we say that a user decides what transformation
user_transforms = 'log' # can make into an input() but for simplicity create it as a typical variable
# apply the function from the dictionary
tdata['log_inj'] = tdata['inj'].apply(transformation[user_transforms])
print(tdata) # apply function has applied our function def log from the dictionary called transformation
Pass a function as an arguement to function¶
# pass function as an arguement to other functions
# say you have a function you need to apply often to different inputs
def cube(x):
return x * x * x
# you can create another function that takes your cube function along with a specific arguement to evaluate
def apply_function(func, value):
return func(value) # this will pass a value to the cube function to be evaluated and returned
result = apply_function(cube, 5) # you can pass that cube function into another function along with a specific arguement to evaluate
print(result) # Output will be 125
# Pass function as an arguement to another function with sample tornado data
data={
'mag': [1,1,3,3], # column 1
'date': ['April 10 2024','April 11 2024','April 12 2024','April 13 2024'], # column 2
'inj' : [1,2,100,200] # column 3
}
tdata = pd.DataFrame(data)
# create a function that takes a dataframe and function as arguements
def apply_custom_function(df, func):
return df.apply(func) # uses the built-in pandas apply function to execute the function on the dataframe
# Function to increment by 10
def add_ten(x):
return x + 10
# add_ten function is passed as an arguement to apply_custom_function
tdata["added_inj"] = apply_custom_function(tdata['inj'], add_ten)
print(tdata)
Return a function as value from another function¶
# Return a function as a value from another function
def make_multiplier(x):
def multiplier(y):# when we call make_multiplier, the multiplier function is created
return x * y #created with a value x that is determined when make_multiplier(x) is called
return multiplier # we return the function and now it holds a fixed value for x
double = make_multiplier(2) # multiplier function is stored in variable double with x=2 as fixed parameter
print(double(5)) # when we call double it means we execute the multiplier function, here with y set to 5
# Output will be 10
# so we created a function that generates a another function called multiplier with a pre-set value of 2
# Return a Function as a value from another function
# lets use this to normalize injury counts from a sample of tornado data
data={
'mag': [1,1,3,3], # column 1
'date': ['April 10 2024','April 11 2024','April 12 2024','April 13 2024'], # column 2
'inj' : [1,2,100,200] # column 3
}
tdata = pd.DataFrame(data)
def get_normalizer(min_val, max_val):
def normalize(x):
return (x - min_val)/ (max_val - min_val) # fyi, min-max scaling formula scales min to 0 and max to 1
return normalize # notice we returned a function, and it hold fixed values for min and max injuries
# compute min and max
min_A = tdata["inj"].min()
max_A= tdata["inj"].max()
# Return our normalize function and store it in a new variable called normalizer
normalizer = get_normalizer(min_A, max_A)
tdata["inj_normalized"] = tdata['inj'].apply(normalizer)
print(tdata)
# one more example where we pull all 3 characteristics of first-class functions together
def greet(name):
return f"Hello {name}!"
def greet_loudly(name):
return f"HELLO {name}!!!"
def create_greeting(name, func):
"""Apply any greeting function to a name."""
return func(name)
# Variable assignment
say_hello = greet
print(greet("Bob"))
# Passing a function as an argument
print(create_greeting("Alice", greet))
print(create_greeting("Bob", greet_loudly))
# Returning a function from a function, with a condition
def get_greeter(mood):
if mood == 'loud':
return greet_loudly
else:
return greet
# Use returned function
current_mood_greeter = get_greeter('loud')
print(current_mood_greeter("Chris"))
4. Higher-Order Functions¶
- Definition: functions that can take other functions as arguments, return functions as results, or both
useful for creating flexible code that can be customized with behavior defined outside the function itself
Methods:
- ### map : is primarily used with pandas Series to apply a function element-wise. Good example of a higher-order function because it takes a function as an argument
- Recall that a series is 1-dimension array, like a column of data that is paired with an index of row labels for each of the data points
- In contrast, a DataFrame is 2-dimension arrary, like a table or spreadsheet. Here data is acccessed by an index of labels for each row and its column names
- ### apply: in pandas is also used for applying functions along an axis of a DataFrame or on a Series
- The result is either aggregated or transformed data
- This is another instance of higher-order functions where the function passed can produce outputs based on entire columns or rows
- filtering: there is not method called filter but we can use built-in python syntax along with apply to create a filter based on a condition
#Map function
# first let's generate a Series or column of data . Say it is temperature data in Celcius
cdata= pd.Series([22,21,32,30,40])
# Let's create a function to convert celcius to fahrenheit
def cto_Fahrenheit(x):
return (9/5)*x + 32
# now we use the map function
# the map function takes our custom function as its arguement
# applies it to each element in cdata
f_temps = cdata.map(cto_Fahrenheit) # key point: we are passing a function as an arguement
print("Temps Series data\n", f_temps)
# create a new dataset
# using a dictionary {} of lists []
# generates column names and their values as key:value pairs'
new_cdata= {
'C_Temp':[30,33,31,37],
'Date':['2024-06-04', '2024-06-05','2024-06-06', '2024-06-07']
}
# convert to a DataFrame
temp_df=pd.DataFrame(new_cdata)
print(temp_df)
# Use map to apply the custom function to the c_temp column
# Convert values in C_Temp and define a new column to store the converted temp values
temp_df['F_Temp'] = temp_df['C_Temp'].map(cto_Fahrenheit) # key point: we are passing a function as an arguement
print(temp_df)
# we used map with our custom function to convert entire columns of data from Celcius to Fahrenheit
# Apply method
precip_data = {'Precip' : [ 1, 1, 0, 0]} # a simple example, more realistic would need to know day/time values to link the datasets
precip_df = pd.DataFrame(precip_data)
temp_df['Precip'] = precip_df
print(temp_df) # now we added some precip data
# Say the Precip measurements are all off by 1
# We can create a new column that will hold the corrected values
# Using the lambda function along with apply to modify a DataFrame column
temp_df['Precip_plus_1'] = temp_df['Precip'].apply(lambda x: x + 1)
print("DataFrame with Precip + 1:\n", temp_df)
# key point:
# In these precip examples
# we are passing a function as an arguement to another function, this time a lambda function was passed to apply
# Filter data based on a condition
# How to create a function that filters data based on a certain condition?
# Here we use the apply method to create a mask, series of True/False value
# Then we use that mask to filter the data
print(temp_df) # we start with the full set of data
# Define a function to select only those temps meeting a certain condition
def hightemps(row, temp_value):
return row >temp_value # returns temps above temp value
# Using apply method to call the function hightemps that filters for values where F_Temp is >90
# Here apply works on each row of the column, returning True when value is above the temp_value
hi_temps= temp_df[temp_df['F_Temp'].apply(hightemps, temp_value=90)]
print('\n') # adds space between prints
print('DataFrame with high temperatures:\n', hi_temps)
# A simpler approach to filtering based on a condition with using apply function
print(temp_df) # we start with the full set of data
temp_value = 90
# Directly use boolean indexing for filtering
hi_temps = temp_df[temp_df['F_Temp'] > temp_value]
print('\n') # adds space between prints
print('DataFrame with high temperatures:\n', hi_temps)
# the advantage of using a function is that we do not have to
# specify the temp value to filter on twice each time we want to fitler
# rewrite the filter line each time
# Another example, where simple boolean filter would not work
data = {
'Day': ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'],
'City': ['CityA', 'CityB', 'CityA', 'CityB', 'CityA'],
'F_Temp': [89, 92, 88, 93, 95]
}
temp_df = pd.DataFrame(data)
print("Original DataFrame:")
print(temp_df)
# Define a function to select only those temps meeting a certain condition
# For example, filter for temperatures above a user-specified value on specific days or cities
def hightemps(row, temp_value):
# Complex condition: Temp > user-specified value and either on Monday or in CityA
return (row['F_Temp'] > temp_value) and (row['Day'] == 'Monday' or row['City'] == 'CityA')
# Using apply method to call the function hightemps that filters rows based on the complex condition
hi_temps = temp_df[temp_df.apply(hightemps, axis=1, temp_value=90)] # can make this a user input like this: float(input("Enter the temperature value: "))
print("\nDataFrame with high temperatures based on complex conditions:")
print(hi_temps)
Self-test Exercise: Functional Programming¶
Apply functional programming principles to find a moving average
Exercise 1: Tornado Data Analysis¶
- Many natural phenomena have short-term fluctuations that hide patterns or trends in the data.
- How do we find a trend when there are such short-term fluctuations?
Task: Determine the 10 year moving average for annual tornado counts in the United States
- Have tornado counts increased in the last 20 years? Why or Why not?
- Do the results surprise you?
Moving Averages
- Moving averages are a fundamental tool in time series analysis, smoothing out short-term fluctuations and highlighting longer-term trends or cycles
- They are widely used in weather forecasting, stock market analysis, and many other fields
- Basic Syntax:
dataframe['Column'].rolling(window).mean()
- specify the data set, usually will refer to a column
- size of the window (the number of periods to include in each average)
.mean()
average for the data within the window, completing the moving average operation
Arguments:
data
: The dataset or series on which the moving average is to be calculatedwindow
: The number of periods over which to calculate the average- Window defines the "moving" part of the moving average, as this window slides over the data
Note: We can consider .rolling() as a form of higher-order function because it takes a fucntion (mean()) and applies it to a window of data
# load the data
new_tordata= pd.read_csv(r'.\Data\1950-2022_actual_tornadoes.csv')
# the following code cell will give you a nice hover highlighting on your tables when used with itables library
%%html
<style>
.dataTables_wrapper tbody tr:hover {
background-color: #6495ED; /* Cornflower Blue */
}
</style>
<!-- #1E3A8A (Dark Blue) -->
<!-- #0D9488 (Teal) -->
<!-- #065F46 (Dark Green) -->
<!-- #4C1D95 (Dark Purple) -->
<!-- #991B1B (Dark Red) -->
<!-- #374151 (Dark Gray) -->
<!-- #B45309 (Deep Orange) -->
<!-- #164E63 (Dark Cyan) -->
<!-- #4A2C2A (Dark Brown) -->
<!-- #831843 (Dark Magenta) -->
<!-- #1E3A8A (Dark Blue ) -->
<!-- Suggested Light Colors for Light Backgrounds -->
<!-- #AED9E0 (Light Blue) -->
<!-- #A7F3D0 (Light Teal) -->
<!-- #D1FAE5 (Light Green) -->
<!-- #DDD6FE (Light Purple) -->
<!-- #FECACA (Light Red) -->
<!-- #E5E7EB (Light Gray) -->
<!-- #FFEDD5 (Light Orange) -->
<!-- #B2F5EA (Light Cyan) -->
<!-- #FED7AA (Light Brown) -->
<!-- #FBCFE8 (Light Magenta) -->
show(new_tordata,options={'hover': True})
# Ouuu lala, much easier to scan the rows now
new_tordata= new_tordata.set_index('date')
new_tordata # we changed the index to time. This makes it easier to organize the data by time
# hint:
# to set up the data and plot annual counts
# group by year and count tornadoes per year
# save the annual tornado counts to a new variable
annual_tors= new_tordata.groupby("yr")["om"].count()
annual_tors.plot(kind='line', xlabel='Year', ylabel='Tornadoes', title='US Total Tornado Count from 1950-2022') # plot a line plot for annual counts
print(annual_tors)
# we see annual fluctuations in tornado counts
# how can we smooth this out to discern a trend more clearly?
# answer:
# Find the Moving Average for annual tornado counts , averaging counts every 10 years
# group the data by year and count tornado events per year
annual_tors= new_tordata.groupby("yr")["om"].count()
print(annual_tors.head(20))
# define a function that will calculate moving average of our grouped data
# and always generate the same ouptut for the same input
def movingAvg(series, window_size):
return series.rolling(window=window_size).mean() # rolling is a built-in pandas higher order function that we use within a our function
tor10yrMA = movingAvg (annual_tors, window_size=10) #average number of tornadoes occuring every 10 years
print(tor10yrMA.head(20))
# Plotting the annual tornado count
ax = annual_tors.plot(kind='line', label='Annual Tornado Count')
# Plotting the 10-year moving average on the same Axes object
tor10yrMA.plot(kind='line', label='10 Year MA', ax=ax, xlabel='Year', ylabel='Tornadoes', title='US Total Tornado Count from 1950-2022')
# Displaying the legend
ax.legend()
# After removing short term fluctuations by averaging each point over a 10 year window, we clearly see a general increase in annual tornadoes over time
# The reasons for this -not depicted here- are mostly due to spread and adoption of radar technology that enhanced detection of low magnitude events
# Conclusion
# Taken together, in the last 20 years we have consistently seen over 1000 tornadoes in the US