Python Coffee Courses 103 - Basic SIR modeling and how to visualize it

Import the libraries we will need to execute on the calculations and modelling today.

import pandas as pd  # Loading the NYC dataset 
import numpy as np   # performing fast calculations to build our simulation
import plotly.express as px # Quick visuals for data exploration
from ipywidgets import widgets # sliders, buttons and visual layout of the dashboard
import plotly.graph_objects as go # plotly objects to place on our graph
import math # python mathematical functions

SIR Model

In order to prepare to visualize our SIR model we will need to calculate the number of people in the susceptible, infectious, and removed categories for every single day. Because we’re making this model we can make things simple for ourselves, in the end we will have 4 arrays with data in them which we can plot on a chart:

days          --->  [   1,   2,   3,   4,   5,   6,   7,   8,   9, ..... 196 ]
susceptible   --->  [1000, 999, 990, 960, 910, 850, 780, 700, 600, ..... 0   ]
infectious    --->  [   0,   1,  10,  35,  80, 140, 190, 250, 320, ..... 0   ]
removed       --->  [   0,   0,   0,   5,  10,  20,  30,  50,  80, ..... 1000]

From looking a this we can already understand some of the limits of the SIR model we’re going to build:

  1. At any given day, the sum of susceptible, infectious, and removed will be the same, that will be our population.
  2. We’re going to have to define how many days we want to simulate.
  3. Given that our model calculates what today’s values for S, I, and R will be based on yesterday’s this is a perfect opportunity for a for loop.

Given these insights let’s define our starting conditions now:

days = 50
population =  1000
initial_infection = 1

infection_rate = 1.6
removal_rate = .10

x_days = np.linspace(0,days, days)

y_susceptible = [population]
y_infected = [initial_infection]
y_removed = [0]

With the initial conditions of our model created, let’s look at how we can calculate how many people we need to move from Susceptible to Infectious and from Infectious to Removed every day.

Susceptible to Infectious

For Susceptible to Infectious, we’re going to multiply the infectious rate by the number of people who have been infected and by the ratio of people still susceptible.

$ \[\begin{align} \large \mathit{daily\_infected} = \mathit{ infection\_rate} * \mathit{previous\_day\_infected} * \frac{\mathit{previous\_day\_susceptible}}{\mathit{population}} \end{align}\] $

written in python as:

daily_infected = (infection_rate * y_infected[day-1] * (y_susceptible[day-1])/population)

Infectious to Removed

To move people from the infectious category to the removed category we’re going to multiply the number of infectious people on the day before by the removal rate.

$ \[\begin{align} \large \mathit{daily\_removed} = \mathit{removal\_rate} * \mathit{previous\_day\_infected} \end{align}\] $

written in python as:

daily_removed = removal_rate * y_infected[day-1]

With the figures for the number of people who have moved from Susceptible to Infectious and have moved from Infectious to Removed calculated, we can substract them from the previous day values to get our new Susceptible, Infectious, Removed numbers.

Written out in python it looks like the code written below:

for day in x_days[1:days-1]:
    
        # convert our day value to integer because numpy works with floating point
        # values by default instead of integers.
        day = int(day)

        # Calculate the number of people to move from one category to another category
        # as described above.
        daily_infected = (infection_rate * y_infected[day-1] * (y_susceptible[day-1])/population)
        daily_removed = removal_rate * y_infected[day-1]

        # Add and remove the calculated amounts from each category for thie calculated day
        y_susceptible.append( max(y_susceptible[day-1] - daily_infected, 0) )
        y_infected.append(        y_infected[day-1]    + daily_infected - daily_removed )
        y_removed.append(         y_removed[day-1]     + daily_removed )
        

We now have the data we need to build our visualization and show how the different groups of people interact with each other.


# This figure builder allows us to create an interactive figure to which 
# we are going to add the data we calculated one by one
fig = go.FigureWidget() 

# This visualization only looks good if the bars are stacked, so we're going to
# set the stacking option here so that the new series added to the graph will 
# stack.
fig.update_layout(barmode='stack'
                  , height = 900)
fig

The first data we’re going to add is the most common one you see in the press which is the model of the number of infected people by day.

# add the series data, in plotly this is called a trace.
fig.add_trace(
        # This series is going to be a bar chart
        go.Bar(
               x = x_days       # the x-values are defined by the list of days generated earlier
              ,y = y_infected   # the y-values are defined by the daily infected figures calculated in our loop
              , name='Infected' # name the series 
              , marker = dict(color='red') # color the series red
              ))

x=1 # ignore this, I put this here to avoid seing the output in the notebook

Next, we’ll add the susceptible numbers to see how they evolve in relation to the infected population

fig.add_trace(
        go.Bar( x = x_days
              , y = y_susceptible
              , name='Susceptible'
              , marker = dict(color='rgba(0,0,255,0.5)')  # interesting thing about the color values in plotly, 
                                                          # you can use rgba to set an alpha value to increase the 
                                                          # transparency
              )
              )

x=1 # ignore this, I put this here to avoid seing the output in the notebook

Finally, we’ll add the removed category to the figure.

fig.add_trace(
        go.Bar(x = x_days
              ,y = y_removed
              , name='Removed'
              , opacity=0.5))

x=1 # ignore this, I put this here to avoid seing the output in the notebook

SIR Interactive example

The first thing we’re going to do to make our calculator interactive is to put the model calculation code into a method that we can call when variables change.

We’ll take the starting values from the code above as the parameters for the function and return a dictionary which contains the three arrays we calculated which contain the number of infected, removed, and susceptible people for each day.

# We'll set some easy defaults for this function so that we immediately have outputs which make some sense when we execute this method.
def calculate_values_SIR(  removal_rate = .1
                         , infection_rate = 1.1
                         , population = 1000
                         , days = 150
                         , initial_infection  = 1
                         
                         # we've added this additional parameter which allows us to cut the 
                         # infection rate by 2 at a given day to simulate drastic policy action.
                         , intervention_day = 50
                        ):
    
    # build out starting positions
    x_days = np.linspace(0,days, days)
    y_susceptible = [population]
    y_infected = [initial_infection]
    y_removed = [0]
    
    
    for day in x_days[1:days-1]:
        
        # if this day is after the day drastic intervention is made, reduce the infection rate by 2.
        if day > intervention_day:
            infection_rate = infection_rate / 2
            
        day = int(day)
        
        daily_infected = min((infection_rate * y_infected[day-1] * (y_susceptible[day-1])/population), y_susceptible[day-1])
        daily_removed = removal_rate * y_infected[day-1]
        
        y_susceptible.append( y_susceptible[day-1] - daily_infected )
        y_infected.append(    y_infected[day-1] + daily_infected - daily_removed )
        y_removed.append(   y_removed[day-1] + daily_removed )
    
    return {'infected': np.rint(y_infected)
            , 'removed' : np.rint(y_removed)
            , 'susceptible': np.rint(y_susceptible)}
        

With the smarts of our model safely tucked away in a method let’s build some controls we can use to update our chart live.

We’re going to create a layout using the widgets library which looks like this:

+--------------------------------+
| first group of sliders         | 
+--------------------------------+
| second group of sliders        |
+--------------------------------+
| Chart                          |
|                                |
|                                |
|                                |
+--------------------------------+

Fortunately this is an easy layout to program, it’s a vertical layout where we’re stacking three layout elements. The first two layout elements will be collections of widgets arranged horizontally which is called a Horizontal Box denoted HBox and the final element is the chart.

Let’s create our two slider groups.

# This slider works with floating point numbers (hence being called Float Slider) and 
# allows us to set a variable with this slider. This is going to be the way we set the 
# infection rate.
ir = widgets.FloatSlider(
                value=1.187, # this is the initial value of our slider when it appears
                min=0.0,     # the minimum value we'll allow
                max=5.0,     # the maximum value we'll allow
                step=.001,   # by what increments the slider will change when we move it
                description='Infection_rate:', # the name of the slider
                continuous_update=False # Will this slider wait until it stops moving to 
                                        # call it's update function or does it call the 
                                        # update function immediately?
)

rr = widgets.FloatSlider(
                value=.46,
                min=0.1,     # this is set to greater than 0 because this is the denominator in the R0 calculation
                max=2.0,
                step=.01,
                description='Removal_Rate:',
                continuous_update=False
)


ii = widgets.IntSlider(
                value=1,
                min=1,
                max=50,
                step=1,
                description='Initially Infected:',
                continuous_update=False
)

ip = widgets.IntSlider(
                value=1000,
                min=500,
                max=10_000_000,
                step=500,
                description='Initial Population:',
                continuous_update=False
)


iday = widgets.IntSlider(
                value=15,
                min=1,
                max=500,
                step=1,
                description='Day of intervention (reducing infection rate):',
                continuous_update=False
)


first_slider_group = widgets.HBox(children=[ir, rr])
second_slider_group = widgets.HBox(children=[ii, ip, iday])

The the top part of our interactive chart built, let’s build the chart.


# First, we use the method created above to calculate a model using the initial
# values of the sliders we just created. Given that at this point we haven't 
# displayed the sliders yet, their values will be the default values we set above.

data = calculate_values_SIR(  removal_rate = rr.value
                            , infection_rate = ir.value
                            , population = ip.value
                            , days = 150
                            , initial_infection  = ii.value
                            , intervention_day = iday.value
                        )

# Next we add all the data traces to the chart

infected_trace =  go.Bar(x = list(range(1,len(data['infected'])))
              ,y = data['infected']
              , name='Infected'
              , marker = dict(color='red')
              )

susceptible_trace = go.Bar(x = list(range(1, len(data['susceptible'])))
              , y = data['susceptible']
              , name='Susceptible'
              , marker = dict(color='rgba(0,0,255,0.5)')
              , opacity=0.5
              )

removed_trace = go.Bar(x =list(range(1, len(data['removed'])))
              ,y = data['removed']
              , name='Removed'
              , marker = dict(color='rgba(0,128,0,0.5)')
              , opacity=0.5)

# This trace is interesting as it's basically just drawing a straight line on the 
# selected intervention day.

intervention_day = go.Scatter(x = [iday.value, iday.value]
              ,y = [0, ip.value]
              , name='Intervention day'
              , marker = dict(color='darkblue')
              , line = dict(width=5)
  )

# We create our figure adding all the traces we created to the data list, and setting some layout values in the layout parameter.
g = go.FigureWidget(data=[ infected_trace, removed_trace, susceptible_trace, intervention_day ],
                    layout=go.Layout(
                         title={
                                'text': f'R0 = {ir.value / rr.value} <br /> Post-Intervention R0: {ir.value/ 2* rr.value} <br />Infection_rate={ir.value} Removal_rate={rr.value}',
                                'y':.95,
                                'x':0.5,
                                'xanchor': 'center',
                                'yanchor': 'top'}
                        ,barmode='stack'
                        ,hovermode='x'
                        ,height=900
                        ,xaxis=dict(title='Number of Days')
                        ,yaxis=dict(title='Number of People')
                    ))

# This is to update the x-axis range to show only the days where we have cases
g.update_xaxes(range=[0,np.where(data['infected']==0)[0][0]])

x=1

With all of the elements of our quick analysis dashboard created, let’s link them up so that the buttons actually do something.


# This method will be called any time one of the sliders is modified. It will re-run our model calculation
# with the new values and update the data for the 4 traces we added to the figure. 
def response(change):

    num_days=150
    
    # recalculate the model using the new values defined by the sliders
    pop_values = calculate_values_SIR(removal_rate = rr.value
                                      , infection_rate=ir.value
                                     , initial_infection=ii.value
                                      , population=ip.value
                                      , days=num_days
                                     , intervention_day = iday.value)

    # Try to find the first day where we have no more infections, 
    # if that fails and we get an error, use the maximum number of 
    # days for which we've calculated the model. We use this later 
    # to update the x-axis range to keep our curve centered.
    try:
        end_infection = np.where(pop_values['infected']==0)[0][0]
    except IndexError:
        end_infection = num_days
    
    # plotly updates much faster if you send all the updates in one go, 
    # this construct allows us to open an update session on our chart
    # and when we leave it, it will send all the updates to the figure 
    # at once.
    with g.batch_update():
        # update the y-axis values from the model
        g.data[0].y = pop_values['infected']
        g.data[1].y = pop_values['removed']
        g.data[2].y = pop_values['susceptible']
        
        # update the x-axis values 
        g.data[0].x = list(range(1,num_days))
        g.data[1].x = list(range(1,num_days))
        g.data[2].x = list(range(1,num_days))
        
        # Add the intervention day line
        g.data[3].y = [0         , ip.value]
        g.data[3].x = [iday.value, iday.value]
        
        # update the title to show the R values as well as the infection rate and the removal rate
        g.layout.title = {
                        'text': f'R0 = {ir.value / rr.value} <br /> Post-Intervention R0: {ir.value/ 2* rr.value} <br /> Infection_rate={ir.value} Removal_rate={rr.value}',
                        'x':0.5,
                        'xanchor': 'center',
                        'yanchor': 'top'}
        
        # change the x-axis range to keep the infection curve in scope
        g.layout.xaxis = dict(range=[0,end_infection])
        
        
        
# Update each of the widgets and register our update method as the method to 
# call when they change.
ir.observe(response, names='value')
rr.observe(response, names='value')
ii.observe(response, names='value')
ip.observe(response, names='value')
iday.observe(response, names='value')

# put the widgets, and our chart together into our layout
widgets.VBox([first_slider_group, second_slider_group, g])

Challenge 1: Add NYC data to the visualization

# here's some code to get you started.
df = pd.read_csv('us_jhu_data/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv')
df = df[(df['Province_State'] == 'New York')]  ## Fix a bug here to make sure you select only NY
df = df.melt(id_vars=['UID', 'iso2', 'iso3', 'code3', 'FIPS', 'Admin2', 'Province_State',
       'Country_Region', 'Lat', 'Long_', 'Combined_Key']
        , var_name='day'
        , value_name='confirmed_cases')

# Convert the loaded day to a date object
df['day'] = pd.to_datetime(df['day'])
df['Admin2'].unique()
array(['Albany', 'Allegany', 'Bronx', 'Broome', 'Cattaraugus', 'Cayuga',
       'Chautauqua', 'Chemung', 'Chenango', 'Clinton', 'Columbia',
       'Cortland', 'Delaware', 'Dutchess', 'Erie', 'Essex', 'Franklin',
       'Fulton', 'Genesee', 'Greene', 'Hamilton', 'Herkimer', 'Jefferson',
       'Kings', 'Lewis', 'Livingston', 'Madison', 'Monroe', 'Montgomery',
       'Nassau', 'New York', 'Niagara', 'Oneida', 'Onondaga', 'Ontario',
       'Orange', 'Orleans', 'Oswego', 'Otsego', 'Putnam', 'Queens',
       'Rensselaer', 'Richmond', 'Rockland', 'St. Lawrence', 'Saratoga',
       'Schenectady', 'Schoharie', 'Schuyler', 'Seneca', 'Steuben',
       'Suffolk', 'Sullivan', 'Tioga', 'Tompkins', 'Ulster', 'Warren',
       'Washington', 'Wayne', 'Westchester', 'Wyoming', 'Yates',
       'Out of NY', 'Unassigned'], dtype=object)
df = df[df['Admin2'].isin(['Bronx','Kings','New York','Queens','Richmond',])]
# collect all the confirmed cases for NY per day using groupby and agg
# after that replace the many days with 0 infected with NA so that we 
# can drop those rows with dropna
# We reset the index twice as a shorcut to quickly and easily get a sequential
# number representing the days from first to last because the day column
# became an index durign the group by and indexes in pandas are 'usually' sorted
df = df.groupby('day')\
       .agg({'confirmed_cases': 'sum'})\
       .replace(0, np.nan)\
       .dropna()\
       .reset_index()\
       .reset_index()

# Increment the index by 1 to have the first case start on the first day
df['index'] = df['index'] + 1

# Add the NY data to the plot with the add trace method.
g.add_trace(go.Scattergl(  x= df['index']
                         , y= df['confirmed_cases']
                         , name='NYC Cases'))

Challenge 2: Export NYC Data and predicted cases to an Excel

# Hint: you may need to play around with some numbers in the interactive sheet and then use the model method to get the data you need before you can start writing out to Excel.

# From the previous challenge we have the actual data, we can pull the data we generated by playing with the sliders
# from the graph by using the same methods we used to set the data in our update method:

#     g.data[0].y = pop_values['infected']
#     g.data[1].y = pop_values['removed']
#     g.data[2].y = pop_values['susceptible']

df['predicted_cases'] = g.data[0].y[:len(df)]
df['predicted_removed_cases'] = g.data[1].y[:len(df)]
df['predicted_susceptible'] = g.data[2].y[:len(df)]

df[['day'
    , 'confirmed_cases'
    , 'predicted_cases'
    , 'predicted_removed_cases'
    , 'predicted_susceptible']].to_excel('solution1.xlsx')

Challenge 3 (hard): Use the supplied write_to_sheet method below to write the NYC Data compared to the predicted data to a templated excel sheet with a coversheet.

def write_to_sheet(excel_file, in_scope_df
                   , sheet_name='1'
                   , title='Lorem'
                   , new_file=True
                   , new_sheet=True
                   , include_index=True):
    from openpyxl import Workbook
    from openpyxl import load_workbook
    from openpyxl.worksheet.table import Table, TableStyleInfo
    
    from openpyxl.utils.dataframe import dataframe_to_rows

    # Load template
    if new_file:
        wb = load_workbook('AR_pyTemplate.xlsx')
    else:
        wb = load_workbook(excel_file)
    
    if new_sheet:
        ws = wb.copy_worksheet(wb['basic_sheet'])
        ws.title = sheet_name
    else:
        try:
            ws = wb[sheet_name]
        except Exception as e:
            print("Exception: ", e)
            return

    # Convert the data frame to string
    for col in in_scope_df.select_dtypes(include=['datetime64']).columns:
        in_scope_df[col] = in_scope_df[col].astype('str')
    

    # set the title of the sheet
    ws['A1'] = title
    
    # add column headings. NB. these must be strings
    ws.append(in_scope_df.columns.tolist())
    
    # Add the data rows
    for idx, row in in_scope_df.iterrows():
        ws.append(row.tolist())

    def get_excel_col(number):
        cols = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        second_letter = (number % len(cols))-1
        first_letter = (number // len(cols))-1
        if first_letter  == -1:
            return cols[second_letter]
        else:
            return cols[first_letter] + cols[second_letter]
    
    # Create range to create table
    ref = "A2:" + get_excel_col(len(in_scope_df.columns)) + str(len(in_scope_df)+2)

    # Create the table
    tab = Table(displayName=sheet_name, ref=ref )

    # Add a default style with striped rows 
    style = TableStyleInfo(name="TableStyleLight12"
                           , showFirstColumn=False
                           , showLastColumn=False
                           , showRowStripes=True
                           , showColumnStripes=False)
    
    # Apply table style
    tab.tableStyleInfo = style
    
    # Apply the table to the sheet
    ws.add_table(tab)
    
    #
    wb.save(excel_file)
write_to_sheet("solution3.xlsx"
                   , df[['day'
                        , 'confirmed_cases'
                        , 'predicted_cases'
                        , 'predicted_removed_cases'
                        , 'predicted_susceptible']]
                   , title='case_prediction')
/Users/jvanderhoeven/miniconda/envs/analysis-101/lib/python3.7/site-packages/ipykernel_launcher.py:31: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy