Introduction to Python for Meteorology¶
Python is a powerful C-based scripting language that allows us to rapidly work with large datasets, which are common across various scientific fields. As meteorology advances more and more towards higher resolution datasets, the need to process these data are becoming a more important skill-set that students should learn.
The remaining labs in this course will be devoted to learning how to use python for basic meteorology based applications.
Anaconda Jupyter Notebook¶
The tool by which we'll run python scripts at the moment will be the jupyter notebook application (You're looking at it!)
Jupyter Notebook allows us to mix together a notebook (words) and code (python) together to allow us to easily demonstrate concepts and to document processes by which we can execute scripts.
Download Anaconda Here: https://www.anaconda.com/download/
Let's start! When you click on a cell, you'll see on the left side of the screen the boundary will highlight with either a green color or a blue color. The green color indicates the block is of markdown format, which is text, or where we can explain things. Blue color indicates python code, which can be executed by clicking the play button or by holding the shift key and pressing enter on it. The astricks symbol indicates code is still running, a blank cell refers to unran, and a number indicates the block has been executed.
To edit a code cell, click in the body once. To edit a markdown cell, click in the body twice. Pressing Shift+Enter or clicking the run button will save the changes on a markdown cell.
Let's introduce python now so you can get a feel for it.
Python Syntax: Introduction¶
If this is the first time you have ever seen computer programming, there are some things that may come as a challenge for you to learn how to master, if you've programmed in any way before, then these topics will come relatively easily to you.
First and foremost, variables. A Variable is an object stored in memory which can store information on your computer. Variables are therefore any container of data. They come in various forms, from characters, integers, floating point numbers, booleans, and then lists / arrays of these basic data types.
Let's look at a list of variables, and how these are defined.
a = 1
b = 1.2
c = 'c'
d = True
e = "Hello World"
First and foremost, you'll notice how the text block in this notebook appears different. Along the top bar of your screen, you'll see when you click a cell that it is labeled as "Code", or "Markdown". "Code" is python code that you can execute by holding shift and pressing enter (Or by pushing the run code button above), while markdown is just a fancy way to say text, or basically non-code information that I'm sharing with you.
Jupyter executes statements in the order you run them, so when you see In [1]: along the side, that code block was executed first, In [2]: would be next, and so on.
Let's get back to variables, you can see that to define a variable, we give it some kind of name, followed by the equals sign, and then whatever we're going to store inside of the variable. a is an integer (A non-decimal number), b is a floating point number (decimal number), c is a character (Note the quotation marks), d is a boolean (True or False), and e is a string (A sequence of characters).
You can perform basic arithmetic on variables of similar data types. So for instance, you could add two integers together, an integer and a float (numbers), or a character and a string, but you can't mix and match other types.
Python supports all of the basic artimetic operations:
+ is Addition
- is Subtraction
* is Multiplication
/ is Division
** > Power (You can use ** 0.5 for square root)
Parenthesis will allow you to order operations in your manner, while the remaining code follows the standard order of operations.
For example:
a + a
a + b
e + c
# NEITHER OF THESE TWO WILL WORK
a + c
b + c
So what happened with that last one there? The computer threw a TypeError at you, this is an error, or basically when your code is invalid. There are two types of errors in programming. The first of which is a syntax error, which occurs when you type something into the computer that is invalid, or cannot be completed. The second type of error is a logic error, which occurs when the computer performs something you do not want to happen.
An example of a logic error would be if we wanted to perform (b * b) / 2 + a, but wrote it as:
b * b / 2 + a
(b*b) / (2 + a)
See what happened there? The order of operations was incorrectly specified there, which caused your code to execute in the wrong order. Except it didn't. You wrote it incorrectly, the computer executed the instruction exactly as you specified it to happen. This is what we refer to as a logic error.
Learning how to interpret and debug these two types of errors will become fundamental to your programming experience.
Python Syntax: Printing¶
The next bit is a rather short block of information on how to report back information. In the above, you'll notice that running a basic arithmetic statement will report the value back to your computer when you run the code. But what if we'd like to combine multiple statements together, or print something else, this is where the print statement comes into play.
a = 10
b = 20
c = 25
print("Hello World!")
print("a + b is: ")
print(a+b)
# But what if we want to combine it all on one line?
print("a + b is: " + str(a+b))
# Try it! Write a statement to print a * c to the console.
print("a * c is: ")
You can also calculate your results first, and then print them if it comes a bit easier to you:
result = a * c
print("a * c is: " + str(result))
Python Syntax: Data Types¶
So what's with that fancy str() statement? The print command works just like your arithmetic in the way that it must use a consistent data type across the statement. Because the default behavior is to treat it as a string, we can type cast our resulting answer to a string such that the print statement will accept it.
Even though variables in Python are not type specific, the fundamental data types are still present in python: Integer, Float, Boolean, Character, and String (Non-Fundamental)
You can check for the type of a variable with the type statement. As we redefined all three variables above as integers, let's see what Python now says they are:
print(type(a))
print(type(b))
print(type(c))
If we wanted to perform division on these numbers, say c / a, we'd expect the result to read 2.5 in the console, however if you perform this division:
# Get into the habbit of using print to output stuff!
print(c/a)
So what happened here? Well, because both of the variables were of type integer, the computer assumed you wanted an integer result, therefore it truncated the decimal value off to give you an integer. Even if you got a result of 2.999999999, the integer division would make your number 2, so this behavior is mostly undesired.
So, how do we fix this problem? Easy, type cast the two integer variables to floating point numbers
print(float(c) / float(a))
There are calls for each of the fundamental data types in python for type casting (Except character), the below block of code shows each in action:
a = int(10)
b = float(2.3)
c = bool(True)
e = str("Hello")
Python Syntax: Logic¶
Next up, we'll introduce the concept of logical statements in computer programming. A logical statement is commonly called a truth table in the means that it allows you to run blocks of code based on conditions.
This important concept is used to establish conditional blocks of programming, or code that executes when certain conditions are met. Let's look at a simple example and then we'll explain:
a = 10
b = 20
if a == 10:
print("a is 10")
else:
print("a is not 10")
if b != 20:
print("b is not 20")
else:
print("b is 20")
This is also an important time to introduce the notion of blocks of code for python. You'll notice at the end of my if statement, there is a colon (:), and then the next line is tabbed in one. This is a code block in python. Other languages such as C++ and Jave use curly braces ({ and }) to block code together, python on the other hand, uses tabs. For each bit of the code you would like to execute together, you'll need to place it on the same tab level, and then when you're done with the block, bring the tab level back to the original. You can see in my above example how the else statement is back on the zero tab level, denoting the end of the block of code.
Play with the values of a and b and run the code again, you'll see that when the values match the conditional statements, what is printed to the console will depend on that value. The if and else statements specify to your computer to run the code based if a specific condition is met.
The conditional operators for python are:
== (Is Equal To)
!= (Is Not Equal To)
> (Greater Than)
< (Less Than)
>= (Greater Than or Equal To)
<= (Less Than or Equal To)
You can compound conditional statements together using connective statements: And, Or, Not to chain together more complex logical operations that will check if multiple conditions are met.
Let's look at an example of one of these in action, along with another conditional test:
a = 10
b = 20
c = 25
if (a == 10 and b == 20 and c <= b):
print("a is 10, b is 20, and c is less than or equal to b")
elif (a < b and b < c):
print("The variables are in sequential order.")
else:
print("Neither of the above cases are met.")
So what happened here? We looked at the first statement and tested if all three of our conditions were true, since the last statement was false, the entire test returned false, so we moved to the elif (Short for else if) statement, we tested if the variables were in order numerically (a less than b, and b less than c) and this was true, so we printed the statement.
The last clause will execute only if neither statement is true. Play with the above numbers, and see how it all works out!
Sometimes in programming, you'll need to nest statements together to get more complex structures to work. It can be done, but we'll not worry about that right now!
Python Syntax: Loops¶
Next up is looping structures, or pieces of code that run multiple times. Loops, like conditional statements above will continue to run until their condition is false, at which point the loop ends.
There are two types of loops in python, the while loop which follows the structure of an if statement and will run until the specified condition is False, and the for loop, which is a counter based loop and will execute an exact number of times. Let's look at examples of each.
The while loop is not employed much in programming only due to the dangerous nature of potentially creating what is called an infinite loop where you can spool a specific function infinitely which will suck up all of your CPU and inevitably crash. But, there are some applications of the while loop that are particularly useful in programming:
currentValue = 10
while currentValue > 0:
# Remember, blocks of code need to be tabbed to the same level, both of the below
# statements are on tab level 1, indicating they are in the same block.
print(str(currentValue))
currentValue -= 1 # Try to guess what this line does before you run the code
# Once you run the code, piece together what happened here, how does each bit work together?
As I mentioned however, while loops are potentially dangerous due to the infinite loop problem. Someone new to python might for instance not get the notion of a block of code and move the currentValue -= 1 statement back to the original tab level. At this point, they have created an infinite loop because the loop will never reach this statement and your computer will continue to print The current value is: 10 endlessly to your screen. This is bad!
Therefore, we prefer the usage of what is called a for loop, or a counter oriented loop. To create the counter instance, we're going to employ the range statement, which will create a list of numbers between two values, here's the same example:
# range() is NON-INCLUSIVE on the second value, which means if you want 10, you need to add one to the desired value.
# This is because of how computers handle index values of arrays (Later)
print(range(1,11))
# You can count backwards too! More on that wierd [::-1] block soon.
print(range(1,11)[::-1])
So, now that you can create a list of numbers that count down from 10 to 1, let's go back and recreate our looping structure with a for loop this time:
for number in range(1,11)[::-1]:
print("The current value is: " + str(number))
What makes for loops superior to while loops is that they store the value they are currently on in a variable that you can directly call within the body of your looping structure. In the above case, we called it number, which means we don't need to create a counter instance, we can just call number directly and go!
There's a ton of things you can do with loops, and we'll talk more about that as we work more with python, but now let's introduce lists.
Python Syntax: Lists¶
When we analyze our data in meteorology, you'll be working with very large tables of information that usually contain things like lat/lon pairs along with temperature, pressure, or wind values for instance. Creating a new variable to store each would be vastly inefficient and consume way too much memory on your computer, so instead, what is employed is a data structure, or a container instance that holds your data.
A list is the first structure of data. A list can contain variables of all types stored together and can be iterated over (for loop) to perform rapid analysis. To define a list, you can use the list() statement, or the square bracket notation:
myFirstList = [10, 20, 30, 40, 50]
for item in myFirstList:
print("Pulled " + str(item) + " from the list.")
Lists are powerful instances that have all sorts of functioning associated with them. They are a mutable instance, which means the data are not fixed and can be modified after you create the instance. To access individual locations in a list, you use the index of the list. This is a concept that will be strange to some at first, so try to follow closely.
In memory, the first element in a list or array is assigned the index of zero. Therfore, if I want to grab the number 10 from my above list, I need to use the following:
print("Give me the 10!")
print(myFirstList[0])
Each next index then continues up by one value at a time, so the 20 would be in index 1, 30 in 2, 40 in 3, and 50 in 4.
Slicing¶
One of the most powerful tools of python is the concept of list slicing, which allows you to break up large lists into bits and pieces without needing complex C/C++ code. Slicing allows you to grab specific ranges of your data, or by specific intervals. Remember that wierd [::-1] block you saw earlier? Well, we were actually slicing the array backwards with this block to make the list reverse. So let's see how this all works:
# Get me everything after the 10
print(myFirstList[1:11])
# Get me everything up to the 40
print(myFirstList[:-1])
# Get me everything between 20 and 40
print(myFirstList[1:-1])
# Get me every other number
print(myFirstList[::2])
# Reverse the list
print(myFirstList[::-1])
# Reverse the list, and get me every other number
print(myFirstList[::-2])
An important lesson for python is that data are not uniquely saved unless explicitly specified, so if I create a list, try to copy it to another variable, and then work with the second variable, I will also modify the original because copies in python are done by reference. Here's what I mean:
myFirstList = [10, 20, 30, 40, 50]
print(myFirstList)
mySecondList = myFirstList
mySecondList.extend([60, 70, 80, 90])
print(mySecondList)
# OOPS!
print(myFirstList)
Oops! As you can see, we've unfortunately modified the original data now as well. In order to fix this for lists (and numpy arrays), we need to create a copy of the list instance to work with:
myFirstList = [10, 20, 30, 40, 50]
print(myFirstList)
# Instead of making a reference, define a new list.
mySecondList = list(myFirstList) # myFirstList[:] also works
mySecondList.extend([60, 70, 80, 90])
print(mySecondList)
# Much better!
print(myFirstList)
Lists are a powerful structure of data in python. Although we don't use it much for our applications, it's still something you should know how to work with.
For more information on python lists, the full reference is available here: https://docs.python.org/3/tutorial/datastructures.html
Python Syntax: Functions¶
In any computer programming language, you will want to get into the habbit of breaking up your work into functions, or bits of code that you will execute multiple times in the lifespan of an entire program. Functions allow you to break up your work and then call the function to execute the block of code again.
def printHello():
print("Hello World!")
print("Running my code!")
printHello()
Just like conditionals and loops, functions define a new code block, so don't forget to tab in for the block of code otherwise python will yell at you (error).
But, let's try doing something a bit more interesting, say adding some numbers up:
def addEmUp(a, b):
print("Now adding "+ str(a) + ", and " + str(b))
return a + b
What's that return statement? This is where the magic of functions come into play for python. Remember that all variables store values, the return statement in a function sends the value back from calling the function to whatever is calling it.
Go ahead and run the above code, but nothing happens... This is because you defined the function in memory, but now you need to call it. In the original code block, I did that via printHello(). Therefore, I'll do the same with addEmUp(), right? Careful, you'll notice that the definition of addEmUp above has two variables (a and b), so we must give our function two parameters to use for a and b:
a = 10
b = 20
addEmUp(a, b)
You aren't limited by name either!
addEmUp(25, 50)
But you are limited by the count:
#This code will not work, you must provide two parameters
addEmUp(1)
Let's do something a little more inventive:
a = 10
b = 20
result1 = addEmUp(a, b)
result2 = addEmUp(result1, b)
print(result1)
print(result2)
This time, we're showing the usage of the return statement. I'm assigning the output of the addEmUp function to a variable, and then using this number to run the function a second time. I then print the two values out to the console at the end of the function. This shows the power of the return statement and why functions are extremely useful.
Python Syntax: Variable Scope¶
The last stop on our quick intro to python is the important concept of variable scope. Variable Scope is the notion of where variables are allowed to be used relative to the code. So far, you've simply run the code and the variables are there and ready to be used, BUT! This is because you have been defining global variables, which by nature, are valid in all places in the code. Local variables however do exist, and losing track of your variables is a very common flaw beginning programmers fail to grasp and struggle with.
Let's use the following example to explain:
# Assume I have these functions
def functionA(a):
localA = 10 + a
print("functionA is holding localA as " + str(localA))
def functionB(b):
localB = b - 10
print("functionB is holding localB as " + str(localB))
# Now, localA is a local variable that is only valid within the body of functionA, just as localB is only valid in the
# body of functionB. When I create a third function to run this piece of code, I cannot access either of these two!
def functionC():
a = 10
b = 20
functionA(a)
functionB(b)
# INVALID! YOU CANNOT ACCESS LOCAL VARAIBLES FROM OTHER FUNCTIONS
c = localA + localB
functionC()
So see what happens at the end there? The two function execute, and then python errors out when you try to call localA and localB in functionC, because they do not exist in the body of functionC, EVEN THOUGH they have been defined in memory prior to the execution of functionC.
This is the flaw of variable scope that many novice programmers fail to capture. So, let's fix the above code so we can use it! There are two ways we can do it, and I'll run each and explain:
# FIX #1: Return the values from the functions
def functionA(a):
localA = 10 + a
print("functionA is holding localA as " + str(localA))
return localA
def functionB(b):
localB = b - 10
print("functionB is holding localB as " + str(localB))
return localB
# Now, localA is a local variable that is only valid within the body of functionA, just as localB is only valid in the
# body of functionB. When I create a third function to run this piece of code, I cannot access either of these two!
def functionC():
a = 10
b = 20
localA = functionA(a)
localB = functionB(b)
# This now works!
c = localA + localB
print(c)
functionC()
This above code works because now we can create a version of localA and localB that is local to functionC. Now even though the values are the same, these two variables we defined here are local to functionC, therefore a functionD cannot use it. functionB for instance could not see localA, and functionA could not see localB.
Using a return statement is the preferred way to address scope issues, but it's not the only way.
# FIX #2: Make the localA and localB variables global variables
localA = 0
localB = 0
def functionA(a):
localA = 10 + a
print("functionA is holding localA as " + str(localA))
def functionB(b):
localB = b - 10
print("functionB is holding localB as " + str(localB))
# Now, localA is a local variable that is only valid within the body of functionA, just as localB is only valid in the
# body of functionB. When I create a third function to run this piece of code, I cannot access either of these two!
def functionC():
a = 10
b = 20
functionA(a)
functionB(b)
# This also works! But does it?
c = localA + localB
print(c)
functionC()
Wait a minute... The code runs, but what happened? Well, what happened is why you don't usually use this method. When you call functionA, it defines a value of localA within it's own scope, however... it is still local to the function, so once it exits the body of the code, it's no longer saved.... You would need to call functionA and functionB at the global level to fix this, and because we want these calls to be a part of functionC() and not the global scope, this is not desired behavior.
Variable scope is an important concept that you'll come across many times in your programming career, especially early on. It's better to learn it now, and understand it sooner rather than later so you don't struggle down the road.
Python Packages¶
One of the challenges of writing code is doing something and then saving it to run the same tasks again. But, this process has been long thought by computer programmers and scientists in general to the point at which we have created "packages", which are libraries of functions that you can use to simplify tasks without needing to reinvent the wheel per say.
Looking for image processing tools? Don't need to worry about the heavy lifting, just load up an image package and run with it! Meteorology applications for the most part work the same way. We import a few packages and get going. Let's talk about some of the more common ones we'll use and then work out an example problem.
To install any of these packages, after installing Anaconda, run 'Anaconda Prompt' as administrator (right click the program then "run as administrator"). Then, in each section below, copy the command code into the prompt and press enter. The package will begin to download. About half way through it'll stop, ask you something that has "(Y/N)?" just press enter and it should continue to download.
If you are unsure if you have a package installed, in the prompt, type "pip list" and you should get a list of every package you already have installed.
NumPy¶
NumPy is probably one of the most commonly used packages for scintific data processing out there. It provides an adanced instance of a list called an array with all of the same list syntax and tools available and then some more!
conda install -c anaconda numpy
Matplotlib¶
Matplotlib, specifically the PyPlot module within this package is an API with MATLAB style processing and tools to generate plots of various types. If you need something plotted in python, this package will almost always be somewhere on your list.
conda install -c conda-forge matplotlib
Pandas¶
Think Excel Jr. In Python. If you have large tables of data you need to work with, or will be performing statistical analysis on data, this is one package you may consider using.
conda install -c anaconda pandas
SciPy.Stats¶
Speaking of statistics, running any kind of significance testing will involve one of many statistical testing tools out there, and the SciPy.Stats module allows you to run all forms of statistical tests on your data.
conda install -c anaconda scipy
Basemap¶
So you have all of your data, but you're going to find out that maps don't just magically appear because you want them to. Basemap creates an overlay of geographical charts based on latitudes, longitudes, and even projects these data for you!
conda install -c anaconda basemap
NetCDF4¶
Most meteorological data are stored in large binary files called NetCDF files, this package will allow you to load these data into python to perform analysis or run plotting on it. We'll run an example here in a bit!
conda install -c anaconda netcdf4
MetPy¶
An early development package with various meteorological toolsets available to plot various forms of analyses tools to make your isoplething lives so much easier (Wait until you see the contour statement in Matplotlib)
conda install -c unidata metpy
Bringing in a package in python is really easy. Install it from anaconda (conda install x) or the python installation package module (pip install x) and then call up the import statement in a python script and you're good to go!
# Note the capitalization on some of these, python is case-sensitive!
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import scipy.stats as stat
from netCDF4 import Dataset
from mpl_toolkits.basemap import Basemap
Once you run the above block of code, we now have access to these five packages to run in any script we'd like within the rest of our notebook.
So, let's use some of these!
# Create a pair or numpy arrays, one for x from 0 to 100, and y 100 random numbers from 0 to 10
x = np.arange(0, 100, 1) # Remember, what did we say about range?
y = np.random.randint(0, 10, 100)
print(x)
print(y)
Making Figures in Matplotlib¶
# That's cool... but let's do something even neater... how about a scatter plot of these?
plt.scatter(x,y)
plt.show()
# Or a dashed green line plot...
plt.plot(x,y,'g--')
plt.show()
# But wait, we're RESPONSIBLE scientists, We should make a title, and our axes should be labeled... right?
plt.figure() # Creates a new figure...
plt.plot(x,y,'r--') # What do you think 'r--' does?
plt.xlabel('x')
plt.ylabel('y')
plt.title('A Plot of Randomness!')
plt.show()
So as you can see, it's very easy to generate some very nice looking figures that you can use for your research papers or any kind of project you'll like to use... But let's say we'd like to embed that into a word document? Well, in order to do this, we'll need to save the figure. It's as easy as one line of code...
plt.figure()
plt.plot(x,y,'r--')
plt.xlabel('x')
plt.ylabel('y')
plt.title('A Plot of Randomness!')
# Call savefig before show()
plt.savefig("myPlot.png", dpi=600) # This saves your figure as myPlot.png at 600 DPI.
plt.show()
# Responsible coders close files when they're done with them.... Get into this good habit!
plt.close()
And now a personal favorite of mine... How to simplify isoplething 101... Let's try plotting a two dimensional field using contour analysis... To start, we'll need a two-dimensional data array...
So far, we've only used one dimensional lists of data, but you'll soon see that most scientific data is more like excel spreadsheets, where you have rows, columns, and even in some cases sheets (2D - 3D), fields such as pressure for instance use latitude, longitude, and height.. let's create a 2D random field and countour it.
xi = np.linspace(-5, 5, 1000)
yi = np.linspace(-5, 5, 1000)
XI, YI = np.meshgrid(xi, yi)
myFakeData = np.sqrt(XI**2 + YI**2)
print(myFakeData)
What we have created is a pair of 1000 x,y coordinates between -5 and 5 that then calculates the Z (height) as sqrt(x^2 + y^2), a classic math example of a sinkhole. We can then contour height as Z using matplotlib's contour function:
contours = plt.contour(myFakeData)
plt.clabel(contours, fontsize=10, inline=1,fmt = '%1.0f') # Responsible isoplething is labeled....
plt.show()
Now that we have the basic gist of things in matplotlib, lets work with some real data!
Try It Yourself!¶
Here are links to a Jupyter Notebook lab you may try on your own, test data for the lab, and the answers (just in case)! Enjoy!
Test Data: https://drive.google.com/open?id=1IEdT_qHEpRiGwtOxri7AVvPCsmvKoHfG
Lab (Blank): https://drive.google.com/open?id=1mT7r0VwlBYsdresEG3b7ZPwBS4XjFR1N
Lab (Answers): https://drive.google.com/open?id=1H9V4owrEqMfh1rbfVJwEBZIACzj_yqVN
Copy of Blog Post: https://drive.google.com/open?id=1lBFDSGPMj2NDEdVDW-7ytJ54AZF9biS5