A First Program: Straight Line Fitting
Introduction
So far, we have been executing single-line commands and examining the output of each command. This week is concerned with writing and using computer programs. A computer program is a list of commands that can be executed by a computer to perform a specific task. You will learn how to save and run programs and how to add comments that explain how the program works.
The program you will write will be a useful one. It will allow you to plot and analyze data, by calculating the best fit straight line (y = mx+c) fit, and associated errors, i.e. it will determine best fit values of m and c and best estimates of dm and dc.
In writing the program you will use skills learned in the earlier weeks particularly those associated with arrays, plotting and polynomials.
Syntax Summary
Comments
Function | Syntax |
---|---|
Comment out one line | # comment |
Comment out a block of lines | """ comment comment """ |
Example of a well presented code
Below is an example of a well presented code that can be easily read, understood, used and altered by other programmers. Note the formatting of the script into blocks with comments to explain what each block of code is doing.
'''
Created on 20 Sep 2020
@author: bob
Example Code: X-Y Fit
'''
# Data from experiment entered in arrays
x_array = array( [0, 1, 2, 3, 4, 5], dtype=float )
y_array = array( [2, 3.6, 4, 5, 6.7, 7.45] )
# Plot data points
plot( x_array, y_array, 'kx', markersize=10, label='data' )
# Find and plot 1st order line of best fit
p_coeff = polyfit( x_array, y_array, 1 )
p = poly1d( p_coeff )
x = linspace( 0, 5, 100 )
plot( x, p(x), label='Best Fit Line' )
# Add titles/labels and a legend to the graph
title( 'Best Fit of Experimental Data' )
xlabel( 'Width (m)' )
ylabel( 'Height (m)' )
legend( loc='best' )
# Format the graph to be easily readable
grid( )
xlim( 0, 6 )
ylim( 0, 8 )
Good code should have a description, authors name and date of last editing, as well as being well spaced to be easy to read and having appropriate comments to show what each part of the code is doing.
Worksheet
Setting up - Folder Organisation
When started, the Jupyter Notebook App can access only files within its start-up folder (including any sub-folder). If you want to change the default start-up folder, you can do so by following these instructions.
Any time you create a new Notebook, it will be saved in the same start-up folder and as a result after several weeks you will end up with a large number of files making it difficult to navigate through them. For this reason, it is always a good idea (and you are strongly encouraged) to keep your files organised in a logical way so that you can find them later.
Before starting you program or project, create a new folder that will contain your program, any input data files and the output files (eg, graph images, results text files) your program might produce.
Create a folder for your work
If you haven't done so already, create a new folder called PX1224 which will contain all of the code produced for this course. You can create a folder by clicking the "New" button in Jupyter's Dashboard and selecting "Folder"

Rename the new folder
The new folder will be named "Untitled Folder" so you will have to rename it to something meaningful, like Week 1. In order to rename the folder, select the folder you want to rename by clicking on the tick box right next to the folder's name and then click on the "Rename" button at the top of the files list. Change the folder's name to PX1224

Final folder structure
Once you created the PX1224 folder, double click on that folder to open it. Once inside the PX1224 folder, repeat the process to create another folder called "Week 1". The final folder structure should look something like in the following image. You can now create a new Notebook by clicking the "New" button.

Writing Scripts/Programs
A first program: fitting straight lines to data
In the following sections a program will be developed and tested that allows data to be input (by hand), plots the data and fits the best straight line to it. You will be taken through all the stages of writing and running a program. At the end, you will have written a useful (if rather crude) program for use in the 1st and 2nd year laboratories and beyond. Here, data from a Hooke’s Law experiment will be used but this can be easily overwritten with new data.
Hooke's Law data
The experiment is one in which the length of a spring is measured when different masses are suspended from it. If Hooke’s law is obeyed the extension of the spring will be proportional to the force applied to it. The results are given below:
- mass 0 kg; length 0.055 m
- mass 0.1 kg; length 0.074 m
- mass 0.2 kg, length 0.089 m
- mass 0.4 kg, length 0.124 m
- mass 0.5 kg, length 0.135 m
- mass 0.6 kg, length 0.181 m
- mass 0.8 kg, length 0.193 m
A graph of length versus mass is a reasonable straight line. Following the mathematics described in students’ physics laboratory books:
- the gradient (m) and its error are (0.180491329 +/- 0.013838801) m/kg
- the y intercept (c) and its error are (0.054531792 +/- 0.006320128) m
The above values will be used later to test any Python code that is developed.
(If given as the result of an experiment then a consideration of significant figures leads to m = 0.18+/-0.01 m/kg and c = 0.055 +/- 0.006 m. However, the full values are useful for you to test the validity of your code.)
Inputting and plotting data
Using what you have learnt previously, do the following in a single cell
- Make two arrays, one containing the masses (call this x_data) and the second containing the lengths (call this y_data). Check in the variable explorer that you have done this correctly.
- Make a plot of length in metres vs mass in kilograms. Choose a marker for the points but do not join the points with a line (as a fit line will be added later).
- Label the axes of the plot.
You should have a plot that looks something like this:

Figure 1. Length of spring versus mass for a Hooke's Law experiment.
Adding comments
When you are writing code, it is vital to add notes about what you are doing, to make it easier to read. In computer code these notes are known as comments. Comments are used to explain the code; what it’s trying to do and how it’s trying to do it (in other words to understand the code). As in a lab diary, aim to include comments that will make sense to another person. In reality, comments will be most useful for you when you come back to a program days or months later and try to remember what it does.
There are two ways to add comments to a python program.
- Use of the hash symbol, #
Anything that follows and is on the same line as a # is classed as a comment, and will not be read by the computer.
You can comment out an entire line, for example:
or just the end of a linex = 1 # Python will ignore this line of code, so you can write anything. y = x + 5
You can even write bits of code after a # symbol and they will not be processed (although you would rarely want to do this), soy = 2*x # Python will read the expression for y, but not these words
will return a value of 1, not 40000045. However, the # only applies to things appearing after it and on the same line.x = 1 # x = 40000045 print(x)
and give the error SyntaxError: invalid syntax.x=2 y=3*x # Python will not read this But it will read this, as no hash precedes it on this line!
- Use of 3 quotation marks """ To comment out blocks of text, spanning several lines, use three quotation marks at the start and end of the block of text. For example,
x=1
"""Python will completely
ignore this block of text. x=5"""
print(x)
This code will output the value 1.
The ability to comment out a block of code can be useful when developing a program. If the code isn't working quite as you expected, then you can comment out one (or a few lines) of code and try running again. This can be a useful way to track down the error.
Now add comments to your straight line graph program and re-save it (with a new version number).
Hint: at this point take a look at the example code in section 1.
Comment in
- a title for the program
- a description of what it does
- the purpose of each section. There are two at present; inputting data in to arrays and plotting the data.
Fitting polynomials to data
The function polyfit(x,y,order) is used to fit polynomials to a set of data. We used this last week to fit polynomials to a sine curve. The x here refers to an array of x values, y to an array of y values and order to the required order of the polynomial used in the fit. The x and y arrays must be of the same length. The function uses the mathematics described in the physics laboratory books to determine the coefficients of the best fit polynomial. Reminder: To fit data with a straight line graph (y = mx+c) requires an order 1 (first order) polynomial.
Fitting the Hooke’s law data
Expand your straight line graph program do the following:
- Check that the x_data and y_data arrays contain the correct data as given in section 3.2.2.
-
Determine the best fit 1st degree polynomial coefficients by typing
p_coeff = polyfit( x_data, y_data, 1 )
- Print p_coeff to see that it is the array “[ 0.18049133, 0.05453179]” in agreement with the expected values given in section 3.2.2. The first term is the best fit gradient, m and the second the best fit intercept, c. Define variables m and c that contain the values for the slope and intercept respectively.
- Determine the best fit polynomial using the coefficients as
p = poly1d( p_coeff )
-
We want to plot a best fit line with the data. To do this we need an appropriate array of x values. We could simply type in “x=linspace( 0, 0.9, 100 )”, however the aim is to incorporate this code into a useful program to analyze any straight line data set. So instead, we will create an array that runs from the minimum to maximum values of the x data:
x=linspace( min(x_data), max(x_data), 100 ) plot( x, p(x) )
-
If you don’t already have it, plot the data so that data and fit are on the same graph (it should look something like the graph below.
Figure 3: Graph of Hooke's Law data with best fit line.
Finding the errors in slope and intercept
It should be apparent from the teaching laboratory that being able obtain best fit values for gradient and intercept without their associated errors is almost useless. Therefore error calculations are required in the program being developed this week. Formulae for standard errors in m and c are given in the laboratory book and one approach would be to code in all the summations required (not a bad exercise in computing). Instead we’ll exploit the inbuilt functionality of Python and use some of the statistics functions that we learned last week.
From your lab book the equations for errors in m and c are given by:
where:
$$D = \Sigma x_i^2 -\frac{1}{n}(\Sigma x_i)^2$$Where, n is the number of data points, x̅ is the average x-value and di is the residual for a data point (di = yi - mxi - c, the difference between the y data value and the corresponding value of best fit). ∑i di2 is the sum of the squares of the di values. In order to calculate the errors, we first need to work out these four quantities (D, n, x̅ and ∑i di2). As before, we will work these out in the console and then transfer the code to the editor.
Determine n
The number of data points, n is equal to the number of elements in the x_data (or y_data) arrays.
n = len(x_data)
In this case, n should be equal to 7.
Calculate the value of D
D = sum(x_data**2) - 1./n * sum(x_data)**2
For the Hookes Law data given this should return a value of 0.494285714286
Calculate the value of xbar
x̅ is just the average value of x_data. You can calculate it as
x_bar = sum(x_data) / n
# or
x_bar = mean(x_data)
Determine Σdi2
This is hidden a little bit, but can be easily determined using the “polyfit” command.
p_coeff = polyfit(x_data, y_data, 1)
This returns the coefficients of the polynomial fit to the data in an array.
If you look at the online documentation for polyfit()
, you will see that there are additional options that can be set when calling this function, rcond
and full
. So far, when using this function, we have not set these and they have taken their default values, rcond=None
and full=False
. These are the assumed values if you don't set a value. It is quite common for the functions we use to have a number of additional options which are set to default values unless over-written. We have seen this already with arange: if you don't specify the start and step they are assumed to be 0 and 1 respectively.
We will not be interested in the rcond option for polyfit()
, but by setting full=True
the function returns more information - what is returned now has five components: the coefficients (in an array) plus in order: (2)residuals (array) , (3) rank (number), (4) singular_values (array), (5) rcond (number).
As explained in the function's documentation, “residuals” here is actually the sum of the squares of the residuals, i.e. ∑i di2 exactly what is required. The value in this case is 0.00047331.
So, from the “polyfit” function it is possible to extract both “p_coeff” as we have done previously and “residuals”. Assigning the labels “p_coeff” and “residuals” to the multiple output of polyfit(x_data, y_data, 1, full=True)
is not something we’ve done before. Remembering that there are now 5 outputs the way to do it is as follows:
p_coeff, residuals, _, _, _ = polyfit(x_data, y_data, 1, full=True)
This stores the coefficients and residuals as named variables and the underscore ( _ ) is used as a dummy variable to hold the other returned values overwriting each other (we don’t care about these).
Print “p_coeff” and “residuals” to confirm that this has worked correctly.
Finding the errors in m and c
All that is left is to bring everything together. Still in the console, the squares of the errors in m and c can be found as follows
dm_squared = 1./(n-2)*residuals/D
dc_squared = 1./(n-2)*(D/n + x_bar**2)*residuals/D
Finally,
- Write code to define errors in m and c as “dm” and “dc”.
- Print these values and check that they are correct. (The values given at the start of the worksheet were 0.013838801 m/kg and 0.006320128 m for dm and dc respectively). Note: The code snippet above, calculates the square of the error.
Printing out the best fit values and errors
We have now calculated the best fit slope and intercept as well as their errors. We would like to print out these values to the console. As well as printing out the values, it is useful to print out a description of what they are, for example "best fit slope = 0.18". Here, we will introduce the basics of how to do this. Next week, we cover input and output in much more detail.
We have been using the print()
command to print out numbers, arrays, etc for a while now. It can also be used to print out text by simply typing
print( "hello world" )
By default, each time we call print()
, it will print the output on a new line. If you want to print two things on the same line, it can be done easily enough by putting a comma after the first print command. So
print( "The slope of the best fit line is " ) ,
print( m )
Completing the straight line fitting program
Make sure:
- You have a completed program that is tested
- Your program works correctly and is suitable for future use
- Your program is properly commented
- Your program is saved with a suitable name and in a suitable location.
The program is still rather crude - editing the program to input new data and change axis labels isn’t exactly elegant. As you learn more programming you will want to improve it. We will return to this code in coming weeks and add new features as we learn them.
Checking that your program is complete
It is very easy to accidentally leave something out of your program. For example, if you have defined a variable in a different cell, which you have since deleted, you might not notice that you haven't defined it in your program. In order to check that your program has everything required, it is useful to start afresh by restarting the Python Kernel by clicking on the restart icon as shown below, and running your code again.

Excercises
For each question please create code contained a single Code Cell. You can use markdown headings above each cell, to specify which question each cell corresponds to. Make sure your Notebook is saved in an appropriate folder on OneDrive as you will be asked to modify this code in the coming weeks.
1. Straight line fitting: Hooke's law [3 marks]
Write a code that fits a best fit line to the Hooke's law data given in section 3.2.2. The code should give the best fit line with uncertainties on the slope and intercept and plot a graph showing the data and best fit line. Ensure that the code is well commented.
2. Straight line fitting: Current vs Amplitude [1 mark]
An experiment is performed to measure the resistance of a component. The voltage and current are measured:
Voltage (V) | Current (mA) |
0.0 | 2 |
1.0 | 104 |
2.0 | 212 |
3.0 | 302 |
4.0 | 398 |
5.0 | 507 |
6.0 | 606 |
7.0 | 692 |
Copy of your code Hooke's Law program into a new Code Cell and then:
- Enter the above data into your program in place of the Hooke's Law data and update the answers and labels on the graph to have the correct names and units.
- Run your program to calculate the gradient and intercept with errors.
- Calculate the resistance and the error in the resistance.
- The component tested was marked as a 10 Ω resistor. Are the experimental results consistent with this?
Note: that the plot must be made with voltage on the x-axis as it’s the quantity we control. Marks will be deducted otherwise – the final error must be propagated from that plot.
3. Fitting exponential decay [1 mark]
A radioactive source of Lead-214 was analysed and the number of radioactive decays in the last 10 seconds of each minute were recorded. The data are given in the table below.
Time (minutes) | Counts(in 10 second interval) |
1 | 20 |
2 | 16 |
3 | 13 |
4 | 10 |
5 | 8 |
6 | 7 |
7 | 5 |
8 | 4 |
9 | 3 |
The radioactivity equation is N(t) = N0 e-λt where N(t) is the number of radioactive nuclei at time t, N0 is the number of radioactive nuclei at time t = 0 and λ is the decay constant. The activity (counts) of the sample at a time t is directly proportional to the number of radioactive nuclei at time t. So, the number of observed counts as a function of t should satisfy a similar equation.
Use this to calculate the decay constant λ by
- Entering this data into your straight line fit program
- Manipulating the data arrays so that the data should lie on a straight line
- Calculate the gradient and intercept of the best fit line, and the associated uncertainties.
- Determine the decay constant λ, quote your result along with the correposnding uncertainy.