Strings and Data Input/Output
Introduction
This week we will learn how to read in and print out data using python. These are vital skills when writing computing programs. To do this, we will first introduce strings (computer speak for text) and learn how they can be entered and manipulated. In previous weeks, we have entered data (e.g. from the Hooke's law example) directly into the console or a script. We have learned how to produce basic output using the print command. This week, we will explore both input and output in more detail, learning first how to enter and print more complex information in the console and then looking at input and output of data contained in files.
Useful Links
- String Methods - All the basic Python methods that can be used on strings
- Input and Output - Python input and output tutorial
- input() function - Details of the input() function
- Strings - Some more advanced string methods
Syntax Summary
Strings are one of the basic Python variable types. So, all of the string manipulation that follows is contained within python. Similarly, file opening and input/output is in python. However, once we come to input and output with arrays, this is (as usual) in NumPy along with all of the other array commands we have used.
Strings
Function | Syntax |
---|---|
Make a string variable s be all upper case | s.upper() |
Make a string variable s be all lower case | s.lower() |
Swap the case of each letter in a string s | s.swapcase() |
Combine two strings, s and t | s + t |
Split a string into words (separate at spaces) | s.split() |
Join a set of words, introducing a separator sep between them. | sep.join(words) |
Test if 'word' is in string S (returns Boolean) | "word" in s |
Count instances of letter or word l in s | s.count("l") |
Replace all instances of x in string s with y | s.replace("x", "y") |
Convert variable x of any data type to a string | str(x) |
Line break in string | "\n" |
Tab in string | "\t" |
String Formatting
Function | Syntax |
---|---|
print any variable inside a string | print( f"The value is {x}" ) |
print a float (with 2 decimal places) | print( f"The value is {x : .2f}" ) |
print a float using exponential format | print( f"The value is {x : .2e}" ) |
print a number, using exponential format if necessary | print( f"The value is {x : .2g}") |
Data Input
Function | Syntax |
---|---|
Print "prompt" to console and wait for string input | input("prompt") |
File Handling Methods
Function | Syntax |
---|---|
Open a file in writing mode and call it "f" | f = open("FileName.txt","w") |
Open a file in reading mode and call it "f" | f = open("FileName.txt","r") |
Write a line to a currently open file "f" | f.write("Text to write\n") |
Read the next unread line from a currently open file "f" | f.readline() |
Read an entire currently open file "f" | f.read() |
Close a currently open file "f" | f.close() |
Reading and writing numpy arrays
Function | Syntax |
---|---|
Read data from a file (with name "filename.txt") into an array a, and skip the first row | a = loadtxt("filename.txt", skiprows=1) |
Read data from a file object f into an array a | a = loadtxt(f) |
Save an array a into a file named "filename.txt" | savetxt("filename.txt", a) |
Save an array a into a file object f | savetxt(f, a) |
Stack two 1D arrays a and b into a 2D array c | c = column_stack(a,b) |
Worksheet
Strings
So far, most of the variables we have defined have been either integers or floats or arrays of integers and floats. Today, we will explore the use of strings of characters, particularly formatting and printing them. The classic first computer code is to print "Hello, World". In python, it's as easy as
print( "Hello, World" )
We have used print before to print numbers and arrays, now we are printing the string "Hello, World". You can also define a variable as a string, for example
s = "Hello, World"
print(s)
Note: You can enter strings enclosed in either single or double quotes, it doesn't make any difference. The only requirement is that you open and close with the same quotation mark. Choose the one you prefer and stick with it.
You can also force other variables to be strings (just like you did with integers and floats), for example
t = str(1)
p = str(pi)
a = str(arange(10))
Accessing strings
You can access a certain character in a string, just you can access entries in in array. For example
print(s[0]) #print the first letter
print(s[-1]) # print the last letter
print(s[:5]) #print the first 5 letters
A brief introduction to methods
The syntax that we will use in formatting strings is different than what we have seen before. Previously, we have used functions (e.g. sin(x)), now we will use something known as methods. Let's not worry too much about it for now. But note that we could repeat much of what we have done before using methods. For example
x = rand(10)
print(max(x)) # function max()
print(x.max()) #array method max()
gives two identical ways of finding the maximum of an array. At this stage, we won't worry about the difference between a function and a method, just note that a method explicitly acts on a given object (in this example the array x). It is not uncommon that the parentheses following the method are empty, although that is not always the case. If they are empty, it signifies that the method acts only on the object and doesn't need any extra arguments.
Upper/lower case
You can make a string entirely upper or lower case, or swap the case of every letter in the string. In the following, the methods will act on the string s="Hello, World"
that we defined earlier.
The syntax for making a string all upper or lower case or swapping the case is:
u = s.upper()
print(u)
l = s.lower()
print(l)
swap = s.swapcase()
print(swap)
Note: these operations do not change the original string s, but rather return a new string with different capitalization. Some methods do change the object they act on. We will see this later when dealing with files.
Splitting and Joining strings
Here, we will split the string "Hello, World" and then put it back together again. To split a string up into its component "words" do
words = s.split()
print(words)
The result is a list of two entries, "Hello," and "World". The default behaviour is to split at spaces and remove the space, but you can split (and remove) at any character. Try splitting on the letter "o" by typing
not_words = s.split('o')
print(not_words)
You can join strings using the + symbol. It just puts the content of the second string after the first, for example
j = words[0] + words[1]
print(j)
You'll notice that we didn't quite get back the string we started with since we lost the space. Instead we should do
j = words[0] + ' ' + words[1]
print(j)
You can also join the words together using the join()
method. The syntax is
sep = " "
j = sep.join(words)
This works by joining together all the elements of words, inserting the separator sep between them. The syntax can take a while to get used to, but it is quite powerful once you get the hang of it.
Find, Count and Replace
You can do other things with strings, including testing whether a certain string is contained in another one
print("Hel" in s)
print("Help" in s)
The first should return True, since "Hello, World" contains "Hel". The second should return False, since "Hello, World" does not contain the string "Help".
You can also count the number of times that something appears in your string:
print(s.count('l'))
print(s.count('ll'))
You can do a find and replace on a string using
p = s.replace("Hello,", "Physics")
print(p)
This does not change the original array s, but creates a new array p with "Hello," replaced by "Physics".
String Formatting
There are various ways of formatting a string to make it appear as you want.
New Lines and Tabs
One of the most basic ways is by adding new lines and tabs into the string. You can include special characters in the string to signify new lines \n
and tabs \t
. For example
s = "This is a string that \n is two lines long and \t contains a tab."
print(s)
[Out]This is a string that
is two lines long and contains a tab.
Printing Numbers
Often we will want to display the results of a calculation. Rather than just printing the raw result, it is preferable to give a description of the result before the value. For example, it would be better to say "The answer is 42" rather than just "42". We also want to give answers to a reasonable number of significant figures.
There are a few ways to print numbers.
-
As we saw last week, you can simply write two print statements (and make sure there's no new line between them).
This way is a bit cumbersome, and doesn't work well for non-integer results as there's no way to specify the number of significant figures to use.ans = 42 print("The answer is ") , print(ans)
-
We can force the answer to be a string and then print it
This is a bit more convenient than the first method, but again doesn't work well for non-integer results, as we can't specify how the number is printed.print("The answer is " + str(ans) )
-
We can tell python to expect an integer value at a given place in the string and then later specify the value. Try the following
Notice the f at the start of the string which signifies that the following string is of a special type (f-string) which allows the inclusion of python variables and expressions, when these are inside curly brackets { }print( f"The answer is {ans}")
The third method will probably take a bit more getting used to, but it is the most versatile method.
Significant Figures
Often, we will want to display a number with a fixed number of significant figures. To do this, we first need to specify that the number is printed as a float:
ans=42.00001
print( f"The answer is {ans}")
This will print the number as a float and will give six decimal places. We can specify the number of decimal places using:
print( f"The answer is {ans :.2f}")
print( f"The answer is {ans :.5f}")
The first will print the answer with 2 decimal places, and the second will use 5.
For example, we may wish to specify an answer that is a float, or restrict the number of decimal places to be displayed or require scientific notation. This is all possible by specifying a different variable type, for example:
a = 42
print(f"The answer is {a}") # prints an integer (or integer part of a float)
a = 42.00001
print(f"The answer is {a}") # prints a float, 6 decimal places
print(f"The answer is {a :.2f}") # prints a float, 2 decimal places
a = 42.1
print(f"The answer is {a :.3e}") # prints in scientific notation, 3 decimal places
a = 42
print(f"The answer is {a:g}") # prints a number, format depends on value
a = 42.01
print(f"The answer is {a:g}")
a = 42000000000.1
print(f"The answer is {a:g}")
a ="forty two"
print(f"The answer is {a}") # prints a string
You can also specify several things to fill in as in
a= 11.235
e = 0.004
print(f'The answer is {a :.3f}, the uncertainty is {e :.3f}')
Requesting an input
You can request user input using the input()
function.The input()
function just reads whatever the user types and returns it as a string. For example
name = input("Please tell me your name\n")
print(f"Greetings {name}")
if you want to enter a series of numbers, and create a NumPy array out of them, you can do:
x_data = input("Please enter your x data with each entry separated by a ,\n")
# 1, 2, 3, 4, 5, 6, 7, 8
x_data = array( x_data.split(',') , dtype = float)
print(x_data)
The third line is needed as the split()
function will just create a python list, which in turn is converted into a NumPy float array via the array()
function.
File Input and Output
It is convenient to use computers to analyze large data sets that would be impractical to study using pen and paper. In order to do this, you need to be able to read in the data then perform whatever analysis you want before writing out the results. We have already learned how to make plots to show the results. Here we will deal with reading and writing text files, which provide a convenient way to store numerical data. There are other ways of storing data, but for now we will concentrate on text files that contain lists or spreadsheets of data. Here, we will cover the basics of reading and writing text files using Python.
Before starting on the examples below, make sure you are in an appropriate working directory, i.e. your files are somewhere that you will find them. As suggested previously, you should have create a /PX1224/Week_5/
folder where you can store this weeks notebook and any output text files you will be producing.
Writing files
To begin, let's have a quick look at reading and writing a text file. To write to a file it must first be opened in write mode. The command will either open an existing file or (if there isn't a file of the name specified) create a new one. Here, let's open/create a new file called "rainfall.txt"
in write mode and identify it as the file object f. We have added the .txt suffix to indicate that this is a plain text file.
f = open("rainfall.txt", 'w')
print(f)
The computer confirms that f is an open file with name "rainfall.txt"
opened in write ('w') mode. As an example of writing something to the file, type the following.
f.write("My made up data\n")
f.write("Day\t Rainfall (mm)\n")
f.write("Monday\t 10\n")
f.write("Tuesday\t 10\n")
f.write("Wednesday\t 100\n")
Using f.write()
is very similar to the print()
command that you have been using. So, much of what you have done with print goes directly over to printing to a file. In particular the "\t" commands are adding tabs to the file and the "\n" is generating a new line. Once you are done writing to the file, you need to close it
f.close()
Trying to access the file after that will result in an error -- the file is closed and no longer available. Make sure to close files once you are done with them otherwise the changes you have made will not be saved.
Note: It is possible to have multiple files open at the same time, just make sure that you give the file objects different names. For example (there's no need to do this) you could open two files as
f = open( "rainfall.txt", "w" )
g = open( "snowfall.txt", "w" )
However, it is generally considered good practice to keep a file open for only as long as you need to. So, if you want to write some data you should open the file, write the data and then close the file.
Reading files
Now let's have a go at reading the simple file that you've just generated. Before we look at the file in Python, open it up with a program that you are familiar with. It should be possible to open the rainfall.txt
file in Word, Excel or Notepad. You can also open the file by double-clicking on it in the Jupyter Dashboard. You will see that it contains the text that we typed in spread over 5 lines.
To open the file in python and read what it contains, type
f = open("rainfall.txt","r")
text = f.read()
f.close()
print(text)
The first line opens the file in read mode ("r"), the second line reads the contents of the file and stores it to the variable text and the third line closes the file. Finally, we print the contents of the file. You should see that text contains the data that you entered a few minutes ago.
If you didn't want to read everything at once, then (after opening the file) you can use the command
text=f.readline()
to read a single line. Typing it again will read in the next line, and so on. Again, remember to close the file when you're done.
Reading and writing NumPy arrays
OK, that was the very basics of file IO (input/output). For most of the course so far, we have been using arrays. You can read and write arrays using the basic commands that we just saw above, but it's quite a hassle to turn the text that you read into an array. Luckily, numpy has specialized functions that are designed to read and write numpy arrays to text files.
We will read in the data with the loadtxt() command, and write it back out with the savetxt() command. These commands have quite a few different options. We will explore them with the help of some examples. All of the files can be downloaded from their links, below.
One column of numbers
First, we'll take the most basic data file - just a list of numbers. Download the files "numbers.txt" and "numbers.csv" into your Week5 folder. These were generated from an excel spreadsheet using the save as command and selecting either text (tab separated) or comma separated (csv). You can open either of these in excel. Alternatively, open them up using Notepad or in Jupyter. You will see that both of the files look the same (since there's only one entry per line).
To load in the numbers, make sure the two files are saved in the same folder as your current notepad, then simply do:
data1 = loadtxt("numbers.txt")
print(data1)
data2 = loadtxt("numbers.csv")
print(data2)
to load in the txt or csv files respectively. In both cases, you will have an array of numbers that you can analyze with all the methods that we have introduced previously.
To write the data to a file of a different name, use the savetxt()
command
savetxt( "new_numbers.txt", data1, fmt="%.2f" )
Reading files containing text headers
A more diligent student recorded the same numbers, but actually remembered to put a heading so that we'd have a clue what the numbers were about. These are contained in the file names "bus", which is again saved as "bus.txt" and "bus.csv" format. This tells us that the numbers are actually the amount of time that the student had to wait for the bus. If you try to read this in, you get the error ValueError: invalid literal for float(): Wait. This is because the loadtxt()
function is expecting a list of numbers; not text.
In this case, you can tell the loadtxt()
to skip one row
bus_data = loadtxt( "bus.txt", skiprows=1)
Writing files containing text headers
It is also possible to write a file containing a header before the data. As an example, we can create a file containing the data and a header (which also gives the average wait):
header='Bus wait time(min)'
f = open("new_bus.txt","w")
f.write(header)
f.write(f"average wait is { mean(bus_data) } minutes")
savetxt(f,bus_data,"%.2f")
f.close()
Note: In this case, you are passing the file object, f, to savetxt()
rather than the file name. This will write the array to the file at the current file location.
Excercises
For the exercises below, please write a program to perform the requested tasks for each question. When the demonstrators come to mark your work, they will ask you to run the programs you have written to demonstrate that they work.
1. Strings and Input/Output [2 marks]
Write a program that uses input() and asks the user to input
- a string
- set of three numbers, separated by comma
Create a float array, that contains the three numbers above. Then write an output file containing:
- the original string, the string in upper case, everything before the first letter “e” in the string
- the first number as an integer, the second as a float with 2 decimal places, the third as a float in scientific notation with three sig figs.
Tip:Have the input part of the code in a separate Code Cell, so that you don't have to input the numbers every time.
2. Input/Output [1 ½ marks]
Make a copy of the straight line fitting script you made in week 4. Open it and add a data input section that will ask the user to input the following:
- The x and y data in comma separated lists that will then be stored in arrays.
- The label for the x axis.
- The label for the y axis.
(Remember not to call your variables xlabel and ylabel as this will overwrite the functions.) Modify the rest of your code to use these variables at the appropriate points.
Also, edit the code to print out the slope, intercept and errors to 3 significant figures
Run your code, entering the Hooke's Law data to check that it still works and add comments to highlight the changes you have made.
3. Reading and analyzing 1 column data [1 ½ marks]
In this week's folder on LC there is a file called "two_gauss.txt" (also linked here). Copy this file to your current folder and read the data into Python. This is a large table of data with one column.
- How many data points are there?
- Make a histogram of the data with enough bins to resolve the features of the data.
- The data contains samples drawn from two different Gaussian distributions. By looking at the plot, choose a value to split the data into two parts corresponding to the two peaks. For each Gaussian, estimate
- The number of samples
- The mean
- The standard deviation