In the last few chapters we’ve seen how you can store values in variables and group them into collections of lists and dictionaries. As your scripts grow more complicated you’ll find yourself working with a shared set of these datastructures that you access repeatedly.
When just starting out, it’s natural to include your data right in the script (for instance by building a list of words or a dictionary mapping names to coordinates). But after these intialization blocks grow beyond a certain size, your script can get a little ungainly.
Once your data has taken on a life of its own, it makes sense to ‘serialize’ it (i.e., ‘write it out’) to a file. You can think of the file as holding the freeze-dried version of a variable. PlotDevice gives you an easy way to ‘rehydrate’ files using the read() command. In addition to reading Unicode text, read() can also unpack structured data from a pair of useful formats:
- for arbitrarily-nested dictionaries & lists
- for speadsheet-style ‘tabular’ data
When you call read() with a file path as its first arg, it will use the file exension to recognize .json
or .csv
files and give you options for how they are unpacked into variables.
Using json
for Datastructures
Over the last decade or so, the JSON format has become something of a lingua-franca for exchanging structured data between systems. Its popularity owes a lot to how naturally its syntax and semantics map onto the native collection classes of the so-called ‘dynamic’ languages.
In fact, the representations of most values look identical to those used in Python literals. This makes json a great choice for moving custom data out of your scripts and into a file. For instance, here’s a list of numbers in json:
[1, 2, 3, 4, 5]
Looks just like a Python list, right? Now here’s a dictionary (note that it’s a little different):
{"address":"123 Elm St.", "apt":null, "garage":false}
The structure of the dictionary itself looks just like Python’s {"key":val}
syntax, but you’ll notice the values are ‘spelled’ a little differently. First, Python’s None
value should be written as null
inside the json file. Second, true
& false
aren’t capitalized. When you read() from the file, these will be translated back to their Python equivalents automatically. Also keep in mind that strings in json always use double-quotes.
Things start to get interesting when you combine nested containers to represent more-complex ‘records’. For instance, if we were writing a nostalgic computer game, we might deal with data whose json representation looked like:
{ "player":{ "room":"Flood Control Dam #3", "score":53, "inventory":["lamp", "leaflet", "sword"] }, "thief":{ "room":"West of House", "score":null, "inventory":["egg"] }, "moves":153 }
If we save this text into a file called zork.json
, we can access it in PlotDevice by reading it into a variable. All of the ‘fields’ in the data can then be accessed using normal dictionary syntax:
game = read("zork.json") print("Game saved after %i moves" % game["moves"]) print("(%i points so far)" % game["player"]["score"]) if "lamp" in game["player"]["inventory"]: print("Player has not yet been eaten by a grue...") >>> Game saved after 153 moves >>> (53 points so far) >>> Player has not yet been eaten by a grue...
When you read() from a json file, you can also supply an optional keyword argument: dict
. This lets you use one of the ‘specialized’ dictionaries like adict or odict if that’s more convenient. For instance, we can use an ‘Attribute Dictionary’ to cut down on some of the punctuation noise in the previous example:
game = read("zork.json", dict=adict) print("Game saved after %i moves" % game.moves) print("(%i points so far)" % game.player.score) if "lamp" in game.player.inventory: print("Player has not yet been eaten by a grue...")
An ‘Ordered Dictionary’ is an appropriate choice if the preserving the file’s key ordering is important to you:
print("normal:", list(read("zork.json").keys())) print("ordered:", list(read("zork.json", dict=odict).keys())) >>> normal: [u'player', u'moves', u'thief'] >>> ordered: [u'player', u'thief', u'moves']
Behind the scenes, the read() command is using the json
module from the standard library. Take a look at the official docs to learn about generating json as well as parsing it.
Using csv
for Tabular Data
While json is great for the kinds of structured-data you generate through code, data that comes from the ‘real world’ often started its life in a spreadsheet or SQL table. Rather than having a variable structure, these formats are rigidly row & column-oriented. Comma-separated Value files represnt these ‘tabular’ structures in a simple text format. Each line of text corresponds to a row, and columns are separated by ","
characters.
Reading from rows & columns
In a somewhat contrived example, imagine we have a file called timestable.csv
with the following set of rows and columns:
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30 0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50 0, 6, 12, 18, 24, 30, 36, 42, 48, 54, 60 0, 7, 14, 21, 28, 35, 42, 49, 56, 63, 70 0, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80 0, 9, 18, 27, 36, 45, 54, 63, 72, 81, 90 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100
When we load this file using read(), it returns a list with one element for each of the rows. Each row is itself a list containing the column values as its elements. You can then use list-index notation to read from particular a particular cell or even pull out a ‘slice’ of values within a row:
mult = read('timestable.csv') print("%i rows and %i columns" % (len(mult), len(mult[0]))) print("6 times 8 is", mult[6][8]) print("multiples of 9:", mult[9][2:10]) >>> 11 rows and 11 columns >>> 6 times 8 is 48 >>> multiples of 9: [u'18', u'27', u'36', u'45', u'54', u'63', u'72', u'81']
Note that the parser doesn’t do any interpretation of the contents of the cells. Every value is read in as a unicode string (whether it’s number-like or not) and it’s up to you to call int() or float() to perform the conversion.
Reading from ‘records’
Often, each row represents a distinct ‘object’ and the columns correspond to ‘attributes’. For instance, here’s a csv-formatted table of data relating to the months of the year. Each row represents a single month and has attributes for the number of days, and the name in a few different languages. We’ll save the following into a file called months.csv
:
days, english, french, german 31, January, Janvier, Januar 28, February, Février, Februar 31, March, Mars, März 30, April, Avril, April 31, May, Mai, Mai 30, June, Juin, Juni 31, July, Juillet, Juli 31, August, Août, August 30, September, Septembre, September 31, October, Octobre, Oktober 30, November, Novembre, November 31, December, Décembre, Dezember
A common convention in csv files is to treat the first row of the document as a ‘header’ row. Rather than containing data, its values label what each column represents. If you set your file up in this manner, the read() command makes it easy to unpack each row into a dictionary with keys set to the names in the header row. All you need to do is set the cols
arg to True
:
year = read("months.csv", cols=True) dec = year[-1] print("%i months in a year" % len(year)) print("The month of", dec['english']) print("lasts for %i hours" % (24 * int(dec['days']))) >>> 12 months in a year >>> The month of December >>> lasts for 744 hours
Just as with json files, you can tell read() to use a particular kind of dictionary if you don’t want the standard one. An adict will let you access the columns with dot-notation:
year = read("months.csv", cols=True, dict=adict) print(year[9].german) >>> Oktober
An odict will let you address columns by order as well as by name:
year = read("months.csv", cols=True, dict=odict) june = year[5] print("column order:", list(june.keys())) print("column values:", list(june.values())) >>> column order: [u'days', u'english', u'french', u'german'] >>> column values: [u'30', u'June', u'Juin', u'Juni']