Sõnastik

Valige vasakul üks märksõnadest ...

Programming in JuliaSets and Dictionaries

Lugemise aeg: ~25 min

Sets

Sets are unordered collections of unique values. The main advantage of having a special type for sets is that the design of the data structure can be optimized for membership checking. Figuring out whether a given value is in an array requires going through each element in the array, so the amount of time it takes increases with the length of the array. By contrast, checking membership in a set can be done very quickly even if the set is large.

A = [1,2,3]
S = Set(A)
2 in S # evaluates to true
pop!(S, 2) # removes 2
push!(S, 11) # puts 11 in the set
2 in S # evaluates to false now

Exercise
Make a set which contains the first 10,000 prime numbers.

Hint: It suffices to look for primes among the first 110,000 integers. Compare how long it takes to check whether a given number is in that set to the time it takes to compute whether the number is prime using Primes.isprime.

using Primes: isprime
primes = # add code here
primes_set = Set(primes)
@time 98779 in primes
@time 98779 in primes_set
@time isprime(98779)

Solution. To get exactly 10,000 primes, we index the list obtained by filtering out the composite numbers:

using Primes: isprime
primes = [k for k in 2:110_000 if isprime(k)][1:10000]
primes_set = Set(primes)
@time 98779 in primes
@time 98779 in primes_set
@time isprime(98779)

Put the three methods in order from fastest to slowest:

List membership checking
Set membership checking
Computing from scratch

Dictionaries

The internal mechanism that sets use to check membership extremely fast is also useful when the information you want to retrieve is more complex than just true or false.

For example, suppose you want to store a collection of color names together with the RGB values for each one. We'll store the names as and the RGB triples as .

It's possible to do this by putting the names in an array and the values in a list of the same length:

names = ["fuchsia", "firebrick", "goldenrod"]
rgbs = [(256, 0, 256), (178, 34, 34), (218, 165, 32)]

However, this solution gets very tedious quickly. For example, modifying this structure requires .

The Julia data structure tailored to the problem of encoding a map from one finite set to another is called a dictionary. Dictionaries are created by supplying pairs to the Dict function. For example, the dictionary encoding the map described above looks like this:

rgb = Dict(
  "fuchsia" => (256, 0, 256),
  "firebrick" => (178, 34, 34),
  "goldenrod" => (218, 165, 32)
)

The domain elements "fuchsia", "firebrick" and "goldenrod" are called the keys of the dictionary, and the codomain elements (256,0,256), (178,34,34), and (218,165,32) are called the values.

We can also form new dictionaries from lists of pairs using the dict function:

Dict([
  ("fuchsia", (256, 0, 256)),
  ("firebrick", (178, 34, 34)),
  ("goldenrod", (218, 165, 32))
])

We can perform a dictionary lookup using indexing syntax: rgb["fuchsia"] returns (256,0,256). We can also change the value associated with a given key or introduce a new key-value pair using indexing and assignment:

rgb = Dict(
  "fuchsia" => (256, 0, 256),
  "firebrick" => (178, 34, 34),
  "goldenrod" => (218, 165, 32)
)
rgb["crimson"] = (220, 20, 60)
rgb

keys and values functions may be used to obtain the keys and values.

rgb = Dict(
  "fuchsia" => (256, 0, 256),
  "firebrick" => (178, 34, 34),
  "goldenrod" => (218, 165, 32)
)
collect(keys(rgb))

Exercise
Consider a dictionary which encodes flight arrival times:

import Dates
arrival_times = Dict(
  "JetBlue 924" => Dates.Time(7,9),
  "United 1282" => Dates.Time(7,42),
  "Southwest 196" => Dates.Time(7,3)
)

You can most easily use this dictionary to .

Suppose you want to reverse the lookup direction: for any given time, you want to see which flight arrives at that time. One problem is that .

Assuming that the codomain values are distinct, however, you can form a new dictionary that allows you to look up keys for values by using an array comprehension that iterates over the key-value pairs of the dictionary (obtainable using the pairs function).

Implement this idea in the block below. Check that your dictionary works by indexing it with Dates.time(7,9).

import Dates
arrival_times = Dict(
  "JetBlue 924" => Dates.Time(7,9),
  "United 1282" => Dates.Time(7,42),
  "Southwest 196" => Dates.Time(7,3)
)    

Solution. We use the Dict function to convert the list of pairs back into a dictionary: Dict([(b,a) for (a,b) in pairs(arrival_times)]).

Exercises

Exercise
You can construct dictionaries using a comprehension in Julia. For example, here's a dictionary that maps each one-digit positive integer to its square:

square_dict = Dict(k => k*k for k in 1:9)

Use a dictionary comprehension to make a dictionary which maps each of the first 100 powers of 2 to its units digit. Note: you'll need to use big(2) instead of 2 to calculate its 100th power, because 2^{100} is larger than the largest 64-bit integer.

Solution. We convert to a string, get the last character, and convert back to an integer:

  Dict([big(2)^k => parse(Int64, string(big(2)^k)[end]) for k in 1:100])

Exercise
Suppose you want to store student IDs in a part of a web application where the main thing you need to do is check whether an ID input by a student is a valid student ID (so you can flag it if it has been mistyped). Among the given options, the best data structure for this purpose would be a .

Solution. This is an ideal use case for sets. Lists and tuples will be slower for checking membership, and dictionaries aren't quite appropriate because it isn't clear what the values would be.

Bruno
Bruno Bruno