Matthew Yee-King MSC thesis

Newsflash - this was first written in 2000 and based on a Java Applet. In 2015, I reimplemented it as a Javascript web application using the web audio API.
Run the new web audio API version! yee-king home : run the applet

abstract : introduction : inspiration : implementation : conclustion : references

Implementation

PREAMBLE

This system has been programmed in Java with the exception of the server-side software, which is written in Perl. It can run on any medium specification PC or Mac system (200MHz +) with the Internet Explorer web browser (v5+) and the Jsyn plug in [4]. AudioServe has a simple Graphical User Interface that makes it easy and intuitive to use. The user interacts with the program through a set of clickable lists and buttons. To overcome current speed limitations of the Java virtual machine architecture the program uses libraries developed by Phil Burk [10] to generate its audio output. These libraries use the Java Native Interface [11] to run fast CPU native C code to generate audio in real time. The java client program is engineered with an hierarchical object orientated theme, where the population is an object, the members of the population are objects, their genomes are objects and so on. The server-side Perl software consists of a cgi-bin script that serves requests for immigrants and submissions of fit circuits.

user perspective

>>the GUI

This has three clickable lists, three information readout panels, several buttons and a visual circuit display. These features are in a compact single screen layout which is initially quite complex but soon becomes intuitive once the user gains an understanding of what the program is doing.

Clickable lists:

Local population list [Fig 1 reference 1]

This list represents the members of the current generation. Each member of the population has a user definable name and it can be heard by clicking on its name. Clicking on its name also displays a diagram of the circuit in the circuit display window. The list format is used as it provides a compact display of the population including the names of the circuits and a rapid way of previewing the members of the population since the user can use the arrow keys to flick through the sounds.

Immigrant list [Fig 1 reference 11]

This list represents the circuits being sent from the server to the program. Their names are displayed and they can be clicked on to preview them, in a similar fashion to the local population list.

Generation list [Fig 1 reference 8]

The generation list represents the last 10 generations. At the bottom of the list is a generation labelled latest. Clicking on one of these generations recalls that version of the population from memory. This allows the user to go backwards and forwards through the generations of circuits they create.

Information readout panels:

Network info panel [Fig 1 reference 9]

This panel displays messages relating to network activity. When a circuit is sent or received a short message is displayed here. If there is an error connecting to the server, a warning is displayed here.

General info panel [Fig 1 reference 12]

This panel displays general information about program events. When a circuit is playing it displays a message to this effect along with some parameters relating to the circuit (e.g grid size). When the program is generating new populations, the display informs the user of the current activity. This means that the user knows what is going on during longer operations and that they know which circuit they are listening to at al times.

Circuits to be sent panel [Fig 1 reference 10]

This contains the list of circuits that are waiting to be sent to the server. As discussed in the programming details section the network system runs in a separate thread so it does not interrupt the main program flow. This means the circuits are not sent immediately; they are buffered. This display keeps the user informed as to what is going to be sent

Buttons and controls:

Restart

Generates a fresh population of random circuits

Mutate

Generates a population of mutant versions of whatever circuit is playing.

Submit

Sends the currently selected member of the local population to the immigration system.

Name

Sets the name of the current circuit in the local population to the displayed text.

Mutation Rate

This slider sets the mutation rate from 0-100%. This is the probability of a mutation occurring at any locus in the genome. This allows the user to control the rate of evolution. If they have a good sound they can set the mutation rate low and gradually tweak it. If they are starting from scratch they can make swift movements with a high mutation rate until the population starts to contain reasonable circuits.

Circuit display window:

This graphical display panel was originally used to debug the code but I felt it added something to the experience of using the software and left it in. It displays the layout of the modules in a circuit and the connection ranges for each module. Fig 2 shows a screen shot.

>>a session with AudioServe

A typical session with AudioServe may run something like this:

Program initialisation:

The GUI fires up. The local population is initialised with random circuits. Several immigrants are downloaded into the immigrant population. The process of immigrant downloading continues on a timer so a new immigrant pops on the end of the list every 15 seconds or so.

The user starts using the controls:

Several circuits from the available set are auditioned and one is selected for mutation. The local population is replaced with mutant versions of the selected sound. These are auditioned until the user finds one they prefer to the parent sound. Mutate is clicked again¦ and so on. At some point the user gets a sound they really like. They click submit and it is sent to the server. The server adds the necessary details to the database.

functional description of the machinery

>>the circuit

As previously mentioned the sounds AudioServe makes are generated from modular circuits based on established principles of FM/ AM synthesis [2]. This system was chosen after the authors previous success with a similar system called Audiomorph [12]. This work showed that a circuit made from as few as 4 oscillators could generate complex, dynamic sounds. The AudioServe circuits represent the development of these ideas.

The circuits are made up from a selection of interconnected modules in a 2-d space. The modules can be sine or square wave oscillators or state variable filters (see fig 3). The sine wave is classically used as a volume or frequency modulating oscillator in an FM circuit due to its smooth shape. For example in its volume modulating guise it can be used to generate pleasing vibrato effects. The square wave was chosen for its dense harmonic content. Fig 6 shows a spectrum analysis of a square wave with a base harmonic of 400 hertz. The wide range of audible frequencies in a square wave is quite apparent from this. The filter modules were chosen to complement this feature of square waves. In subtractive audio synthesis set ups, filters are used to subtract or remove some frequencies from the sound. Resonant filters subtract some frequencies from the sound and emphasise or resonate others through feedback. The state variable filters used here have frequency cut off and resonance control inputs and they can work in low, high or band pass modes. Fig 7 shows how these forms of filtering work. Low pass filters make the sound appear increasingly muffled as the frequency cut off is lowered (since the high frequencies are blocked). Band pass filters can totally change the nature of the sound as the cut off is swept. High pass filters make the sound appear increasingly tinny as the frequency cut off is increased. The combined potential of these modules means the circuits can make a varied selection of sounds.

The audio output is connected to an output on one of the modules in the circuit so it can be heard. This is the first module that becomes instantiated. The developmental process biases the first module to have the most connections made to it so the is likely to be producing the most complex waveform (see developmental section further down).

>>the genetic encoding scheme

The genome consists of an array of integer values. They fall in the range 0-360 and are normalised according to what the value will be used for, after [13]. The length of the genome is variable, but the work reported here did not exploit that feature. The genome can be viewed as a collection of genes where each gene defines the properties of a module in the circuit. A gene defines these properties (see fig 4) :

1. The modules x, y position in a 2d grid

These two integers place the module in the grid. Their values are normalised so they fall in the range 0-maximum grid size.

2. A numerical ID that defines what type of module it will be

As mentioned above, there are three types of module in the present architecture. Fig 3 shows these modules. Note that they have different numbers of in and out ports. Parameter 6 defines which port the module makes connections to. The module ID value is normalised between 0 and 2.

3. Two angles that define bearings for lines that make the sides of the connection segment (connection range)

The circuit undergoes a developmental process where each module is instantiated and it scans a specified connection range for other modules. This connection range takes the form of a segment. The bearings of the two lines bounding the segment are these two integers, normalised between 0 and 360.

4. The radius of the connection range

This parameter defines the lengths of the sides of the segments. It is normalised between 0 and the maximum radius value. (worked well as 2 " 4 x max grid size)

5. The bias to place on connections made to other modules in the circuit

When connection is made, a bias is placed on it. The bias is the same for all connections made by a particular module and this parameter defines its value. It is converted to a double variable normalised between 0 and 1. Fig 5 shows how the bias is implemented.

6. The port to connect to on other modules

This parameter dictates which port a given module will connect to when another module falls in its connection range. Since different modules have different numbers of ports, it is normalised between 0 and the number of ports available on the target module.

The normalised parameter integer string genetic encoding scheme is popular amongst GA researchers as it can undergo mutation and crossover without becoming garbled. In AudioServe I chose this scheme for this robustness as well as for the ease of transmitting and storing such data. A more complex, symbolic genome may have been more difficult to transmit across the internet and store on the server. Further, the integer string genome has proven capable of encoding circuits that produce interesting activity patterns [13].

>>the developmental system

Basics:

The circuits undergo a deterministic developmental process. Fig 5 shows two stages in the development of a circuit. Each module is instantiated in the grid in turn. If a module falls at the same grid reference as a previously created module it is not instantiated. This leads to dormant genetic information, i.e. genetic information that is not expressed. A later mutation may awaken this information " if the competing module moves, the information will be expressed. When it is instantiated a module makes all its connections. This means that modules can only connect to and modulate previously created modules as further connections are not made to subsequently instantiated modules.

Feedback:

Feedback (self modulation directly or via modulation of a module that is modulating the self) is therefore impossible " there are just successive layers of modulation with no retrospective connections. Earlier work [12] suggested feedback tends to kill the signal in FM circuits. There is another kind of feedback available in the circuits however. The resonance control of the state variable filters does allow a little localised feedback to occur. This is in a controlled way within the architecture of the module itself though.

Factors affecting connectivity:

The first module in the genome is instantiated in the grid, then the next. If the first module falls in the connection range of the second, a connection is made. Another module is added. If the second or first modules fall in the connection range of the third, connection(s) are made, and so on. It has already been stated that the developmental process is biased such that more connections are likely to be made to the first module. This is intuitively clear " the first module connects to no others. The second module may connect to the first, the third module may connect to the second and the first. If m is the total number of modules, the first module has an opportunity of gaining modulation

m - 1

times and the second module

m - 2

times etc. This makes the output of the first module likely to be the most heavily modulated hence making it likely to produce the most interesting waveform. In a non-toroidal grid with limited radii for connection ranges, positions near the centre of the grid are likely to fall in more connection ranges. Of all the possible connection ranges coming from all the possible grid positions, more include the centre region of the grid.

>>the population system

Client program:

The AudioServe client maintains a local population, an immigrant population and a population history. The local population is a set of 25 sounds that can be heard and selected for mutation. Its member sounds only change as a result of user input. The immigrant population is a set of 8 sounds that can also be heard and selected for mutation. Its member sounds are only changed automatically by the program, which keeps this immigrant population in constant flux by requesting new immigrants from the server every 15 seconds or so. When the program is first started it gets 8 immigrants from the server. From then on in each time a new immigrant comes in the oldest immigrant is removed from this population. If the user clicks mutate with an immigrant sound or a local population sound running, the local population is entirely replaced with mutant versions of the audible sound. The population history is a collection of the last 10 generations of the local population. Each time the local population is replaced with mutants of a chosen sound the program stores the old population. This allows the user to move backwards and forwards through the mutation steps they have taken. When a different generation is selected it replaces the local population in the sense that its members become available for auditioning rather than the current local populations members. The most recent generation is always labelled in the GUI as latest. Calling mutate on a member of an older generation than the latest one will generate a population which becomes the latest one.

A small local population system with dynamics where each generation completely replaces the previous is a proven way to implement populations in interactive evolution systems [12, 14, 15]. With respect to the local population size, the user can only be expected to effectively audition a small number of candidates at each iteration of the GA. In terms of the population dynamics, this is certainly not the only way to do it. Ooptions involving persistent local population members are discussed in the user feedback section.

Server program:

The AudioServe server program works on a text file database of genome information that represents a persistent, central population. When the client requests an immigrant, the server program chooses a random genome from the database and sends it back. When the client submits a fit genome to the server its details are added to the database. The use of the server population as a library of ready-made circuits allows the user to start from a fit sound rather than having to evolve from scratch each time. This is similar to the system in [15] but it has a deliberately more random feel. This randomness is implemented to take control away from the user a little and give the system a dynamic of its own. This makes the system less predictable as the user does not know what new sounds may appear in their immigrant population.

Island hopping:

The client programs can be viewed as islands which can be seeded by the central population. The server program provides an island hopping dynamic and a persistent store for fit genomes. The decision as to which genomes are sent back from the islands to the server is made by the user based on their opinion of the quality of the sounds. This means there should be a flow of novel, fit genomes back to the server evolved by varying fitness criteria. In practice the quality of the genomes that have been sent to the server is high and the sounds are very varied. Viewed as a whole, the system could be summarised thus:

A transient set of local populations with variable, user dependent fitness criteria submit highly fit genomes to a persistent central population that can optionally seed further local populations. The result is that the system provides a good degree of parallel hyperplane sampling combined with transient, local hill climbing search stages.

>>the mutation function

Point mutation:

In the current implementation mutation is carried out by a fairly simple point mutation function. Previous work by the author with these kinds of systems [12] has proved point mutation to be effective in providing the required evolutionary properties " workably smooth fitness landscape with tangibly gradual change in the sounds. When a user mutates a sound they expect to be presented with a population of mutants that mostly produce sounds related to the parent sound. Having a smooth fitness landscape implies a gradual change in the phenotype as the genotype is adjusted (e.g. by mutation). Such a landscape should yield tangible variants of the parental phenotype. Moving around a rough fitness landscape is characterised by sudden large changes in the phenotype. This is not a desirable property when the aim is to provide a user with sounds that appear to evolve in a gradual way.

Implementation details:

The mutation function parses the genome generating a random number (0-1) at each point. If this number is smaller than the mutation rate, that locus will be mutated. The mutation can increase or decrease the value of the integer at this locus. Another random number decides this:

If > 0.5 à go up

If <= 0.5 à go down

To decide the magnitude of the change, the distance from the original random number to the mutation rate is calculated as a percentage of the distance from 0 to the mutation rate. This percentage is then taken of the smaller of these two:

1. Maximum locus value-locus value

2. Locus value

This value is added or subtracted from the locus according to the earlier increase/ decrease decision.

The result of this is a conservational dynamic, with three related effects:

1. high locus values decrease or increase gradually under mutation

2. low locus values increase or decrease gradually.

3. medium locus values can go either way.

This is not the only possible mutation function that the system can support. Other options are discussed in the final chapter.

programming details

The java program code is included in full in Appendix A.

>>the JSyn SDK

The JSyn SDK provides java developers with access to real time audio synthesis through their java code. Since java runs in a virtual machine it can be a bit slow on lower spec systems. Real time audio synthesis is a CPU intensive process and JSyn overcomes this by performing the audio synthesis with native C code that runs outside the virtual machine. A plug in for web browsers allows JSyn applets that generate audio to be embedded in web pages. To access the JSyn libraries the Java Native Interface (JNI) [11] is used. This allows instantiation of and communication with non-java objects from java code. In JSyn these non-java objects are the modules that make up the audio circuit. More information about JSyn is available. [10]

>>the major objects in the java client program

The program is engineered with a hierarchical structure:

AudioServe:

The top level object. This is essentially formed from GUI and event code. It translates GUI events into calls to lower level objects to perform functions such as mutation, circuit auditioning, restoration of old populations etc.

Major fields:

Population object

This is the local population referred to above. Operations that affect the local population are mediated through this

PopulationHistory object

This is the population history. This is updated whenever the local population changes. Also it provides access to the stored generations

Immigration object

This object deals with incoming and outgoing circuits. It maintains the immigrant population and talks to the server.

Population:

This consists of an array of Individual objects and a set of methods to manipulate them. It provides methods to change fields in the individuals such as name, and methods to mutate the population.

Individual:

This consists of a Circuit, a CircuitSpec, a CircuitGraphic and a Genome object. The Circuit object is used to make the sound. The Genome deals with the genetic information. The CircuitGraphic deals with the graphic display of circuits. The CircuitSpec object is an interpreted form of the genome that is used by several objects.

Circuit:

This object has the JSyn fields in it. It is used to generate the audio. It is instantiated when a sound is auditioned and destroyed when a different sound is auditioned. This is necessary as Circuit objects take up a lot of memory and CPU time. It has another important object as one of its fields, the PortMatrix object. The circuit contains the oscillators and filters in a circuit; the PortMatrix contains the wiring system in the circuit.

PortMatrix:

The wiring system in a circuit. It consists of a 3-d array of pointers to in and out ports on the modules in the circuit. It can be seen as a virtual patch bay. The first and second dimensions are x and y in the grid. The third dimension is a stack of ports. The Circuit build method generates a PortMatrix at the same time as instantiating the main modules in the circuit. The PortMatrix object provides methods to add wires and modules to the patch bay. The flexibility of this core object makes the whole system highly extensible. It is easy to add extra module types or increase the size of the grid.

Genome:

This contains the integer array that is the genome. Its most important methods are pointMutate and develop. PointMutate generates a mutant version of the chromosome. Develop generates a circuitSpec object from the genome. An integer string is used as it is easy to manipulate by mutation and such.

CircuitSpec:

An interpreted version of the genome used by the CircuitGraphic and Circuit objects. Its design is optimised to provide quick access to circuit parameters. It has arrays containing the parameters for each module and an array of parameters for each wire in the circuit. It is useful as it keeps the genome interpreting code in one place and it makes generation of running circuits and the circuit display quicker than it would be were the information taken directly from the genome.

Immigration:

This object maintains the immigrant population and sends circuits to and from the server. It has two sorts of methods, network connection methods and normal methods. This object runs independently of the main program in its own thread so the <potentially time consuming> network methods can run without interrupting the flow of the main program.

PopulationHistory

This consists of an array of Population objects. These objects are stored in a compact form to keep memory usage down. When the user requests a different generation to the one they currently have, the appropriate Population object is expanded and returned.

>>the server-side system

The server-side system consists of a Perl script and a text file. The Perl script is in the cgi-bin of a web server. It runs in two modes, submit and request. If the user submits an Individual to the server the client program will send the essential details of that individual to the Perl script by the POST method. Upon receipt of these details the Perl script formats them and appends them to the end of the text file. When the client program requests an individuals details (an immigrant request) the Perl script reads the text file into memory, chooses a random entry from the file and sends it back to the client as text.

The current version of the text file is included in Appendix C and the Perl Script Is in Appendix B.

>>testing

Different platforms:

It is important to test any application aimed at a web audience on a variety of platforms. Here is a list of machines that I have personally run the software happily on:

Windows 2000 Server PC with twin Pentium III 600MHz CPUs (development machine)

Windows 95 PC with Pentium II 233MHz CPU.

Apple PowerBook with MacOS 9 and G3 400MHz CPU.

Apple iBook with MacOs 9 and G3 266 CPU.

Most digital audio software recommends a spec at least as good as the second machine in this list. The AudioServe client software ran flawlessly on this machine.

Stability:

Since the client software is for public use rather than for experimental purposes it has to be reasonably crash proof. A proof of its stability is that I have run 8 versions of it at the same time on my development machine without any problems (apart from an overheated PSU!). The classic test of clicking on as many things as possible in a short space of time failed to crash it. Also running the software for many successive generations fails to cause any memory or other problems. This is proof of the stability of the garbage collection system in Java as much as anything else. The Perl software has been running without problems since it was uploaded to the web server.