More about connection problems

Introduction to connections

Solocontutti almost always works really well and with great quality and latency, but sometimes there are problems and when that happens there are loads of tools to solve it. This post follows up from the previous post to take a deep dive into what causes problems and how they can be resolved.

Just a quick warning - this does not cover the process of establishing a connection, that's a whole other topic.

This post is aimed at people who don't already have an understanding of digital audio and internet routing, but are willing to dig technically a bit deeper.

What is going on?

What exactly is going om when Solocontutti sends and receives audio? You may think that this is just the same as, for example, Zoom works but this is less than half true. The fundamental reason that apps like Solocontutti exist is that realtime collaborative online performance (meaning playing together online, which I will now abbreviate to RCOP) requires a latency of around 30ms, which no public internet can guarantee. Zoom does not have this restriction, so can use other methods which mean it has a relatively high latency which makes it is useless for RCOP. But wait a minute - if you can't guarantee 30ms how does Solocontutti work at all? Well, basically by pretending that everything is fine by making stuff up.

Throwing away the data

As you probably know, digital audio works by sampling analogue audio, and converting it into chunks of data. You can choose how big this chunk is and generally you will make it quite large for efficiency. However, the longer the chunk the longer you have to wait for all the audio to arrive before you can digitise it. This means longer latency. This why Solocontutti works best with smaller chunks (or blocks) of about 128 samples. At a 48kHz sampling rate this equates to about 3ms (3/1000 seconds).

When you send blocks of data over internet the travel time is highly unreliable - if you have an average of 15ms some packets will take 2ms to arrive, and some 100ms. The trouble is that when you play these out as sound you are on the clock: you need another block every 3ms otherwise there is a gap in the sound. Blocks that arrive too early are not a problem - you just save them up until they are needed (this is called buffering). Blocks that arrive too late must be discarded, because they no longer fit into the timeline of the sound that you are playing. But this causes another problem - what do you play in the interval when you needed that block but it didn't arrive on time and now you have nothing? The answer is simple - you make stuff up.

Fake it till you make it

If you listen to music it tends to change a lot from moment to moment - this is what makes it interesting. However as you zoom in to very short timescales it starts to become more and more predictable. The sound that a guitar makes is very similar to the sound that it made 3ms ago - so similar that software can make a good guess at what the next block should sound like and play this guess out when no data has arrived on time. This is not perfect - if you choose to strum a new chord at that moment then this won't be right, and there may be small differences with reality which cause a usually inaudible glitch, but generally this works very well.

The more unreliable your connection, the more blocks you will lose and the more blocks you have to make up the more chance there is that you will start hearing this. As the connection gets worse you will hear this as audible glitches and later as distortion or noise until at some point the signal is unusable. Now this usually doesn't happen, and Solocontutti is very good at dealing with this, but when even Solocontutti can't keep up there are still quite a few tools in your arsenal to help you cope.

You may ask why I say that internet is so unreliable, when it seems stable, reliable and quick from your point of view. This is true and this is because the modern internet uses all sorts of clever mechanisms to ensure that your data arrives quickly, in the right order and without loss. However this comes at a price, and that price is - you guessed it - extra latency. In order to achieve low latency you have to forgo these protections and you are back to the law of the jungle where blocks arrive at wildly different times, sometimes in the wrong order and sometimes not at all.

When is fast not really fast at all?

One thing that is important to realise is that what internet providers call "speed" has little to do with the time it takes for data to get from A to B. What they call speed would better be termed bandwidth, and is really the maximum amount of data per second that the connection can deliver. So if your connection is rated at 1Gbs it could still take a week for data to travel from A to B, so long as every second it was sending 1Gb. Now obviously a week would be absurd, but the point is that the "speed" of your internet connection has only a passing relationship with the travel time of your data, which is much more influenced by how the data is routed from A to B. This also means that your connection with different people in different loctations can behave differently.

Tools to help out

The key tool for diagnosing problems is the Tuning window. You can call this up by enabling advanced functions and then selecting the Tuning Dialogue in the Advanced menu (or the Tools menu in versions 3.0 and above). Here is an example below - this may need some explanation. These graphs are shown for a single connection/channel. If you have more than one connection you can choose which one you want to monitor in the bottom left.

Taking the graphs one at a time we have .....

Jitter Buffer Samples

Solocontutti does actually buffer small number of blocks, as little as possible to keep the sound as smooth as possible without compromising on the latency requirements. The buffer is called a jitter buffer and it is "adaptive" which means it changes size according to network conditions. There is a lot to be said about this buffer, but there are two main characteristics that you need to know:

The "prefetch" size (red line) is the number of samples that the buffer thinks it currently needs to have in order to be able to use 95% of the incoming data (this is adjustable, but more on that later). This should be as low as possible, but will be larger as network conditions become worse. It is default limited to a maximum of 200ms (9600 samples at 48kHz), which is far higher than the workable latency. The number in the top right hand of the graph shows the current value in samples. In this example it is 111 samples which is about 2,5ms - this is very good.
The green line shows the actual fill of the buffer - in other words, how successful the buffer is in keeping itself filled. It should be very close to the red line, if not this indicates that the network is unstable and normally the buffer will be automatically lengthened.

Jitter Overflow Samples

This graph is showing how much data cannot be used. When blocks come in to the buffer and cannot be used they will "overflow" and must be discarded. The more often this happens there more unreliable the arrival time of the blocks is.

Ping latency

This graph shows how long it takes a block to get from you to the other side and back, and is an indication of the real speed of the connection in terms of latency.

Frame Substitution Rate

This graph tells you how often the app has had to "make up" some sound the replace data that was missing or didn't arrive on time. This will never be zero, but most of the time the app can compensate almost inaudibly. When this becomes larger the limits of the compensation algorithm will be reached a you will hear distortion and clicks.

Interframe ms

This tells you the time in milliseconds between subsequent frames of data. Ideally you would expect this to be constant, but of course it never is in the real world. The expected gap between subsequent frames in milliseconds can be calculated by framesize/sampling rate. For example,

if you choose a frame size of 128 samples and a sampling rate of 48kHz, then you would expect the gap between frames to be 128/48000 seconds = 0.00267 seconds or 2.67 milliseconds. When this value starts varying a lot than lot of data will be substituted and the quality of sound will be compromised.

The "Tune" controls

These controls give you some control over the way the app optimises for either performance or latency. The controls are:

min % of data - this control determines how much of the data Solocontutti tries to capture on time. If the network is perfect then you should always be able to capture 100%. However, in reality, in order to capture 100% of data you will need a larger buffer which introduces latency. By reducing this value you can reduce latency, but at some point it will start effecting sound quality
min buffer length - this determines the smallest buffer size that Solocontutti can use to buffer data. The value is in milliseconds, so a value of 3 will be enough for a single 128 sample frame at 48kHz. This does not fix the buffer size - it only tells the app how low it is allowed to go when trying to reduce latency. If you have a good connection you can set this as low as you like. If you are having problems, then making it higher should help
Adaptive - this tells Solocontutti whether or not to use a variable buffer length according to internet conditions. If you have very stable connections, you may want to turn adaptive off, in which case the buffer will be fixed to the value in the "min buffer length" field

Remember - these settings are for the connection that you have selected, and will not effect other connections.

Solocontutti