Data Protection and Synchronization in C#

Performing more than one operation at once provides a valuable facility to a program, but it also increases the complexity of the programming task.

1. A Slightly Broken Example

Consider the following code:

using System; class Val {

int number = 1;

public void Bump()

{

int temp = number;

number = temp + 2;

}

public override string ToString()

{

return(number.ToString());

}

public void DoBump()

{

for (int i = 0; i < 5; i++)

{

Bump();

Console.WriteLine(“number = {0}”, number);

}

}

}

class Test {

public static void Main()

{

Val v = new Val();

v.DoBump();

}

}

In this example, the Val class holds a number and has a way to add 2 to it. When this program runs, it generates the following output:

number = 3

number = 5

number = 7

number = 9

number = 11

While that program is being executed, the operating system may be performing other tasks simultaneously. The code can be interrupted at any spot in the code, but after the interruption, everything will be in the same state as before, and you have no way of knowing that the interruption took place.

You can modify the program to use some threads:

using System;

using System.Threading;

class Val

{

int number = 1;

public void Bump()

{

int temp = number;

number = temp + 2;

}

public override string ToString()

{

return(number.ToString());

}

public void DoBump()

{

for (int i = 0; i < 5; i++)

{

Bump();

Console.WriteLine(“number = {0}”, number);

}

}

}

class Test {

public static void Main()

{

Val v = new Val();

for (int threadNum = 0; threadNum < 5; threadNum++)

{

Thread thread = new Thread(new ThreadStart(v.DoBump));

thread.Start();

}

}

}

In this code, a ThreadStart delegate is created that refers to the function the thread should execute. When this program runs, it generates the following output:

number = 3
number = 5
number = 7
number = 9
number = 11
number = 13
number = 15
number = 17
number = 19
number = 21
number = 23
number = 25
number = 27
number = 29
number = 31
number = 33
number = 35
number = 37
number = 39
number = 41
number = 43
number = 45
number = 47
number = 49
number = 51

Can you find the error in the output? No?

This example illustrates one of the common problems with writing multithreaded programs. The example has a latent error that might show up in some situations, but it doesn’t show up when the example runs under normal conditions. Bugs such as these are some of the worst to find, and they usually show up only under stressful conditions (such as at a customer’s site). You can change the code a bit to simulate an interruption by the operating system:

public void Bump()

{

int temp = number;

Thread.Sleep(l);

number = temp + 2;

}

This small change leads to the following output:

number = 3
number = 3
number = 3
number = 3
number = 3
number = 5
number = 5
number = 5
number = 5
number = 5
number = 7
number = 7
number = 7
number = 7
number = 7
number = 9
number = 9
number = 9
number = 9
number = 9
number = 11
number = 11
number = 11
number = 11
number = 11

This isn’t exactly the desired result.

The call to Thread.Sleep() will cause the current thread to sleep for one millisecond, before it has saved the bumped value. When this happens, another thread will come in and also fetch the current value.

The underlying bug in the code is that you have no protection against this situation happening. Unfortunately, it’s rare enough that it’s hard to find. Creating multithreaded appli­cations is one area where good design techniques are important.

2. Protection Techniques

You can use several techniques to prevent problems. Code that’s written to keep this in mind is known as thread-safe.

In general, most code isn’t thread-safe, because there’s usually a performance penalty in writing thread-safe code.

2.1. Don’t Share Data

One of the best techniques to prevent such problems is to not share data in the first place.

It’s often possible to architect an application so that each thread has its own data to deal with. An application that fetches data from several Web sites simultaneously can create a separate object for each thread.

This is obviously the best option, as it imposes no performance penalty and doesn’t clutter the code. It requires some care, however, since modifying the code may introduce the errors you tried to carefully avoid. For example, a programmer who doesn’t know that a class uses threading might add shared data.

3. Immutable Objects

Immutable objects are thread-safe by definition. Multiple threads can safely read a piece of data simultaneously, and it’s only when modifying operations are taking place that precau­tions need to be taken in a multithreaded scenario. Because immutable objects don’t allow modifying operations, you don’t need any other thread-safety measures.

The string type is a great example of achieving thread-safety through immutability. Every modifying operation on a string, such as ToUpper(), returns a new string instance. If one thread is completing an enumeration of the characters of a string at the same time as another thread calls ToUpper(), the thread conducting the enumeration is unaffected because the new uppercase string is an entirely separate object that isn’t physically related to the original string.

Immutability does place a higher design and implementation burden on a type. The methods of the type must be designed so it’s apparent to the users of the type that a modifying operation returns a new instance rather than modifying the instance it was called, and it’s generally wise to provide a mutable equivalent of the immutable type to support high-performance modifi­cation operations. For string, StringBuilder is the equivalent mutable type.

3.1. Exclusion Primitives

The System .Threading namespace contains a number of useful classes for preventing the prob­lems in the earlier example. The most commonly used one is the Monitor class. You can modify the slightly broken example by surrounding the problem region of code with exclusion primitives:

public void Bump()

{

Monitor.Enter(this);

int temp = number;

Thread.Sleep(1);

number = temp + 2;

Monitor.Exit(this);

}

The call to Monitor.Enter() passes in the this reference for this object. The monitor’s job is to make sure that if a thread has called Monitor.Enter() with a specific value, any other call to Monitor.Enter() with the same value will block until the first thread has called Monitor. Exit(). When the first thread calls Thread.Sleep(), the second thread will call Monitor. Enter() and pass the same object as the first thread did, and therefore the second thread will block.

The implementation of Bump() has a slight problem. If an exception was thrown in the block that’s protected, Monitor.Exit() will never be called, which is bad. To make sure Monitor.Exit() is always called, the calls need to be wrapped in a try-finally. This is important enough that C# provides a special statement to do just that.

3.2. The lock Statement

The lock statement is simply a thin wrapper around calls to Monitor.Enter() and Monitor.Exit(). The following code:

object lockObj = new object();

lock(lockObj)

{

// statements

}

translates to the following:

object lockObj = new object();

try {

System.Threading.Monitor.Enter(lockObj);

// statements

}

finally

{

System.Threading.Monitor.Exit(lockObj);

}

The object that’s used in the lock statement reflects the granularity at which the lock should be obtained. If the data to be protected is instance data, it’s typical to create a private member variable and use this to prevent concurrent access.

If the data to be protected is a static data item, it will need to be locked using a unique static reference object. You do this simply by adding a static field of type object to the class:

static object staticLock = new object();

This object is then used in the lock statement.

For types that will be widely distributed, creating a private member variable to use in lock statements is preferable to taking out a lock on this. Code anywhere inside an application domain can also take out a lock on the this object reference, which can cause contention and deadlocks. Using a private member variable for locking is extremely simple to implement:

public class WidleyDistributedType {

private object wdtLock = new object();

public void ThreadSafeMethod()

{

lock(wdtLock) {/*code here*/}

}

}

3.3. Synchronized Methods

An alternate to using the lock statement is to mark the entire method as synchronized, which has the same effect as enclosing the entire method in lock(this). To do this, mark the method with the following attribute:

[MethodImpl(MethodImplOptions.Synchronized)]

You can find this attribute in the System.Runtime.CompilerServices namespace.

In general, you should use lock over the attribute for two reasons. First, the performance is better because the region in which a lock exists is often a subset of the whole method. Second, it’s easy to miss the attribute (or forget what it does) when reading code; using lock is more explicit.

3.4. Interlocked Operations

Many processors support some instructions that can’t be interrupted. These are useful when dealing with threads, as no locking is required to use them. In the CLR, these operations are encapsulated in the Interlocked class in the System.Threading namespace. This class exposes the Increment(), Decrement(), Exchange(), and CompareExchange() methods, which can be used on int or long data types.

You could rewrite the problem example using these instructions:

public void Bump()

{

Interlocked.Increment(ref number);

Interlocked.Increment(ref number);

}

The runtime guarantees the increment operations won’t be interrupted. If Interlocked works for an application, it can provide a nice performance boost, as it avoids the overhead of locking.

4. Mutexes and Semaphores

The exclusion primitives discussed so far are all application domain-specific locking techniques. A Monitor object in one application domain is totally unaffected by a Monitor object is another application domain or process, even if they have the same name. When protecting against concurrent access to resources that are specific to application domains such as C# objects, this is the behavior that’s desired. However, at times, you’ll need to protect globally available resource such as files, ports, and hardware devices. In these cases, you’ll need systemwide exclusion primitives. This is where the Mutex and Semaphore types come in.

Mutex acts in the same way as Monitor but is unique across all processes on a machine. Acquiring a lock on a Mutex in one process blocks all other processes from acquiring a lock on the same mutex. The global nature of Mutex makes it much more expensive to acquire than a Monitor lock—about two orders of magnitude slower. Mutex has a different syntax for acquiring a lock and doesn’t have a C# language helper equivalent to a Monitor’s lock statement. The following code shows a Mutex in use:

//create a Mutex called “PORT_1234_PROTECT”. The mutex is not initially

//owned by this thread

Mutex portProtector = new Mutex(false, “PORT_1234_PROTECT”);

try {

//acquire the mutex portProtector.WaitOne();

//operations on port that need to be protected

}

finally

{

portProtector.ReleaseMutex();

}

.NET 2.0 introduces a new exclusion primitive called a Semaphore. A Semaphore is similar to a Mutex, but where a Mutex protects access only to a single resource, a Semaphore protects access to multiple instances of a similar resource. Consider a call center: a customer’s call can be routed to any available call center staffer, but as there are usually more callers than staff and a staffer can take only a single call at a time, access to the staffer needs to be locked until any of the staffers become available. A pool of objects shares the same requirement—any object from the pool is as good as the next from a client’s perspective, but because the number of objects in the pool is finite, a locking mechanism needs to be put in place to prevent two clients from acquiring the same pooled object.

Using a Semaphore is almost identical to Mutex. The only difference for a programming model perspective is that a maximum count needs to be set when the Semaphore is created:

//create a Semaphore called “PORT_12xx_PROTECT”. The thread has an initial count

//of 0 on the semaphore. The semaphore has a maximum count of 6 Semaphore mutiplePortProtector = new Semaphore(0, 6, “PORT_12xx_PROTECT”);

try {

mutiplePortProtector.WaitOne();

//operations on port that need to be protected

}

finally

{

mutiplePortProtector.Release();

}

Source: Gunnerson Eric, Wienholt Nick (2005), A Programmer’s Introduction to C# 2.0, Apress; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *