The Weird and Wonderful World of Garbage Collection

Fundamentals

Each process has its separate address space. The CLR allocates a segment in memory to store and manage objects. This address space is called managed memory heap as opposed to a native heap. All threads in the process allocate objects on the managed object heap

How Garbage Collection Works

The garbage collection is an automatic process of reclaiming memory from the objects that are no longer in use. It provides several benefits:
· Enables you to develop without having to explicitly fee memory, using ‘delete’ and thus eliminates the potential bugs associated with this manual process whereby a developer deletes and objects still in use or forgets to do so leading to memory leaks

· Significantly improves the allocation performance. In order to allocate an object in CLR world, all the framework has to do is to advance the next object pointer, relying on the fact that the memory is compacted

The garbage collection occurs when the system becomes low on memory; the size of the managed heap surpasses the acceptable threshold or ‘GC.Collect()‘ function is called, triggering the collection.

Generations

The managed heap is further segregated into a large object heap LOH and a small object heap. The small object heap is split into 3 generations: Gen0, Gen1 and Gen2 however this depended on the platform, for example if you’re developing using Xamarin for Android you only have 2 generations.

The generations dictate how often the garbage collection is performed.

Generation 0

This is the youngest generation almost all objects are initially allocated in this generation, unless they are larger than 85,000 bytes in which case they are allocated in LOH.

Most objects are reclaimed by garbage collection in generation 0 however the ones that do survive move on to generation 1.

Generation 1

This generation is a buffer between short lived objects and long-lived objects

Generation 2

This generation contains long lived objects. GC tends to collect objects in this area of memory space quite infrequently.

What Happens During Garbage Collection

A garbage collection has the following steps:
· Marking phase – traverses the graph of all objects and marks them as alive. Each object (‘class’) has a special flag that enables this. Structs in turn do not have this field since they don’t live on the heap

· Relocation of the references to the objects that will be compacted. It should be noted that the GC works around the pinned objects. Pinned objects are usually the ones following by the fixed(…) statement. This pins the object in memory thus allowing to pass the pointer to that object to unmanaged code and at the same time guarantee that the pointer wouldn’t change. This is critical to the correct operation, it also hinders the performance of the garbage collector, hence the use of pinned objects should be minimal.

· Compacting phase that reclaims the space occupied by the dead objects. It moves all objects to the beginning of the memory segment and makes next object pointer, point to the end of the segment.

Originally the LOH was never compacted, which lead to fragmentation and excessive memory usage; however since version of 4.5.1 CLR provides the ability to defragment the LOH by setting ‘GCSettings.LargeObjectHeapCompactionMode

The garbage collector uses the following information to determine whether objects are live:
· Stack roots. Stack variables provided by the just-in-time (JIT) compiler and stack walker.

· Garbage collection handles. Handles that point to managed objects and that can be allocated by user code or by the common language runtime.

· Static data. Static objects in application domains that could be referencing other objects. Each application domain keeps track of its static objects.

Finilizers and Managing Unmanaged Resources

If your object uses any of the unmanaged resources it has to provide the ability to free them. The common pattern is to use ‘IDisposable’ interface to deterministically dispose of unmanaged resources. In case the ‘Dispose’ method isn’t used, the developer should provide a backup, in the form of a Finilizer. The finalizer should only be called if the client code never called the ‘Dispose()’ method. So you logic should handle the case that if the dispose method was called the finaliser never executes. You can achieve that by calling ‘GC.SuppressFinalize(this)’. If you don’t the object will be in freechable queue, then in finalisation queue and even though your object is in Gen0 it will be cleaned up after several garbage collections. ‘GC.SuppressFinalize(this)’ removed the object from the freechable queue eliminating almost all performance drawbacks of having a finilizer. Why almost? Well there is a small catch. When an object has Finilizer it adds itself into freechable queue. If the objects are allocated from different threads and add themselves to freechable queue, that creates contention, theoretically slowing the allocation down due to synchronisation.

To be continued…

Mongo – DateTime Issue

There is an issue with Mongo DB (it is an issue if you don’t need that feature). Whenever you pass a date in an entity to a Mongo DB server in a different timezone it will automatically adjust the DateTime value of the object property.

The problem is very easy to solve by decorating the DateTime property with

[BsonDateTimeOptions(Kind = DateTimeKind.Local)]
public DateTime TradeTime { get; set; }

where you do need the time component or

[BsonDateTimeOptions(DateOnly = true)]
public DateTime ReportingDate { get; set; }

when you only care about the date.

An even better solution is to set a global setting of the C# driver:

BsonSerializer.RegisterSerializer(typeof(DateTime), new DateTimeSerializer(DateTimeSerializationOption(DateTimeSerializationOptions.LocalInstance));

string.Intern(…) – Minimising Memory Footprint

I had a one particular view (in the UI) that loaded denormalised data with many repetitive strings. There was nothing much we could do to minimise the volume of the data loaded, however during profiling I realised that most memory was consumer by strings, and most of them were the same.

I remembered a feature of a ‘string’ to intern the value i.e. two or more string having the same value would point to the same memory location without duplication. Applying this technique allowed me to save about 2/3 of the memory consumed. There was one downside though, this memory would never be reclaimed.

So now I’m thinking of writing my own interning using WeakReference

Anonymous Disposable – Adapter that makes any object IDisposable

There are quite a few API’s out there that have for example a ‘Close()’ method but don’t implement dispose. Or if you want to put something under ‘using’ statement and preform a certain action when the ‘using’ finishes.

Here is a very simple example that alloys you to turn anything into IDisposable.

public static class Diposable
{
    public static IDisposable Create(Action action)
    {
        return new AnonymousDisposable(action);
    }
    private struct AnonymousDisposable : IDisposable
    {
        private readonly Action _dispose;
        public AnonymousDisposable(Action dispose)
        {
            _dispose = dispose;
        }
        public void Dispose()
        {
            if (_dispose != null)
            {
                _dispose();
            }
        }
    }
}

Here is an example :

var resource = ...; using(Disposable.Create(()=> resource.Close()) { /// .... }

Roslyn – C# Compiler

I’m obviously very glad that the compiler has now been rewritten from scratch allowing
it to support many new features. Some I consider very handy and some I just don’t get (not in the sense that I don’t know what they do, but rather what benefit they bring to the .NET world) As a C# developer I will focus on C# features.

Null-Conditional Operator

One of the improvements, they say is the conditional check for null:
Whereas before you had to write something like this to check for null

public static string Truncate(string value, int length)
{
    string result = value;
    if (value != null) // Skip empty string check for elucidation
    {
        result = value.Substring(0, Math.Min(value.Length, length));
    }
    return result;
}


Now it's much shorter:

public static string Truncate(string value, int length)
{
    return value?.Substring(0, Math.Min(value.Length, length));
}

At first it seems like a very handy tool, but my worry is that this feature will be greatly abused. Some developers would be tempted to write this type of code:

public static string Truncate(string value, int length)
{
    return value?.Item1?.Item1?.Customer?.FirstName;
}

Obviously the alternative would be much bulkier, however this approach takes away from the developer the ability to throw a 'NullReferenceException'. Say we always expect the 'Customer' to be not null. If it's null it's a bug, so an exception has to be thrown. With the shorthand notation it's very easy to miss that.

However in this case it's quite handy although only a couple of lines of code are eliminated

OnTemperatureChanged?.Invoke(this, value)

Auto-Property Initialisers

I have mixed fillings about this. This generally allows you to create a auto-property with a 'get' accessor only while still being to supply a value:

public class FingerPrint
{
    public DateTime TimeStamp { get; } = DateTime.UtcNow;
}

"As the code shows, property initializers allow for assigning the property an initial value as part of the property declaration. The property can be read-only (only a getter) or read/write (both setter and getter). When it’s read-only, the underlying backing field is automatically declared with the read-only modifier. This ensures that it’s immutable following initialization."

When I first saw this I was unsure whether it would return current of 'DateTime.UtcNow' or the static value.

Nameof Expressions

This one I think is a very useful feature and gives you an ability to get the name of the parameter.

public void Foo(string message)
{
    if(string.IsNullOrWriteSpace(message))
    {
       throw new ArgumentException(nameof(message));
    }
}

This would also come in handy when implementing 'INotifyPropertyChanged' interface. Currently you either have to use expressions or '[CallerNameAttribute]'

Primary Constructors

This alloys you to write constructors in the following manner:

public struct Pair(T first, T second)
{
    public T First { get; } = first;
    public T Second { get; } = second;
}

The whole purpose is to eliminate code bloat and I think this is quite intuitive.

Expression Bodied Functions and Properties

public override string ToString() = string.Format("{0}, {1}", First, Second);

I can see its use. However some developers will be inclined to implement a more complex logic using this technique. So the rule should be that this is used for cases when the override is very simple. If you have to use

public override string ToString()
{
    //complex logic
}

In this case it makes more sense to do an override using the old fashioned way.

I'm glad that Microsoft decided not to go ahead with index properties, this was just another way of doing something that you could already do.

Using Static Methods

I like this feature. Quite often you have to work with a static class, for example Math and then you end up with Math.Min(..) * Math.Pow(..) etc. The new feature would eliminate having Math. , by importing the class like you would import a namespace:

using System.Math;
public static void Main(string [] args)
{
    var result = Min(x, 0.5) * Max(y, 1);
}

References:
http://msdn.microsoft.com/en-us/magazine/dn802602.aspx
https://roslyn.codeplex.com/

Dos and Don’ts – Best Code Practises

Today I will be presenting my ideas based on the analysis of the code issues I found while re-factoring. Some of the issues I have found are not too serious, but there are many areas where it could be done better (much better) So here are the notes:

* Solution has to build. It might seem obvious but the developers here don’t follow one of the most important rule. When checking in the code into the branch, especially if there is more than one person working on it, it has to build! So many times I come in in the morning and seeing that someone checked in the code that doesn’t compile and that in turns means that I cannot do much until that is fixed.
Rule #1 – the code you check in must compile, full stop!

* Tuples – I was never in favour of tuples, but they have their use, but that use should be constrained to internal implementation. Tuples should never be exposed in a public interface to prevent other developers working on the code guessing what Item1, Item2 …Itemn stand for.
Rule #2 – do not expose tuples in public interfaces, use a class or a struct, which ever one is more appropriate.

* Singletons and Static members – don’t use them. If you need a singleton like behavior do it using you preferred IoC container. There is a valid case for using static methods, if the method is stateless you should declare it as static, but that is different.
Rule #3 – Do not use static members and singletons.

* Separation of Concerns – don’t create classes that do ‘everything’, don’t create a facade to a database. Instead split the class into smaller chunks each responsible to do one thing and do it well. That simplifies development and lets you to trace what is being used and where. It quite often that you would then realise that not everything is used and some code can be removed.
Rule #4 – Split large classes into smaller ones.

* Exception Handling – always catch those exception that can be handled. There are valid reasons for catching all exceptions, however before doing that, put catch statements for exceptions that you can do something about. The generic catch statement then can just be used for logging
Rule #5 – Catch the most specific exceptions first and generic exceptions last.

* Dispose and Finally – You should always implement IDisposable if any of the fields are disposable. Think hard before implementing a finiliser. When objects with finilisers are instantiated they are placed into the finalisation queue. While it is fine when the application is single threaded it could create contention to finilisation queue when instatiating those objects from multiple thread. But if you doing that it is a definite sign that you’re doing something wrong.
Rule #6 – Implement IDisposable if any of the fields is IDisposable. Use finalisers with caution

Misunderstanding of Defensive Coding – Timere Erroris Syndrome

While going through the codebase of the counterparty risk reporting system I came across thousands of cases where you can sense that the developer is afraid to write code that throws an exception.

The most primitive examples is returning true/false if the operation completed successfully, but this one is very easy to spot and therefore fix. The same applies to handling all exceptions.

But the one which doesn’t stand out very often is the usage of ‘as’ instead of plain casting or combination of ‘as’ and then a null check. Without a null check there is a possibility of spotting a bug at the early stages of testing, even though instead of ‘InvalidCastException’ you get a ‘NullReferenceException’. But with a null check in place the bug would go pretty much unnoticed. The question should then be – why a certain parameter, item in the array, should not be of the particular type? If there is a valid reason or are you just “scared”?

Why put code inside finally statement with empty try statement

When browsing .NET source code you might come across an empty try clause and some code in the finally clause:

try
{
}
finally
{
// few lines of code here
}

The answer is to guarantee the execution of the code in case something calls Abort() on the thread. Since .NET 2.0 execution of the code in the finally statement is guaranteed even if something calls Abort() on the thread. In the earlier versions of.NET it was possible that the finally clause is never executed. I don’t expect to ever write anything that would leverage this, however it still nice to know.

RangeObservableCollection – Pragmatic Implementation

This implementation improves the behaviour of the standard ObservableCollection when dealing with a large number of elements. This collection does not fire an event for each inserted element, but fires a reset event when adding elements

    public class RangeObservableCollection<T> : ObservableCollection<T>
    {
        private static readonly NotifyCollectionChangedEventArgs ResetChangedArgs = new NotifyCollectionChangedEventArgs(NotifyCollectionChangedAction.Reset);
        private static readonly PropertyChangedEventArgs CountChangedEventArgs = new PropertyChangedEventArgs("Count");
        private static readonly PropertyChangedEventArgs ItemChangedEventArgs = new PropertyChangedEventArgs("Item[]");
 
        public RangeObservableCollection(IEnumerable<T> items)
            : base(items)
        {
        }
 
        public RangeObservableCollection(List<T> list)
            : base(list)
        {
        }
 
        public RangeObservableCollection()
        {
        }
 
        public void RemoveRange(IEnumerable<T> list)
        {
            if (list == null) throw new ArgumentNullException("list");
 
            foreach (T item in list)
            {
                Items.Remove(item);
            }
 
            OnCollectionChanged(ResetChangedArgs);
            OnPropertyChanged(CountChangedEventArgs);
            OnPropertyChanged(ItemChangedEventArgs);
        }
 
        public void AddRange(IEnumerable<T> list)
        {
            if (list == null) throw new ArgumentNullException("list");
 
            foreach (T item in list)
            {
                Items.Add(item);
            }
            OnCollectionChanged(ResetChangedArgs);
            OnPropertyChanged(CountChangedEventArgs);
            OnPropertyChanged(ItemChangedEventArgs);
        }
 
        public void RemoveAll()
        {
            Items.Clear();
            OnCollectionChanged(ResetChangedArgs);
            OnPropertyChanged(CountChangedEventArgs);
            OnPropertyChanged(ItemChangedEventArgs);
        }
 
        public void ReplaceAll(IEnumerable<T> list)
        {
            if (list == null) throw new ArgumentNullException("list");
            Items.Clear();
            this.AddRange(list);
        }
    }

Blog at WordPress.com.

Up ↑