High Concept Performance Oriented Alternatives to System.Collections

May 11, 2011 at 3:51 PM
Edited May 11, 2011 at 5:55 PM
namespace AddonStudio.FrameXml.Serialization.Support
{
	/// <summary>
	/// Used to set the inheritance mode for inheritance enabled functions.
	/// </summary>
	public enum InheritMode
	{
		NullOp = 0,		// neither this nor children, no actions are performed
		This = 1,				// act on this dictionary only
		Children = 2,		// act on children only
		All = 3			// act on this dictionary and children
	}

	/// <summary>
	/// A specialized Dictionary class for high performance applications.
	/// </summary>
	/// <remarks>
	/// This infrastructure out of the box is useful for short "token like" string lookups 
	/// where table fill rate and lookup speed are paramount. The underlying numeric key
	/// backbone is useful as a place to start for many other kinds of lookups, including
	/// type/type key pair lookups, as the hash is always differnt for Type, and so on, 
	/// including even regular numeric lookups.
	/// 
	/// The intent of this TokenDictionary is to serve as a replacement for the more generic 
	/// framework Dictionary class in a wide variety of circumstances where performance is
	/// critical, while staying in the realm of safe managed code and with a simplistic and
	/// more importantly self-contained and easily alterable and extensible set of code.
	/// 
	/// TokenDictionary is of the "enough rope to hang yourself" mind set, where the overhead
	/// inherent in Dictionary is stripped away and you are left with more or less a streight
	/// shot right to the core of the set and lookup mechanisms.
	///
	/// Since the underlying generics for reference types in .Net simply uses erasure, where
	/// the type is stripped, and because box transitions are expensive, sometimes it better
	/// jsut to have striehgt access to the underling boxed form or bare reference until you
	/// really need the value. This is the basis for how this class operates, and is the product
	/// of many many cycles usinig a variety of profilers and looking at JIT'd and IL code.
	/// 
	/// Keys:
	/// String keys with TokenDictionary are not designed to be infinatly arbitrary. The default
	/// numeric key generator will provide fast case-insensitive lookups fully coherent for alpha
	/// ascii strings up to 12 charecters long, with some degredation if numbers are used. With the
	/// default generator, characters are compressed to 5-bit values, 0-31 decimal each, and are
	/// compiled into a 64-bit integer, leaving 3 bits for whatever other use, as the sign bit is 
	/// set to 1 to denote a key vs. an empty bucket value for lookup speed. This translates to 
	/// 26 alpha coherent without going into a more elaborate compression scheme. Other characters
	/// can be used such as '_' etc...
	/// See: http://www.cdrummond.qc.ca/cegep/informat/Professeurs/Alain/files/ascii.htm
	/// for an idea which chars can be used together outside of a..z.  Each row will show which 
	/// charecters are considered the to be the same for a value in the numeric key. For example '_'
	/// is the same as '?' and both '[' and ']' are the same as '{' and '}' respectivly.
	/// 
	/// Numeric Keys:
	/// Functions are typically paired to provide different levels of performance and convienence.
	/// In a typical Generics.Dictionary usage a string or other key will be hashed each call, which
	/// especially for strings and some other classes, can be extrememly non-performant. With 
	/// TokenDictionary you can ask for the underlying key ahead of time, for multiple calls even to
	/// differnt dictionaries solong as they were keyed using the same function. A workable string
	/// key generator is supplied already with matching convienence overloads for each major fucntion.
	/// If you are generating your own numeric keys, be aware that the highest order bit must be set,
	/// which is sometimes refered to as the sign bit, for numeric keys to function correctly.
	/// 
	/// Nulls:
	/// TokenDictionary allows use of nulls for values and can determine if a key was set.
	/// 
	/// Exceptions and error trapping:
	/// There are few explicit error trapping or exception handling conventions, for performance
	/// reasons. There are many old school mechanisms however that implicity keep things safe
	/// and coherent. One caviat is that Add is not tolerant of adding keys that are already added.
	/// This is a simple trade off for performance and is the responsibility of the caller to either
	/// know, or use Set, Has or TryGet instead.  Set will add a key if it doesnt exist. Add is useful
	/// when you *do* know and you need the extra performance, which is what this class is all about.
	/// 
	/// Function Naming Conventions:
	/// Any function with 'If' in the name is an inheritance enabled function, which means that it will
	/// use the inheritance mode, child, and children fields for actions it takes. Functions without
	/// 'If' in the name are more performant, but not necessarily by much depending on what is goign on.
	/// 
	/// Cross Assembly Compilation:
	/// The "patch opt out" attribute allows the JIT to consider inlining that function across assembly
	/// boundries. This class is pre-tuned using this attribute, as it does make more than a considerable
	/// difference in performance. However this is also a way to get yourself in a great deal of trouble
	/// very quickly. The intent as written is to use this class in a way where you are always compiling
	/// it with the any other assemblies which will use that specific version of this class. If you make
	/// changes to this class you *must* recompile everyting else...  You must not plan to ever deliver
	/// the assembly that contains this class seperatly, to production, customers, etc... unless you only
	/// change the functions that do not have this attribute set. This is the same as what MS has to do
	/// as far as awareness with thier framework code.  If you are concerned or unsure about the optout
	/// attribute, please comment them out.
	/// 
	/// A few general observances:
	/// - Never transition a boxed element if you dont have to, and generic unbox.any is painful.
	/// - Use if (!((object)a == (object)b)) for not equals for refences if you are in a tight loop.
	///   the (object) forces a reference compare of the type which bypasses Equals and != will be
	///   slower as it will call the == rather than operate on != directly... i swear... lol. The !()
	///   will get JIT'ed to almsot nothing.
	/// - Its sometimes better to box and cart the boxed struct around than to pass and copy structs
	///   if you dont need the value for a while. The struct copy operations take time and sometiems
	///   cause deep CLR action, like the memcpy helper gets invoked over and over. However, if
	///   you are using unsafe code, stucts (that dont hold references) on the stack are your best
	///   friend.
	/// - Its better to box asap rather than pass a nullable around, if you don need nullable
	///   symantics anytime soon, like you dont need the "signature" of a nullable.
	/// - Numeric compares are **way** faster than refernce compares.  What I mean by this is that
	///   its a *huge* chunk of code that gets run for references, and if you have a way to index
	///   a referenced objexct rather than operate on the reference, do it.. it will get compiled into
	///   real JIT'ed code rather than run way down into the clr core, even for null compare.  As a
	///   counter example, java will typically directly compare numerically two references in the
	///   JIT'ed code, or a null and a ref, so its almost free.  Like var isNull = (bob == null);
	///   instead of bob == null, over and over.
	/// - Break functions in to peices that will give the JIT some choices about inlining, which is
	///   almost the same as writing good clean code that shoudl be anyway.  The new JIT does not 
	///   operate on codesize by itself, and will travel only to a certain depth of iterations to 
	///   keep compile time sane. But... it will make some marvelously good choices if you give it
	///   room and clean code, which equates to, if you do it right, extremly terse and good machine
	///   code, almost like mama would have made with c++, and often better. If you do it wrong you
	///   will feel like your program is running on an Atari 2600.
	/// - Avoid unnessarily doign things that cause deep CLR functionality to run, meaning avoid anytng
	///   that will invoke the static c++ code burried deep in the CLR. A few things that do are: refernce
	///   compares, Object.GetHash(), Unbox.Any, explicit string intern, Casting to a subclass (casting
	///   to a super class is almsot free, and use 'as' and 'is' instead), etc... reference compares can
	///   fall back to, up to, compareing the vtables and other meta data, its nto like a pointer compare.
	/// - Conversely, of those type of things, GetType() is is very fast, and always returns the same
	///   object and the same hash.
	/// - If its a short function or a fucntion that really doesnt need a lot of vars, try to keep the
	///   number of stack vars to minumum. JIT wont always be able to figure out from the IL whats
	///   transient and not, and sometimes end up with a register suffling carnival. Like dont always
	///   hoist bob.Lenght out of the loop into a stack var. The JIT will know exactly what that is and 
	///   its exact context and do a much better job for you if you dont. Also, 
	///   prefer: "for(){ var scopevar = blah; for() { var scopevar2 = blahblah", 
	///		over: "var scopevar; var scopevar2; for(){ for(){"  
	/// - Conversely *do* perfer stack vars for class members. JIT can sometimes throw away some concerns
	///   and produce better code if you put a number or refence in function scope.  
	///   like: "var animal = this.animal; for () { animal animal animal...}".
	/// - List<> is suprisingly docile performance wise. if you do
	///   this: "for(int i=0; i  list.Count; i++ ) var o = list[i]; o o o o ..."
	///   the actual jit for the list[i] will be something
	///   like: " my jit'd code; if (i>const) throw; o = array[i]; more my jit'd code;" right inline in
	///   machine code. Infact the copy for native types with list all inlined and jitted is nearly
	///   what you would or could have done with hand assembly, overflow check not wiht standing.
	///   So if you are all XNA-ing things up, or doing someting thats killing perf otherwise List<>
	///   is fine. Hats off to MS for that one.
	/// - Perfer non-Linq if it realy matters.  Idioms for any supoprting <> type list like thing
	///   (if you want to put as much in real JIT code as possible) do:
	///   <code>
	///   var controls = Controls;		// function call
	///   var count = controls.Count;	// probably function call
	///   UserControl control;			// jit to nothing - exact destination type you want
	///	  for(int i=0;i<>count;i++)		// shuffle three register vars
	///     if (!((object)control=controls[i] as MyUserControl) == null)) //- exact filter type you want
	///		{	// get_Item func call, 2x downcast(nop), 'as' almost inline, ref compare (CLR static code)
	///			control control control .... 
	///		}
	///		
	///   or:
	///   
	///   var list = SomeWoudBeEnumberableWithDeligateAndFinallyAnd3PlusFunctionCallsPerIteration.Whatevers;
	///   var count = list.Count;	// probably func call, often not
	///	  for(int i=0;i<>count;i++)
	///		if (Equals.String(name, list[i].Name, ...IgnoreCase)
	///			break;				// get_Item, Name (possibly inline if its yours), StrEq call 
	///   </code>
	///       
	/// </remarks>
	public class TokenDictionary
	{
		#region Initialization

		// inheritance

		/// <summary>
		/// Sets the inheritance mode for inheritance enabled functions.
		/// </summary>
		internal InheritMode mode = InheritMode.All;
		/// <summary>
		/// Sets a single child for single inheritance.
		/// </summary>
		internal TokenDictionary child;
		/// <summary>
		/// Sets a multiple children for multiple inheritance.
		/// </summary>
		/// <remarks>
		/// The child and children fields are mutually exclusive.  If child field is set then
		/// children field will be ingnored. This is due to the inherent expense of setting up
		/// the loop for checking multiple children and is much faster just to manage these two
		/// explicitly when they are set.
		/// </remarks>
		internal TokenDictionary[] children;

		// tracking
		private int defcap;
		private int defgrow;
		protected int size;
		protected int capacity;
		private int overflow;

		// storage
		internal ulong[] keys;
		internal object[] values;
		internal int[] compile;

		/// <summary>
		/// Creates a new TokenDictionary.
		/// </summary>
		/// <param name="capacity">Initial capacity.</param>
		/// <param name="grow">Amount to grow during resize.</param>
		/// <remarks>
		/// TokenDictionaries will only shrink if manually cleared, using Clear().
		/// TokenDictionaries are meant for very high get and fill rates with minimal GC pressure. 
		/// </remarks>
		public TokenDictionary(int capacity = 10, int grow = 10)
		{
			this.capacity = this.defcap = capacity;
			Resize();
			this.defgrow = grow;
		}

		/// <summary>
		/// Resize the TokenDictionary.
		/// </summary>
		// this is built for fast lookups
		//   its ok to completely refabricate the dictionary as bigger bucket size means less collisions
		private void Resize()
		{
			this.capacity += defgrow;

			var n = capacity + capacity;		// must account for possiblity of gethash always returning same value
			var oldkeys = keys;
			var oldvalues = values;
			var oldoverflow = overflow;
			keys = new ulong[n];
			values = new object[n];
			compile = new int[n];
			size = 0; overflow = capacity;

			ulong key;
			for (int i = 0; i < oldoverflow; i++)
				if ((key = oldkeys[i]) != 0)
					Add(key, oldvalues[i]);
		}

		/// <summary>
		/// Clears the current keys for this Dictionary.
		/// </summary>
		/// <remarks>
		/// Does not affect inherited dictionaries, or clear their associations.
		///	Will shrink the dictionary storage back to its initial capacity.
		/// </remarks>
		private void Clear()
		{
			size = overflow = 0;
			capacity = defcap;
			Resize();
		}
		#endregion

		#region Core Functions
		/// <summary>
		/// Gets an Xor'd hashed 32bit version of a 64bit Key.
		/// Useful for keying a type+name value pair.
		/// </summary>
		[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
		public static uint GetKey32(ulong key)
		{
			return unchecked((uint)key ^ (uint)(key >> 32));
		}

		/// <summary>
		/// Gets an id for a named key for use with other function.
		/// </summary>
		[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
		public ulong GetKey(string key)
		{
			return StringGenerator.GetKey2(key);
			//unchecked
			//{
			//    var p = key;
			//    var n = p.Length;
			//    if (n > 12) n = 12;
			//    ulong k = 0x8000000000000000;
			//    for (int i = 0; i < n; i++)
			//        k = k | ((ulong)(p[i] & 0x001F) << (i * 5));
			//    return k;
			//}
		}

		/// <summary>
		/// Gets the bucket index for a key.
		/// </summary>
		[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
		public int GetBucket(ulong key)
		{
			return unchecked((int)(key % (uint)capacity));
		}

		/// <summary>
		/// Adds a value key pair.
		/// </summary>
		/// <remarks>
		/// *Not* safe if key is already added. Use Set or Has if the state is unknown.
		/// Add is faster than Set if a key does not already exist.
		/// </remarks>
		[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
		public void Add(string key, object value)
		{
			Add(GetKey(key), value);
		}
		public void Add(ulong key, object value)
		{
			if (size++ == capacity)
				Resize();

			//var ibucket = (int)((uint)RuntimeHelpers.GetHashCode(key) % (uint)capacity);
			var ibucket = unchecked((int)(key % (uint)capacity));
			if (keys[ibucket] != 0)
			{
				int bucket = compile[ibucket];					// get next value
				while (bucket != 0)								// find the last one on the chain
					bucket = compile[ibucket = bucket];			// ibucket should end up on last good node
				ibucket = compile[ibucket] = overflow++;		// ibucket should now be on the new element
			}
			values[ibucket] = value;
			keys[ibucket] = key;
		}

		/// <summary>
		/// Determine if this Dictionary contains the key.
		/// </summary>
		[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
		public bool Has(string key)
		{
			return (size > 0) && Has(GetKey(key));
		}
		public bool Has(ulong key)
		{
			var ibucket = unchecked((int)(key % (uint)capacity));

			while (!(keys[ibucket] == key))
				if ((ibucket = compile[ibucket]) == 0)
					return false;

			return true;
		}

		/// <summary>
		/// Gets the value for a key.
		/// </summary>
		[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
		public object Get(string key)
		{
			return (size > 0) ? Get(GetKey(key)) : null ;
		}
		public object Get(ulong key)
		{
			var ibucket = unchecked((int)(key % (uint)capacity));

			while (!(keys[ibucket] == key))
				if ((ibucket = compile[ibucket]) == 0)
					return null;

			return values[ibucket];
		}

		/// <summary>
		/// Sets a value for the key.
		/// </summary>
		[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
		public void Set(string key, object value)
		{
			Set(GetKey(key), value);
		}
		public void Set(ulong key, object value)
		{
			var ibucket = unchecked((int)(key % (uint)capacity));

			while (!(keys[ibucket] == key))
				if ((ibucket = compile[ibucket]) == 0)
					{ Add(key, value); return; }

			values[ibucket] = value;
		}

		/// <summary>
		/// Removes a key.
		/// </summary>
		[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
		public void Remove(string key)
		{
			if (size > 0)
				Remove(GetKey(key));
		}
		public void Remove(ulong key)
		{
			int ibucket = unchecked((int)(key % (uint)capacity)), libucket = ibucket;

			while (!(keys[ibucket] == key))
				if ((ibucket = compile[libucket=ibucket]) == 0)
					return;

			compile[libucket] = compile[ibucket];		// re-link compile
			keys[ibucket] = 0;
			values[ibucket] = null;
			size--;
		}

		/// <summary>
		/// Gets a value for a key, if the key exists.
		/// </summary>
		[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
		public bool TryGet(string key, out object value)
		{
			if (size == 0)
			{
				value = null;
				return false;
			}
			return TryGet(GetKey(key), out value);
		}
		[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
		public bool TryGet(ulong key, out object value)
		{
			var ibucket = unchecked((int)(key % (uint)capacity));
			while (!(keys[ibucket] == key))
				if (size == 1 || (ibucket = compile[ibucket]) == 0)
				{ value = null; return false; }

			value = values[ibucket];
			return true;
		}
		#endregion

		#region Convienience Functions
		[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
		public bool ContainsKey(string key)
		{
			return Has(key);
		}
		[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
		public bool ContainsKey(ulong key)
		{
			return Has(key);
		}

		public object this[string key]
		{
			[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
			get { return Get(key); }
			[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
			set { Set(key, value); }
		}
		public object this[ulong key]
		{
			[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
			get { return Get(key); }
			[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
			set { Set(key, value); }
		}

		[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
		public bool TryGetValue(string key, out object value)
		{
			return TryGet(key, out value);
		}
		[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
		public bool TryGetValue(ulong key, out object value)
		{
			return TryGet(key, out value);
		}
		#endregion

		#region Inheritance Enabled Functions
		/// <summary>
		/// Determines if a value is set. Follows the inheritance chain.
		/// </summary>
		[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
		public bool IfHas(string key)
		{
			return IfHas(GetKey(key));
		}
		public bool IfHas(ulong key)
		{
			// See if property bag contains a value for the property
			if ((mode & InheritMode.This) != 0)
				if (size != 0 && Has(key))
					return true;

			// If property bag inherits from another property bag and inheritance enabled, delegate GetValue
			if ((mode & InheritMode.Children) != 0)
			{
				var t = this;
				for (var n = t.child; (object)n != null; t = n, n = n.child)
					if (n.size != 0 && n.Has(key))
						return true;

				if ((object)t.children != null)
					foreach (var i in t.children)
						if (IfHas(key))
							return true;
			}
			return false;
		}

		/// <summary>
		/// Gets a value if set. Follows the inheritance chain.
		/// </summary>
		[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
		public bool GetIfHas<T>(string key, out T value)
		{
			object o;
			if (GetIfHas(GetKey(key), out o))
			{
				value = (T)o;
				return true;
			}
			value = default(T);
			return false;
		}
		public bool GetIfHas(ulong key, out object value)
		{
			// See if property bag contains a value for the property
			if ((mode & InheritMode.This) != 0)
				if (TryGet(key, out value))
					return true;
	
			// If property bag inherits from another property bag and inheritance enabled, delegate GetValue
			if ((mode & InheritMode.Children) != 0)
			{
				var t = this;
				for (var n = t.child; (object)n != null; t = n, n = n.child)
					if (n.TryGet(key, out value))
							return true;

				if ((object)t.children != null)
					foreach (var i in t.children)
						if (i.GetIfHas(key, out value))
							return true;
			}

			value = null;
			return false;
		}

		/// <summary>
		/// Gets only an inherited value, if set. Follows the inheritance chain.
		/// </summary>
		/// <remarks>
		/// This function is a convience function and is the same as calling GetIfHas
		/// with the exception that it sets the current inheritance mode to
		/// InheritMode.Children for you, then sets it back to the previous mode.
		/// </remarks>
		[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
		public bool GetInheritedIfHas<T>(string key, out T value)
		{
			object o;
			if (GetInheritedIfHas(GetKey(key), out o))
			{
				value = (T)o;
				return true;
			}
			value = default(T);
			return false;
		}
		[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
		public bool GetInheritedIfHas(ulong key, out object value)
		{
			var oldmode = mode;
			mode = InheritMode.Children;

			var rv = GetIfHas(key, out value);

			mode = oldmode;
			return rv;
		}

		/// <summary>
		/// Get all values including inherited values, if any are set. Follows the inheritance chain.
		/// </summary>
		[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
		public bool GetAllIfHas<T>(string key, List<T> result)
		{
			return GetAllIfHas<T>(GetKey(key), result);
		}
		public bool GetAllIfHas<T>(ulong key, List<T> result)
		{
			bool rv = false;
			object o;

			// See if property bag contains a value for the property
			if ((mode & InheritMode.This) != 0)
			{
				if (TryGet(key, out o) || o != null)
				{
					result.Add((T)o);
					rv = true;
				}
			}

			// If property bag inherits from another property bag and inheritance enabled, delegate GetValue
			if ((mode & InheritMode.Children) != 0)
			{
				var t = this;
				for (var n = t.child; (object)n != null; t = n, n = n.child)
					if (n.TryGet(key, out o) || o != null)
					{
						result.Add((T)o);
						rv = true;
					}

				if ((object)t.children != null)
					foreach (var i in t.children)
						if (i.GetAllIfHas<T>(key, result))
							rv = true;
			}

			return rv;
		}
		#endregion
	}

Coordinator
May 11, 2011 at 5:59 PM

Hey Celess, did you get the email I sent you a few weeks back about AddOn Studio?