Wednesday, November 21, 2012

The slow and steady tsunami of Open Source

Recently I have started reading some materials I should have read over a decade ago. It is the essay collection The Cathedral & the Bazaar that is written by Eric S. Raymond and, the many authored, Open Sources. Perhaps it is lucky that I didn't read the books back then because it might have lead me to have an even bushier beard and running down the street screaming at people that the source code should always be available (it always should).

When I was a teenager I was dimly aware of the Open Source revolution that was happening. I have a friend that was more in tune what was happening then I was and he introduced me to Linux. It didn't take long for me to get hooked and I have been a proponent of open sourced software ever since.

However supporting an idea and keeping track of the progress is two quite separate things and I have not followed the development of the open software movement. Watching webcasts from the Build event in Redmond it brought to my realization that the open source movement might actually be winning (yes with tiger blood dripping from its teeth). Some of the talks that gave me this feeling is The Future of C++ and It’s all about performance: Using Visual C++ 2012 to make the best use of your hardware. The first presentation talks about the renascence of C++ and mentions Microsofts commitment to the C++11 standard. It also mentions the site isocpp.org which is heavily funded by Mircrosoft. If MS is ready to support a portable language with an open standard without trying to invent their own conflicting standard something must be going right in the world. The second clip handles the new C++ AMP API which is only available on Visual Studio 2012 right now, but it is an open documented API the anyone can port.

Microsoft has even made it possible to use their new release of Team Foundation Server under Linux, here is a blog that shows how to set it up on an virtual machine running Ubuntu.

I think we are seeing more and more of the code that used to be hidden away. API's are pushed closer to the core functionality and more code is shared. Just a while back Facebook shared their internal bag of C++ hacks called folly. Google has always been big on the open source scene with their project hosting and the Summer of Code event and more companies are doing similar things.

If new developers manages not to be torn apart from all the choices that are available I think there is a bright future for those who choose the open source path.

Saturday, September 22, 2012

Compilers should be more evil

Recently I ran in to one of those occasions where you do something incorrectly that should not work, and it still does. In my case it had to do with inheritance of forward declared classes. A simplification of what was done can be seen below

Penguin.h
1:  class Bird;  
2:    
3:  class Penguin : Bird {  
4:  public:  
5:       Penguin();  
6:       virtual ~Penguin();  
7:       void BirdCall();  
8:  };  

Penguin.cpp
1:  #include "Bird.h"  
2:  #include "Penguin.h"  
3:    
4:  Penguin::Penguin() : Bird(false) {  
5:  }  
6:    
7:  Penguin::~Penguin() {  
8:  }  
9:    
10:  void Penguin::BirdCall() {  
11:       cout << "Awk awk awk!" << endl;  
12:  }  

Bird.h
1:  class Bird {  
2:       Bird() {}  
3:       bool canFly;  
4:  public:  
5:       Bird (bool canFly) { this->canFly = canFly; }  
6:       virtual ~Bird() {}  
7:       virtual void BirdCall() = 0;  
8:  };  

My real code was slightly more complicated than this and had a lot less to do with birds. The interesting thing was that the compiler managed to compile this into working code. Forward declared classes such as Bird on the first of Penguin.h cannot be used as base classes. If it could then it would be possible to create circular inheritance and that might very well be the end of the universe.

What probably happened was that the compiler had already parsed the Bird class once and was already aware of it when it came to parsing the Penguin class. However when the compiler "fixes" things like this for us it creates a confusing mess of how things work. The next class that tries to use a forward declared class as a base class might not work and the programmer has no clue why.

This kind of problem leaves a certain amount of room for "luck" when writing software. The is nothing wrong with being lucky, some people build entire careers on it (stock brokers, palm readers and so on). However most people distrust luck and prefers if things behave in a consistent manner and that they can acquire a knowledge on how to receive the same result in each case.  

There can be many things said about the error messages of C++ compilers but I would pick a cryptic error message any day over an inconsistent behavior. I think compilers should be more evil and tell us loud and clear when we do something erroneous.

Monday, August 13, 2012

C++/CLI compile options

When compiling C++/CLI code there is several different compiler modes which sets the level of interaction between managed and unmanaged code. Depending the on what is should be achieved with the code the correct compiler option can be the difference between a finished component and the compiler throwing a tantrum.

This post will deal with the options /clr, /clr:pure, and /clr:safe. There is also the option /clr:oldSyntax which is used for compiling code that uses the managed C++ extension . The managed C++ extension is a predecessor to C++/CLI and uses a completely different syntax.

clr:safe

This option requires that only verifiably type safe code is included in the component. This means that the code cannot access unsafe arrays, that does not carry out boundary checks, or any unsafe pointers. Any calls to native code must therefore be marshaled and the code cannot contain any instances of native types.

1:  #pragma once  
2:  using namespace System;  
3:  namespace ClrSafe {  
4:       public ref class ClrSafeClass  
5:       {  
6:       private:  
7:            double offset;  
8:       public:  
9:            ClrSafeClass(double offsetParam) : offset(offsetParam) {}  
10:            ~ClrSafeClass() {}  
11:            double Translate(double position) { return position + offset; }  
12:       };  
13:  }  

The code above does not contain any references to native code and can hence be compiled with the /clr:safe option. In comparison to C# this would be similar to compile with the Any CPU flag, this is because the code does not reference any specific architecture. 

clr:pure

The option like the /clr:safe option generate only IL (Intermediate Language) output. The difference is that native types and classes are allowed to exist within the component. The compiler makes it so that the calls to native calls are transformed to IL. There are some limitations to this, such as that the native components cannot be declared to export their methods from the DLL. Neither __delcspec(dllexport) nor .def files will export native calls, this is due to that all exported methods are internally declared with __clrcall.

1:  #pragma once  
2:  #include <cmath>  
3:  #include <iostream>  
4:  #include <msclr\marshal_cppstd.h>  
5:  using namespace System;  
6:  using namespace msclr::interop;  
7:  namespace ClrPure {  
8:       public ref class PureClass  
9:       {  
10:       private:  
11:            double exponent;  
12:       public:  
13:            PureClass(double exponentParam) : exponent(exponentParam) {   
14:                 std::string mess = "Using exponential";   
15:                 Console::WriteLine(marshal_as<String^>(mess));   
16:            }  
17:            ~PureClass() {}  
18:            double Power(double base) { return pow(base, exponent); }  
19:       };  
20:  }  

The above code can be compiled with /clr:pure and the Power method, on row 18, that uses a native function in the cmath library generates the following IL code:

1:  .method public hidebysig instance float64   
2:      Power(float64 base) cil managed  
3:  {  
4:   // Code size    15 (0xf)  
5:   .maxstack 2  
6:   .locals ([0] float64 V_0)  
7:   IL_0000: ldarg.1  
8:   IL_0001: ldarg.0  
9:   IL_0002: ldfld   float64 ClrPure.PureClass::exponent  
10:   IL_0007: call    float64 modopt([mscorlib]System.Runtime.CompilerServices.CallConvCdecl) pow(float64,  
11:                                                    float64)  
12:   IL_000c: stloc.0  
13:   IL_000d: ldloc.0  
14:   IL_000e: ret  
15:  } // end of method PureClass::Power  

On row 10 there is a call to the native method expressed in IL: Code compiled with /clr:pure or /clr:safe have native entry points and are slightly more effective since they both produce a none mixed assembly.

clr

The /clr option allows for complete mixing of unmanaged and managed types. This type of assembly has a mixture of IL and native code. The entry point of the DLL is native code that then loads the CLR when it is needed.

1:  #pragma once  
2:  using namespace System;  
3:  namespace Clr {  
4:       class __declspec(dllexport) NativeClass  
5:       {  
6:       private:  
7:            double scaleFactor;  
8:       public:   
9:            NativeClass(double scaleFactor) : scaleFactor(scaleFactor) {}  
10:            virtual ~NativeClass() {}  
11:            double scale(double factor) { return factor*scaleFactor; }  
12:       };  
13:       public ref class ClrClass {  
14:       private:  
15:            NativeClass* nativeClass;  
16:       public:  
17:            ClrClass(double scaleFactor) { nativeClass = new NativeClass(scaleFactor); }  
18:            ~ClrClass() { delete nativeClass; }  
19:            double Scale(double factor) { return nativeClass->scale(factor); }  
20:       };  
21:  }  

The above code has both the possibility to be called by a managed or unmanaged component, where the managed class is merely a wrapper for the native class.

This concludes the C++/CLI compiler options that can be used to create mixed modes and managed DLL’s.  A MSDN article on the compiler options can be found here

Friday, August 10, 2012

Should private data and methods be unit tested?


A central part in OOP (Object Oriented Programming) model is encapsulation which hinders the external usage of some of the object’s components, such as its methods and data. Some OOP languages do not have real encapsulation, Python is an example of this but there is a loose agreement that internal data that begins with two underscores should not be accessed directly.

When it comes to unit testing such objects it can be quite tempting to check that the accessible methods modify the private data in a manner that is according to how the class should work. There are usually ways to achieve this in the programming language without breaking the encapsulation principle totally.

In C# it is possible to mark components as public, protected, private, and internal. protected components of an object are only available for inheritance while private components are only available to the object which they have been declared in. Components of an object that are declared as internal are only available within the software module, however it is possible to allow other modules access to them by adding an attribute to the AssemblyInfo.cs file.

For example if we have a class named AClass that has internal members and we want to have access to them in the unit test project called AClassUnitTest the following line should be added to the AssemblyInfo.cs file which houses the AClass file:

[assembly: InternalsVisibleTo("AClassUnitTest")]

Declaring an internal constructor which allows for injecting mocks into the class that should be tested is good usage of this functionality. The private and protected components can also be accessed through what is called Accessors. There is an article on MSDN on using Accessors on MSDN 

In C++ there is the friend keyword that can be used to access private components of another object.

1:  #include <iostream>  
2:  using namespace std;  
3:  // Predeclaration for AClass so the BClass is aware of it existance.  
4:  class AClass;   
5:  class BClass {  
6:  private:  
7:       int data;  
8:       void printData() { cout << data << endl; }  
9:  public:  
10:       BClass (int insertData) : data(insertData) {}  
11:       friend AClass;  
12:  };  
13:  class AClass {  
14:  private:  
15:       BClass* dataClass;  
16:  public:  
17:       AClass (BClass *insertDataClass) : dataClass(insertDataClass) {}  
18:       virtual ~AClass() { delete dataClass; }  
19:       void fireDataClass() {   
20:            if (dataClass)  
21:                 dataClass->printData(); // Calling the private method on BClass  
22:       }  
23:  };  
24:  int main(int argc, char* argv[]) {  
25:       AClass friendClass(new BClass(10));  
26:       friendClass.fireDataClass();  
27:       return 0;  
28:  }  

In the example above the AClass can access both the private function and data of the BClass  which is due to that the BClass friends AClass on row 11. Simply by pre-declaring a unit test class it is possible to prepare a class so that the internal members are available for unit testing. A more in depth description of the friend keyword and its usage can be found here 

However testing private data with unit test makes the black box, that objects usually are, transparent. Changing the internal functionality of the objet will then carry a high probability of breaking the unit test even though the functional contract of the object has not been violated. There are good grounds to treat objects as black boxes even during the testing of the objects.

Unit test should only verify the externally visible components of an object so it is possible to safely refactor the code of the object without destroying its usability. Rewriting of a test should only occur when the object has had a breaking change to its interface and this should indeed be a time for reflection on what has been done.  

Thursday, August 9, 2012

Unit Testing and Native Code

Old legacy code often comes with a technical debt. Even if the code currently works correctly feature add-ons and bug corrections can cause breaking changes. Rewriting large chunks of legacy code to C# or some other .NET language can be quite expensive. A good solution is to add unit test which increases reduces the risks when altering the code.

There are many good unit testing framework and in this article we will be looking at using Microsoft's framework to test native code. Microsoft's unit testing framework is based in .NET but by using C++/CLI it can be used to test native (unmanaged) code.

I have used visual studio and selected to create an empty Visual C++ project. To the project I have added one class called AClass which has two private types and a couple of functions described in the header file.

AClass.h
1:  #pragma once  
2:  #include <iostream>  
3:  using namespace std;  
4:  class __declspec(dllexport) AClass  
5:  {  
6:  private:  
7:        string name;  
8:        double test;  
9:  public:  
10:       AClass(void) : name(""), test(0.0) {}  
11:       virtual ~AClass(void) {}  
12:        bool setName(const string&);  
13:        bool getName(string&) const;  
14:        void setDouble(double);  
15:        double getDouble(void);  
16:  };  

The implementation is of the class is pretty much straight forward. Setting a new name in the DLL is always successful however getting the name when no name has been set returns false.

AClass.cpp
1:  #include "AClass.h"  
2:  bool AClass::setName(const string &name) {  
3:       this->name = name;  
4:       return true;  
5:  }  
6:  bool AClass::getName(string &name) const {  
7:       if (this->name.empty())  
8:            return false;  
9:       name = this->name;  
10:       return true;  
11:  }  
12:  void AClass::setDouble(double value) {  
13:       this->test = value;  
14:  }  
15:  double AClass::getDouble(void) {  
16:       return this->test;  
17:  }  

To create a unit test for this DLL I created a new Visual C++ project in Visual Studio. In the Visual C++ menu the CLI sub-menu contains a template for creating unit tests. In the code below i have removed some of the auto generated code, which was unnecessary for this example, to make the example more readable.

To make the unit test class find the AClass.h header file so that we can make instances of the class the project must be told of the location of the header file. Setting this is done by selecting Properties for the AClassUnitTest project. A path to the header file should be add In the C/C++ menu to the Additional Include Directories property.

To use the routines in the DLL it must be referenced. There are two options for this:

  1. If the unit test project is in the same solution file as the DLL it is possible to open the Properties from the project menu and expand the Common Properties node. By selecting References and clicking Add New Reference... it is possible to reference the project for the DLL. 
  2. By opening the Properties from the project menu and Expanding the Linker node. Under the Input node the name of the .lib file, which was generated when the DLL was compiled, should be added to the Additional Dependencies property. In this example the name is AClass.lib. The path to the LIB file should be added to the Additional Dependencies property under the General node that is also is in the Linker category. The path to the LIB file is the same as the compile directory for the DLL, which most likely is not the same directory as the output directory where the final DLL is located.  
A good description of creating and using DLL's can be found on MSDN.

For those who are used to creating unit tests in C# all the same functions are available under C++/CLI. However the referencing of static methods in class are as in C++ syntax with the use of  "::" instead of a ".". By using asserts the functionality of the class is verified. 

AClassUnitTest.cpp
1:  #include "stdafx.h"  
2:  #include "AClass.h"  
3:  using namespace System;  
4:  using namespace System::Text;  
5:  using namespace System::Collections::Generic;  
6:  using namespace Microsoft::VisualStudio::TestTools::UnitTesting;  
7:  namespace AClassUnitTest  
8:  {  
9:       [TestClass]  
10:       public ref class UnitTest1  
11:       {  
12:       public:   
13:            [TestMethod]  
14:            void TestMethod1()  
15:            {  
16:                 AClass* target = new AClass();  
17:                 string theName = "The Class";  
18:                 Assert::IsTrue(target->setName(theName));  
19:                 delete target;  
20:            };  
21:            [TestMethod]  
22:            void TestMethod2()  
23:            {  
24:                 AClass* target = new AClass();  
25:                 string response = "";  
26:                 Assert::IsFalse(target->getName(response));  
27:                 delete target;  
28:            };  
29:            [TestMethod]  
30:            void TestMethod3()  
31:            {  
32:                 AClass* target = new AClass();  
33:                 target->setDouble(10.0);  
34:                 Assert::AreEqual(target->getDouble(), 10.0);  
35:                 delete target;  
36:            };  
37:       };  
38:  }  

A very similar example of that I have described above can be found here. Something that is worth noting is the first unit test, TestMethod1. The setName method takes a reference to a native type string, in my first attempt this was not a reference but the type was copied instead. However when the copy of the string passed out of scope in the setName it caused an exception to be thrown when it should be deleted. I was not able to find the exact reason why this occurred but it seemed to be due to calling delete on an already deleted object. This was only a problem when the DLL was used from a unit test, using a C++/CLI console application to call the DLL or an ordinary CLR console application did not cause the error. 

This can be due to the fact that the native string class is in fact a template class which can cause problems when they pass native to managed borders.  This is usually not such a big problem since it is better to pass the string as a constant reference since it can become quite large. However if you are not in possession of the source code then this throws a serious wrench in the machinery.

Thursday, July 26, 2012

C++/CLI and Heap Compaction

I'm currently looking into C++/CLI that is a C++ extension developed by Microsoft which allows for writing managed code with C++ syntax.  Technically this is not an extension since there has been several new key words added to the language and these do not follow the C++ specification of how new keywords should be formatted. There was an attempt at creating a C++ extension, called managed C++, that followed the  guide lines; however the syntax became complicated and ugly and the extension was recreated as the programming language C++/CLI which was designed to produce cleaner and prettier code.

The C++/CLI language can give good insight into what happens within the CLR (Common Language Runtime). C++/CLI also allows for great managed to unmanged (that is code that does not have automatic memory handling) transitions, as well as the other way around. The reason for this is that it allows handling of both native C++ object and types as well as managed .NET objects and types.

There is a lot to write about this subject and hopefully I will produce more blog posts about this topic however in this post I will just talk a bit about the difference in managed and unmanaged objects are handled.

When an object is created in C++ using the new key word memory will be allocated on the heap and a address to the objects location on the heap will be returned. This address is the pointer that the coder must keep track of to use the object and finally call delete on when the object no longer is desired. When delete is called on an object the memory is freed up and can be used by other objects.

Managed object are always added to the managed heap when they are created. When they no longer are used they are collected by the automated GC (Garbage Collector). A big difference between the managed and unmanaged heap is that the managed heap also is compacted. This means that when the GC is running the memory addresses of objects on the heap are moved to create a continuous block of memory. 

Heap Compaction
Managed Heap Compaction
Doing this reduces the risk of heap fragmentation which can lead to out of memory errors even though there is non allocated memory left on the heap. The problem with this is that since the memory addresses are moved around you can no longer use pointers to keep track of your objects. In the managed C++/CLI (and all languages that use Microsoft's CLR) this is done by Tracking Handles, or handles for short. In most of the .NET languages this is handled behind the scenes but in C++/CLI you have to declare your handle by using a caveat, ^, which is similar to the asterisk, *, used by C++ unmanaged allocations.  The downside is that when the heap is compacting it must update all tracking handles with the new memory addresses of the allocated objects. This is one of the reasons why an unmanaged language such as C or C++ will always have a slight performance advantage over a managed language.  In many cases simplicity in the development process outweighs the need for high performance but selecting the right tool for a job is not a judgment call that should be taken lightly. 

Wednesday, July 11, 2012

Member Functions



In my daily work I usually use C# however there is some interactions with C++ in the legacy code. Recently a change woke an dormant bug which took a while to track down due to some misdirection.

A simplified but similar error is displayed in this code.
 #include <iostream>  
 #include <list>  
 using namespace std;  
 class A {  
 private:  
      list<int> IntegerList;  
 public:  
      int getListCount() {  
           return IntegerList.size();  
      }  
 };  
 class B {  
 public:  
      A* AClass;  
 };  
 int main() {  
      B* BClass = new B();  
      if (BClass) {  
           cout << BClass->AClass->getListCount() << endl;  
      }  
      return 0;  
 }  

Member function of classes are about the same thing as usual free functions, they just hide that they require a this pointer to the object calling them as their first argument. This pointer can be null which means that the code does not crash when you call a member function on an uninitialized class instance. The crash first occurs when you try to work with the non-existing memory that you think your class contains.

You always want to crash early when a problem occurs, this simplifies the location of the problem. In the code above the crash does not occur until we try to get the size of the integer list. This problem could have easily been avoided with some defensive programming and RIIA.

In the code below I have made sure that the AClass gets assigned when the BClass gets created and even though I'm pretty certain AClass now exists an assert will make sure we crash at a more logical position in the code.  
 #include <iostream>  
 #include <list>  
 #include <cassert>  
 using namespace std;  
 class A {  
 private:  
      list<int> IntegerList;  
 public:  
      int getListCount() {  
           return IntegerList.size();  
      }  
 };  
 class B {  
 public:  
      B() : AClass(new A()) {}  
      virtual ~B() { delete AClass; }  
      A* AClass;  
 };  
 int main() {  
      B* BClass = new B();  
      if (BClass) {  
           assert(BClass->AClass != NULL);  
           cout << BClass->AClass->getListCount() << endl;  
      }  
      return 0;  
 }  


Tuesday, July 10, 2012

Greetings


I started programming in my teens, my first experience was copying an AMOS program from a computer magazine article. Now, a couple of decades later, I now work as a programmer for a Swedish telecommunications company.

I'm not a guru at any programming language or technique, this blog is not meant to be a shining beacon on the sea of bad code that is out there. This is a place for me to put down thoughts about the things that I come a cross, forcing me to think about them more closely. However, if my way of explaining techniques, useful components, or similar, is of benefit to some one else I wouldn't consider that a bad thing.

I will also use this blog to went some of my personal opinions which is perhaps not ideal for a technical blog but I would never have the discipline to continuously update two blogs. This means that future post might contain ranting on topics that have nothing to do with programming so consider this a fair warning.