Archive

C++ - Part 2



Tony Houghton

Last month, I introduced the features of C++ which can be used for improved C programming. This month will start my look at Object Oriented Programming with C++'s classes.

Encapsulation

All programs need data as well as procedures, and some of that data has to be global as opposed to local. Consider the various ways C allows us to store, globally, some information about the current screen mode and the consequences when we need to change the way that data is stored. Looking at some simplified example header files, the simplest way would be:

/* modeinfo.h */
extern int screen_mode;
extern int screen_width;
extern int screen_height;
/* Read mode and calculate dimensions from VDU variables */
extern void read_mode_info(void); 

With a bit of experience, you would probably be inclined to tidy this slightly with a struct. (When declaring a single object of a certain type, I find it convenient to use capital initials in the class/struct definition and lower case for the variable itself.)

/* modeinfo.h */
struct ModeInfo {
  int mode;
  int width;
  int height;
};
extern struct ModeInfo mode_info;
/* Read mode and calculate dimensions from VDU variables */
extern void read_mode_info(void); 

In both these cases, you would call read_mode_info() at every mode change, and every time you wanted to know the mode number, you would access the variable directly with statements such as x=screen_mode; or x=mode_info.mode;

What happens when you want your program to work on a Risc PC? For its exciting new modes, it uses pointers to mode selector blocks in place of mode numbers and the data these point to are not guaranteed to remain valid. This could cause your program to crash, so you would then have to read the mode number from the operating system every time you needed it. Rather than calling read_mode_info() every time, you would add a function int get_mode(void);. In a very long program with many files, it would be quite a nuisance replacing every screen_mode or mode_info.mode with read_mode().

To avoid having to change hundreds of lines of code every time a trivial implementation detail changes, large projects are developed with small functions to access global variables instead of accessing the variables directly. You may have noticed this in Acorn's C libraries. It also aids debugging if you know that something can only be accessed by a small group of related functions. Obviously C++'s inline functions make the technique far more efficient but there are further advantages in C++. By using classes, the data can be encapsulated with their associated functions, and there is greater control over what parts of a program can access members of a class. A class for modeinfo would look like:

// modeinfo.h
class ModeInfo {
  int width;
  int height;
public:
  void read_info(void);
  int get_mode(void);
  int get_width(void){return width;}
  int get_height(void){return height;}
};

The most obvious change is that the functions are now part of the class definition, and there is also a new keyword, public. Incidentally, it seems to be conventional to call functions that read a member get_<member name>(), and functions that write a member set_<member name>(). Before explaining classes in more detail, I need to define some of the terminology.

Terminology

A class, analogous to a struct, defines the way an object is represented. The term class can also be used to cover struct's and even union's. In fact, a struct is now a class in C++, and you can add functions to struct's. The only difference is that the members of a struct are, by default, public and those of a class are private (see Access control). An object is a variable created from a class (such as mode_info above). A member is the term for anything which belongs to a class or object (e.g. mode, read_info()), and a member function is also known as a method. To call a member function is sometimes referred to as sending a message to the object.

There is also the term translation unit. This means a single .c++ or .c file plus all its headers.

Reference to members

Member functions of a class can refer to other members of the class with just their member name, as width and height are referred to above. Outside the class, its members must be qualified with the . operator, as for struct's - this goes for member functions as well as variables.

Defining member functions

Member functions can be defined in two main ways. get_width() and get_height() are inline functions. By including their definitions with their declarations, it is unnecessary to use the keyword inline. To define the functions elsewhere, you use the same syntax for ordinary functions except that the name is qualified with the class name and the :: qualifier:

void ModeInfo::read_info(void)
{
  // Use OS_ReadVduVariables or
  // OS_ReadModeVariable to calculate 
  // dimensions. Actual code is not
  // needed to make the point.
}
int ModeInfo::get_mode(void)
{
  _kernel_swi_regs r;
  r.r[0] = 135;
  _kernel_swi(6, &r, &r);// OS_Byte135 reads mode
  return r.r[0];
}

As an alternative to including a method definition in the class definition (which may make it look cluttered), you can use the ordinary declaration/definition syntax, but prefix the method definition with inline. Unlike ordinary inline functions, inline methods do not have static linkage - all class members must have global linkage.

Member functions, or methods, behave exactly like ordinary functions in the way they can take arguments and return values.

Access control

There are three types of access that can be applied to members: private, public and protected. These are the actual keywords that C++ uses. private members can only be accessed by other members of the same class. protected members are similar, but they can also be read by any class which is derived from the base class. I will discuss derived classes (inheritance) in a future article. public members can be accessed by any part of the program that has 'seen' the class definition.

One of these keywords followed by a colon specifies that all the following members have that particular access restriction. You can use the keywords over again in the same class in any order. For instance, you may want to start off with some public members, then define some private inline methods which refer to the first members, then switch back to some public methods.

Members in a class start off as private by default, and members in a struct, default to public. It is common for all member variables to be private and all member functions to be public. There is usually no reason for using public member variables - that justifies the lack of the protection that encapsulation offers - but private methods can be useful to perform functions that are needed by several public methods, but not by any other parts of the program.

As a quick example of how to access class members:

#include "modeinfo.h"
ModeInfo mode_info;
int main()
{
  mode_info.read_info();	
          // same syntax for class 
          // methods as for struct 
          // variable members
  int x = mode_info.width;
	         // error, width is private
  int y = mode_info.get_height();
          // OK
}

Initialisation

C++ introduces a new way of initialising objects. Where you would have written:

int a = 4;

you can now write:

int a(4);

The new parameter notation is important for initialisation of classes, but I prefer to stick to the old way (=) where possible. Whilst it is usually quite easy to distinguish between assignment = and initialisation =, it is possible, in some cases, for initialisation by parameter notation to be confused with an actual function call.

Constructors and destructors

Sometimes, using an object before it has been initialised can be disastrous. C++ classes can be provided with constructors which are guaranteed to be called for each object as it is created. If the object is a permanent variable, it will be constructed before main() is executed. Overloading (covered in a later article) allows a class to have more than one constructor.

Conversely, you can provide destructors to deallocate memory used by objects, etc, when they are no longer in use. For a local object, the destructor is called when it goes out of scope (at the end of a function or block); for a permanent object it is called during the program's exit handler (after executing main() or calling exit(), but not if abort() is called); for an object created by new, its destructor is called by delete. When a group of objects goes out of scope simultaneously (e.g. all global variables), the objects are destructed in the reverse order from which they were created. Constructors can take arguments, but destructors can not, and neither return a value. The class listed overleaf, to implement a 'safe' array, will illustrate the above and more:

Working from the top, the first new thing is the const member, size. The only time this can be written to is at the start of construction. Member array will be used to point to the actual array during construction. check_subscript() is an example of when a private method is useful. It checks whether a subscript is within the range of the array. get_element() and set_element() use this to make sure you cannot cause a crash by trying to access a part of the array which does not exist. Actually, as this method has no possible harmful effects, it would be more useful public, you never know if another part of the program might want to check subscripts before attempting access. Note the word const after its declaration.

class Array {
  const int size;
  int *array;
  int check_subscript(int subscript) const
    {return (subscript>=0 && subscript<size);}
public:
  Array(int size);// Constructor
  ~Array(void) {if (size) delete[] array;}// Destructor
  int get_size(void) {return size;}
  void set_element(int subscript, int value);
  int get_element(int subscript) const;
};
Array::Array(int size) : Array::size(size)
{
  if ((size<=0) || (array = new int[size], !array))
  {
   // raise error
  }
}
void Array::set_element(int subscript, int value)
{
  if (!check_subscript(subscript))
  {
   // raise error
  }
  else array[subscript] = value;
}
int Array::get_element(int subscript)
{
  if (!check_subscript(subscript))
  {
   // raise error
  }
  else return array[subscript];
}
Array array1; // Error: no parameter for constructor
Array array2(100); // An array of 100 int's
void f(void)
{
  Array temp_array(10); // This is constructed every time f()
is called
  // Do something with temp_array
}
// temp_array is destructed at the end of f()
int main()
{
  Array *new_array = new Array(200); // Note similar notation
for new
  f();
  // Do something with new_array and array2
  delete new_array; // Destructor is called
}
// After executing main() array2 is destructed

This tells the compiler that it will not alter anything in the object. This allows you to define whole objects that are const e.g.

const Array carray(10);

Only const methods (apart from constructors and destructors) can be called for const objects. Similar restrictions apply to volatile - the compiler needs to know that a method is volatile to avoid applying optimisations. Constructors and destructors may not be const or volatile. Constructors are allowed to write members of const objects.

A method with the same name as its class is its constructor, but if preceded by a ~, it is its destructor. Note the lack of a type before the constructor and destructor, both in their declarations and definitions.

Now look at the definition of Array's constructor. After its argument list (int size), there is a colon followed by an initialisation list before the body of the function. The initialisation list is where you initialise const members, and base classes in the case of derived classes. In fact, any members can be initialised at this point, but methods cannot be called. If there is a list of initialisers, they must be separated by commas, not further colons. The order in which members are initialised depends on the order in which they are originally declared, not their order in the initialisation list - this allows C++ to ensure they are destructed in the correct order.

Qualification of the size member by Array:: would not normally be necessary, but it is needed here to specify the member while its name is hidden by the argument of the same name. :: is C++'s way of resolving ambiguities with duplicated variable names. Also, a class name followed by :: explicitly associates a member with a class - this is the class qualifier notation.

A constructor can be used explicitly to create a temporary object e.g.:

Array a = Array(100);
return Array(100);

These statements could cause dangerous side effects, see Caveat below.

A destructor can be called explicitly as if it were any other method, but this is rarely useful.

Classes with constructors taking arguments, but no constructors without arguments, cannot be included in an array:

Array array_of_Arrays[10];
         // Error: Array constructors 
         // need arguments

A class with any constructor at all cannot be part of a union.

Caveat

Consider the following dangerous program:

// Definition of Array or include etc
int main()
{
  Array array1(100);
  Array array2 = array1;
  // ...
}

The assignment of array1 to array2 is perfectly valid; instead of calling array2's constructor, each member of array1 is copied to the corresponding member of array2 (memberwise copy), as for a C struct. The problem occurs when array1 and array2 are destructed. The destructor will call delete[] for array2's array member, then do the same for array1. The trouble is, both objects are sharing the same pointer, so deleting it twice may cause a crash. This would be resolved by overloading either the constructor, the operator =, or both. Furthermore, if array2 had been defined separately before copying from array1, its data would be left floating around on the free store - harmless in itself, but inefficient.

Self reference

Sometimes it is necessary for an object to be able to pass on some sort of reference to itself. This is done by the keyword this. In the scope of a class X, including the scope of its members, this is predefined as:

X *const this;

i.e. a const pointer to a non-const object. For const objects this is:

const X *const this;
     // const pointer to const object

Casting this can be allowed to cheat on const. Sometimes, you may have objects which you want to appear const from the outside, but its members need to write to some hidden internal data. This can be done by casting away the const:

X *non_const_this = (X*) this;
non_const_this->member = ...

or by reference (see Archive 8.11):

X &non_const_this = *((X*) this);
non_const_this.member = ...

Static members

Class members can be declared static. This means that there will only be one member of that name shared between all objects. Member functions, as well as variables, can be static. static methods can only access static members - there is no this pointer for static methods. Do not confuse static as applied to members with static applied to plain functions and variables. The latter is for restricting scope to a single translation unit. All class members have global linkage, so static can only be used in the definition, not the declaration.

static member variables are not implicitly created when an object is created, so they must be defined exactly once. This is done, in the same way as for methods, by defining the member as if it were an ordinary variable but prefixing its name with the class name and :: (i.e. the class qualifier).

Friends

A friend of a class is a function, or another class, which can access the private members of the class that declares it a friend. A friend of a class is declared by declaring it within the class definition prefixed by friend. Suppose we needed a function to multiply a vector class with a matrix class. It cannot be a member of both classes, but it can be a friend of both. In both class declarations, you would include the line:

friend vector multiply(const matrix &,const vector &);

depending on the actual definition of multiply(). The argument types have to be included, in case of overloading.

Methods of one class can be friends of another. This is done in the same way as functions, but with the full method name (qualified by its class). A class X can be a friend of class Y by including:

friend class X;

in Y's definition. A friend class behaves simply as if all its members are friends. The keyword class allows classes to be made friends when they have not yet been defined.

Friend functions are very useful for RISC OS event handlers. You will often declare an event handler as a friend of a class and register it with an object's pointer as its handle. Within the handler function, its handle can be cast back to a pointer to class, and operations performed on the object's data.

Nested classes

Class definitions can be nested within other classes:

class Array {
  class linked_list {
    // Implementation of a linked list
  };
  LinkedList linked_list;
  // rest of Array definition
};

It is almost always better to define non-trivial classes separately: a LinkedList is highly likely to be useful in other parts of the program; in the example, its definition is only available within Array. If you did keep the definition local, you would be more likely to reduce the two LinkedList expressions to class { ... } linked_list;.

As far as access restrictions are concerned (public, private, etc), the same rules apply to nested classes as for other members.

One sort of type that can usefully be defined in a class is an enum. Suppose the LinkedList can be one of several different types of linked list and we need a way to differentiate between them:

class LinkedList {
public:
  enum linkage {single, double};
// ...
};

Outside LinkedList's scope linkage values can only be referred to by qualifying them i.e. LinkedList::single or LinkedList::double. Furthermore, this is only possible if the enum is public, so you will usually see enum's in the public part of a class. Nested enum's are used frequently in the streams libraries for flag values.

In fact, any type can be defined within a class and, if it is public, referred to outside the class by qualifying it with the class name.

Pointers to members

Suppose we had a class representing a real life object, holding two sound samples, an effect for the noise that the object makes and the spoken name of the object:

class LifeObject {
  // ...
public:
  enum sample_type {effect, spoken};
  void *effect_data;
  void *spoken_data;
  void play(void *effect, /* Other data e.g. volume */);
  // ...
};

This is actually a poorly designed class being used in a silly way. The pointers should be private and the following carried out within the class. This is just to demonstrate the syntax for pointers to members.

A pointer to one of the void * members would have the type (called lo_dt_ptr):

typedef LifeObject::void *lo_dt_ptr;

A pointer to the play method would have the type:

typedef void (LifeObject::*lo_fn_ptr)(void *, /* Other args */);

and be used as follows:

void play_a_sample(LifeObject *obj, LifeObject::sample_type which)
{
  lo_dt_ptr sample_ptr;
  lo_fn_ptr func_ptr = &LifeObject::play;
  switch (which)
  {
    case LifeObject::effect:
      sample_ptr = &LifeObject::effect_data;
      break;
    case LifeObject::spoken:
      sample_ptr = &LifeObject::spoken_data;
      break;
  }
  (obj->*func_ptr)(obj->*sample_ptr, /* ... */);
  // Parentheses necessary to avoid interpretation as (obj->*)
                                                  (func_ptr(...));
}

Note that when assigning to pointer to member the syntax is:

<pointer> = &<class name>::<member name>;

The pointer is not bound to any particular object. To use it with an object, the syntax is:

<object>.*<pointer to member> or <pointer to object>->
                                             *<pointer to member>

The member pointed to by a pointer to member can only be used by specifically attaching the pointer to an object. Casting a pointer to member to a pointer to a real object or function and then attempting to use it would probably cause a run-time crash.


Contents - The Archives - Archive Articles