Data Representation and Data Types

Data Representation

Most of us write numbers in Arabic form, ie, 1, 2, 3,..., 9. Some people write them differently, such as I, II, III, IV,..., IX. Nomatter what type of representation, most human beings can understand, at least the two types I mentioned. Unfortunately the computer doesn't. Computer is the most stupid thing you can ever encounter in your life.

Modern computers are built up with transistors. Whenever an electric current pass into the transistors either an ON or OFF status will be established. Therefore the computer can only reconize two numbers, 0 for OFF, and 1 for ON, which can be referred to as BIT. There is nothing in between Bit 0 and Bit 1 (eg Bit 0.5 doesn't exist). Hence computers can be said to be discrete machines. The number system consists only of two numbers is called Binary System. And to distinguish the different numbering systems, the numbers human use, ie 1,2,3,4..., will be called Decimals (since they are based 10 numbers) from now on.

How, therefore, can computer understand numbers larger than 1? The answer is simple, 2 is simply 1+1, (like 10 = 9+1 for human) the numbers are added and overflow digit is carred over to the left position. So (decimal) 2 is representated in Binary as 10. To further illustrate the relationship, I have listed the numbers 1 to 9 in both systems for compaison:

Decimal	Binary
0	0000 0000
1	0000 0001
2	0000 0010
3	0000 0011
4	0000 0100
5	0000 0101
6	0000 0110
7	0000 0111
8	0000 1000
9	0000 1001

You may ask why do I always put 8 binary digits there. Well, the smallest unit in the computer's memory to store data is called a BYTE, which consists of 8 BITS. One Byte allows upto 256 different combinations of data representation (2⁸= 256). What happens when we have numbers greater than 256? The computer simply uses more Bytes to hold the value, 2 Bytes can hold values upto 65536 (2¹⁶) and so forth.

ASCII FORMAT

Not only does the computer not understand the (decimal) numbers you use, it doesn't even understand letters like "ABCDEFG...". The fact is, it doesn't care. Whatever letters you input into the computer, the computer just saves it there and delivers to you when you instruct it so. It saves these letters in the same Binary format as digits, in accordance to a pattern. In PC (including DOS, Windows 95/98/NT, and UNIX), the pattern is called ASCII (pronounced ask-ee) which stands for American Standard Code for Information Interchange.

In this format, the letter "A" is represented by "0100 0001" ,or most often, referred to decimal 65 in the ASCII Table. The standard coding under ASCII is here. When performing comparison of characters, the computer actually looks up the associated ASCII codes and compare the ASCII values instead of the characters. Therefore the letter "B" which has ASCII value of 66 is greater than the letter "A" with ASCII value of 65.

Data Types

The computer stores data in different formats or types. The number 10 can be stored as numeric value as in "10 dollars" or as character as in the address "10 Main Street". So how can the computer tell? Once again the computer doesn't care, it is your responsibility to ensure that you get the correct data out of it. (For illustration character 10 and numeric 10 are represented by 0011-0001-0011-0000 and 0000-1010 respectively — you can see how different they are.) Different programming launguages have different data types, although the foundamental ones are usually very similar.

C++ Basic Data Types (C++ specific)

C++ has many data types. The followings are some basic data types you will be facing in these chapters. Note that there are more complicated data types. You can even create your own data types. Some of these will be discussed later in the tutorial.

Data Type	Bytes	Data Range	Remarks
char	1	ASCII -128 to127
unsigned char	1	ASCII 0 to 255	including high ASCII chars
int	2	-32768 to 32767	Integer
unsigned (unsigned int)	2	0 to 65535	non-negative integer
long int	4	± 2 billions	double sized integer
unsigned long int	4	0 to 4 billion	non-negative long integer
float	4	3.4 ±e38	6 significant digits
double	8	1.7 ±e308	15 significant digits

char is basically used to store alphanumerics (numbers are stored in character form). Recall that character is stored as ASCII representation in PC. ASCII -128 to -1 do not exist, so char accomodates data from ASCII 0 (null zero) to ASCII 127 (DEL key). The original C++ does not have a String data type (but string is available through the inclusion of a library — to be discussed later). String can be stored as an one-dimensional array (list) with a "null zero" (ASCII 0) store in the last "cell" in the array. Unsigned char effectively accomodates the use of Extended ASCII characters which represent most special characters like the copyright sign ©, registered trademark sign ® etc plus some European letters like è, é, etc. Both char and unsigned char are stored internally as integers so they can effectively be compared (to be greater or less than).

Whenever you write a char (letter) in your program you must include it in single quotes. When you write strings (words or sentences) you must include them in double quotes. Otherwise C++ will treat these letters/words/sentences as tokens (to be discussed in Chapter 4). Remember in C/C++, A, 'A', "A" are all different. The first A (without quotes) means a variable or constant (discussed in Chapter 4), the second 'A' (in single quotes) means a character A which occupies one byte of memory. The third "A" (in double quotes) means a string containing the letter A followed by a null character which occupies 2 bytes of memory (will use more memory if store in a variable/constant of bigger size). See these examples:
letter = 'A';
cout << 'A';
cout << "10 Main Street";

int (integer) represents all non-frational real numbers. Since int has a relatively small range (upto 32767), whenever you need to store value that has the possibility of going beyond this limit, long int should be used instead. The beauty of using int is that since it has no frational parts, its value is absolute and calculations of int are extremely accurate. However note that dividing an int by another may result in truncation, eg int 10 / int 3 will result in 3, not 3.3333 (more on this will be discussed later).

float, on the other hand, contains fractions. However real fractional numbers are not possible in computers since they are discrete machines (they can only handle the numbers 0 and 1, not 1.5 nor 1.75 or anything in between 0 and 1). No matter how many digits your calculator can show, you cannot produce a result of 2/3 without rounding, truncating, or by approximation. Mathameticians always write 2/3 instead of 0.66666.......... when they need the EXACT values. Since computer cannot produce real fractions the issue of significant digits comes to sight. For most applications a certain significant numbers are all you need. For example when you talk about money, $99.99 has no difference to $99.988888888888 (rounded to nearest cent); when you talk about the wealth of Bill Gates, it make little sense of saying $56,123,456,789.95 instead of just saying approximately $56 billions (these figures are not real, I have no idea how much money Bill has, although I wish he would give me the roundings). As you may see from the above table, float has only 6 significant digits, so for some applications it may not be sufficient, espically in scentific calculations, in which case you may want to use double or even long double to handle the numbers. There is also another problem in using float/double. Since numbers are represented internally as binary values, whenever a frational number is calculated or translated to/from binary there will be a rounding/truncaion error. So if you have a float 0, add 0.01 to it for 100 times, then minus 1.00 from it (see the codes here or get the executable codes here), you will not get 0 as it should be, rather you will get a value close to zero, but not really zero. Using double or long double will reduce the error but will not eliminate it. However as I mentioned earlier, the relevance may not affect our real life, just mean you may need to exercise caution when programming with floating point numbers.

There is another C++ data type I haven't included here — bool (boolean) data type which can only store a value of either 0 (false) or 1 (true). I will be using int (integer) to handle logical comparisons which poses more challenge and variety of use.

Escape Sequences

Escape Sequences are not data types but I feel I would better discuss them here. I mentioned earlier that you have to include a null zero at the end of a "string" in using an array of char to represent string. The easiest way to do this is to write the escape sequence '\0' which is understood by C++ as null zero. The followings are Escape Sequences in C++:

Seq	Meaning	Seq	Meaning	Seq	Meaning
\a	Alarm	\t	Tab	\"	Double Quote
\b	Backspace	\v	Vertical Tab	\000	Octal Num
\f	Form Feed	\\	Backslash	\xhh	Hex number
\n	New Line	\?	Question Mark	\0	Null Zero
\r	Carriage Return	\'	Single Quote

Type Definition

Earlier I said you can create your own data types. Here I will show you how. In fact you not only can create new data types but you can also create an alias of existing data type. For example you are writing a program which deals with dollar values. Since dollar values have fractional parts you have to either use float or double data types (eg assign float data type to salary by writing float salary. You can create an alias of the same data type MONEY and write MONEY salary. You do this by adding the following type definition into your program:

typedef double MONEY;

You can also create new data types. I will discuss more on this when we come to Arrays in Chapter 10. But the following illustrates how you create a new data type of array from a base data type: