CSE 202: Computer Science II, Winter 2018
Strings
Null-terminated strings

A null-terminated string is an array of characters with the last character having the value zero.

There are actually different types of characters, hence there are different types of null-terminated strings. In this course we will focus on strings and null-terminated strings that use char as the element type.

Whenever you use a string literal such as "hello world", the type of the value of that string literal is an array of N char where N is the number of characters in the string literal exluding the null (zero valued) character.

C++ has several standard functions for working with null-terminated strings, they can be found in <cstring>.

One of these functions is std::strcpy, it is used for copying the characters from one null-terminated string and assigning them to another character array.

char message[] = "hello world!"; char message_copy[32]; std::strcpy(message_copy, message);

Another function, std::strlen calculates and returns the length of a null-terminated string. The length is the number of characters up to but not including the terminating null character.

std::string

C++ offers a class called std::string which stores a resizeable sequence of characters. A null-terminated string can be assigned to an std::string object and an std::string has a member function for converting back into a null-terminated string.

In C++, null-terminated strings are the alternative to the easier-to-use and more modern std::string, or otherwise known as string. In the context of C++, string refers to std::string type and null-terminated string or c-string refers to the null-terminated character array type.

To use the std::string type you need to include <string>. Here are a few expressions on a std::string variable s and their meaning:

Expression Meaning
s = s2 Assigns the string or null-terminated string s2 to s
s[i] Accesses the character at index i in s
s.c_str() Returns the null-terminated string equivalent to s
s.empty() Returns true if s contains no characters
s.size() Returns the number of characters in s
s.clear() Deletes all characters in s
s.insert(i, s2) Inserts the characters from s2 into s starting at index i
s.erase(i, n) Deletes n characters from s starting at index i
s += s2 Appends the characters from s2 to s (This operator can also be used to append a single character).
s.replace(i, n, s2) Equivalent to s.erase(i, n) followed by s.insert(i, s2)
s.substr(i, n) Returns a new string that contains a copy of n characters from s starting at index i
s.find(s2) Returns the index of character sequence in s that matches s2. Returns std::string::npos if no match was found. The type of the value returned is std::string::size_type which is usually an alias for unsigned int.
s + s2 Returns a new string that is the concatenation of s and s2
s == s2 Returns true if the character sequence in s matches that in s2
s != s2 Returns true if the character sequence in s does not match that in s2
Converting from strings to arithmetic types

<cstdlib> contains functions for converting null-terminated strings to integers or floating-points.

Expression Meaning
std::atoi(s) Returns the integer value represented in s as a value of type int.
std::atol(s) Returns the integer value represented in s as a value of type long int.
std::atof(s) Returns the floating-point value represented in s as a value of type double.

Remember, these functions expect a null-terminated string. If you are using an object of type std::string, you must invoke the c_str member function first.

Character classification

The type char, which is the element type of a string, actually counts as an integral type.

The size of char is 8 bits and can hold a value from -128 to 127. For unsigned char, that value goes from 0 to 255.

Every character literal such as 'A', '$', and '8' maps to an integer value which is determined by the ASCII table.

See http://en.cppreference.com/w/cpp/language/ascii for a list of characters and their corresponding integer values.

Character values can be categorized into alphabetic, digits, space, lowercase, uppercase, punctuation, etc.

The following functions from <cctype> can be used to classify characters:

Expression Meaning
std::isalnum(c) Returns true if c is an alphanumeric character
std::isalpha(c) Returns true if c is an alphabetic character
std::islower(c) Returns true if c is a lowercase character
std::isupper(c) Returns true if c is an uppercase character
std::isspace(c) Returns true if c is a space character
std::ispunct(c) Returns true if c is a punctuation character

<cctype> also has functions for converting characters to upper or lowercase:

Expression Meaning
(char)std::toupper(c) Converts c to uppercase
(char)std::tolower(c) Converts c to lowercase
unsigned char and the byte

In C++, the "object representation" of an object refers to the sequence of unsigned char objects existing at the same location as that object in the computer.

unsigned char is the type that represents a byte, which is always an 8-bit unsigned integer. Every object in your program is ultimately an array of bytes, and every type is an aspect applied to an array of bytes.