Skip to content

Latest commit

 

History

History
940 lines (789 loc) · 20.7 KB

character-strings.md

File metadata and controls

940 lines (789 loc) · 20.7 KB
sidebar_position
1

Character Strings (C string)

Learning Outcomes

After reading this section, you will be able to:

  • Design data collections using arrays to manage information efficiently
  • Stream data using standard library functions to interact with users

Introduction

Although some original programming languages focused on processing numerical information, most languages include extensive features for processing textual data. Textual data involves sets of characters and these sets are often referred to as character strings. The C language libraries provide facilities for processing character strings, treated as arrays of characters with a special delimiter.

This chapter introduces these C-style strings, highlights their distinguishing feature and notes the advantage of using strings to pass textual data from one function to another. This chapter include the conversion specifiers for input and output of character strings.

Definition (review)

A string is a char array with a special property: a terminator element follows the last meaningful character in the string. We refer to this terminator as the null terminator and identify it by the escape sequence '\0'.

char
                                  \0

Term Definition
The term "Meaningful Characters" in these notes refers to the actual data content you want to manage in the C string character array.

The null terminator has the integral value of 0 on any host platform (in its collating sequence). All of its bits are 0's. The null terminator occupies the first position in the ASCII and EBCDIC.

The index identifying the null terminator element is the same as the number of meaningful characters in the string.

char
name
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
M y   n a m e   i s   A r n o l d \0

HINT
The number of memory locations (char array size) occupied by a C string is one more than the number of meaningful characters in the string.

Allocating Memory

We allocate memory for a C string in the same way that we allocate memory for an array. Since the null terminator is one of the elements in the array, we must allocate memory for one extra character than the number of meaningful characters.

For example, to allocate memory for a string with up to 30 meaningful characters, we write:

char name[31]; // 30 chars plus 1 char for the null terminator byte

Initializing Memory

To initialize a string at the time of memory allocation, we follow the definition with the assignment operator and the set of initial characters enclosed in braces.

const char name[31] = {'M','y',' ','n','a','m','e',' ','i','s',' ', 
                       'A','r','n','o','l','d','\0'};

For a more compact form we enclosed the list of meaningful characters in double quotes.

const char name[31] = "My name is Arnold";  // null-byte is automatically appended

The C compiler copies the characters in the string literal into the character string and appends the null-byte terminator after the last character copied.

char
name
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
M y   n a m e   i s   A r n o l d \0                          

Since the number of initializers (18) is less than the number of elements (31) available, the compiler fills the uninitialized elements with 0's.

String Handling

Arrays of numbers require a separate variable to hold the number of elements that are filled. Unlike arrays of numbers, character strings do not require a separate variable for sizing. In iterations on the characters in a string, we check for the presence of the null terminator in our test conditions.

Iterations

The following program displays the string stored in name[31] character by character:

// Iterations on Strings
// string_iterations.c

#include <stdio.h>

int main(void)
{
        int i;
        const char name[31] = "My name is Arnold"; 

        for (i = 0; name[i] != '\0'; i++)
                printf("%c", name[i]);

        putchar('\n');

        return 0;
}

The above program produces the following output:

My name is Arnold

Functions

Using a character string instead of an array of characters with a separate sizing variable achieves a more compact argument list for function calls. For example:

// Strings To Functions
// string_to_function.c

#include <stdio.h>
void print(const char name[]);

int main(void)
{
        int i;
        const char name[31] = "My name is Arnold"; 

        print(name);
        return 0;
}

void print(const char name[])
{
        int i;

        for (i = 0; name[i] != '\0'; i++)
                printf("%c", name[i]);

        putchar('\n');
}

The above program produces the following output:

My name is Arnold

Formatted String Input

The scanf() and fscanf() library functions support conversion specifiers particularly designed for character string input. These specifiers are:

  • %s - whitespace delimited set
  • %[] - rule delimited set

The corresponding argument for these specifiers is the address of the string to be populated from the input stream.

%s

The %s conversion specifier

  • reads all characters until the first whitespace character
  • stores the characters read in the char array identified by the corresponding argument
  • stores the null terminator in the char array after accepting the last character
  • leaves the delimiting whitespace character and any subsequent characters in the input buffer

For example:

char name[31];
scanf("%s", name);  // <=== User enters: My name is Arnold

The scanf function will stop accepting input after the character y and stores the following:

char
name
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
M y \0                                                        

The characters ' name is Arnold' remain in the input buffer.

A qualifier on the conversion specifier limits the number of characters accepted. For instance, %10s reads no more than 10 characters:

char name[31];
scanf("%10s", name); // <=== User enters: Schwartzenegger

The scanf function will stop accepting input after the character n and stores the following:

char
name
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
S c h w a r t z e n \0                                        

By specifying the maximum number of characters to be read at less than 31, we ensure that scanf() does not exceed the memory allocated for the string.

%s discards all leading whitespace characters.

For example, if enter many spaces before the Schwartzenegger value:

char name[31];
scanf("%10s", name);  // <=== User enters: '          Schwartzenegger'

Just as before, the scanf function will stop accepting input after the character n but will also discard the leading spaces entered and stores the following:

char
name
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
S c h w a r t z e n \0                                        

Because %s discards leading whitespace, it cannot accept an empty string; that is, %s does not treat a '\n' in an otherwise empty input buffer as an empty string. If the buffer only contains '\n', scanf("%10s", name) discards the '\n' and waits for non-whitespace input followed by another '\n'.

%[]

The %[] conversion specifier accepts input consisting only of a set of pre-selected characters. The brackets contain the admissible and/or inadmissible characters. The symbol ^ prefaces the list of inadmissible characters. The symbol - identifies a range of characters in an inclusive set.

For example, the %[^\n] conversion specifier:

  • reads all characters until the newline ('\n')
  • stores the characters read in the char array identified by the corresponding argument
  • stores the null terminator in the char array after accepting the last character
  • leaves the delimiting character (here, '\n') in the input buffer

For example:

char name[31];
scanf("%[^\n]", name);  // <=== User enters: My name is Arnold

The scanf function accepts the full line an stores:

char
name
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
M y   n a m e   i s   A r n o l d \0                          

A qualifier on this conversion specifier before the opening bracket limits the number of characters accepted. For instance, %10[^\n] reads no more than 10 characters:

char name[31];
scanf("%10[^\n]", name);  // <=== User enters: My name is Arnold

The scanf function will store:

char
name
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
M y   n a m e   i s \0                                        

We specify the maximum number of characters as the qualifier to ensure that scanf() does not store more characters than the allocated memory for the array size.

%[ ], like %s, ignores any leading whitespace characters.

For example:

char name[31];
scanf("%10[^\n]", name);  // <=== User enters: '          My name is Arnold'

The scanf function will store:

char
name
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
M y   n a m e   i s \0                                        

Caution

Because %[ ] ignores leading whitespace, it cannot accept an empty string; that is, %[^\n] does not treat a '\n' in an otherwise empty input buffer as an empty string. If the input buffer only contains '\n', scanf("%[^\n]", name), unlike %s, returns 0 and leaves name unchanged.

Example:

Consider a text file named spring.dat that contains:

Light Jacket
Long-Sleeved Shirts
Large Skateboards

The following program reads and displays this data:

// Reading from a file
// readFromFile.c

#include <stdio.h>

int main(void)
{
        FILE *fp = NULL;
        char phrase[61];

        fp = fopen("spring.dat","r");

        if (fp != NULL)
        {
                while (fscanf(fp, "%60[^\n]%*c", phrase) != EOF)
                        printf("%s\n", phrase);
                
                fclose(fp);
        } 
        else
        {
                printf("Failed to open file\n"); 
        }

        return 0;
}

The above program produces the following output:

Light Jacket
Long-Sleeved Shirts
Large Skateboards

String Output

Formatted Output

The printf() and fprintf() library functions support the %s conversion specifier for character string output. The corresponding argument is the address of the character string or string literal. Under this specifier printf() displays all of the characters from the address provided up to but excluding the null terminator byte. For example:

// Displaying Strings
// displayStrings.c

#include <stdio.h>

int main(void)
{
        const char name[31] = "My name is Arnold"; 

        printf("%s\n", name);

        return 0;
}

The above program produces the following output:

My name is Arnold
// Writing to a File
// writeToFile.c

#include <stdio.h>

int main(void)
{
        FILE *fp = NULL;
        const char phrase[] = "My name is Arnold";

        fp = fopen("alpha.txt","w");

        if (fp != NULL) 
        {
                fprintf(fp, "%s\n", phrase);
                fclose(fp);
        } 
        else 
        {
                printf("Failed to open file\n"); 
        }

        return 0;
}

Qualifiers

Qualifiers on the %s specifier add detail control:

  • %20s displays a string right-justified in a field of 20
  • %-20s displays a string left-justified in a field of 20
  • %20.10s displays the first 10 characters of a string right-justified in a field of 20
  • %-20.10s displays the first 10 characters of a string left-justified in a field of 20

Unformatted Output

The puts() and fputs() library functions output a character string to the standard or specified output device respectively.

puts

The prototype for puts() is:

int puts(const char *);

The parameter receives the address of the character string to be displayed. For example:

// Displaying Lines
// puts.c

#include <stdio.h>

int main(void)
{
        const char name[31] = "My name is Arnold"; 

        puts(name);

        return 0;
}

The above program produces the following output:

My name is Arnold

fputs

fputs() writes a null-terminated string to a file. The prototype for fputs() is:

int fputs(const char *str, FILE *fp);

str receives the address of the string to be written and fp receives the address of the FILE object. fputs() returns a non-negative value if successful; EOF in the event of an error.