It's fairly common for programs to have a need to do some simple kinds
of lexical analysis and parsing, such as splitting a command string up
into tokens. You can do this with the strtok
function, declared
in the header file `string.h'.
strtok
.
The string to be split up is passed as the newstring argument on
the first call only. The strtok
function uses this to set up
some internal state information. Subsequent calls to get additional
tokens from the same string are indicated by passing a null pointer as
the newstring argument. Calling strtok
with another
non-null newstring argument reinitializes the state information.
It is guaranteed that no other library function ever calls strtok
behind your back (which would mess up this internal state information).
The delimiters argument is a string that specifies a set of delimiters that may surround the token being extracted. All the initial characters that are members of this set are discarded. The first character that is not a member of this set of delimiters marks the beginning of the next token. The end of the token is found by looking for the next character that is a member of the delimiter set. This character in the original string newstring is overwritten by a null character, and the pointer to the beginning of the token in newstring is returned.
On the next call to strtok
, the searching begins at the next
character beyond the one that marked the end of the previous token.
Note that the set of delimiters delimiters do not have to be the
same on every call in a series of calls to strtok
.
If the end of the string newstring is reached, or if the remainder of
string consists only of delimiter characters, strtok
returns
a null pointer.
Warning: Since strtok
alters the string it is parsing,
you should always copy the string to a temporary buffer before parsing
it with strtok
. If you allow strtok
to modify a string
that came from another part of your program, you are asking for trouble;
that string might be used for other purposes after strtok
has
modified it, and it would not have the expected value.
The string that you are operating on might even be a constant. Then
when strtok
tries to modify it, your program will get a fatal
signal for writing in read-only memory. See section Program Error Signals.
This is a special case of a general principle: if a part of a program does not have as its purpose the modification of a certain data structure, then it is error-prone to modify the data structure temporarily.
The function strtok
is not reentrant. See section Signal Handling and Nonreentrant Functions, for
a discussion of where and why reentrancy is important.
Here is a simple example showing the use of strtok
.
#include <string.h> #include <stddef.h> ... const char string[] = "words separated by spaces -- and, punctuation!"; const char delimiters[] = " .,;:!-"; char *token, *cp; ... cp = strdupa (string); /* Make writable copy. */ token = strtok (cp, delimiters); /* token => "words" */ token = strtok (NULL, delimiters); /* token => "separated" */ token = strtok (NULL, delimiters); /* token => "by" */ token = strtok (NULL, delimiters); /* token => "spaces" */ token = strtok (NULL, delimiters); /* token => "and" */ token = strtok (NULL, delimiters); /* token => "punctuation" */ token = strtok (NULL, delimiters); /* token => NULL */
The GNU C library contains two more functions for tokenizing a string which overcome the limitation of non-reentrancy.
strtok
, this function splits the string into several
tokens which can be accessed by successive calls to strtok_r
.
The difference is that the information about the next token is stored in
the space pointed to by the third argument, save_ptr, which is a
pointer to a string pointer. Calling strtok_r
with a null
pointer for newstring and leaving save_ptr between the calls
unchanged does the job without hindering reentrancy.
This function is defined in POSIX-1 and can be found on many systems which support multi-threading.
strtok_r
with the newstring argument
replaced by the save_ptr argument. The initialization of the
moving pointer has to be done by the user. Successive calls to
strsep
move the pointer along the tokens separated by
delimiter, returning the address of the next token and updating
string_ptr to point to the beginning of the next token.
If the input string contains more than one character from
delimiter in a row strsep
returns an empty string for each
pair of characters from delimiter. This means that a program
normally should test for strsep
returning an empty string before
processing it.
This function was introduced in 4.3BSD and therefore is widely available.
Here is how the above example looks like when strsep
is used.
#include <string.h> #include <stddef.h> ... const char string[] = "words separated by spaces -- and, punctuation!"; const char delimiters[] = " .,;:!-"; char *running; char *token; ... running = strdupa (string); token = strsep (&running, delimiters); /* token => "words" */ token = strsep (&running, delimiters); /* token => "separated" */ token = strsep (&running, delimiters); /* token => "by" */ token = strsep (&running, delimiters); /* token => "spaces" */ token = strsep (&running, delimiters); /* token => "" */ token = strsep (&running, delimiters); /* token => "" */ token = strsep (&running, delimiters); /* token => "" */ token = strsep (&running, delimiters); /* token => "and" */ token = strsep (&running, delimiters); /* token => "" */ token = strsep (&running, delimiters); /* token => "punctuation" */ token = strsep (&running, delimiters); /* token => "" */ token = strsep (&running, delimiters); /* token => NULL */
Go to the first, previous, next, last section, table of contents.