str: yet another string library for C language.
No Data
Bored with developing the same functionality over and over again, unsatisfied with existing libraries, so decided to make the right one, once and forever. 🙂
Just clone the project and copy (or symlink) the files
str.hand
str.cinto your project, but please respect the license.
String composition:
str s = str_null;str_join(&s, str_lit(", "), str_lit("Here"), str_lit("there"), str_lit("and everywhere"));
str_cat(&s, s, str_lit("..."));
assert(str_eq(s, str_lit("Here, there, and everywhere..."))); str_free(s);
Same as above, but writing to a file:
FILE* const stream = fopen(...);int err = str_join(stream, str_lit(", "), str_lit("Here"), str_lit("there"), str_lit("and everywhere..."));
if(err != 0) { /* handle the error */ }
Disclaimer: This is the good old C language, not C++ or Rust, so nothing can be enforced on the language level, and certain discipline is required to make sure there is no corrupt or leaked memory resulting from using this library.
A string is represented by the type
strthat maintains a pointer to some memory containing the actual string. Objects of type
strare small enough (a struct of a
const char*and a
size_t) to be cheap to create, copy (pass by value), and move. The
strstructure should be treated as opaque (i.e., do not attempt to directly access or modify the fields in this structure). The strings are assumed to be immutable, like those in Java or Go, but only by means of
const char*pointers, so it is actually possible to write to such a string, although the required type cast to
char*offers at least some (mostly psychological) protection from modifying a string by mistake.
This library focusses only on handling strings, not gradually composing them like StringBuffer class in Java.
All string objects must be initialised. Uninitialised objects will cause undefined behaviour. Use the provided constructors, or
str_nullfor empty strings.
There are two kinds of
strobjects: those actually owning the memory they point to, and non-owning references. This property can be queried using
str_is_ownerand
str_is_reffunctions, otherwise such objects are indistinguishable.
Non-owning string objects are safe to copy and assign to each other, as long as the memory they refer to is valid. They do not need to be freed.
str_freeis a no-op for reference objects. A reference object can be cheaply created from a C string, a string literal, or from a range of bytes.
Owning objects require special treatment, in particular: * It is a good idea to have only one owning object per each allocated string, but such a string can have many references to its underlying string, as long as those references do not outlive the owning object. Sometimes this rule may be relaxed for code clarity, like in the above example where the owning object is passed directly to a function, but only if the function does not store or release the object. When in doubt pass such an object via
str_ref. * Direct assignments (like
s2 = s1;) to owning objects will certainly leak memory, use
str_assignfunction instead. In fact, this function can assign to any string object, owning or not, so it can be used everywhere, just to avoid any doubt. * There is no automatic memory management in C, so every owning object must be released at some point, either directly by using
str_freefunction, or indirectly by assignment from
str_assignor a similar function. * An owning object can be passed over to another location by using
str_movefunction. The function resets its source object to an empty string.
It is technically possible to create a reference to a string that is not null-terminated. The library accepts strings without null-terminators, but every new string allocated by the library is guaranteed to be null-terminated.
A string object can be constructed form any C string, string literal, or a range of bytes.
The provided constructors are computationally cheap to apply. Depending on the constructor,
the new object can either own the actual string it refers to, or be a non-owning reference.
Constructors themselves do not allocate any memory. Importantly, constructors are the only
functions in this library that return a string object, while others assign their results
through a pointer to a pre-existing string. This makes constructors suitable for initialisation
of new string objects. In all other situations one should combine construction with assignment,
for example:
str_assign(&dest, str_acquire_chars(buff, n));
Querying a property of a string object (like the length of the string via
str_len) is a cheap operation.
C language does not allow for operator overloading, so this library provides a function
str_assignthat takes a string object and assigns it to the destination object, freeing any memory owned by the destination. It is generally recommended to use this function everywhere outside object initialisation.
An existing object can be moved over to another location via
str_movefunction. The function resets the source object to
str_nullto guarantee the correct move semantics. The value returned by
str_movemay be either used to initialise a new object, or assigned to an existing object using
str_assign.
String composition functions can write their results to different destinations, depending on the type of their
destparameter:
str*: result is assigned to the string object;
int: result is written to the file descriptor;
FILE*result is written to the file stream.
The composition functions return 0 on success, or the value of
errnoas retrieved at the point of failure (including
ENOMEMon memory allocation error).
Just to make things more clear, here is the same code as in the example above, but with comments: ```C // declare a variable and initialise it with an empty string str s = str_null;
// join the given string literals around the separator (second parameter), // storing the result in object "s" (first parameter); in this example we do not check // the return values of the composition functions, thus ignoring memory allocation failures, // which is probably not the best idea in general. strjoin(&s, strlit(", "), strlit("Here"), strlit("there"), str_lit("and everywhere"));
// create a new string concatenating "s" and a literal; the function does not modify its // destination object "s" before the result is computed, also freeing the destination // before the assignment, so it is safe to use "s" as both a parameter and a destination. // note: we pass a copy of the owning object "s" as the second parameter, and here it is // safe to do so because this particular function does not store or release its arguments. strcat(&s, s, strlit("..."));
// check that we have got the expected result assert(streq(s, strlit("Here, there, and everywhere...")));
// finally, free the memory allocated for the string str_free(s); ```
There are some useful code snippets provided to assist with writing code using this library.
typedef struct { ... } str;
size_t str_len(const str s)
const char* str_ptr(const str s)
const char* str_end(const str s)
sthe following condition is always satisfied:
str_end(s) == str_ptr(s) + str_len(s).
bool str_is_empty(const str s)
bool str_is_owner(const str s)
bool str_is_ref(const str s)
str_null
str str_lit(s)
str str_ref(s)
strobject. Implemented as a macro.
str str_ref_chars(const char* const s, const size_t n)
str str_acquire_chars(const char* const s, const size_t n)
sshould be safe to pass to
free(3)function.
str str_acquire(const char* const s)
free(3)function.
str str_move(str* const ps)
str_null, and then returns the saved object.
void str_assign(str* const ps, const str s)
sto the object pointed to by
ps. Any memory owned by the target object is freed before the assignment.
void str_clear(str* const ps)
str_nullafter freeing any memory owned by the target.
void str_swap(str* const s1, str* const s2)
int str_from_file(str* const dest, const char* const file_name)
STR_MAX_FILE_SIZE) into the destination string. Returns 0 on success, or the value of
errnoon error.
int str_cmp(const str s1, const str s2)
bool str_eq(const str s1, const str s2)
int str_cmp_ci(const str s1, const str s2)
strncasecmp(3).
bool str_eq_ci(const str s1, const str s2
bool str_has_prefix(const str s, const str prefix)
sstarts with the specified prefix.
bool str_has_suffix(const str s, const str suffix)
sends with the specified suffix.
int str_cpy(dest, const str src)
srcto the generic destination
dest. Returns 0 on success, or the value of
errnoon failure.
int str_cat_range(dest, const str* src, size_t count)
countstrings from the array starting at address
src, and writes the result to the generic destination
dest. Returns 0 on success, or the value of
errnoon failure.
int str_cat(dest, ...)
strarguments, and writes the result to the generic destination
dest. Returns 0 on success, or the value of
errnoon failure.
int str_join_range(dest, const str sep, const str* src, size_t count)
septhe
countstrings from the array starting at address
src, and writes the result to the generic destination
dest. Returns 0 on success, or the value of
errnoon failure.
int str_join(dest, const str sep, ...)
strarguments around
sepdelimiter, and writes the result to the generic destination
dest. Returns 0 on success, or the value of
errnoon failure.
bool str_partition(const str src, const str patt, str* const prefix, str* const suffix)
srcon the first match of
patt, assigning a reference to the part of the string before the match to the
prefixobject, and the part after the match to the
suffixobject. Returns
trueif a match has been found, or
falseotherwise, also setting
prefixto reference the entire
srcstring, and clearing the
suffixobject. Empty pattern
pattnever matches.
void str_sort_range(const str_cmp_func cmp, str* const array, const size_t count)
strobjects using the given comparison function. A number of typically used comparison functions is also provided: *
str_order_asc(ascending sort) *
str_order_desc(descending sort) *
str_order_asc_ci(ascending case-insensitive sort) *
str_order_desc_ci(descending case-insensitive sort)
const str* str_search_range(const str key, const str* const array, const size_t count)
str_order_asc. Returns a pointer to the string matching the key, or NULL.
size_t str_partition_range(bool (*pred)(const str), str* const array, const size_t count)
predreturns "true" precede the elements for which predicate
predreturns "false". Returns the number of preceding objects.
size_t str_unique_range(str* const array, const size_t count)
for_each_codepoint(var_name, src_string)
src_string(of type
str) by UTF-32 code points. On each iteration the variable
var_name(of type
char32_t) is assigned the value of the next valid UTF-32 code point from the source string. Upon exit from the loop the variable has one on the following values: *
CPI_END_OF_STRING: the iteration has reached the end of source string; *
CPI_ERR_INCOMPLETE_SEQ: an incomplete byte sequence has been detected; *
CPI_ERR_INVALID_ENCODING: an invalid byte sequence has been detected.
The source string is expected to be encoded in the current program locale, as set by the most recent call to
setlocale(3).
Usage pattern: ```c
... str s = ... ... char32_t c; // variable to receive UTF-32 values on each iteration
foreachcodepoint(c, s) { /* process c */ }
if(c != CPIENDOF_STRING) { /* handle error */ } ```
Tokeniser interface provides functionality similar to
strtok(3)function. The tokeniser is fully re-entrant with no hidden state, and its input string is not modified while being parsed.
// declare and initialise tokeniser state str_tok_state state;str_tok_init(&state, source_string, delimiter_set);
// object to receive tokens str token = str_null;
// token iterator while(str_tok(&token, &state)) { /* process "token" */ }
void str_tok_init(str_tok_state* const state, const str src, const str delim_set)
bool str_tok(str* const dest, str_tok_state* const state)
destobject. Returns
trueif the token has been read, or
falseif the end of input has been reached. Retrieved token is always a reference to a slice of the source string.
void str_tok_delim(str_tok_state* const state, const str delim_set)
All the tools are located in
tools/directory. Currently, there are the following tools:
file-to-str: The script takes a file (text or binary) and a C variable name, and writes to
stdoutC source code where the variable (of type
str) is defined and initialised with the content of the file.
gen-char-class: Generates character classification functions that do the same as their
isw*()counterparts under the current locale as specified by
LC_ALLenvironment variable. Run
tools/gen-char-class --helpfor further details, or
tools/gen-char-class --spaceto see an example of its output.
The library requires at least a C11 compiler. So far has been tested on Linux Mint 19.3 and 20, with
gccversions up to 9.3.0, and
clangversions up to 10.0.0; it is also reported to work on ALT Linux 9.1 for Elbrus, with
lccversion 1.25.09.