One of the first things I wanted to do in the REPL was some string manipulation. But it was tedious.
To trim whitespace, and I mean all whitespaces, we had to define
#\Space #\Newline #\Backspace #\Tab #\Linefeed #\Page #\Return
#\Rubout
.
To concatenate two strings: either giving an unusual 'string
argument to
concatenate
, like this:
(concatenate 'string "fo" "o")
either we had to use a format
construct, which is another source of
frustration for (impatient) beginners, and sure isn’t straightforward
and self-explanatory.
Many common stuff was split in various external libraries
(cl-ppcre
), and many common stuff was made more difficult than
necessary (weird format construct again, entering a regexp, thus
esaping what’s necessary, when all you want to do is simple search and
replace, dealing with strings’ lengths and corner cases, lack of
verbs,… see below).
And all of that with many inconsistencies (the string as first argument, then as the last, etc).
So I just joined everything in a little library, which has now more features. Let’s see its code and its tests to learn at the canonical way to do stuff, their shortcomings, and the library api at the same time.
I just don’t know how come this lib didn’t exist yet.
str
You can install it with
(ql:quickload "str")
See on https://github.com/vindarel/cl-str.
Package definition
(in-package #:asdf-user)
(defsystem :str
:source-control (:git "git@github.com:vindarel/cl-s.git")
:description "Modern, consistent and terse Common Lisp string manipulation library."
:depends-on (:prove :cl-ppcre) ;; <= depends only on cl-ppcre.
:components ((:file "str"))
)
Trim
(defvar *whitespaces* '(#\Space #\Newline #\Backspace #\Tab
#\Linefeed #\Page #\Return #\Rubout))
(defun trim-left (s)
"Remove whitespaces at the beginning of s. "
(string-left-trim *whitespaces* s))
(defun trim-right (s)
"Remove whitespaces at the end of s."
(string-right-trim *whitespaces* s))
(defun trim (s)
(string-trim *whitespaces* s))
Concat
(defun concat (&rest strings)
"Join all the string arguments into one string."
(apply #'concatenate 'string strings))
Join
Snippets on the old cookbook or stackoverflow advised to use a
format
construct. Which is weird, and causes problems if your
separator contains the ~
symbol.
(defun join (separator strings)
(let ((separator (replace-all "~" "~~" separator)))
(format nil
(concatenate 'string "~{~a~^" separator "~}")
strings)))
Now:
(is "foo~bar"
(join "~" '("foo" "bar")))
Split
cl-ppcre
takes a regexp, but we don’t need this for the basic cases
of split
. And disabling this regexp was not straightforward:
(defun split (separator s &key omit-nulls)
"Split s into substring by separator (cl-ppcre takes a regex, we do not)."
;; cl-ppcre:split doesn't return a null string if the separator appears at the end of s.
(let* ((val (concat s
(string separator)
;; so we need an extra character, but not the user's.
(if (string-equal separator #\x) "y" "x")))
(res (butlast (cl-ppcre:split (cl-ppcre:quote-meta-chars (string separator)) val))))
(if omit-nulls
(remove-if (lambda (it) (empty? it)) res)
res)))
Now: (split "." "foo.bar")
just works.
Repeat
(defun repeat (count s)
"Make a string of S repeated COUNT times."
(let ((result nil))
(dotimes (i count)
(setf result (cons s result)))
(apply #'concat result)))
Replace-all
This required to use cl-ppcre and one switch of it to avoid regexps.
(defun replace-all (old new s)
"Replace `old` by `new` in `s`. Arguments are not regexs."
(let* ((cl-ppcre:*allow-quoting* t)
(old (concatenate 'string "\\Q" old))) ;; treat metacharacters as normal.
(cl-ppcre:regex-replace-all old s new)))
starts-with? start string
The Lisp way was to check if the beginning of “string” contains “start”, taking its length, dealing with corner cases,…
(defun starts-with? (start s &key (ignore-case nil))
"Return t if s starts with the substring 'start', nil otherwise."
(when (>= (length s) (length start))
(let ((fn (if ignore-case #'string-equal #'string=)))
(funcall fn s start :start1 0 :end1 (length start)))))
;; An alias:
;; Serapeum defines a "defalias".
(setf (fdefinition 'starts-with-p) #'starts-with?)
(defun ends-with? (end s &key (ignore-case nil))
"Return t if s ends with the substring 'end', nil otherwise."
(when (>= (length s) (length end))
(let ((fn (if ignore-case #'string-equal #'string=)))
(funcall fn s end :start1 (- (length s) (length end))))))
(setf (fdefinition 'ends-with-p) #'ends-with?)
Usage illustrated by the tests:
(subtest "starts-with?"
(ok (starts-with? "foo" "foobar") "default case")
(ok (starts-with? "" "foo") "with blank start")
(ok (not (starts-with? "rs" "")) "with blank s")
(ok (not (starts-with? "foobar" "foo")) "with shorter s")
(ok (starts-with? "" "") "with everything blank")
(ok (not (starts-with? "FOO" "foobar")) "don't ignore case")
(ok (starts-with-p "f" "foo") "starts-with-p alias")
(ok (starts-with? "FOO" "foobar" :ignore-case t) "ignore case"))
Predicates: empty? blank?
There was no built-in to make those differences.
(defun empty? (s)
"Is s nil or the empty string ?"
(or (null s) (string-equal "" s)))
(defun emptyp (s)
"Is s nil or the empty string ?"
(empty? s))
(defun blank? (s)
"Is s nil or only contains whitespaces ?"
(or (null s) (string-equal "" (trim s))))
(defun blankp (s)
"Is s nil or only contains whitespaces ?"
(blank? s))
words, unwords, lines, unlines
Classic stuff:
(defun words (s &key (limit 0))
"Return list of words, which were delimited by white space. If the optional limit is 0 (the default), trailing empty strings are removed from the result list (see cl-ppcre)."
(if (not s)
nil
(cl-ppcre:split "\\s+" (trim-left s) :limit limit)))
(defun unwords (strings)
"Join the list of strings with a whitespace."
(join " " strings))
(defun lines (s &key omit-nulls)
"Split the string by newline characters and return a list of lines."
(split #\NewLine s :omit-nulls omit-nulls))
(defun unlines (strings)
"Join the list of strings with a newline character."
(join (make-string 1 :initial-element #\Newline) strings))
Substring
The builtin subseq
is much poorer compared to what we have in other languages.
Take Python, we can do:
"foo"[:-1] # negative index and starting from the end
"foo"[0:100] # end is too large, thus it returns the entire array.
This was not possible with subseq
, it throws a condition. Nothing
found in Alexandria or other helper libraries.
(defun substring (start end s)
"Return the substring of `s' from `start' to `end'.
It uses `subseq' with differences:
- argument order, s at the end
- `start' and `end' can be lower than 0 or bigger than the length of s.
- for convenience `end' can be nil or t to denote the end of the string.
"
(let* ((s-length (length s))
(end (cond
((null end) s-length)
((eq end t) s-length)
(t end))))
(setf start (max 0 start))
(if (> start s-length)
""
(progn
(setf end (min end s-length))
(when (< end (- s-length))
(setf end 0))
(when (< end 0)
(setf end (+ s-length end)))
(if (< end start)
""
(subseq s start end))))))
Usage:
(subtest "substring"
(is "abcd" (substring 0 4 "abcd") "normal case")
(is "ab" (substring 0 2 "abcd") "normal case substing")
(is "bc" (substring 1 3 "abcd") "normal case substing middle")
(is "" (substring 4 4 "abcd") "normal case")
(is "" (substring 0 0 "abcd") "normal case")
(is "d" (substring 3 4 "abcd") "normal case")
(is "abcd" (substring 0 t "abcd") "end is t")
(is "abcd" (substring 0 nil "abcd") "end is nil")
(is "abcd" (substring 0 100 "abcd") "end is too large")
(is "abc" (substring 0 -1 "abcd") "end is negative")
(is "b" (substring 1 -2 "abcd") "end is negative")
(is "" (substring 2 1 "abcd") "start is bigger than end")
(is "" (substring 0 -100 "abcd") "end is too low")
(is "" (substring 100 1 "abcd") "start is too big")
(is "abcd" (substring -100 4 "abcd") "start is too low")
(is "abcd" (substring -100 100 "abcd") "start and end are too low and big")
(is "" (substring 100 -100 "abcd") "start and end are too big and low")
)
See also
and afterwards I saw cl-strings which does help but can have its shortcomings.
The Cookbook is updated: https://lispcookbook.github.io/cl-cookbook/strings.html