Regular expressions w/ search engine

Regular expressions are a powerful tool originating with Unix. This search engine allows expert users to use regular expression syntax. This is done by checking the "refined" box when doing a search, and it is recommended that readers who do not feel comfortable with their understanding of this explanation not attempt it. The ordinary search should be sufficient for almost all purposes.

The most obvious reason to choose to use the advanced search is for an "exact" match on multiple input words in the given order. Typing "john smith" and checking "refined" will match the string "john smith" exactly, rather than any file containing either "john" or "smith". Also, common words ignored by the ordinary search will not be ignored by the advanced search.

The ordinary search (i.e. not "refined") finds CD files which contain each of the words listed, in any order and with any intervening strings. In addition, matches are performed against any substring of the contained words. With regular expressions, the user can refine this search result by specifying a precise pattern to be matched in the file. Some examples follow:

There are several books (and probably web pages) which explain regular expressions, and which will open up many other searching possibilities.

Note that the "refined" search is only invoked as a refinement of the ordinary search, so that only regular expressions in which each desired match actually contains each word listed (defined as any alphanumeric string of three characters or more). In particular, this makes some kinds of "or" searches difficult (but certainly speeds up the engine), but you can use any alphanumeric string of less than three characters without problems. However, there must be at least one ordinary word on which to search, and expressions of too much complexity will be returned as errors. It is possible some refinements to the algorithm can be introduced in order to allow some others kinds of expressions, so please contact me if there is a specific example of something you would like to be able to do but currently cannot.

A more complex search will take longer to complete. Searches of too high a complexity will be interrupted by the server itself before they are allowed to consume too much computing power. However, please do not attempt absurdly complex searches.

Finally, note that newline characters and HTML code are stripped from the CD files before checking for a match. Some 8-bit characters will be handled properly, and will be as any special cases are noticed and corrected.

Todd M. McComb