Parse Strings with Regular Expressions

Learn to parse strings with regular expressions.

We'll cover the following

Regular Expressions (commonly abbreviated as regex) are commonly used for lexical analysisLexical analysis, also known as tokenization, is the process of breaking down a sequence of characters into meaningful units called tokens, which are used as input for parsing or further processing in programming languages or compilers. and pattern-matching on streams of text. They are common in Unix text-processing utilities, such as grep, awk, and sed, and are an integral part of the Perl language. There are a few common variations in the syntax. A POSIX standard was approved in 1992, while other common variations include Perl and ECMAScript (JavaScript) dialects. The C++ regex library defaults to the ECMAScript dialectECMAScript dialect refers to a specific variant or version of the ECMAScript programming language, which is the standardized specification for scripting languages such as JavaScript..

The regex library was first introduced to the STL with C++11. It can be very useful for finding patterns in text files.

How to do it

For this recipe, we will extract hyperlinks from an HTML file. A hyperlink is coded in HTML like this:

Get hands-on with 1200+ tech skills courses.