Explain Codes LogoExplain Codes Logo

Javascript regex multiline text between two tags

javascript
regex
multiline
flags
Nikita BarsukovbyNikita Barsukov·Jan 15, 2025
TLDR

To find and extract text between any two HTML tags over multiple lines in JavaScript, you can use the following regex:

const regex = /<startTag>([\s\S]*?)<\/endTag>/;

Remember to substitute startTag and endTag with the actual tag names you're targeting. Using regex.exec(), you can capture the content between tags:

const str = `<startTag>line1\nline2</startTag>`; const match = regex.exec(str); console.log(match ? match[1] : 'No match');

Here, the pattern [\s\S]*? lazily matches any character (including newlines), ensuring the balance of capturing all between the tags while avoiding any "reckless capture".

Dotall or multiline: a guide to choosing flags

When working with multilines, two flags often come up as candidates, the dotall(/s) and multiline(/m) flags. However, they don't play the same roles. The /s flag affects how the dot (.) works, allowing it to match newline characters. Thus, if you have a multiline string and want to grab everything no matter what, this is your guy:

let regex = /<tag>.*?<\/tag>/s; // "Dot, you now have the power of inclusivity. Include newline characters!"

But, don't be caught off guard, this /s flag is a new kid in the ECMAScript 2018 block. Some older browsers might give you a suspicious look if you try using it. [\s\S] to the rescue!

let regex = /<tag>[\s\S]*?<\/tag>/; // "Hey, [\s\S], we love each other—including newlines!"

Now, the wildcard (.) is indeed a "wild" card. To keep it in check, we use the non-greedy *? quantifier. It helps to ensure we don't mess with the wrong tags. Always ensure the /s flag is at the spot when building your regex:

const pattern = '<tag>.+?</tag>'; const flags = 's'; const regex = new RegExp(pattern, flags); console.log(regex.dotAll); // "Did we remember to invite Mr dotAll? true or false!"

Keep these in mind to avoid sinking your regex ship:

  • Overlapping tags: Modify your tags to be distinct to dodge any unintended captures.
  • Nested tags: Remember, JavaScript regex isn't a fan of recursive patterns. If working with nested tags, a parser might be your better ally.
  • Performance: Running your regex on large strings might be like running a marathon for it. Tread lightly and always benchmark!

Applying Regex: real scenarios

Deploy your new regex powers in these common scenarios:

  1. Scraping data: Extract valuable data from HTML/XML documents like Indiana Jones mining artifacts!
  2. Templating engines: Find placeholders to switch with actual data—just like a game of tag.
  3. Log parsing: Pick out specific entries from multiline logs.