Regex for URL Extraction
URL Extraction Regex is a Regex pattern that matches urls starting with http:// or https://, followed by the domain (letters, numbers, dots, hyphens), a tld of 2+ letters, and an optional path with query parameters.. Formula Genius generates and validates this formula automatically from a plain-English prompt.
Find and extract URLs from unstructured text. Match full URLs, domains, or specific URL components with tested patterns.
The Formula
"Extract all HTTP and HTTPS URLs from a block of text"
https?://[\w.-]+(?:\.[a-zA-Z]{2,})(?:[/\w.\-?=&#%+]*)?
Matches URLs starting with http:// or https://, followed by the domain (letters, numbers, dots, hyphens), a TLD of 2+ letters, and an optional path with query parameters.
Step-by-Step Breakdown
- https? matches http or https (s is optional)
- :// matches the protocol separator literally
- [\w.-]+ matches the domain name (letters, numbers, dots, hyphens)
- (?:\.[a-zA-Z]{2,}) matches the TLD (.com, .org, .co.uk)
- (?:[/\w.\-?=&#%+]*)? optionally matches the path and query string
Edge Cases & Warnings
- Won't match URLs without a protocol (e.g., www.example.com) — add www\.? prefix pattern if needed
- Trailing punctuation (period at end of sentence) may be captured — use boundary logic
- Doesn't validate the domain actually exists — only checks format
- Parentheses in URLs (like Wikipedia links) need special handling
Examples
"Visit https://example.com/page?id=1"
https://example.com/page?id=1
"Link: http://sub.domain.co.uk/path"
http://sub.domain.co.uk/path
"No protocol: example.com"
No match (needs http/https)
Frequently Asked Questions
How do I also match URLs without http/https?
Add an alternative: (https?://|www\.)[\w.-]+... This catches www. prefixed domains too.
How do I extract just the domain from a URL?
Use a capture group: https?://([\w.-]+). The first group contains just the domain.
Can't find what you need?
Describe any formula in plain English and Formula Genius will generate, explain, and validate it — instantly.