Best way to check if a URL is valid

All we need is an easy explanation of the problem, so here it is.

I want to use PHP to check, if string stored in $myoutput variable contains a valid link syntax or is it just a normal text. The function or solution, that I’m looking for, should recognize all links formats including the ones with GET parameters.

A solution, suggested on many sites, to actually query string (using CURL or file_get_contents() function) is not possible in my case and I would like to avoid it.

I thought about regular expressions or another solution.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

You can use a native Filter Validator

filter_var($url, FILTER_VALIDATE_URL);

Validates value as URL (according to » http://www.faqs.org/rfcs/rfc2396), optionally with required components. Beware a valid URL may not specify the HTTP protocol http:// so further validation may be required to determine the URL uses an expected protocol, e.g. ssh:// or mailto:. Note that the function will only find ASCII URLs to be valid; internationalized domain names (containing non-ASCII characters) will fail.

Example:

if (filter_var($url, FILTER_VALIDATE_URL) === FALSE) {
    die('Not a valid URL');
}

Method 2

Here is the best tutorial I found over there:

http://www.w3schools.com/php/filter_validate_url.asp

<?php
$url = "http://www.qbaki.com";

// Remove all illegal characters from a url
$url = filter_var($url, FILTER_SANITIZE_URL);

// Validate url
if (filter_var($url, FILTER_VALIDATE_URL) !== false) {
echo("$url is a valid URL");
} else {
echo("$url is not a valid URL");
}
?>

Possible flags:

FILTER_FLAG_SCHEME_REQUIRED - URL must be RFC compliant (like http://example)
FILTER_FLAG_HOST_REQUIRED - URL must include host name (like http://www.example.com)
FILTER_FLAG_PATH_REQUIRED - URL must have a path after the domain name (like www.example.com/example1/)
FILTER_FLAG_QUERY_REQUIRED - URL must have a query string (like "example.php?name=Peter&age=37")

Method 3

Using filter_var() will fail for urls with non-ascii chars, e.g. (http://pt.wikipedia.org/wiki/Guimarães). The following function encode all non-ascii chars (e.g. http://pt.wikipedia.org/wiki/Guimar%C3%A3es) before calling filter_var().

Hope this helps someone.

<?php

function validate_url($url) {
    $path = parse_url($url, PHP_URL_PATH);
    $encoded_path = array_map('urlencode', explode('/', $path));
    $url = str_replace($path, implode('/', $encoded_path), $url);

    return filter_var($url, FILTER_VALIDATE_URL) ? true : false;
}

// example
if(!validate_url("https://somedomain.com/some/path/file1.jpg")) {
    echo "NOT A URL";
}
else {
    echo "IS A URL";
}

Method 4

function is_url($uri){
    if(preg_match( '/^(http|https):\\/\\/[a-z0-9_]+([\\-\\.]{1}[a-z_0-9]+)*\\.[_a-z]{2,5}'.'((:[0-9]{1,5})?\\/.*)?$/i' ,$uri)){
      return $uri;
    }
    else{
        return false;
    }
}

Method 5

Personally I would like to use regular expression here. Bellow code perfectly worked for me.

$baseUrl     = url('/'); // for my case https://www.xrepeater.com
$posted_url  = "home";
// Test with one by one
/*$posted_url  = "/home";
$posted_url  = "xrepeater.com";
$posted_url  = "www.xrepeater.com";
$posted_url  = "http://www.xrepeater.com";
$posted_url  = "https://www.xrepeater.com";
$posted_url  = "https://xrepeater.com/services";
$posted_url  = "xrepeater.dev/home/test";
$posted_url  = "home/test";*/

$regularExpression  = "((https?|ftp)\:\/\/)?"; // SCHEME Check
$regularExpression .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)[email protected])?"; // User and Pass Check
$regularExpression .= "([a-z0-9-.]*)\.([a-z]{2,3})"; // Host or IP Check
$regularExpression .= "(\:[0-9]{2,5})?"; // Port Check
$regularExpression .= "(\/([a-z0-9+\$_-]\.?)+)*\/?"; // Path Check
$regularExpression .= "(\?[a-z+&\$_.-][a-z0-9;:@&%=+\/\$_.-]*)?"; // GET Query String Check
$regularExpression .= "(#[a-z_.-][a-z0-9+\$_.-]*)?"; // Anchor Check

if(preg_match("/^$regularExpression$/i", $posted_url)) { 
    if(preg_match("@^http|https://@i",$posted_url)) {
        $final_url = preg_replace("@(http://)[email protected]",'http://',$posted_url);
        // return "*** - ***Match : ".$final_url;
    }
    else { 
          $final_url = 'http://'.$posted_url;
          // return "*** / ***Match : ".$final_url;
         }
    }
else {
     if (substr($posted_url, 0, 1) === '/') { 
         // return "*** / ***Not Match :".$final_url."<br>".$baseUrl.$posted_url;
         $final_url = $baseUrl.$posted_url;
     }
     else { 
         // return "*** - ***Not Match :".$posted_url."<br>".$baseUrl."/".$posted_url;
         $final_url = $baseUrl."/".$final_url; }
}

Method 6

Actually… filter_var($url, FILTER_VALIDATE_URL); doesn’t work very well.
When you type in a real url, it works but, it only checks for http://
so if you type something like “http://weirtgcyaurbatc“, it will still say it’s real.

Method 7

You can use this function, but its will return false if website offline.

  function isValidUrl($url) {
    $url = parse_url($url);
    if (!isset($url["host"])) return false;
    return !(gethostbyname($url["host"]) == $url["host"]);
}

Method 8

Given issues with filter_var() needing http://, I use:

$is_url = filter_var($filename, FILTER_VALIDATE_URL) || array_key_exists('scheme', parse_url($filename));

Method 9

Another way to check if given URL is valid is to try to access it, below function will fetch the headers from given URL, this will ensure that URL is valid AND web server is alive:

function is_url($url){
        $response = array();
        //Check if URL is empty
        if(!empty($url)) {
            $response = get_headers($url);
        }
        return (bool)in_array("HTTP/1.1 200 OK", $response, true);
/*Array
(
    [0] => HTTP/1.1 200 OK 
    [Date] => Sat, 29 May 2004 12:28:14 GMT
    [Server] => Apache/1.3.27 (Unix)  (Red-Hat/Linux)
    [Last-Modified] => Wed, 08 Jan 2003 23:11:55 GMT
    [ETag] => "3f80f-1b6-3e1cb03b"
    [Accept-Ranges] => bytes
    [Content-Length] => 438
    [Connection] => close
    [Content-Type] => text/html
)*/ 
    }   

Method 10

Came across this article from 2012. It takes into account variables that may or may not be just plain URLs.

The author of the article, David Müeller, provides this function that he says, “…could be worth wile [sic],” along with some examples of filter_var and its shortcomings.

/**
 * Modified version of `filter_var`.
 *
 * @param  mixed $url Could be a URL or possibly much more.
 * @return bool
 */
function validate_url( $url ) {
    $url = trim( $url );

    return (
        ( strpos( $url, 'http://' ) === 0 || strpos( $url, 'https://' ) === 0 ) &&
        filter_var(
            $url,
            FILTER_VALIDATE_URL,
            FILTER_FLAG_SCHEME_REQUIRED || FILTER_FLAG_HOST_REQUIRED
        ) !== false
    );
}

Method 11

public function testing($Url=''){
    $ch = curl_init($Url);
    curl_setopt($ch, CURLOPT_TIMEOUT, 5);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $data = curl_exec($ch);
    $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);
    return ($httpcode >= 200 && $httpcode < 300) ? true : false;
}

Method 12

if anyone is interested to use the cURL for validation. You can use the following code.

<?php 
public function validationUrl($Url){
        if ($Url == NULL){
            return $false;
        }
        $ch = curl_init($Url);
        curl_setopt($ch, CURLOPT_TIMEOUT, 5);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        $data = curl_exec($ch);
        $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
        curl_close($ch);
        return ($httpcode >= 200 && $httpcode < 300) ? true : false; 
    }

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply