All we need is an easy explanation of the problem, so here it is.
I get a URL from a user. I need to know:
a) is the URL a valid RSS feed?
b) if not is there a valid feed associated with that URL
using PHP/Javascript or something similar
(Ex. http://techcrunch.com fails a), but b) would return their RSS feed)
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
Method 1
Found something that I wanted:
Google’s AJAX Feed API has a load feed and lookup feed function (Docs here).
a) Load feed provides the feed (and feed status) in JSON
b) Lookup feed provides the RSS feed for a given URL
Theres also a find feed function that searches for RSS feeds based on a keyword.
Planning to use this with JQuery’s $.getJSON
Method 2
The Zend Feed class of the Zend-framework can automatically parse a webpage and list the available feeds.
Example:
$feedArray = Zend_Feed::findFeeds('http://www.example.com/news.html');
Method 3
This link will allow you to validate the link against the RSS/Atom specifications using the W3C specs, but does require you to manually enter the url.
There are a number of ways to do this programmatically, depending on your choice of language – in PHP, parsing the file as valid XML is a good way to start, then compare it to the relevant DTD.
For b), if the link itself isn’t a feed, you can parse it and look for a specified feed in the <head> section of the page, searching for a link whose type is “application/rss+xml”, e.g:
<link rel="alternate" title="RSS Feed"
href="http://www.example.com/rss-feed.xml" rel="nofollow noreferrer noopener" type="application/rss+xml" />
This type of link is the one used by most browsers to “auto-discover” feeds (causing the RSS icon to appear in your address bar)
Method 4
a) Retrieve it and try to parse it. If you can parse it, it’s valid.
b) Test if it’s an HTML document (server sent text/html
) MIME-type. If so, run it through an HTML parser and look for <link>
elements with RSS feed relations.
Method 5
For Perl, there is Feed::Find , which does automate the discovery of syndication feeds from the webpage. The usage is quite simplicistic:
use Feed::Find;
my @feeds = Feed::Find->find('http://example.com/');
It first tries the link
tags and then scans the a
tags for files named .rss
and something like that.
Method 6
Are you doing this in a specific language, or do you just want details about the RSS specification?
In general, look for the XML prolog:
<?xml version="1.0" encoding="UTF-8"?>
followed by an <rss> element, but you might want to validate it as XML, fully validate it against a DTD, or verify that – for example, each URL referred to is valid, etc. More detail would help.
UPDATE: Ah – PHP. I’ve found this library to be pretty useful: MagpieRSS
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0