XML Sitemap & Google News for WordPress

XML and Google News Sitemaps to feed the hungry spiders. Multisite, WP Super Cache, Polylang and WPML compatible.

Description

This plugin dynamically creates dynamic feeds that comply with the XML Sitemap and the Google News Sitemap protocol. Multisite, Polylang and WPML compatible and there are no static files created.

There are options to control which sitemaps are enabled, which Post Types and archive pages (like taxonomy terms and author pages) are included, how Priority and Lastmod are calculated, who to ping and a possibility to set additional robots.txt rules from within the WordPress admin.

The main advantage of this plugin over other XML Sitemap plugins is simplicity. No need to change file or folder permissions, move files or spend time tweaking difficult plugin options.

You, or site owners on your Multisite network, will not be bothered with overly complicated settings like most other XML Sitemap plugins. The default settings will suffice in most cases.

An XML Sitemap Index becomes instantly available on yourblog.url/sitemap.xml (or yourblog.url/?feed=sitemap if you’re not using a ‘fancy’ permalink structure) containing references to posts and pages by default, ready for indexing by search engines like Google, Bing, Yahoo, AOL and Ask. When the Google News Sitemap is activated, it will become available on yourblog.url/sitemap-news.xml (or yourblog.url/?feed=sitemap-news), ready for indexing by Google News. Both are automatically referenced in the dynamically created robots.txt on yourblog.url/robots.txt to tell search engines where to find your XML Sitemaps. Google and Bing will be pinged on each new publication.

Please read the FAQ’s for info on how to get your articles listed on Google News.

Compatible with caching plugins like WP Super Cache, W3 Total Cache and Quick Cache that cache feeds, allowing a faster serving to the impatient (when hungry) spider.

NOTES:

  1. If you do not use fancy URL’s or you have WordPress installed in a subdirectory, a dynamic robots.txt will NOT be generated. You’ll have to create your own and upload it to your site root! See FAQ’s.
  2. On large sites, it is advised to use a good caching plugin like WP Super Cache, Quick Cache, W3 Total Cache or another to improve your site and sitemap performance.

Features

  • Compatible with multi-lingual sites using Polylang or WPML to allow all languages to be indexed equally.
  • Option to add new robots.txt rules. These can be used to further control (read: limit) the indexation of various parts of your site and subsequent spread of pagerank across your sites pages.
  • Includes XLS stylesheets for human readable sitemaps.
  • Sitemap templates and stylesheets can be overridden by theme template files.

XML Sitemap

  • Sitemap Index includes posts, pages and authors by default.
  • Optionally include sitemaps for custom post types, categories and tags.
  • Sitemap with custom URLs optional.
  • Custom/static sitemaps can be added to the index.
  • Works out-of-the-box, even on Multisite installations.
  • Include featured images or attached images with title.
  • Pings Google, Bing & Yahoo on new post publication.
  • Options to define which post types and taxonomies get included in the sitemap.
  • Updates Lastmod on post modification or on comments.
  • Set Priority per post type, per taxonomy and per individual post.
  • Exclude individual posts and pages.

Google News Sitemap

  • Required news sitemap tags: Publication name, language, title and publication date.
  • Set a News Publication Name or uses site name.
  • Supports custom post types.
  • Limit inclusion to certain post categories.
  • Pings Google on new publications, once per 5 minutes.

Pro Features

Google News Advanced

  • Multiple post types – Include more than one post type in the same News Sitemap.
  • Keywords – Add the keywords tag to your News Sitemap. Keywords can be created from Tags, Categories or a dedicated Keywords taxonomy.
  • Stock tickers – Add stock tickers tag to your News Sitemap. A dedicated Stock Tickers taxonomy will be available to manage them.
  • Ping log – Keep a log of the latest pings to Google with exact date and response status.

Privacy / GDPR

This plugin does not collect any user or visitor data nor set browser cookies. Using this plugin should not impact your site privacy policy in any way.

Data that is published

An XML Sitemap index, referencing other sitemaps containing your web site’s public post URLs of selected post types that are already public, along with their last modification date and associated image URLs, and any selected public archive URLs.

A Google News Sitemap containing your web site’s public and recent (last 48 hours) URLs of selected news post type, along with their publication time stamp and associated image URL.
An author sitemap can be included, which will contain links to author archive pages. These urls contain author/user slugs, and the author archives can contain author bio information. If you wish to keep this out of public domain, then deactivate the author sitemap and use an SEO plugin to add noindex headers.

Data that is transmitted

Data actively transmitted to search engines is your sitemap location and time of publication. This happens upon each post publication when at least one of the Ping options on Settings > Writing is enabled. In this case, the selected search engines are alerted of the location and updated state of your sitemap.

Contribute

If you’re happy with this plugin as it is, please consider writing a quick rating or helping other users out on the support forum.

If you wish to help build this plugin, you’re very welcome to translate it into your language or contribute code on Github.

Credits

XML Sitemap Feed was originally based on the discontinued plugin Standard XML Sitemap Generator by Patrick Chia. Since then, it has been completely rewritten and extended in many ways.

Installation

WordPress

I. If you have been using another XML Sitemap plugin before, check your site root and remove any created sitemap.xml, sitemap-news.xml and (if you’re not managing this one manually) robots.txt files that remained there.

II. Install plugin by:

Quick installation via Covered Web Services !

… OR …

Search for “xml sitemap feed” and install with that slick Plugins > Add New admin page.

… OR …

Follow these steps:

  1. Download archive.
  2. Upload the zip file via the Plugins > Add New > Upload page … OR … unpack and upload with your favourite FTP client to the /plugins/ folder.

III. Activate the plugin on the Plugins page.

Done! Check your sparkling new XML Sitemap by visiting yourblogurl.tld/sitemap.xml (adapted to your domain name of course) with a browser or any online XML Sitemap validator. You might also want to check if the sitemap is listed in your yourblogurl.tld/robots.txt file.

WordPress 3+ in Multi Site mode

Same as above but do a Network Activate to make a XML sitemap available for each site on your network.

Installed alongside WordPress MU Sitewide Tags Pages, XML Sitemap Feed will not create a sitemap.xml nor change robots.txt for any tag blogs. This is done deliberately because they would be full of links outside the tags blogs own domain and subsequently ignored (or worse: penalised) by Google.

Uninstallation

Upon uninstalling the plugin from the Admin > Plugins page, plugin options and meta data will be cleared from the database. See notes in the uninstall.php file.

On multisite, the uninstall.php can loop through all sites in the network to perform the uninstalltion process for each site. However, this does not scale for large networks so it only does a per-site uninstallation when define('XMLSF_MULTISITE_UNINSTALL', true); is explicitly set in wp-config.php.

Frequently Asked Questions

Where are the options?

On Settings > Reading you can enable the XML Sitemap Index and (if needed) the Google News Sitemap. There is also an Additional robots.txt rules field.

Once a sitemap is enabled, its options can be found on Settings > XML Sitemap or on Settings > Google News.

Ping settings can be found on Settings > Writing.

How do I get my latest articles listed on Google News?

Go to Suggest News Content for Google News and submit your website info as detailed as possible there. Give them the URL(s) of your fresh new Google News Sitemap in the text field ‘Other’ at the bottom.

You will also want to add the sitemap to your Google Search Console account to check its validity and performance. Create an account if you don’t have one yet.

Can I manipulate values for Priority and Changefreq?

You can find default settings for Priority on Settings > XML Sitemap. A fixed priority can be set on a post by post basis too.

Changefreq has been dropped since version 4.9 because it is no longer taken into account by Google.

Do I need to submit the sitemap to search engines?

No. In normal circumstances, your site will be indexed by the major search engines before you know it. The search engines will be looking for a robots.txt file and (with this plugin activated) find a pointer in it to the XML Sitemap on your blog. The search engines will return on a regular basis to see if your site has updates.

Besides that, Google and Bing are pinged upon each new publication by default.

NOTE: If you have a server without rewrite rules, use your blog without fancy URLs (meaning, you have WordPress Permalinks set to the old default value) or have it installed in a subdirectory, then read Do I need to change my robots.txt for more instructions.

Does this plugin ping search engines?

Yes, Google and Bing are pinged upon each new publication. Unless you disable this feature on Settings > Writing.

Do I need to change my robots.txt?

In normal circumstances, if you have no static robots.txt file in your site root, the new sitemap url will be automatically added to the dynamic robots.txt that is generated by WordPress.

But if you use a static robots.txt file in your website root, you will need to open it in a text editor. If there is already a line with Sitemap: http://yourblogurl.tld/sitemap.xml you can just leave it like it is. But if there is no sitemap referrence there, add it (adapted to your site url) to make search engines find your XML Sitemap.

Or if you have WP installed in a subdirectory, on a server without rewrite_rules or if you do not use fancy URLs in your Permalink structure settings. In these cases, WordPress will need a little help in getting ready for XML Sitemap indexing. Read on in the WordPress section for more.

My WordPress powered blog is installed in a subdirectory. Does that change anything?

That depends on where the index.php and .htaccess of your installation reside. If they are in the root while the rest of the WP files are installed in a subdir, so the site is accessible from your domain root, you do not have to do anything. It should work out of the box.

But if the index.php is together with your wp-config.php and all other WP files in a subdir, meaning your blog is only accessible via that subdir, you need to manage your own robots.txt file in your domain root. It has to be in the root (!) and needs a line starting with Sitemap: followed by the full URL to the sitemap feed provided by XML Sitemap Feed plugin. Like:
Sitemap: http://yourblogurl.tld/subdir/sitemap.xml

If you already have a robots.txt file with another Sitemap reference like it, just add the full line below or above it.

Do I need to use a fancy Permalink structure?

No. While I would advise you to use any one of the nicer Permalink structures for better indexing, you might not be able to (or don’t want to) do that. If so, you can still use this plugin:

Check to see if the URL yourblog.url/?feed=sitemap does produce a feed. Now manually upload your own robots.txt file to your website root containing:

Sitemap: http://yourblog.url/?feed=sitemap

User-agent: *
Allow: /

You can also choose to notify major search engines of your new XML sitemap manually. Start with getting a Google Search Console account and submit your sitemap for the first time from there to enable tracking of sitemap downloads by Google! or head over to XML-Sitemaps.com and enter your sites sitemap URL.

Can I change the sitemap name/URL?

No. If you have fancy URL’s turned ON in WordPress (Permalinks), the sitemap url is yourblogurl.tld/sitemap.xml but if you have the Permalink Default option set the feed is only available via yourblog.url/?feed=sitemap.

I see no sitemap.xml file in my site root!

There is no actual file created. The sitemap is dynamically generated just like a feed.

I see a sitemap.xml file in site root but it does not seem to get updated!

You are most likely looking at a sitemap.xml file that has been created by another XML Sitemap plugin before you started using this one. Remove that file and let the plugin dynamically generate it just like a feed. There will not be any actual files created.

If that’s not the case, you are probably using a caching plugin or your browser does not update to the latest feed output. Please verify.

I use a caching plugin but the sitemap is not cached

Some caching plugins have the option to switch on/off caching of feeds. Make sure it is turned on.

Frederick Townes, developer of W3 Total Cache, says: “There’s a checkbox option on the page cache settings tab to cache feeds. They will expire according to the expires field value on the browser cache setting for HTML.”

The Google News sitemap is designed to NOT be cached.

I get an ERROR when opening the sitemap or robots.txt!

The absolute first thing you need to check is your blogs privacy settings. Go to Settings > Privacy and make sure you are allowing search engines to index your site. If they are blocked, your sitemap will not be available.

Then, you might want to make sure that there is at least ONE post published. WordPress is known to send 404 status headers with feed requests when there are NO posts. Even though the plugin tries to prevent that, in some cases the wrong status header will get sent anyway and Google Search Console will report a vague message like:

We encountered an error while trying to access your Sitemap.
Please ensure your Sitemap follows our guidelines and can be
accessed at the location you provided and then resubmit.

If that did not solve the issue, check the following errors that might be encountered along with their respective solutions:

404 page instead of my sitemap.xml

Try to refresh the Permalink structure in WordPress. Go to Settings > Permalinks and re-save them. Then reload the XML Sitemap in your browser with a clean browser cache. ( Try Ctrl+R to bypass the browser cache — this works on most but not all browsers. )

404 page instead of both sitemap.xml and robots.txt

There are plugins like Event Calendar (at least v.3.2.beta2) known to mess with rewrite rules, causing problems with WordPress internal feeds and robots.txt generation and thus conflict with the XML Sitemap Feed plugin. Deactivate all plugins and see if you get a basic robots.txt file showing:
User-agent: * Disallow:
Reactivate your plugins one by one to find out which one is causing the problem. Then report the bug to the plugin developer.

404 page instead of robots.txt while sitemap.xml works fine

There is a known issue with WordPress (at least up to 2.8) not generating a robots.txt when there are no posts with published status. If you use WordPress as a CMS with only pages, this will affect you.

To get around this, you might either at least write one post and give it Private status or alternatively create your own robots.txt file containing:

Sitemap: http://yourblog.url/sitemap.xml

User-agent: *
Allow: /

and upload it to your web root…

Error loading stylesheet: An unknown error has occurred

On some setups (usually using the WordPress MU Domain Mapping plugin) this error occurs. The problem is known, the cause is not… Until I find out why this is happening, please take comfort in knowing that this only affects reading the sitemap in normal browsers but will NOT affect any spidering/indexing on your site. The sitemap is still readable by all search engines!

XML declaration allowed only at the start of the document

This error occurs when blank lines or other output is generated before the start of the actual sitemap content. This can be caused by blank lines at the beginning of wp-config.php or your themes functions.php or by another plugin that generates output where it shouldn’t. You’ll need to test by disabling all other plugins, switching to the default theme and manually inspecting your wp-config.php file.

I see only a BLANK (white) page when opening the sitemap

There are several cases where this might happen.

Open your sitemap in a browser and look at the source code. This can usually be seen by hitting Ctrl+U or right-click then select ‘View source…’ Then scan the produced source (if any) for errors.

A. If you see strange output in the first few lines (head tags) of the source, then there is a conflict or bug occuring on your installation. Please go to the Support forum for help.

B. If the source is empty or incomplete then you’re probably experiencing an issue with your servers PHP memory limit. In those cases, you should see a messages like PHP Fatal error: Allowed memory size of xxxxxx bytes exhausted. in your server/account error log file.

This can happen on large sites. To avoid these issues, there is an option to split posts over different sitemaps on Settings > XML Sitemap. Try different settings, each time revisiting the main sitemap index file and open different sitemaps listed there to check.

Read more on Increasing memory allocated to PHP (try a value higher than 256M) or ask your hosting provider what you can do.

Can I run this on a WPMU / WP3+ Multi-Site setup?

Yes. In fact, it has been designed for it. Tested on WPMU 2.9.2 and WPMS 3+ both with normal activation and with Network Activate / Site Wide Activate.

620 Comments

Dear,

RavanH firstly I would like to thank you for producing such a great plugin and making it available to us all.

I have an issue with the <a href="http://freeonlinebingocash.co.uk/sitemap-news.xml&quot;

Firstly it’s not viewable in chrome or I.E. but is in firefox but says it has no stylesheet, which proably explains why it doesn’t display in the other browsers. I have just read a comment on your blog from the 28/12/2010 so I will try this in the first instance.

Secondly, more worryling, <a href="http://freeonlinebingocash.co.uk/sitemap-news.xml&quot; in Google Webmaster says:

line 6

Missing XML tag

This required tag is missing. Please add it and resubmit.
Parent tag: urlset
Tag: url
Problem detected on: Jan 12, 2011

We haven’t made a blog post since the 9th Jan so maybe this is why?

Hi Dean, the fact that you do not have any posts younger than 2 days is indeed the cause of the ‘error’. I have not found any protocol for when there are no recent posts like that: no sitemap-news.xml at all (returning a 404 error) or a sitemap-news.xml without any URLs (returning a missing tag error) … I chose the last one.

Your Google News sitemap should be fine as soon as you make a new post, but then after two days it will show that error in the Webmaster Tools validation again. There is no real problem or harm in that. But if you get any complaints from the official Google News service, please let me know.

Hi ravanH,

By adding your other code recommendation it seemed to fix it:

echo ‘< ?xml-stylesheet type="text/xsl" href="'.plugins_url('',__FILE__).'/sitemap.xsl.php?v='.XMLSF_VERSION.'"

Thanks!

Hi Marcus, I think you are confusing two things here.

Firstly, the edit you did to the xml sitemap template has made your sitemap on http://www.allpetnews.com/sitemap.xml invalid. Remove all the white spaces between each < and ? in that line you edited to make it valid again.

Secondly, the edit did not affect your News Sitemap on /sitemap-news.xml which uses a different template file. Anyway, to me looks all right. Keep in mind you are using a caching plugin that might be causing a delay in any new posts appearing in your xml and news sitemaps.

Hi,
The publication name is the name by which the blog was submitted for google sitemaps. We had submitted it as OnlyGizmos (blog) but the plugin(obviously) picks up OnlyGizmos from the site title.
Can you add an option in the backend where people can make changes to their publication name from the backend?

Well, I could but since this plugin does not have any options at all, I would be creating a whole backend thing for just that one option… I’m not going to do that for the public version but if you’d like to buy some of my time for a custom version, please contact me 🙂

Ha Cor,

What kind of images where you thinking of? I can add images to the stylesheet but not to the sitemap itself. The protocol does not allow images. But it does allow links to images. Do you mean that? You want the attached images to be referenced in the sitemap as well as regular post/pages?

It can be done but I read it is unnecessary because Google will index the images along with the post content (using the image alt – and link title tags) … Or do you have images that you do not show in your posts but you do want them in the sitemap?

Allard

Hallo Allard,

Sorry for the late reply. I was referring to an image sitemap (like ie. Yoast’s includes)

As there was talk about including video to a sitemap is a good thing for SEO, I didn’t know it was already handled by using image alt and title tags for images.

Well, if Joost says so, it must have some validity (no irony here) … I’ll have a look into the subject and consider implementation.

Hi Ciprian, your sitemap.xml is the XML Sitemap that search engines will be looking at for indexing your site. They should be able to find it themselves and you don’t have to do anything at all but if you like you can open a Google Webmaster Tools account and add your sitemap there. It will tell you how many of the URLs it has indexed and a lot more valuable info 🙂

The sitemap-news.xml is a dedicated News Sitemap and will only show posts from the last two days. This is (so far) only meant for Google News and your site will need to be accepted as a new provider to be included. Read more on http://www.google.com/support/news_pub/bin/topic.py?topic=8909 or apply for inclusion on http://www.google.com/support/news_pub/bin/answer.py?hl=en&answer=191208&rd=1

sitemap.xml is formatted correctly
sitemap-news.xml is just a paragraph that looks exactly like this:
http://droidism.net/android/pedal-harder-your-phone-is-depending-on-you/Droidismen2011-02-02T21:51:24+00:00Pedal harder, your phone is depending on you!android, nokia, phone chargerBloghttp://droidism.net/android/android-apps/android-market-webstore-live-um-almost/Droidismen2011-02-02T19:39:20+00:00Android Market Webstore LIVE! Um… Almostandroid, Apps, Market, WebstoreBloghttp://droidism.net/android/smart-phones/verizons-emergency-text/Droidismen2011-02-02T19:31:24+00:00Verizon’s Emergency Textemergency, fascinate, Samsung, VerizonBloghttp://droidism.net/android/smart-phones/t-mobile-good-vibrations-launching-vibrant-fastest-4g-phone-in-the-world/Droidismen2011-02-02T08:05:50+00:00T-Mobile good Vibrations. Launching Vibrant. Fastest 4G phone in the World!4G, Announcement, HSPA+, Release Date, Samsung, T-mobile, Vibrant 4GBloghttp://droidism.net/android/android-os/update-for-the-xperia/Droidismen2011-02-02T00:47:59+00:00Update For The XperiaFroYo, Moxier, sony ericsson, X10, X10 mini, XperiaBlog

Is that not what you are seeing?

That’s probly your browser doing that… Open the page source (in IE: right click > View source, I think) to see the real code as seen by Google News.

Hey Ravan,

When I use your plugin to generate a Sitemap feed for Google News, it gives me an error when submitted to Google Webmaster Tools: “URL not allowed. This url is not allowed for a Sitemap at this location.” The plugin is generating URLs like: http://feedproxy.google.com/~r/PopularFidelity/~3/MaOYASNgeho/

When I click on the above URL it displays fine, routing through Feedburner like my RSS feed normally does. Why is it not an acceptable sitemap for Google? Do I need to disable feedburner or set up a second clean RSS feed?

It looks like either your theme or a plugin is redirecting ALL feeds to your FeedBurner account. And since your XML and News Sitemaps are actually two new feeds, any requests for them get redirected to your main RSS feed via FeedBurner too. This completely voids the specific XML Sitemap markup and all other rules that both an XML Sitemap and a News Sitemap have to live by… If there is no way for you to disable this redirection for ALL feeds (for your RSS feeds it is fine, but not the two that are generated by my plugin) you will be better off switching to another XML Sitemap plugin.

Do you know what exactly is causing that FeedBurner redirection?

I think all the feeds are being redirected by Feedburner Feedsmith plugin. It redirects all feeds to Feedburner automatically, and I’m not sure how to change it to affect only the RSS feed. Is there another feedburner plugin I should try to use?

The only way to do that currently is by editing the plugin file feed-sitemap-news.php … On line 26 there starts the posts query. Insert on line 27 a new line like 'cat' => array( -12, -3 ), to exclude all the categories you want to. Other query parameters can be found on http://codex.wordpress.org/Function_Reference/query_posts

But remember that this will be overwriten on the next plugin upgrade.

There seem to be two things going on there. Do you have some feed redirection going on so the sitemap-news.xml lands on FeedBurner. However, that does not seem to interfere with the sitemap.xml feed. Something else is breaking the feed before it can show any post links. I suspect it is the memory limit so I will send you an e-mail with an adapted version of feed-sitemap.php. If you’d be so kind to test it for me… 🙂

We are seeing errors in our log like this:
==============
PHP Notice: Undefined variable: wpdb in /wp-content/plugins/xml-sitemap-feed/XMLSitemapFeed.class.php on line 9
==============

I’ve fixed by adding missing ‘global $wpdb;’ to function go() in class XMLSitemapFeed.

Otherwise, works great!

Yep, it’s already in the development version but as it throws only a Notice, I have not made a bugfix release for it… Thanks for reporting anyway and good to hear you are happy with the plugin 🙂

I am using your plugin in multi-site mode, right now I need to use robots.txt to control how Google index my site.

May I know How can I append my custom rules to the dynamically generated robots.txt?

Thank you.

The easiest is to create a static robots.txt file and copy the current rules into it along with the new ones… But that will probably not perform very well on a multi-site setup because it will show the same rules across all sites in your network.

If that’s a problem, you should probably use a plugin that controls the robots.txt content. I suppose there are some available out there but I cannot suggest any.

But you can hire me to write one from scratch for you if you need it. Then you are sure it will be compatible with XML Sitemap Feed 🙂

A quick note to thank you Raven. Just installed your plugin and it is working well. Your commitment to responding to all posts is impressive. Keep up the great work (:

Raven, for three days I have been trying to submit our site map to google and even though I can fetch the site map as googlebot and find no issues under diagnostics, I keep getting the google error message:

We encountered an error while trying to access your Sitemap. Please ensure your Sitemap follows our guidelines

This morning I installed your plugin and resubmitted only to find the the same issue. Any help would be greatly appreciated.

thank you

Site: http://www.ensmartsolutions.com

Thanks Ravan. The errors remain. There are two unusual things I notice.
1) If I try to change the preffered domain setting I get a messsage to verify my ownership eventhough this was already done and I have access to all webmaster functions on the site.
2) I have tried fetching the sitemap.xml as the googlebot multiple times. Out of 5 attempts 3 succeeded and 2 failed.

Have spent hours looking at the forums and it looks like there are many people with my issue and no solid solution.

The “preferred domain” setting is just an indication of your preference, of how the URLs are visible in Google Search results. Nothing more. The real problem is probably the URL you used when adding your site in the Webmaster Toosl account.

So, if you did not do this already, add your site with www in the URL and then:
1. Set the www as your preferred domain under Settings.
2. Check the robots.txt content under Crawler Access to see if the URLs match (also with www).
3. Go to the Sitemaps list and look at the tab ‘Others’ or ‘All’. The sitemap is probably already listed there. If not, submit it.

If it’s still showing an error, select it, hit “Resubmit” and come back later to see if it has been processed.

If you cannot add your site with www in the URL in the Webmaster Tools account (for whatever reason) consider switching to the NON www url in your WordPress settings and set the preferred domain to non www in the Webmaster Tools account.

It really does not matter, as long as it matches 🙂

Ravan,
Thanks again for your great care and response. I have tried the suggestions about the www version with no sucess. Will try the non www as well. Hope I can get this thing to work !

Mathew, http://www.xml-sitemaps.com/validate-xml-sitemap.html shows no errors for your sitemap so maybe Google just needs some time to adjust. You can also try to delete the sitemap you added in GWT (Google Webmaster Tools) and let it find the sitemap for itself — which it should be able to. In any case, give it some time before trying the new approach….

I’ve installed xml sitemap generator, but there is a blank line at the top of the feed that is messing up submission to google.

Any idea where the blank line at the top has come from or how to remove it?

Thanks

Ok, so the attempt to increase the available memory from 90M to 256M failed and you still run into the memory limit. It sounds like your provider, Dreamhost (right?) is blocking the increase and on the dreamhost forums I see some posts like http://discussion.dreamhost.com/thread-47135.html about the subject. If it turns out to be impossible to increase the memory_limit value (either in a custom php.ini file or after recompiling PHP5 with new settings like is suggested in one answer) then I’m afraid I cannot help you. You site is simply too big for my plugin with that particular PHP memory limit.

Unless you want to switch hosts or hire me to code you a custom version of the plugin for it 🙂

I had problems with the Google XML Sitemap plug in, running out of memory too. I don’t quite understand why the sitemap problem takes so much memory. I think I will make something to run on my desktop that connects with the database remotely and then uploads the sitemap. That way I have giggle-bytes of memory to play with.

Thanks.

That’s because WP loads all posts and pages in one query. And although for the sitemap we only need the URL and some data like number of comments and last modified date to calculate priority, WP loads everything including content and irrelevant meta-data with that one query. So if your site holds vast amounts of posts and/or has on average a very large post content, the process will take a lot of memory.

Sadly, it’s hard to predict when it’s going to run into the provider imposed memory limit. Most providers allow you to adjust that limit from within PHP to temporarily increase it for memory intensive processes like file uploads (especially zipped files like plugins or WP core updates) and something like this sitemap generation. Apparently, Dreamhost does not 🙁

Hi Sylwia, it might be that Google is looking at cached content and needs some time to get up-to-date… Or it might have trouble with that trailing slash that seems to appear on your install. Not sure why that is happening though… What Permalink rules have you set? Do you know why your post type “External Videos” seems to force a different permalink structure than normal pages?

If the problem (in the eyes of Google) persists for more than a few hours, you might try the development version: http://downloads.wordpress.org/plugin/xml-sitemap-feed.zip

Thanks for your answer. For reasons unknown to me the sitemap was fine today and Google accepted it. Indeed, the trailing slash disapepared, although I don’t understand why.

Thanks anyway!

I installed XML Sitemap Feed plugin, but something strange happens. When you go to the sitemap URLs, they get redirected to the URLs but with a trailing slash at the end. This isn’t a problem with the English sitemaps, but with the Spanish ones, they redirect to the English files. Check it out:
http://www.artstroll.com/sitemap.xml
http://www.artstroll.com/es/sitemap.xml (redirects to English, with a trailing slash)
http://www.artstroll.com/sitemap-news.xml
http://www.artstroll.com/es/sitemap-news.xml (redirects to English)

Since the robots.txt file shows the URLs without a trailing slash, I’m afraid that Google (or other search engines) is not finding the Spanish sitemap.

Any ideas why this might be happening?

Thank you in advance for your help.

-Eduardo

Hi Eduardo, I’ve seen this trailing slash thing happen before… It has to do with the Permalink structure and maybe another plugin but I’m not clear on what actaully causes it. Something (like another plugin ot the theme you are using) is adding a trailing slash that causes WordPress to try and strip it out again, redirecting you to the root sitemap.xml in the process. And yes, search engines will get redirected too 🙁

I suggest the following steps (each time verifying if anything has changed) to debug:
1. Check your permalink structure and resave (!)
2. Temporarily switch to the default Twenty Ten theme
3. Switch off all other plugins

I have not been able to reproduce the issue so it would be great if you can find what actually causes this.

If the above steps do not get any results, put your site back to the original state (theme and plugins) and upgrade the XML Sitemap Feed plugin to the latest development version from http://downloads.wordpress.org/plugin/xml-sitemap-feed.zip and then retrace the steps above…

I hope that is not too complicated for you and thanks for taking the trouble 🙂

Thank you for your help, Ravan.

I followed suggestions 1 and 2, and they didn’t fix the problem. I’m reluctant to try suggestion 3, but will do so if all else fails.

Another strange think is that it used to work fine on another website that I have: http://www.bahiadelmarrincon.com/

But now it doesn’t. I haven’t added any new plugins to that website, but I HAVE upgraded to a newer version of WordPress. I suspect that the trailing slash problem was introduced by WordPress (not by plugins). And it seems to happen only with XML — it doesn’t happen with other types of URLs, e.g.:
http://www.artstroll.com/wp-content/uploads/2010/03/ArtStroll2011_PotentialVenues.pdf
http://www.artstroll.com/wp-content/uploads/2011/05/ArtStroll2011_Postcard.png

-Eduardo

The URLs you give as an example, are files not WordPress managed URLs… but anyway, it is strange. What WP version are you using? If it is the latest (3.1.3) than you might notice this site is using that version too without your particular issue.

I understand you hesitate to switch off plugins on a live site. Maybe you can skip that step and try the development version of XML Sitemap Feed 🙂

Up until the latest WP update everything was working fin but now Webmaster tools is showing a red cross against /sitemap.xml and /sitemap.xml.gz

I’ve turned off all the plugins and even done a fresh install and nothing works. Is there a conflict with the latest WP update?

Hi john, I’m looking at your sitemap on http://photo-journ.com/sitemap.xml and it’s Google Sitemap Generator by Arne Brachhold that is responsible for that sitemap… Not my plugin so I can’t help you with that one, sorry.

Well, actually I could but you will be better off asking the plugin developers support first. Or switch to mine, ofcourse 😉

Sorry about that. I took your off and tried the other to see if it would make a difference and forgot to change back before posting the support request.

I went back and deleted the other plugin and manually deleted the sitemap.xml files using FTP and also edited the Robots text file.

I reinstalled yours but it hasn’t generated a new sitemap. Could this have something to do with Feedburner?

In any event I disabled the feedburner email widget and Feedsmith extended plugins and did a fresh install of yours and still no sitemap.

I DO see a sitemap on http://photo-journ.com/sitemap.xml (this time indeed generated by my plugin) and as far as I can tell, there is nothing wrong with it… Re-submit it to Google Webmaster Tools and let it chew on it for a while.

About /sitemap.xml.gz – there is no such version so if you submitted that to Google Webmaster Tools, it will be marked as broken (404).

About /robots.txt – it will be best to leave it up to WordPress to generate the robots.txt output instead of having your own robots.txt file in your site root. My advice would be to delete the file you have there.

About FeedBurner – it depends on what plugin (or theme) you are using to redirect feed requests to FeedBurner. Some redirect ALL feed requests and that will then include requests for the sitemap (since that is actually a feed with my plugin) but others will only redirect RSS/Atom feed requests and will leave the sitemap alone. I have used FD FeedBurner plugin successfully in combination with XML Sitemap Feed so I can recommend that one…

Feedsmith, I don’t know. I’ll be happy to take a look if/when you activate that plugin 🙂

Well… you need news (a post) less than 48 hours old to appear in the Google News Sitemap 🙂

Hey there, great, great plugin btw. How can I list categories? is there a hack or something? Ty!

sorry, this plugin does not do that… pages and posts but no archives. the only exception being the front page.

Hi

I host a few WordPress sites on Windows Server 2003/IIS 6.0, and I’m having trouble getting your sitemap plugin to work.

I’m using XML Sitemap Feed Version 3.9.1, I’m running WordPress 3.1.3, I’ve got Permalinks set to /%postname% using a 404 handler (details available if required), and I’m using a plugin called Legacy URL Forwarding as well.

Oh, and I made the modification to line 11 of feed-sitemap.php.

I can see /robots.txt, which say this:

User-agent: *
Disallow: /

But there’s no sitemap.xml.

Any idea what’s happening?

Thanks,
Paul

Hi Paul, using a permalink like simply %POSTNAME% sounds to me like asking for trouble. A conflict between a post and a page with the same slug is easily encountered… Anyway, the fact that the sitemap is not mentioned in robots.txt (as long as it’s not a static file you’re looking at) indicates the plugin is not active. Did you get an error on plugin activation? Do you see anything related in your server error logs?

Another option: try the development version http://downloads.wordpress.org/plugin/xml-sitemap-feed.zip 🙂

Hi Glenn, for most WordPress installs that’s basically it, yes… But in your case, you need to do one thing more: start using ‘Pretty URLs’ by activating any one of the other Permalink options than the one you are using now. It will be good for your SEO (keywords in the urls are cosidered with priority search engines) and your /robots.txt /sitemap.xml and even /sitemap-news.xml will come live instantly 🙂

How can I know whether everything is setup correctly for my website to be recognised?

I don’t see a way I can view my sitemap (so that I have the peace of mind there is one) and I don’t see a robots.txt in my root

As far as I know, this plugin is doing nothing formy site.

Hi Charlie, if your site is not generating a robots.txt, it will not generate a sitemap.xml… To get a dynamic robots.txt you will need to (1) use any of the fancy Permalink structures and (2) remove any static robots.txt from your site root.

If my plugin is working correctly it will ALWAYS generate a sitemap feed via /index.php?feed=sitemap whetever Permalink structure you are using. If you do not get a feed via that URL, the plugin is not functioning.

Are we talking about backin5mins.com? Because you are running another xml sitemap plugin there…

thanks RavanH,

I think it was the “/index.php?feed=sitemap” that I needed to see.

It appears to be working, but there is still no robots.txt in my websites directory or the root of my hosting account.

Hi there!

I’m using your plugin and it seems to be almost perfect 🙂
My only issue is that when there is no article in the feed, google thinks there is an error with it, saying the format is not correct, and therefore seems to stop fetching it regularly or visits it less frequently which is problematic when adding fresh content after that.

Could you have a look ?
Thanks a lot !

Hi Benjamin,

I suppose you are talking about the News Sitemap? It lists only posts less than 48 hours old as this is what Google News wants. So yes, when there are NO posts, it lists no URLs.

Google webmaster Tools reports this as an error (based on general xml sitemap rules) but it is unclear to me and many others, since the question remains unanswered by any Google staff, if this actually a problem for the Google News crawler. And if it IS a problem, what should be the alternative… So when designing my plugin, I chose to let it present a news sitemap without any URLs.

It seems many people get worried about the error reported by Google Webmaster Tools. So I might choose for an alternative approach in the upcoming plugin release of presenting at least the latest post URL when there are no posts less then two days old.

Do you have proof (or any indication) that the Google News crawler (not the one from Google Webmaster Tools!) stops coming back after finding an empty news sitemap? If so, I really need to find out what Google News wants as responce when there is NO news…

Thanks for the quick answer!
Yes talking about the News Sitemap 🙂

I have no proof but what I usually see is that when the sitemap has been recognized by Google as faulty, it takes more time to get a new post published on Google News then.

The Error displayed on Google Webmaster tool just in case:
“Missing XML tag
This required tag is missing. Please add it and resubmit.”

What I usually do is submit the sitemap again so that my articles are published, though it’s still longer than when I still have content in the feed.

Hope it helps 🙂

I’ve just committed a change to the news sitemap template in the new development version which should make it default to at least one (the latest) post when there are no posts less than 48 hours old. I would really appreciate it if you could test it to see if this changes this delay in indexing after a ‘stale’ news sitemap…

You can download it from http://downloads.wordpress.org/plugin/xml-sitemap-feed.zip

Thanks 🙂

Hi Ravan,
Maybe we could communicate by mail could be simpler 😉

So I had no articles in my feed and just published one, added a Google news search feed (with site:iostv.fr) on Google Reader to try to have an estimate of the time needed for google to add the article compared to when it has been published in the sitemap, once it’s published I’ll install the other version of your plugin and will test (hopefully this weekend if no hot news comes along).

Cheers,
Benj

Benjamin, thanks for reporting back. I wrote you an e-mail so please check your spam folder if you would like to communicate further via mail 🙂

Can you please improve your addon to work with gtranslate pro plugin?
http://edo.webmaster.am/gtranslate
If user bought pro version, it creates links like site.com/es/?p=73, site.com/ru/?p=73 – around 55 languages.
Please write me directly – I will provide any information you need and will give you gtranslate pro author contact information.
Thank you for your work so much!

Hi Dmitry, thanks for your heads-up. When I get some time to, I’ll take a look at the gtranslate plugin. I suppose the free version is not so different from the pro version but if I need more info, I’ll contact you via e-mail. The main question will be: what does Google do with pages that it has basically just translated itself. Will it see them as duplicates or will the search engine spider be fooled by the translator bot output as being real content? It will be an interesting experiment 🙂

Free version makes it translated with jquery and not available to search engine bot. Pro version gives ability to translate and cache pages for search engine bot to read them like originally writen article. For that in free version you must enable “redirect” and “use pro version”, then set up your apache/nginx to redirect /es/ /en/ /ru/ or so to granslate php file – it will give translated page as output.

Hi, the sitemap is only for SE. The page visible for visitors on the same location is generated with a stylesheet which is ignored by the SE spiders. But breaking up the list of links will be noticed by the spiders. It can be done via a sitemap index file referencing multiple sitemaps each containing part of the sites page urls but there really is no need for it if your site has less than 50.000 post and pages…. There are better plugins around that are specially designed for creating a nice site map page for human eyes that allow you to group URLs any way you like 🙂

Hey RavanH,
one question please. My Blog aren’t in the Google News. is this a problem that they look every post for the sitemap-news? The WT said you are not in the News with this blog and crossed the sitemap.

Hi Thomas, do I understand you correctly in saying that you had submitted your Google News Sitemap (sitemap-news.xml) in your Google Webmaster Tools account but it has been removed again with the remark that your site is not registered or accepted by Google News? I have not seen that before… What does GWT say about your regular sitemap.xml? If there is no error there, all should be fine with your site.

Are we talking about your site blogger-world.de? If so, you are not using my sitemap plugin there.

Hey RavanH,
just you Look i must restore an file 🙂 i use it but i was spammed in last 30 minutes hard. so i restore the wrong folder. Look right now i use it.

My blog isn’t welcome in the Google News. So there is an red cross and the message i write. The normal sitemap is fine. Only my old plugin do an .gz and that must delete from the GWT.

I don’t understand that G search for the google-news sitemap on my root because i don’t accept in GN.

I suspect it is because there is a reference in http://blogger-world.de/robots.txt to it. This is not a problem.

The fact that your site is not listed by Google News does not mean you are prohibited from having a news sitemap 😉 but it looks like Webmaster Tools is kind enough to report what Google does with the news sitemap. It will be ignored until your site is accepted. This is not a problem because you will still make your latest posts (along with all the others) known to Google via the normal sitemap where they will appear at the top (right after the home URL) as soon as you post.

You can manually remove the sitemap-news.xml from the list of sitemaps in GWT if you are tired of that red cross…

Thanks for spend the time on my question.
Red is for me an Warning Color but you say thats no problem and than its no problem for me 🙂

I can’t remove the news Sitemap on the GWT there don’t stand in my sites only in all. There no delete function. So i wait or Google can do what they want.

Greatings
Thomas

Leave a Reply to RavanH Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.