Removing wp_head() elements (rel=’start’, etc.)

In customising WordPress you may find a need to occasionally remove or add to the Link elements that WordPress automatically outputs in the function call wp_head(). I’ve recently had a need to remove the rel=’prev’ and rel=’next’ link elements and in trying to avoid customising the core WordPress functions the following solutions works.

Ensure you have a functions.php file in your theme directory that you are using. If not create the file and edit the file. The following lines will help remove select lines from your wp_head() function:

remove_action( 'wp_head', 'feed_links_extra', 3 ); // Removes the links to the extra feeds such as category feeds
remove_action( 'wp_head', 'feed_links', 2 ); // Removes links to the general feeds: Post and Comment Feed
remove_action( 'wp_head', 'rsd_link'); // Removes the link to the Really Simple Discovery service endpoint, EditURI link
remove_action( 'wp_head', 'wlwmanifest_link'); // Removes the link to the Windows Live Writer manifest file.
remove_action( 'wp_head', 'index_rel_link'); // Removes the index link
remove_action( 'wp_head', 'parent_post_rel_link'); // Removes the prev link
remove_action( 'wp_head', 'start_post_rel_link'); // Removes the start link
remove_action( 'wp_head', 'adjacent_posts_rel_link'); // Removes the relational links for the posts adjacent to the current post.
remove_action( 'wp_head', 'wp_generator'); // Removes the WordPress version i.e. - WordPress 2.8.4

Don’t remove these items unless you have a need to. The WordPress generator removal could be useful if you are not religiously upgrading your WordPress install as it helps hide the WP version from potential hackers to a certain degree.

HTTP Response Header Checker

HTTP Response Header Checker

FAIL, “.$url.” does not exist”;
} else {
if(empty($info[“path”])) {
$info[“path”] = “/”;
}
$query = “”;
if(isset($info[“query”])) {
$query = “?”.$info[“query”].””;
}
$out = “HEAD “.$info[“path”].””.$query.” HTTP/1.1\r\n”;
$out .= “Host: “.$info[“host”].”\r\n”;
$out .= “Connection: close \r\n”;
$out .= “User-Agent: RJPargeter.com Response Header Checker v1.0 – rjpargeter.com/contact for feedback\r\n\r\n”;
fwrite( $fp, $out );
$html = “”;
while ( !feof( $fp ) ) {
$html .= fread( $fp, 8192 );
}
fclose( $fp );
}
if(!$html) {
$message = “FAIL, “.$url.” does not exist”;
} else {
$headers = explode(“\r\n”, $html);
unset($html);
for($i=0;isset($headers[$i]);$i++ ) {

//if(preg_match(“/HTTP\/[0-9A-Za-z +]/i” ,$headers[$i])) {
// $status .= preg_replace(“/http\/[0-9]\.[0-9]/i”,””,$headers[$i]);
//}
}
//$message = $status.” “.$response_array[$status];
}
}
?>
The HTTP Response is the information returned in the HTTP Protocol when you access URL’s over the Internet. Google, Yahoo and in fact all browsers rely on this information to determine if the information you are trying to access has been found or if not what may of happened to it.

The full HTTP response contains a variety of information that a web server will send in response to a HTTP request. This information can yield interesting information such as the web server a site is hosted upon, the scripting language used and most importantly the response code. The following search box allows you to enter a URL and see the full HTTP Response.

Why is this useful you may be thinking? Well Google, etc rely on the response codes to determine if they index your site. For a resource to be indexed you will most often than not be looking for a ‘200 ok’ response. If a page is missing you may get a ‘404 page not found’. If a page has gone you may look for a ‘410 Gone’ response to be sent back.

Feel free to use this tool I have developed to test your URL’s HTTP response:

Domain:


Response Code:“;
for($i=0;isset($headers[$i]);$i++ ) {
echo “
” , $headers[$i];
}

} else {
echo “Response Code:
“;
echo $message;
}

?>

The long forgotten robots.txt

I am still amazed at how many web sites still don’t employ a robots.txt file at the root of their web server.  Even SEO firms or people claiming to be SEO experts have them missing which I find very funny.  There also countless arguments of whether you still need to have a robots.txt, but my advice is if the search engine robots still request it then I’d rather have it there with the welcome mat to the site.

For those of you who don’t know the history of a robots.txt file then i’d suggest you have a Google or Wikipedia for it.  In short it ‘s a text file that specifies which parts of a web site to ‘index’ and ‘crawl’ and/or which parts to not index.  You can also get specific and setup up rules based on a certain spiders and crawlers.

To start with you need to create a text file called robots.txt and place in the root of your web host.  You should be able to access it through your web browser at www.yourdomain.com/robots.txt

You can view other web sites robots.txt files by accessing the robots.txt at the root of their domain.
 

If you want Google, etc. to come into your site and index everything then things are very easy.  Simply add the following to your robots.txt file and away you go:

User-agent: *
Disallow:

Alternatively if you wish to stop all pages in your site being indexed then the following should be present in your file:

User-agent: *
Disallow: /

To stop robots indexing a folder called images and another called private you would add a Disallow line for each folder:

User-agent: *
Disallow: /images/
Disallow: /private/

The above would still index the rest of the site, but anything in those folders would be excluded from search engine results.

To disallow a file you specify the file as above with a folder:

User-agent: *
Disallow: /myPrivateFile.htm

If you only wanted Google access to your site you specify the following:

User-agent: Google
Disallow:

User-agent: *
Disallow: /

If you are looking at getting your site fully indexed then I would put the first example in your robots.txt file.

PageRank no longer important?

I keep hearing people saying that Google’s Page Rank is no longer important and X or Y is much more important. I still believe that Page Rank is one of the most important factors in a site ranking well in the Google results.

Firstly, Google still include a page rank on the Google navigation tool bar – ok this is not updated immediately but I would assume this is based on infrastructure and speed issues.

Secondly Google went to all the trouble to patent the page rank rating algorithm through Stanford University and I would doubt they are going to just leave this behind. The page rank has been the foundation of Google success and although it is undoubtedly tweaked and used along side other algorithms it is still in my opinion the basis of the initial search engine rankings.

Thirdly, when you hear Google employee Matt Cutts discussing page rank in such detail you know it is still fundamental to how Google works – http://www.mattcutts.com/blog/more-info-on-pagerank/ . As Matt discusses in this article pagerank in the toolbar is usually late to reflect how your site is ranking. This however is just the navigation tool bar representation of this. Really behind the scenes page rank is changing all the time as the index is added to, pages removed and back links calculated.

So there you have it -PAGERANK is still alive and soldiering on. Whilst Mr Cutts still talks about pagerank we know it is still one of the main factors in Google search engine ranking.

‘Black-Hat’ SEO Techniques

The following SEO techniques are known as ‘black-hat’ SEO and should be avoided by your web site. These techniques can sometimes provide very quick search engine result improvements, but over time are known to cause sites being banned and removed from indexes altogether.

Hidden Text
An old technique to increase keywords on a page was to include long lists of keywords and key phrases as hidden text on a page. This was sometimes achieved by placing the text far below the main content on the page or by displaying the text as the same colour as a background colour, i.e. white on white or black on black. This technique goes against Google’s webmaster guidelines.

Cloaking or Doorway Pages
Don’t deceive your users or present different content to search engines than you display to users. Matt Cutts (Google employee) describes a classic case of this on his blog site (http://www.mattcutts.com/blog/ramping-up-on-international-webspam/) whereby BMW displayed text to search engine robots whereas normal web users would be shown other content via the use of a JavaScript redirect. It is important to not try to do quick redirects on pages so that the search engine crawlers see certain content and then a user’s browser redirects to another page

Link schemes
Avoid links schemes that provide a massive increase in incoming links from bad neighbourhoods and other sites of dubious content. Participating in such schemes can result in your site being penalised.

Automated Search Engine Submission Software
Avoid using search engine submission software such as ‘WebPosition Gold” or other similar products. As long as your site is linked to from other sites and your site is up and running with a valid robots.txt file then pages and content will be indexed without the need for this software.

Duplicate Content
Avoid duplicating the same content on different pages. If Google detects large amounts of duplicated pages on different sub or main domains then you can risk a ‘duplicate content penalty’ which can result in the site losing rankings. Sub domains such as above are all treated as separate websites and if duplicate content is found then both sites can suffer ranking problems until the original origin site of the content is determined by Google’s algorithms.

Content Solely for Search Engines
Avoid publishing any content that is solely for search engine spiders. Content that is too rich in keyword density and unintelligible to a human can be detected by Google and other search engines and result in site penalties. Always write content firstly for humans and secondly for crawlers and robots. As long as the pages are proper English there is no problem with including keywords.

Frames
Although not a ‘Black Hat’ SEO, Frames should be avoided on sites as they cause huge problems with crawlers indexing sites. Individual pages can only be indexed and therefore that means each page in a frame will be indexed which will cause problems when someone clicks on that individual page from search engine results.

SEO Part 14 – Web Based Sitemap

A site map enables search engines to easily find all content on the site by following all the links from the sitemap page(s)

With the implementation of Google sitemaps, discussed in the ‘Google webmaster tools’ section there is no longer a huge need from a Google SEO point of view for a web based sitemap. For other search engines however it would be beneficial to have a few pages on the site that linked to all content currently published. A single page can have overheads on the system for large sites and for the front end user this can be too big to download and view – therefore several pages of links could make up such a sitemap.

The sitemap start page should be linked to front the front homepage as this is the most common page that search engine spiders will access.

SEO Part 13 – ‘Linkbait’ Content

In the ‘Incoming Links’ section before it is described how incoming links to the site can great improve Page Rank and therefore pages ranking higher in the search engines. The section describes how to aim for links to your site and try to aim for keywords in link text. One of the best ways to attract incoming links is to publicise content that is commonly referred to as ‘link bait’. This content has two principle aims:

1. Informative content for the site visitor
2. Attract external sources to reference or link to it

With this in mind it can pay to encourage articles to be created and published on your site by even external sources. If you run a site on organic fertilisers then write some articles about the field and link to this from your front page. Over time similar themed sites will start to link to it if it contains relevant and useful information. Encourage your business partners to link to it.

Blog sites such as SEOSolutions.co.uk are an excellent way to provide new fresh content which will ultimately attract external links – you could therefore look at adding a Blog on your site even if it’s to discuss your latest product releases.

SEO Part 12 – Valid HTML and Accessibility

The W3 World Wide Web Consortium publicise specifications for web based content and how it should be structured for consistent representation across different platforms and browsers. W3 also develop guidelines widely regarded as international standards for web accessibility – the web accessibility initiative (WAI).

If web based content conforms to W3 standards for valid HTML, XHTML, CSS and WAI then that content is expected to be accessible to a wider audience. For this reason search engines are likely to add more weighting to ‘valid’ sites rather than sites that do not conform.

Google is beginning to move in the direction of favouring valid sites with ‘accessible search’ – http://labs.google.com/accessible/. This search ranks sites for the search string based on how accessible Google judge that site to be. Although this search is still in a beta – or ‘Google labs’ it is widely expected that such technology will be used in the main Google search.

You should begin to work towards making more content valid and accessible by following specifications and guidleines on the W3 website by using CSS and HTML validators.

SEO Part 11 – TITLE and ALT tags

The HTML tags TITLE and ALT are used on text and image links respectively to give a description to the link or image. ALT tags are also used describe an image. The title tag is suggesting a title for the destination source. Both tags are displayed by different browsers in different ways and also auditory browsers can read the text aloud.

TITLE Tags
It has long been known that some of the search engines associate ranking and relevance to the target page based on TITLE tags on the links that link to that page. This is therefore tied in with the text that is used to link to pages. The TITLE is meant to be a more descriptive passage on that link target and the text used to link to that page. So here again we should try to incorporate keywords and key phrases for that target page in the TITLE tag.

To add text into the USC TITLE tags for a given link, the text needs to be added into the Tooltip field of the link

Alt Tags
ALT tags have a similar effect to TITLE tags but just on the images they relate to. The text entered in an images ALT tag can be used by search engines for image indexing but also an indication of what the image content on that page relates to.

SEO Part 10 – Internal Linking

As discussed in the external links section above, links are one the main mechanisms by which search engines find new pages and determine the relevance and importance of pages. As with external linking to your site the internal links within a site play an important role in passing on pagerank and relevance.

Firstly ensure you have a very clear structure on your site such as the following:
Web-Structure
It is important to consider that pages linked to from the homepage will inherit some the page rank (PR) importance €“ as can be seen with the Google PR on the Google toolbar. What you will commonly notice is that the further you traverse down the navigation tree the more the PR degrades on a site. If you find pages that are linked to from the homepage then these will commonly have a PR just under the homepage PR. Pages that are only linked to from lower level pages will have a low PR.

What does this mean or matter you might ask? Well the higher the page rank of those underlying pages the more chance they have of coming top for a given search that matches the TITLE, META and content of that page.

The PR degradation discussed here can be altered by externally high PR pages linking to a page somewhere other then the homepage on your site. These in effect pass the PR onto your pages which can in some circumstances increase your page low in the navigation tree to a PR level the same as your homepage.