Solr, DataImportHandler, UUID and SQL Server

I’ve recently been setting up Apache Lucene/Solr to index static PDF files and also import data to the collection from MS SQL Server.

After successfully indexing PDF files and providing them with a unique id via UUID I wanted to import several SQL tables that each had a ID column called ‘id’. These tables would obvioulsy have overlapping ID’s at some stage so I wanted to use UUID on these documents as well.

I struggles to find much documentation on Solr, SQL Server and UUID, but after successfully setting up UUID via http://wiki.apache.org/solr/UniqueKey, you also need to the UpdateRequestHander on the dataimport handler as well. Therefore the following code:

<requestHandler name=”/dataimport” class=”org.apache.solr.handler.dataimport.DataImportHandler”>
<lst name=”defaults”>
<str name=”config”>db-data-config.xml</str>
</lst>
</requestHandler>

changed to this

<requestHandler name=”/dataimport” class=”org.apache.solr.handler.dataimport.DataImportHandler”>
<lst name=”defaults”>
<str name=”config”>db-data-config.xml</str>
<str name=”update.chain”>uuid</str>
</lst>
</requestHandler>

then auto creates unique id’s when importing on mass from SQL Server tables.

SQL Server: create failed for user/ User, group, or role already exists in the current database

If experiencing the following errors in SQL Server:

Create failed for User ‘xyz’. (Microsoft.SqlServer.Smo)

User, group, or role ‘xyz’ already exists in the current database. (Microsoft SQL Server, Error: 15023)

The above error can sometime occur when migrating a SQL Server Database from one staged environment to another. If both those environments have the same users present you can get issues when trying to add users from the new staged environments to the recently restored DB that the users are already present within.

A solution I found is to map the DB users to the accounts already within the DB. This is achieved by using the stored procedure sp_change_users_login

USE AdventureWorks
GO
EXEC sp_change_users_login 'Update_One', 'xyz', 'xyz'
GO

The above will take the specified user {parameter 2} and map to the login user {parameter 3}.
Further information and other actions are available here: http://msdn.microsoft.com/en-us/library/ms174378(v=sql.105).aspx

Note: MSDN state this feature will be removed in a “future version”.

Resize VMWare Disk

I was looking for a simple slider to resize a disk in VMWare server recently and its not that simple if you want to resize a disk.
The solution is detailed in more detail here, but essentially you need run the following command on the host of the VMware machine:

vmware-vdiskmanager -x 30GB D:\VirtualMachines\virtualMachine1\virtualMachine1.vmdk

The above command essentially will resize the disk for virtualMachine1 to 30GB.

vmware-vdiskmanager is located in the installation directory as detailed here, but the following table gives common locations under 32 bit windows:

Workstation \Program Files\VMware\VMware Workstation
Player / ACE Instance \Program Files\VMware\VMware Player
VMware Server \Program Files\VMware\VMware Server
GSX \Program Files\VMware\VMware GSX Server
Converter \Program Files\VMware\VMware Converter
Capacity Planner \Program Files\VMware\VMware Capacity Planner
Lab Manager \Program Files\VMware\VMware Lab Manager Server
Stage Manager \Program Files\VMware\VMware Stage Manager Server
Virtual Desktop Manager \Program Files\VMware\VMware VDM\Server
Consolidated Backup \Program Files\VMware\VMware Consolidated Backup Framework
VirtualCenter 2.5.x \Program Files\VMware\Infrastructure\VirtualCenter Server
VirtualCenter 2.0.x \Program Files\VMware\VMware VirtualCenter 2.0
Virtual Infrastructure Client 2.5.x \Program Files\VMware\VMware Virtual Infrastructure Client 2.0
Virtual Infrastructure Client 2.0.x \Program Files\VMware\Infrastructure\Virtual Infrastructure Client
Server Console (VMware Server) \Program Files\VMware\VMware Server Console
Remote Console (GSX) \Program Files\VMware\VMware Remote Console

The original Discount Voucher/voucher code sites

Online discount vouchers and discount codes are now accepted as the norm in online shopping. The number of ‘voucher codes’ sites is near saturation point in the UK, so it’s worth remembering which web sites started the trend and what they are offering online consumers.

One the original sites I used to use was UK Frenzy (http://www.ukfrenzy.co.uk/) back in around 2002/2003. UK Frenzy was one of the original sites for online vouchers and regularly featured Amazon, Dell, Dixons and Comet Vouchers. Today they are still offering information in pretty much the same format and also have a fairly active forum of users sharing vouchers and online bargains.

The second site is started to use around 2005 was HotUKDeals (http://www.hotukdeals.com/). This site started small and rapidly grew into a massive list of product offers, deals and merchant specific voucher codes. Over the years the categorisation has changed but the site still offers a vast amount of fantastic buyer’s advice.

A third site that featured heavily in my online purchasing for sourcing bargains and money saving opportunities was Martin Lewis’s Money Saving Expert (http://forums.moneysavingexpert.com). In particular it was the user forums and specifically the Discount Vouchers and Discount Codes forum. Over the years Martin’s site has grown from strength to strength and is regularly featured on the BBC, but the true value of this site is its users. I regularly check back here to search for merchant vouchers.

The following two sites are close to my heart but also worth a mention.

Greedymoose (http://www.greedymoose.co.uk) was a site I set up in 2004 and ran till 2005/06. It hasn’t received much in the way of updates since then but at its time was heavily visited voucher code site that produced some good revenue. At its heyday it ran several unique discount vouchers that can still be found on some the later ‘voucher code’ sites – but at the time these were unique to GreedyMoose. Maybe soon GreedyMoose will have revamp – watch this space.

The final site that will be mentioned is UK Discount Vouchers (http://www.ukdiscountvouchers.co.uk/). UK Discount Vouchers (UKDV) was formed in mid 2004 as a user’s forum for discount vouchers and sharing them easily with fellow shoppers. This worked to a certain degree but forum spammers eventually got too much and the site was changed to a Blog in August 2006. Since 2006 the site has grown and has now transformed into a store specific site where users can access their favourite shops and easily check for discount vouchers and codes for that store.

So that’s a brief round-up of my original voucher code sites that I used and developed.

Honda HR-V automatic shudder/shake from start off

I’ve come across this problem now twice on Honda HR-V/HRV automatics whereby the car will do a big shudder/shake/judder/vibration from a stand still start.

Having done my own research into it at the time it relates in most instances to the automatic transmission fluid is past its best and the clutch brake is sticking when the car comes to stand still. When you then try to pull away the brake does not disengage in time and you get the sudden judder and shake of the car. As you can imagine this will probably not do the car much good in the long term so the solution is as follows:

1) Buy official Honda Automatic transmission fluid. This will either be Honda CVT Fluid or Honda ATF Premium fluid. Your Honda dealer will advise the current recommendation.
2) Drain the transmission fluid as follows:
2.1) Bring the tranismission up to normal operating temperature. Drive the vehicale to do this or run the engine until the radiator fan comes on.
2.2) Park the car on a level surface and turn the engine off.
2.3) Remove the transmission drain plug as pictured below and drain fluid into a suitable container.
Honda HRV Transmission Plug Removal/Change
2.3) Reinstall drain plug (or new one if it looks damaged/corroded).
3) Refill with the transmission fluid from 1) to the recommended level

You should then find the HR-V’s shudder/judder has stopped. If you still experience the issues then the transmission will probably need to be looked at by a qualified Honda technician.

Good luck and let me know if this helps anyone.

Removing wp_head() elements (rel=’start’, etc.)

In customising WordPress you may find a need to occasionally remove or add to the Link elements that WordPress automatically outputs in the function call wp_head(). I’ve recently had a need to remove the rel=’prev’ and rel=’next’ link elements and in trying to avoid customising the core WordPress functions the following solutions works.

Ensure you have a functions.php file in your theme directory that you are using. If not create the file and edit the file. The following lines will help remove select lines from your wp_head() function:

remove_action( 'wp_head', 'feed_links_extra', 3 ); // Removes the links to the extra feeds such as category feeds
remove_action( 'wp_head', 'feed_links', 2 ); // Removes links to the general feeds: Post and Comment Feed
remove_action( 'wp_head', 'rsd_link'); // Removes the link to the Really Simple Discovery service endpoint, EditURI link
remove_action( 'wp_head', 'wlwmanifest_link'); // Removes the link to the Windows Live Writer manifest file.
remove_action( 'wp_head', 'index_rel_link'); // Removes the index link
remove_action( 'wp_head', 'parent_post_rel_link'); // Removes the prev link
remove_action( 'wp_head', 'start_post_rel_link'); // Removes the start link
remove_action( 'wp_head', 'adjacent_posts_rel_link'); // Removes the relational links for the posts adjacent to the current post.
remove_action( 'wp_head', 'wp_generator'); // Removes the WordPress version i.e. - WordPress 2.8.4

Don’t remove these items unless you have a need to. The WordPress generator removal could be useful if you are not religiously upgrading your WordPress install as it helps hide the WP version from potential hackers to a certain degree.

RGB to HEX Converter

Working with CSS I am constantly trying to convert RGB values from Photoshop to their hex equivalent. There is probably a setting in PhotoShop that I have missed, but the following small form will quickly convert RGB values to the HEX equivalent. You can then use these values in CSS with # at the beginning.

Let me know if you find this useful:

R:
G:
B:
:

HTTP Response Header Checker

HTTP Response Header Checker

FAIL, “.$url.” does not exist”;
} else {
if(empty($info[“path”])) {
$info[“path”] = “/”;
}
$query = “”;
if(isset($info[“query”])) {
$query = “?”.$info[“query”].””;
}
$out = “HEAD “.$info[“path”].””.$query.” HTTP/1.1\r\n”;
$out .= “Host: “.$info[“host”].”\r\n”;
$out .= “Connection: close \r\n”;
$out .= “User-Agent: RJPargeter.com Response Header Checker v1.0 – rjpargeter.com/contact for feedback\r\n\r\n”;
fwrite( $fp, $out );
$html = “”;
while ( !feof( $fp ) ) {
$html .= fread( $fp, 8192 );
}
fclose( $fp );
}
if(!$html) {
$message = “FAIL, “.$url.” does not exist”;
} else {
$headers = explode(“\r\n”, $html);
unset($html);
for($i=0;isset($headers[$i]);$i++ ) {

//if(preg_match(“/HTTP\/[0-9A-Za-z +]/i” ,$headers[$i])) {
// $status .= preg_replace(“/http\/[0-9]\.[0-9]/i”,””,$headers[$i]);
//}
}
//$message = $status.” “.$response_array[$status];
}
}
?>
The HTTP Response is the information returned in the HTTP Protocol when you access URL’s over the Internet. Google, Yahoo and in fact all browsers rely on this information to determine if the information you are trying to access has been found or if not what may of happened to it.

The full HTTP response contains a variety of information that a web server will send in response to a HTTP request. This information can yield interesting information such as the web server a site is hosted upon, the scripting language used and most importantly the response code. The following search box allows you to enter a URL and see the full HTTP Response.

Why is this useful you may be thinking? Well Google, etc rely on the response codes to determine if they index your site. For a resource to be indexed you will most often than not be looking for a ‘200 ok’ response. If a page is missing you may get a ‘404 page not found’. If a page has gone you may look for a ‘410 Gone’ response to be sent back.

Feel free to use this tool I have developed to test your URL’s HTTP response:

Domain:


Response Code:“;
for($i=0;isset($headers[$i]);$i++ ) {
echo “
” , $headers[$i];
}

} else {
echo “Response Code:
“;
echo $message;
}

?>

The long forgotten robots.txt

I am still amazed at how many web sites still don’t employ a robots.txt file at the root of their web server.  Even SEO firms or people claiming to be SEO experts have them missing which I find very funny.  There also countless arguments of whether you still need to have a robots.txt, but my advice is if the search engine robots still request it then I’d rather have it there with the welcome mat to the site.

For those of you who don’t know the history of a robots.txt file then i’d suggest you have a Google or Wikipedia for it.  In short it ‘s a text file that specifies which parts of a web site to ‘index’ and ‘crawl’ and/or which parts to not index.  You can also get specific and setup up rules based on a certain spiders and crawlers.

To start with you need to create a text file called robots.txt and place in the root of your web host.  You should be able to access it through your web browser at www.yourdomain.com/robots.txt

You can view other web sites robots.txt files by accessing the robots.txt at the root of their domain.
 

If you want Google, etc. to come into your site and index everything then things are very easy.  Simply add the following to your robots.txt file and away you go:

User-agent: *
Disallow:

Alternatively if you wish to stop all pages in your site being indexed then the following should be present in your file:

User-agent: *
Disallow: /

To stop robots indexing a folder called images and another called private you would add a Disallow line for each folder:

User-agent: *
Disallow: /images/
Disallow: /private/

The above would still index the rest of the site, but anything in those folders would be excluded from search engine results.

To disallow a file you specify the file as above with a folder:

User-agent: *
Disallow: /myPrivateFile.htm

If you only wanted Google access to your site you specify the following:

User-agent: Google
Disallow:

User-agent: *
Disallow: /

If you are looking at getting your site fully indexed then I would put the first example in your robots.txt file.