Scanned documents now join Flash and .pdf files on the list of non text based formats that are indexable by Google’s robots. In an announcement today on Google’s Official Blog, Optical Character Recognition (OCR) technology’s ability to convert images of text into actual text that can be searched and indexed. This significant new step in Google’s arsenal will allow for many sources of previously seemingly inaccessible documents on the internet to be readily available, and easily searchable to the masses.
Google’s previous methods of indexing scanned documents utilized page/file titles, and other unreliable sources of metadata in an attempt to index the search engine-unfriendly images. If your Google search does in fact return results that include a scanned document, you’ll still be able to view it in its original form as a .pdf file, as well as the OCR’d text version, available to you through a ‘View As HTML’ link
This week Google added another source of useful data to the crawl error section of their Webmaster Tools. The original addition of this feature to the popular tools in August 2006 allowed account holders to view the types and counts of server crawl errors such as URLS not found, not followed, restricted and timed out. Due to popular demand, this feature is now complimented by the internal or external sources of ‘Not found’ (404) crawl errors
Whether you choose to view the data in the online application, or download all your crawl errors for later analysis, webmasters may use the data to track down exactly where the 404 errors are coming from, fix the internal ones, and attempt to fix the external ones
Crawl Error Sources Data Benefits Search Engine Optimization
If your server spits a 404 error because of an external linking error, the valuable link juice need not go to waste. Knowing the source of the 404 allows you to either contact the site owner to request a correction, or, you can simply 301-redirect the misspelled URL to the correct version. Presto, the vote for your site as an authoritative source of content is restored, earning you the valuable natural Pagerank points that your site deserves from the incoming link
Just before noon (EDT) on September 6 2008, GeoEye (formerly ORBIMAGE) successfully launched the highest resolution and most accurate commercial imaging satelite, GeoEye-1, into orbit from the Vandenberg Air Force Base in California.. This week, on October 7, an image of Kutztown University in Pennsylvania was returned, the first location seen by the satellite when the camera doors opened, 423 miles above the surface of the earth while orbiting at 17,000 miles per hour (4.5 miles per second).
Google, the co-sponsor of the $502 Million project (along with The National Geospatial-Intelligence Agency), will have exclusive online mapping rights to the images. The photo resolution of the GeoEye is 41 cm (16 in), meaning objects of this size may be clearly seen – however government restrictions will allow Google to use only 50 cm (20 in). Currently, Google Earth’s highest resolution images are approximately 60 cm (24 in).
According to Kate Hurowitz, a spokeswoman for Google, the visual difference between the current 60 cm resolution and the upcoming 50 cm resolution, will be that of clearly seeing rooftops compared to clearly seeing vehicles.
GeoEye-2 is scheduled to launch in 2011 or 2012, and has a planned resolution of 25 cm (9.8 in).
Click on the image below for a closer view of the Kutztown campus from GeoEye-1′s first image. (1.26 MB)
Microsoft Research has released a new social search engine prototype called “uRank”. Social search engines boast personalization features such as allowing users to move search results around to better suit their tastes, and share information with others.
“…allows people to organize, edit and annotate search results…to better support people as they are exploring a topic, comparing information, keeping track of what they’re learning, and collaborating with others…”. Microsoft Research proposes to use these interactions to contribute to the perceived relevance of search results opposed to traditional algorithmic methods of search engine results indexing. uRank is currently accessible to US users only. Apparently they’re ‘working on that’.
- Advertising (1)
- Bing (2)
- Business (4)
- Content Strategy (1)
- Copywriting (1)
- Domains (1)
- Google (14)
- Humour (2)
- Internet Marketing (2)
- Landing Page Optimization (1)
- Link Building (1)
- Misinformation (1)
- news (16)
- Online reputation management (1)
- Semantics (2)
- SEO Tips (2)
- SEO Tools (7)
- Social Media (7)
- Social News (1)
- Technology (2)
- Video (4)
- Web analytics (1)
- Web Usability (1)