Thursday, 21 May 2015

How To Specify An HTML Web Document Language For Good SEO

So you just wrote a beautiful essay on James Joyce's Ulysses - in Irish Gaelic. Will Yahoo, Google, Microsoft and Ask recognize it as Gaelic, hosted as it is on your co.uk domain? Can Be. But you can give them a hint!

The trick is to use all HTTP and HTML code settings available to your advantage to make sure your documents are not misidentified. This article considers HTTP and HTML aspects of website internationalization for search engine optimization.
Why is language recognition of a problem?

Search engines try to match the language of a web searcher (based on ip geo location recognition or user specified preferences) to Web documents when determining the best matches a search query. In some cases, a user could specify that the results are limited to a specific language. Left to their own devices, search engines have some clues to determine the human language of a document:
  •     The site area of the country
  •     The country where the site is hosted
  •     the language of documents linking the document.
  •     A text pattern analysis of the document.

Each approach is fraught with difficulties. Consider a few:

Country domain suffix of a website: Although it is likely that a site with a .de extension is in German, there is always the possibility that the German company has published the contents in other languages for international audience. Some areas of the country, such as .ch for Switzerland, are used by countries with several official languages, in this case, German, French, Italian and French-speaking Switzerland.

When the site is hosted: Many sites host in geographic areas far from their target audience due to cheap hosting options.

The language of linked documents: While the Internet is indeed a set of hubbed networks, it is quite common for web pages to cite an authority, even if the authority is in another language (English, for example)

text pattern analysis: This is probably the most accurate method, especially for longer documents. While search engines do not reveal their approach (es) consider the perl Lingua: Identify module which currently recognizes 33 languages. Lingua: Identify uses a combination of methods corresponding to four text patterns; Here we quote the perl Lingua: Identify the documentation:

    Small Word Technique

    The "Small Word Technique" searches the text for the most common words of each active language. These words are usually articles, pronouns, etc., that happen to be (usually) the shortest words in the language; hence the name of the method. This is usually a good method for large texts.
    Prefix Analysis

    This method analyzes text for common prefixes of each active language.
    Suffix analysis

    Similar to the analysis but the analysis prefix common suffixes.
    N-gram Categorization

    N-grams are sequences of tokens. You can think of them as syllables, but they are also more than that because they are not only consist of the characters, but also by the spaces (or separation defining words). N-gram data is available from Google.

    N-grams are a very good way to identify languages, as the most common of each language are usually not very common in others.

Wednesday, 6 May 2015

Form Inputs: The Browser Support Issue You Didn’t Know You Had

Admission humbly. It has been a part of HTML by the time HTML has had a formal specification; but before HTML5, developers were incapacitated by limited types and attributes. As the use of smartphones and onscreen keyboards has flourished, however, inflows have assumed a new and very important role - but they are also riddled with inconsistencies and device browser.

The eight types of original entry were brilliant in its simplicity. (Well, OK, maybe <input type = "image"> has not aged well.)

Think about it: When you insert a single element in your markup, you can say any web browser to control the interaction, and can completely change the interaction - a text field to a check box to a radio button - simply by changing a keyword. Now imagine a world where the creation of these interactions also involves creating custom controls interaction, and you begin to realize how taken for granted are actually inputs.

Unfortunately, even Tim Berners-Lee and company could not have foreseen the strain that mobile devices and web applications to interact with hunger would place in these original concepts for user input.

That's what the HTML 5.0 specification be solved by expanding the concept of text input to allow certain types of data, such as numbers and email addresses, as well as the rich, such as keyboards and date specific tasks interactions screen - and color-gatherers. Most were designed with graceful degradation at its core, adding improvements in browsers, while indulging either basic text entries in the elderly.


Or at least that was the intention. In the real world, many of these new inputs and attributes - even seemingly innocuous types as <input type = "number"> - do not always behave as you might expect.
Identification of the problem

although not as fierce as the battles of yore, input types are the cause of a new browser war on a small scale. Despite the existence of standards, manufacturers of browsers and devices supported entities palmitas entrance and taken different approaches to the implementation of these enhanced interactions.

Take the time. Using <input type = "date"> is a boon to any user application developer has had to add a date picker based on JavaScript to a website ... or at least it would be if all browsers supported. Desktop versions of Chrome and most mobile browsers display a date-picker origins: