XHTML / HTML / CSS Validity Testing Grounds

The intention behind this test is to launch a dozen almost-identical pages. Simply by introducing non-standards compliant code, the HTML/XHTML or CSS of each page has intentionally been rendered either valid or invalid. The plan is to determine which combination Google prefers.

It has been assumed that Google will give highest priority to pages containing the most standards-compliant, forward thinking code, i.e. valid XHTML 1.0 Strict / valid CSS, and least priority to invalild HTML / invalid CSS.

Over the course of time we will be checking out the Google results pages and plot the relative successes of the pages in the test. These results can be seen below.

In the test results below, the greener results bars show higher Google rankings - the greyer results show lower rankings. The "Code" column is largely for our own reference as this is how we have chosen to name the pages to make classification easier.

This test was launched on 13th April 2010 on a well established domain name.

Order of Google results:

Code Validity Code 21
Apr
2010
27
Apr
20 10
13
May
2010
09
Aug
2010
valid XHTML 1.0 Strict / valid CSS xsv cv 4 2 5 4
valid XHTML 1.0 Transitional / valid CSS xtv cv 7 10 7 10
valid HTML 4.01 / valid CSS hv cv 8 3 9 8
valid XHTML 1.0 Strict / invalid CSS xsv ci 3 8 6 7
valid XHTML 1.0 Transitional / invalid CSS xtv ci 12 12 8 11
valid HTML 4.01 / invalid CSS hv ci 2 7 4 5
invalid XHTML 1.0 Strict / valid CSS xsi cv 11 5 10 6
invalid XHTML 1.0 Transitional / valid CSS xti cv 9 1 2 2
invalid HTML 4.01 / valid CSS hi cv 10 11 3 3
invalid XHTML 1.0 Strict / invalid CSS xsi ci 5 9 12 9
invalid XHTML 1.0 Transitional / invalid CSS xti ci 1 4 1 1
invalid HTML 4.01 / invalid CSS hi ci 6 6 11 12

The first check of Google SERPS was done just a couple of week after launch and it was discovered that the pages had already been indexed. The results were surprising - the highest ranking was given to one of the worst pages - Invalid XHTML Transitional / invalid CSS. We will be watching the results over time to see if this shuffles out in the way we have been predicting.

Current leader chart is as follows:

Code Validity Code Result Worst
Pos
Best
Pos
+ve
Var
-ve
Var
invalid XHTML 1.0 Transitional / invalid CSS xti ci 1 4 1 +3 0
invalid XHTML 1.0 Transitional / valid CSS xti cv 2 9 1 +7 -1
invalid HTML 4.01 / valid CSS hi cv 3 11 3 +8 0
valid XHTML 1.0 Strict / valid CSS xsv cv 4 5 2 +1 -2
valid HTML 4.01 / invalid CSS hv ci 5 7 2 +2 -3
invalid XHTML 1.0 Strict / valid CSS xsi cv 6 11 5 +5 -1
valid XHTML 1.0 Strict / invalid CSS xsv ci 7 8 3 +1 -4
valid HTML 4.01 / valid CSS hv cv 8 9 3 +1 -5
invalid XHTML 1.0 Strict / invalid CSS xsi ci 9 12 5 +3 -4
valid XHTML 1.0 Transitional / valid CSS xtv cv 10 10 7 +0 -3
valid XHTML 1.0 Transitional / invalid CSS xtv ci 11 12 8 +1 -3
invalid HTML 4.01 / invalid CSS hi ci 12 12 6 +0 -6

I think a quick explanation of the columns is required:

Results

OK, so it's been almost four months since this experiment was launched and, to be honest, the results aren't as expected.

  1. The very best page, xsv_cv (valid XHTML Strict, valid CSS) finished up in 4th spot, even beaten by invalid HTML 4.01
  2. One of the worst pages, xti_ci (invalid XHTML Transitional and invalid CSS) has been the best performer.
  3. Although hi_cv (invalid HTML, valid CSS) has been an unexpected high-climber leaping 8 places up the chart to 3rd spot, there has been very little general movement since Google first picked up the pages.
  4. At least the worst page of the lot performed as expected - hi_ci (Invalid HTML 4.01, invalid CSS)

Conclusion

I'm not quite sure what conclusions to derive from this experiment. The results certainly aren't what I expected. There may be a few reasons for this:

  1. Google doesn't care how well a page validates;
  2. The pages could have been more "broken" than they were. To break the validation a simple <font> tag was introduced to the XHTML / HTML (HTML also didn't like the self-closing <hr> tag in the code), and to break the CSS I just introduced an Opacity declaration. Perhaps more effort could have been made to be a little more ham-fisted about the whole affair;
  3. The experiment needs to run a little longer; or
  4. There are external pressures influencing the results.

Perhaps this experiment should be run again with the code truly broken! Perhaps Google could see through my simple veil of deceit and could tell that I do actually know how to create good, valid code and let me off the hook!