HTSS: HyperText Semantic Subset
This is a draft for a proposed subset of (X)HTML5. None of this information is final and is subject to change without notice.
Why this instead of AMP?
- AMP encourages/requires use of JS.
- AMP utilises nonstandard HTML tags that require the use of JS to interpret.
- AMP requires assets from a centrally-operated entity (cdn.ampproject.org).
- AMP is currently designed to be served alongside the "canonical" HTML content. The goal with HTSS is to provide a regular subset of HTML expressive enough to serve as the canonical version.
- AMP does not address accessibility concerns.
As an aside, AMP seems to invent the solution to a problem that only surfaced in the Web's later years. To load pages quickly, web developers and designers simply need to emphasise simplicity, and make do with more of the features of bare HTML5. Advertisers must find less-intrusive ways to market their products without compromising on usability, as well. There is plenty of literature online about AMP if you are still not convinced of its uselessness and potential harm to the Web ecosystem.
Key points
- HTML documents are for HTML; it should not consist of mixed mimetypes. While JS and CSS are allowed in HTML, reserved HTML tokens still need to be entity-escaped or enclosed in a CDATA tag. The ideal solution is to serve non-HTML as separate files. This also aids in clientside caching and capabilities: the client does not have to download assets it already has, and it does not have to download assets it cannot display (CSS and images inside a text browser or screen reader, for instance). Additionally, this makes HTML cleaner by removing the use of
[onhover]
,[style]
, and other tag attributes. - Content should be the primary focus. Content should either be at the very top of the page, before any navigational or supplementary site information, or accessible via a "skip to content" anchor located at the top of the page.
- Websites should be easily navigable without the aid of CSS or JS. These assets should be additive, not required.
- Making full use of the HTML5 standard is desirable over accepting third-party additions to the specification. Make use of standard markup when possible.
What is required?
- HTSS is a subset of HTML5 or XHTML5, therefore the document must begin with an HTML5-compatible doctype, e.g.
<!DOCTYPE html>
or<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html>
. All other (X)HTML5 rules also apply. - HTSS must be advertised as an HTML- or XML-compatible mimetype such as
text/html
orapplication/xhtml+xml
. - HTSS suggests the use of XHTML5 rather than the SGML-based grammar, but for now this is not a requirement and both the full syntax specifications of HTML5 and XHTML5 are allowed.
Tag and attribute requirements
Which tags and attributes have specific requirements in HTSS? (CSS selectors are used to refer to tags and their attributes in this section.)
- Global attributes:
[lang]
([xml:lang]
is also acceptable for XHTML5) should be used whenever an element's language is different from its parent: for instance,html[lang=en-US]
defines a document as United States English, which may have a Latin excerpt asblockquote[lang=la]
.
html
:html[xmlns]
is required for XHTML5-valid HTSS.
b
: Matches the HTML5 semantic meaning, whenstrong
andmark
are not appropriate.i
: Matches the HTML5 semantic meaning, in that it is intended to set off foreign language text, jargon, or internal dialogue.- Use
i[lang]
when offsetting inline foreign terms.
- Use
u
: Matches the HTML5 semantic meaning.q
must be used for quotations rather than the use of the"
or'
characters or any smartquote or localised variants (including«
»
“
”
‘
’
). Use CSS to ensure the correct quotation characters are shown in the browser.div
,span
: Used for non-semantic sectioning of the HTML document, primarily useful for CSS. These must not be used when an alternative tag is available to convey the semantic meaning of the content.- Following recommendation,
pre
tags denote preformatted text (text that should not wrap or have its whitespace condensed). It alone is not for source code; acode
tag shall be used within apre
tag to denote a block of code. code
: Following HTML5 recommendation,[class^="language-"]
(e.g.<code class="language-html"/>
) may be used to denote the programming language. This has the side effect of being recognised by popular syntax highlighting scripts.kbd
,samp
,var
follow their HTML5 semantic meanings and should be used in place ofcode
when it makes sense.table
: Must be used semantically for tabular data and not simply for styling (usediv
and CSS for that).
Forbidden tags and attributes
Which tags and attributes are forbidden in HTSS?
- Global attributes:
[style]
violates the exclusion oftext/css
content inside HTSS. Use separate stylesheets with appropriate selectors, for instance the use of[id]
or[class]
attributes.
- Event handler attributes (
[on*]
JavaScript attributes such as[onclick]
). These violate the exclusion ofapplication/javascript
inside HTSS. Include script files as separate assets and register callbacks from within the script. - Any HTML5 tag or attribute marked deprecated.
- Any HTML5 tag or attribute marked nonstandard and/or vendor-specific.
Suggestions for automated HTSS linting
Just as validators and linters exist for HTML and AMP, it is useful to have a preliminary linting for documents trying to conform to HTSS. Note that since many of the rules are semantic in nature, an automated system cannot be expected to discern correct usage of tags, as it cannot understand the content of a document in the same capacity that a human can. Such an HTSS linter could only catch low-hanging fruit: obvious syntactic violations of HTSS, such as invalid tags.
Such information is also useful in creating other software that prioritises HTSS, such as an HTSS-conformant Web browser.
- As HTSS is a subset of HTML5 or XHTML5, HTSS must first validate as HTML5 or XHTML5.
- All elements that are deprecated or obsolete as defined in HTML5, are not allowed in HTSS.
- Event handler attributes (as mentioned above, the
[on*]
global attributes for JavaScript) and[style]
attribute are not allowed in HTSS.
Other
Not in scope for HTSS, but additional points to consider:
- Use of normalised URIs that are easily mapped to the underlying filesystem; meaning:
- the use of file extensions such as
.html
or.htm
,.xhtml
or.xht
,.css
,.svg
, et cetera; and - directories requiring trailing slash and displaying a predefined index page, for example
/videos/
for the video section of a website.
- the use of file extensions such as
- On UNIX-based systems, the use of file permissions to denote executable (CGI) resources.
- Simple subset of HTTP as well, see HTTP/0.2 for a potential solution.