WHAT IS HTML LANGUAGE |FULL DETAIL ITS EXTRA KNOWLEDGE

Hypertext Markup Language (HTML) is the standard markup language for documents displayed in web browsers. It defines the content and structure of web content. It is often supported by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.
Web browsers retrieve HTML documents from a web server or local storage and convert them into multimedia web pages. HTML semantically describes the structure of a web page and primarily consists of symbols for its appearance.
HTML elements are the fundamental structure of HTML pages. With the help of HTML structures, images and other objects, such as interactive forms, can be included in a page. HTML provides a means of creating structured documents by denoting structural meaning for headings, paragraphs, lists, links, quotes, and other items. HTML elements are defined by tags using parentheses (angle brackets).
Tags such as and add content directly to the page. Other tags, such as and their corresponding closing tags, provide information about the document text and may also include sub-element tags. Browsers do not display HTML tags, but instead use them to interpret the page’s content.
HTML can be embedded with programs written in scripting languages such as JavaScript, which affect the behavior and content of web pages. The inclusion of CSS defines the formatting and layout of content.
The World Wide Web Consortium (W3C), the former maintainer of HTML and the current maintainer of CSS standards, has encouraged the use of CSS over plain presentational HTML since 1997.[3] A variant of HTML, known as HTML5, Used to display video and audio, primarily with JavaScript using the
History
In 1980, physicist Tim Berners-Lee, a contract employee at CERN, proposed and prototyped ENQUIRE, a system for CERN researchers to access and share documents. In 1989, Berners-Lee wrote a memo proposing an Internet-based hypertext system. [ 4 ] Berners-Lee specified HTML and wrote the browser and server software in late 1990.
That same year, Berners-Lee and CERN data systems engineer Robert Cailliau collaborated on a request for joint funding, but the project was not formally adopted by CERN. In his personal notes from 1990, Berners-Lee listed “some of the many areas in which hypertext is used”; an encyclopedia is the first entry. [ 5 ]
The first publicly available description of HTML was a document called “HTML Tags”,[6] first mentioned on the Internet by Tim Berners-Lee in late 1991.[7][8] It described 18 elements included in the initial, relatively simple design of HTML.
Except for the hyperlink tag, these were heavily influenced by CERN SGML, an internal standard Generalized Markup Language (SGML)-based documentation format at CERN. Eleven of these elements are still present in HTML 4.[9]
HTML is a markup language that web browsers use to transform text, images, and other content into visual or audible web pages. Default attributes for each item of HTML markup are defined in the browser, and web page designers can change or enhance these attributes using CSS.
Many text elements are mentioned in the 1988 ISO Technical Report TR 9537 “Techniques for Using SGML”, which describes the features of early text formatting languages. Such as the language used by the RUNOFF command developed for the CTSS (Compatible Time-Sharing System) operating system in the early 1960s.
NEXT
These formatting commands were derived from those used by typesetters to manually format documents. However, SGML’s concept of generalized markup is based on elements (nested annotated ranges with attributes) having distinct structure and markup, rather than just print effects. HTML, along with CSS, has gradually moved in this direction.
Berners-Lee considered HTML to be an application of SGML. The Internet Engineering Task Force (IETF) formally defined it in mid-1993 with the publication of the first proposal for an HTML specification, the “Hypertext Markup Language (HTML)” Internet Draft, written by Berners-Lee and Dan Connolly, which included an SGML Document Type Definition to define the syntax.
[10][11] This draft expired six months later, but was notable for its acceptance of the NCSA Mosaic browser’s custom tag for embedding in-line images. This reflects the IETF’s philosophy of building standards based on successful prototypes. Similarly, Dave Raggett’s competing Internet Draft, “HTML+ (Hypertext Markup Format)”, published in late 1993, proposed standardizing already implemented features such as tables and fill-out forms.[12]
After the HTML and HTML+ drafts expired in early 1994, the IETF formed an HTML Working Group. In 1995, this Working Group completed “HTML 2.0”, It was the first HTML specification to be considered a standard upon which future implementations should be based. [13]
Further development under the auspices of the IETF stalled due to competing interests. Since 1996, the HTML specifications have been maintained by the World Wide Web Consortium (W3C) with input from commercial software vendors. [14] In 2000, HTML became an international standard (ISO/IEC 15445:2000).
HTML 4.01 was published in late 1999, In 2004, development on HTML5 began in the Web Hypertext Application Technology Working Group (WHATWG), became a joint deliverable with the W3C in 2008, and was completed and standardized on October 28, 2014. [15]
HTML version timeline
HTML 2
November 24, 1995
HTML 2.0 was published as RFC 1866. Supplemental RFCs added additional capabilities:
November 25, 1995: RFC 1867 (Form-Based File Upload)
May 1996: RFC 1942 (Tables)
August 1996: RFC 1980 (Client-Side Image Maps)
January 1997: RFC 2070 (Internationalization)
HTML 3
January 14, 1997
HTML 3.2 [16] was published as a W3C Recommendation. It was the first version developed and standardized exclusively by the W3C, as the IETF closed its HTML Working Group on September 12, 1996. [17]
Initially codenamed “Wilbur,” [18] HTML 3.2 completely removed mathematical formulas, resolved overlap between various proprietary extensions, and adopted most of Netscape’s visual markup tags. Netscape’s blink element and Microsoft’s marquee element were removed by mutual agreement between the two companies.[14] A markup for mathematical formulas similar to HTML was standardized 14 months later in MathML.
Initially codenamed “Cougar”,[18] HTML 4.0 adopted several browser-specific element types and attributes, but also attempted to phase out Netscape’s visual markup features by marking them obsolete in favor of style sheets. HTML 4 is an SGML application that conforms to ISO 8879-SGML.[20]
April 24, 1998
HTML 4.0[21] was re-released with minor revisions without increasing the version number. December 24, 1999
HTML 4.01 [22] was published as a W3C Recommendation. It provides the same three variants as HTML 4.0, and its final bug fix [23] was published on May 12, 2001.
May 2000
ISO/IEC 15445:2000 [24] (“ISO HTML”, based on HTML 4.01 Strict) was published as an ISO/IEC International Standard. [25] Within ISO, this standard is under the scope of ISO/IEC JTC 1/SC 34 (ISO/IEC Joint Technical Committee 1, Subcommittee 34 – Document Description and Processing Languages). [24]
After HTML 4.01, there were no new versions of HTML for several years. Because the W3C’s HTML Working Group spent time developing a parallel, XML-based language, XHTML.
HTML 5
Main article: HTML5
October 28, 2014
HTML5 [26] was published as a W3C Recommendation. [27]
November 1, 2016

HTML 5.1 [28] was published as a W3C Recommendation. [29] [30]
December 14, 2017
HTML 5.2 [31] was published as a W3C Recommendation. [32] [33]
HTML Draft Version Timeline
October 1991
HTMT tags, [7] an unofficial CERN document listing 18 HTML tags, were first publicly mentioned.
June 1992
The first unofficial draft of the HTML DTD, [34] with seven subsequent revisions (July 15, August 6, August 18, November 17, November 18, November 20, and November 22) [35] [36] [37]
November 1992
HTML DTD 1.1 (the first with a version number, based on RCS revisions, starting with 1.1 instead of 1.0), an unofficial draft [37]
June 1993
Hypertext Markup Language [38] was published as an Internet Draft (a rough proposal for a standard) by the IETF IIIR Working Group. It was superseded a month later by the second edition [39].
November 1993
HTML+ was published by the IETF as an Internet Draft and was a competing proposal to the Hypertext Markup Language draft. It expired in July 1994. [40]
November 1994
The first draft (Revision 00) of HTML 2.0 (referred to as “HTML 2.0” starting with Revision 02 [42]) was published by the IETF [41], which ultimately led to the publication of RFC 1866 in November 1995. [43]
April 1995 (written in March 1995)
HTML 3.0 [44] was proposed as a standard to the IETF, but the proposal expired five months later (September 28, 1995) [45] without further action. It included many of the capabilities present in Raggett’s HTML+ proposal, such as support for tables, text flow around figures, and the display of complex mathematical formulas.
The W3C began development of its own Arena browser as a test platform for HTML3 and Cascading Style Sheets,[46][47][48] but HTML 3.0 was not successful for several reasons. The 150-page draft was considered too large, and the pace of browser development, as well as the number of interested parties, far exceeded the IETF’s resources.[14] Browser vendors at the time, including Microsoft and Netscape, implemented various subsets of the HTML3 draft features, as well as adding their own extensions.
[14] (See Browser Wars.) These included extensions to control stylistic aspects of documents, which “contradicted the academic engineering community’s belief that things like text color, background texture, font size, and font face were definitely outside the scope of a language when their sole purpose was to specify how a document would be organized.” [14] Dave Raggett, who has been a W3C Fellow for many years, has commented, for example: “To some extent, Microsoft built its business on the Web by extending HTML features.” [14]
January 2008
HTML5 was published as a working draft by the W3C. [49]
Although its syntax closely resembles SGML, HTML5 has abandoned any attempt to be an SGML application and has explicitly defined its own “html” serialization, in addition to an optional XML-based XHTML5 serialization. [50]
2011 HTML5 – Last Chance
On February 14, 2011, the W3C expanded the charter of its HTML Working Group with clear goals for HTML5. In May 2011, the Working Group extended HTML5 to “Last Call,” an invitation to communities inside and outside the W3C to confirm the technical soundness of the specification. The W3C developed a comprehensive test suite to achieve broad interoperability for the full specification by 2014, the target date for recommendation. [ 51 ] In January 2011, the WHATWG renamed its “HTML5” Living Standard to simply “HTML.” Nevertheless, the W3C continued its project to release HTML5. [ 52 ]
2012 HTML5 – Candidate Recommendation
In July 2012, the WHATWG and W3C decided to part ways somewhat.
The W3C will continue HTML5 specification work, focusing on a single definitive standard, considered a “snapshot” by the WHATWG. The WHATWG organization will continue its work on HTML5 as a “living standard.” The concept of a living standard is that it is never complete and is always being updated and improved. New features may be added, but functionality will not be removed.[ 53 ]
In December 2012, the W3C designated HTML5 as a Candidate Recommendation.[54] The criteria for advancement to a W3C Recommendation is “two 100% complete and fully interoperable implementations.”[55]
2014 HTML5 – Proposed Recommendation and Recommendation
In September 2014, the W3C moved HTML5 to a Proposed Recommendation.[56]
On October 28, 2014, HTML5 was released as a stable W3C Recommendation,[57] meaning the specification process is complete.[58]
XHTML Versions
Main article: XHTML
XHTML is a separate language that began as a reorganization of HTML 4.01 using XML 1.0. It is now known as the XML Syntax for HTML and is no longer being developed as a separate standard.[59]
XHTML 1.0 was published as a W3C Recommendation on January 26, 2000,[60] and was later revised and republished on August 1, 2002. It provides three variations similar to HTML 4.0 and 4.01, reorganized in XML with minor restrictions.
XHTML 1.1 [61] was published as a W3C Recommendation on May 31, 2001. It is based on XHTML 1.0 Strict, but includes minor changes, is customizable, and is refactored using modules in the W3C Recommendation “Modularization of XHTML”, which was published on April 10, 2001. [62]
XHTML 2.0 was a working draft. Work on it was abandoned in 2009 in favor of work on HTML5 and XHTML 5. [63] [64] [65] XHTML 2.0 was incompatible with XHTML 1.x and, therefore, can be more accurately described as a new language inspired by XHTML rather than an update to XHTML 1.x.
Conversion of HTML publication to WHATWG
On May 28, 2019, the W3C announced that the WHATWG would be the sole publisher of the HTML and DOM standards.[66][67][68][69] The W3C and WHATWG had been publishing competing standards since 2012. While the W3C standard in 2007 was similar to the WHATWG, the standards have since diverged progressively due to various design decisions.[70] The WHATWG “Living Standard” has been the de facto web standard for some time.[71] The W3C periodically reviews the WHATWG HTML specification and publishes snapshots as W3C Recommendations.[72]
Markup
HTML markup consists of several key components, including tags (and their attributes), character-based data types, character references, and entity references. HTML tags typically occur in pairs, such as and , although some are empty elements and therefore occur without pairs, for example, . In such pairs, the first tag is the start tag and the second is the end tag (also called the opening tag and closing tag). Another important component is the HTML document type declaration, which activates standards mode rendering.
The text between and describes the web page, and the text between and is the visible content of the page. The markup text defines the browser page title displayed on browser tabs and window titles, and the tag defines the division of the page into sections that are used for easy styling. Between and , the element can be used to define webpage metadata. The document type declaration is for HTML5. If no declaration is included, various browsers will fall back to “quirks mode ” for rendering.[73]
elements
An HTML document contains a structure of nested HTML elements. These are represented within the document by HTML tags enclosed in angle brackets. [74] [better source needed]
Generally, the boundaries of an element are represented by two tags: a “start tag” and an “end tag.” If the element contains text, it is placed between these tags.
There may also be additional tag markup between the start and end tags, which may include a combination of tags and text. It represents an additional (nested) element as a child of the parent element.
The start tag can also include element attributes. These represent other information, such as identifiers for sections within the document, identifiers used to add style information to the document’s presentation, and for some tags, such as those used to embed images, a reference to the image resource.
Some elements, such as line breaks, do not allow any underlying content, whether text or other tags. These require only an empty tag (similar to a start tag) and do not use an end tag.
Many tags, especially the closing tag for the most commonly used paragraph element, are optional. An HTML browser or other agent can infer the closing tag for the end of an element from the context and structural rules defined by the HTML standard. These rules are complex and not widely understood by most HTML authors.
The general form of an HTML element is:
. Some HTML elements are defined as empty elements and take the form . An empty element cannot contain any content, for example, a tag or an tag. The HTML element name is the same as that used in the tag. The tag name is preceded by a slash character. If the tag contains no content, the tag is not allowed. If the attributes are not specified, default values are used in each case.
element example
Headings
HTML headings are defined with the ‘to’ tag, with H1 being the highest (or most important) level and H6 being the least important:
CSS can significantly alter rendering.
Paragraph:
line break
The difference between and is that breaks a line without changing the semantic structure of the page, while divides the page into paragraphs. The element is an empty element, because it can have attributes, but it cannot take any content and must not have an end tag.
Link
This is a link in HTML. The tag is used to create a link. The attribute contains the URL address of the link.
Comments can be helpful in understanding the markup and are not displayed on the webpage.
Several types of markup elements are used in HTML:
Structural markup indicates the purpose of the text:
For example, {{text-align: justify}} establishes “golf” as a second-level heading. Structural markup does not imply any specific rendering, but most web browsers have default styles for element formatting. Content can be further styled using Cascading Style Sheets (CSS).[ 75 ]
Presentation markup indicates the formatting of text, regardless of its purpose:
For example, it indicates that visual output devices should display “boldface” in bold text, but it does not explicitly state what devices that are unable to do so (such as audio devices that read text aloud) should do.
And in the case of both, there are other elements that may have a similar visual appearance but are more semantic in nature, such as and , respectively. It’s easy to understand how an audio user agent should interpret the latter two elements. However, they are not equivalent to their presentational counterparts: for example, it would be undesirable for a screen reader to emphasize the name of a book, but on screen, such a name would be in italics. Most presentational markup elements have been deprecated under the HTML 4.0 specification in favor of the use of CSS for styling.
Hypertext markup turns parts of a document into links to other documents:
The anchor element creates a hyperlink within the document, and its attribute determines the link’s target URL. For example, HTML markup would display the word “Wikipedia” as a hyperlink. To display an image as a hyperlink, an anchor element is inserted as content within an element. For example, is an empty element.
Property
Most of an element’s attributes are name-value pairs, written in the element’s start tag after the element name and separated from each other by =. Values can be enclosed in single or double quotation marks, although values consisting of certain characters can be left without quotation marks in HTML (but not in XHTML).[76][77] Leaving attribute values without quotation marks is considered unsafe.
[ 78 ] Unlike name-value pair properties, there are some properties that affect an element simply by their presence in its opening tag, [ 7 ] such as the ismap property for the element. [ 79 ]img
Several common attributes may appear across multiple elements:
The id attribute provides a document-wide unique identifier for an element. It is used to identify the element so that stylesheets can change its presentation properties, and scripts can change, animate, or remove its content or presentation. When added to a page’s URL, it provides a globally unique identifier for the element, which is usually a subsection of the page. For example, . has the ID “Attributes”
NEXT
This attribute provides a way to classify similar elements. It can be used for semantic or presentational purposes. For example, in an HTML document, this class can be used to indicate that all elements with this class value are subordinate to the main text of the document.
In presentational terms, instead of displaying such elements in their original position in the HTML source, They can be presented together as footnotes on the page.
Category attributes are used semantically in microformats. Multiple category values can be specified; for example, this places the element in both category and category. notationimportant
The author can use the style attribute to assign presentation properties to a specific element. However, to select an element from within a stylesheet, it is considered preferable to use its id attribute, but sometimes simpler, But sometimes it can be too complex for simple, specific, or ad-hoc styling.class
The title attribute is used to add an indirect explanation to an element. In most browsers, this attribute is displayed as a tooltip.
The lang attribute identifies the natural language of the element’s content, which may differ from the rest of the document. For example, in an English-language document:
NEXT
The short form element, abbr, can be used to display some of these characteristics:
This example displays as HTML; in most browsers, hovering over the abbreviation should display the title text “Hypertext Markup Language.”
Most elements take a language-related attribute (dir) to specify text direction, such as “rtl” for right-to-left text in Arabic, Persian, or Hebrew. [80]
Character and entity references

Since version 4.0 of HTML, it has defined a set of 252 character references and a set of 1,114,050 numeric character references, both of which allow individual characters to be written through simple markup rather than literally.
A literal character and its markup equivalent are considered equivalent and displayed identically. The ability to “escape” characters in this way allows the characters < and ‹ (when written as and & , respectively) to be interpreted as character data rather than markup.
For example, a literal normally denotes the beginning of a tag, and ‹ normally denotes the beginning of a character entity reference or a numeric character reference; writing it as ‹ or ‹ or ‹ allows it to be included within the content of an element or the value of an attribute.
NEXT
The double-quote character ( ), when not used to quote an attribute value, then it must be escaped as ‹ or ‹ or ‹ when appearing within an attribute value. Similarly, the single-quote character ( ), when not used to quote an attribute value, must be escaped as ‹ or ‹ (as in HTML5 or XHTML documents [81] [82]) when appearing within an attribute value. If document authors ignore the need to escape such characters, some browsers may be overly lenient and try to use context to guess their intent.
LET US LEARN
Escaping also allows characters that cannot be easily typed, or that are not available in the document’s character encoding, to be represented within element and attribute content. For example, the acute accented e (é), a character typically found only on Western European and South American keyboards, can be represented in any HTML document. The entity reference é or the numeric reference can be written as é or é, using characters that are available on all keyboards and supported in all character encodings. Unicode character encodings such as UTF-8 are compatible with all modern browsers and provide direct access to characters from almost all of the world’s writing systems.
Data Types
HTML defines several data types for element content, such as script data and stylesheet data, and several types for attribute values, including IDs, names, URIs, numbers, length units, languages, media descriptors, colors, character encodings, dates and times, etc. All of these data types are special forms of character data.
Document Type Declaration
HTML documents must have a document type declaration (informally, “doctype”) at the beginning. In browsers, the doctype helps define the rendering mode—specifically, determining whether to use quirks mode.
The original purpose of the doctype was to enable the parsing and validation of HTML documents by SGML tools based on a Document Type Definition (DTD). The DTD, which is referred to as the doctype, contains a machine-readable grammar that specifies the permitted and prohibited content for a document conforming to such a DTD. Browsers, on the other hand, do not implement HTML as an application of SGML and consequently do not read DTDs.
HTML5 does not define a DTD; therefore, the doctype declaration in HTML5 is simpler and shorter: [ 84 ]
An example of an HTML4 doctype
This declaration refers to the DTD for the “Strict” version of HTML 4.01. SGML-based validators read the DTD to correctly parse and validate the document. In modern browsers, a valid doctype activates Standards Mode instead of Quirks Mode.
Additionally, HTML 4.01 provides Transitional and Frameset DTDs, as described below. The Transitional type is the most inclusive, including current tags as well as older or “obsolete” tags, while the Strict DTD excludes obsolete tags. A frameset contains all the tags needed to create a frame on a page, as well as the tags included in the transitional type.[ 85 ]
semantic html
HTML documents can be sent just like any other computer file. However, they are often sent via HTTP from a web server or by email.
HTTP
The World Wide Web is primarily composed of HTML documents, which are sent from web servers to web browsers using the Hypertext Transfer Protocol (HTTP). However, in addition to HTML, HTTP is also used to display images, sound, and other content.
To help web browsers understand how to handle each document they receive, other information is sent along with the document.
NEXT
This metadata typically includes the MIME type (e.g., text/html or application/xhtml+xml). In modern browsers, the MIME type sent with an HTML document can affect the initial interpretation of the document. A document sent with the XHTML MIME type must be in the correct XML format; syntax errors may cause the browser to fail to display it. The same document sent with the HTML MIME type may display successfully because some browsers are more lenient with HTML.
BE CONTINOU
The W3C Recommendation states that XHTML 1.0 documents that follow the guidelines set out in Appendix C of the Recommendation may be labeled with any MIME type.[91] XHTML 1.1 also states that XHTML 1.1 documents[92] should be labeled with any MIME type. HTML Email
Main article: HTML Email
Most graphical email clients allow the use of a (often vaguely defined) subset of HTML, which provides formatting and semantic markup not available in plain text. This can include typographical information such as colored headings, highlighted and quoted text, inline images, and diagrams.
Many such clients provide both a GUI editor for composing HTML email messages and a rendering engine for displaying them. The use of HTML in email is criticized by some because it causes compatibility issues, can facilitate phishing attacks, can be difficult to access for people with visual impairments or low vision, can confuse spam filters, and results in larger message sizes than plain text.
Naming Conventions
The most common filename extension for files containing HTML is . A common abbreviation is. This arose because some operating systems limit file extensions to three characters.[94]
HTML Application

Main article: HTML Application
An HTML application (HTA; file extension .hta) is a Microsoft Windows application that provides a graphical interface for an application in a browser using HTML and Dynamic HTML.
NEXT
A typical HTML file operates within the web browser’s security system, communicating only with web servers and processing only web page objects and site cookies. HTAs run as fully trusted applications and therefore have enhanced privileges, such as creating/editing/deleting files and Windows registry entries. Because they operate outside the browser’s security system, HTAs cannot be executed via HTML, but must be downloaded (like an EXE file) and executed from the local file system.
HTML4 Variations
From its inception, HTML and its associated protocols gained acceptance relatively quickly. However, no clear standards existed in the language’s early years. Although its creators originally envisioned HTML as a semantic language devoid of presentation details,[95] practical uses led to the language incorporating many presentation elements and features, largely driven by various browser vendors. The latest standards for HTML address the sometimes chaotic development of the language.[96]
and reflect efforts to create a rational basis for the creation of meaningful and well-organized documents. To return HTML to its role as a semantic language, the W3C has developed style languages such as CSS and XSL to carry the burden of presentation. Additionally, the HTML specification has gradually taken control of presentational elements.
There are two axes separating the various forms of HTML currently specified: SGML-based HTML versus XML-based HTML (called XHTML) on one axis, and strict versus transitional (loose) versus frameset on the other axis.
SGML-based versus XML-based HTML
One difference in the latest [ when? ] HTML specifications lies in the distinction between the SGML-based specification and the XML-based specification. The XML-based specification is usually called XHTML to clearly distinguish it from the more traditional definition. However, even in XHTML-specified HTML, the root element remains named “html”. The W3C intended XHTML 1.0 to be identical to HTML 4.01, except where limitations of XML over the more complex SGML necessitated workarounds. Since XHTML and HTML are closely related, Therefore, they are sometimes documented in parallel. In such circumstances, some authors combine the two names as (X)HTML or X(HTML).
Like HTML 4.01, XHTML 1.0 has three subspecifications: Strict, Transitional, and Frameset.
NEXT
Aside from the different opening declarations for a document, the differences between HTML 4.01
XHTML 1.0 documents—in each respective DTD—are primarily syntactic.
LET US LEARN
HTML’s underlying syntax allows several shortcuts that XHTML does not, such as elements with optional opening or closing tags, and even empty elements that need not have an ending tag.
NEXT
In contrast, XHTML requires all elements to have an opening tag and a closing tag. However, XHTML also introduces a new shortcut: An XHTML tag can be opened and closed within the same tag by placing a slash before the end of the tag, like this: . The introduction of this shortcut, which is not used in the SGML declaration for HTML 4.01, may confuse older software unfamiliar with this new rule. The solution is to remove the slash before the closing angle bracket, like this.
To understand the subtle differences between HTML and XHTML, consider converting a valid and well-formed XHTML 1.0 document into a valid HTML 4.01 document that follows Appendix C (see below). This conversion requires the following steps:
The language of an element is specified with an attribute called xml:lang instead of the langXHTML attribute. XHTML uses XML’s built-in language-determining functionality with attributes.
Remove the XML namespace ( xmlns=URI ). HTML has no facility for namespaces.
Change the document type declaration from XHTML 1.0 to HTML 4.01.
If present, remove the XML declaration. Usually, it looks like this: .
Ensure the document’s MIME type is set to . For both HTML and XHTML, this is derived from the text/html HTTP header sent by the server.
Change the XML empty-element syntax to HTML-style empty elements ( to ).
NEXT
These are the main changes required to translate a document from XHTML 1.0 to HTML 4.01. Translating from HTML to XHTML will also require adding any omitted opening or closing tags. When coding in HTML or XHTML, it is better to always include optional tags in the HTML document rather than trying to remember which tags can be omitted.
A well-formed XHTML document follows all the syntactic requirements of XML.
Include both attributes on any element that specifies a language (xml:lang.lang).
Use empty-element syntax only for elements specified as empty in HTML.
Omit the closing slash in empty-element tags: for example, . instead of .
Include explicit close tags for elements that allow content but are left empty (for example, , not).
NEXT
Omit the XML declaration. By carefully following the W3C’s compatibility guidelines, the user agent should be able to interpret the document equally well as either HTML or XHTML. For XHTML 1.0 documents that have been made compatible in this way, the W3C allows them to be served as HTML (with the text/html MIME type) or XHTML (with the application/xhtml+xml MIME type). When served as XHTML, the browser must use an XML parser that strictly adheres to the XML specifications to parse the document’s content.
Transitional vs. Strict
HTML 4 defined three different versions of the language: Strict, Transitional (formerly called Loose), and Frameset. The Strict version is for new documents and is considered best practice, while the Transitional and Frameset versions were developed to easily transition documents that followed older HTML specifications to the HTML 4 version. The Transitional and Frameset versions allow presentational markup, which is not included in the Strict version. Instead, Cascading Style Sheets are encouraged to improve the presentation of HTML documents. Since XHTML 1 only defines XML syntax for the language defined by HTML 4, the same differences apply to XHTML 1.
The Transitional version includes the following parts of the vocabulary, which are not included in the Strict version:
A more flexible content model
Inline elements and plain text directly in body<blockquote…formnoscriptnoframes
Elements related to presentation
Underline ( u ) (Obsolete. This may confuse visitors with hyperlinks.)
Strike-through ( s )
Center (This technique is obsolete. Use CSS instead.)
Font (This technique is obsolete. Use CSS instead.)
Basefont (This technique is obsolete. Use CSS instead.)
Presentational attributes
background(obsolete. Use CSS instead.) and bgcolor(obsolete. Use CSS instead.) attributes for the body(required element according to W3C.) element.
align(obsolete. Use CSS instead.)
attributes on div, form, paragraph (p) and heading (h1…h6) elements
align(obsolete. Use CSS instead.), noshade(obsolete. Use CSS instead.), size(obsolete. Use CSS instead.) and width(obsolete.
Use CSS instead.) attributes on the hr element
align(obsolete. Use CSS instead.), The border, vspace, and hspace attributes on the image object. (Caution: This element is only supported in Internet Explorer (among the major browsers))
The align (obsolete. Use CSS instead.) attribute on the legend and caption elements.
The align (obsolete. Use CSS instead.) and bgcolor (obsolete. Use CSS instead.) attribute on the table element.
The nowrap (obsolete. Use CSS instead.), bgcolor (obsolete. Use CSS instead.), width, and height attributes on the tdth element.
The bgcolor (obsolete. Use CSS instead.) attribute on the tr element.

The clearbr (obsolete) attribute on the element. The compact attribute applies to the dl and element dirmenu.
The type(obsolete. Use CSS instead.), compact(obsolete. Use CSS instead.) and start(obsolete. Use CSS instead.) attributes apply to the ol and ul elements.
The type and value attributes apply to the li element.
The width attribute applies to the pre element.
Additional elements in the transitional specification.
menu(obsolete. Use CSS instead.)
