bennypass.com - CHARACTER CODING US-ASCII

CHARACTER CODING

US-ASCII - CP850 - ISO-8859-1 - CP1252

The abbreviation "ASCII" stands for: "American Standard Code for Information Interchange", ie "American Standard for Information Exchange". It was proposed in 1963 by A.N.S.I (American National Standard Institute) and became final in 1968.

The ASCII code was invented for communications between teletypes (in fact there are codes of specific commands that are almost incomprehensible today but that at the time had their function), then gradually became a world standard. It was a 7-bit encoding which later, to avoid confusing it with the 8-bit extensions proposed later, was called US-ASCII. Initially the eighth bit, missing in the US-ASCII table, was used for parity checks aimed at determining transmission errors.
The ASCII table originally therefore included the definition of 128 characters of which 33 non-printable usually defined as control characters:

inary	Dec	Hex	Abbr	C	Description
000 0000	0	00	NUL	\0	Null character
000 0001	1	01	SOH		Start of Header
000 0010	2	02	STX		Start of Text
000 0011	3	03	ETX		End of Text
000 0100	4	04	EOT		End of Transmission
000 0101	5	05	ENQ		Enquiry
000 0110	6	06	ACK		Acknowledgment
000 0111	7	07	BEL	\a	Bell
000 1000	8	08	BS	\b	Backspace
000 1001	9	09	HT	\t	Horizontal Tab
000 1010	10	0A	LF	\n	Line feed
000 1011	11	0B	VT	\v	Vertical Tab
000 1100	12	0C	FF	\f	Form feed
000 1101	13	0D	CR	\r	Carriage return
000 1110	14	0E	SO		Shift Out
000 1111	15	0F	SI		Shift In
001 0000	16	10	DLE		Data Link Escape
001 0001	17	11	DC1		Device Control 1 (oft. XON)
001 0010	18	12	DC2		Device Control 2
001 0011	19	13	DC3		Device Control 3 (oft. XOFF)
001 0100	20	14	DC4		Device Control 4
001 0101	21	15	NAK		Negative Acknowledgement
001 0110	22	16	SYN		Synchronous Idle
001 0111	23	17	ETB		End of Trans. Block
001 1000	24	18	CAN		Cancel
001 1001	25	19	EM		End of Medium
001 1010	26	1A	SUB		Substitute
001 1011	27	1B	ESC	\e	Escape
001 1100	28	1C	FS		File Separator
001 1101	29	1D	GS		Group Separator
001 1110	30	1E	RS		Record Separator
001 1111	31	1F	US		Unit Separator
111 1111	127	7F	DEL		Delete

The US-ASCII encoding thus allows the numeric representation of alphanumeric characters, punctuation symbols and other symbols. The representation by numerical coding is necessary because the computer can "understand" only sequences of bits. For example, the "@" character is represented by the ASCII code "64", "Y" from the "89", "+" from the "43", etc.

When someone requests information in ASCII format (for example, your resume, or an article, etc.) it means that it requires a text saved in a standard mode that is easily readable by any operating system and program.
In fact, the ASCII format is universally recognized by all computers, which is not true in the case of "formatted" texts, ie those that have typographic features such as underlining, styles, bold, etc.

Below the list of printable US-ASCII characters:

ascii

Since the number of symbols used in natural languages is much larger than the characters encoded with US-ASCII it was necessary to expand the encoding set. The various extensions used 128 additional characters that could be coded using the eighth bit available in each byte.

IBM then introduced an 8-bit encoding on its IBM PCs with variants for different countries. The IBM encodings were ASCII-compatible, since the first 128 characters of the set maintained the original value (US-ASCII). The various codings were divided into pages (code page).
The different code pages differed in the additional 128 characters encoded using the eighth bit available in each byte. The PCs built for North America used the code page 437, for Greece the code page 737, for Italy and France the code page 850.

To see the active page in DOS, use the dos chcp command. Here is the set of characters (excluding US-ASCII ones) related to code page 850

cp850

Following the proliferation of proprietary encodings, ISO released a standard called ISO / IEC 8859 containing an 8-bit extension of the ASCII set. The most important was the ISO / IEC 8859-1, also called Latin1, containing the characters for the languages of Western Europe. This specification contained for the precision the encoding of 192 graphic characters.

A special feature of ISO / IEC 8859 compared to other extended characters is that characters from 128 to 159, who's lower 7 bits correspond to ASCII control characters, are not used to avoid creating compatibility problems.

The codes 00-1F and 7F-9F are therefore not assigned to any character by ISO / IEC 8859-1.

isoiec8859 1

The ISO / IEC 8859 standard is the starting point for the ISO-8859-1 and Windows-1252 encodings. Both codings are a subset of ISO / IEC 8859-1; Add other symbols to the 191 standard characters.

ISO-8859-1 is the default encoding of HTML documents distributed using the HTTP protocol with MIME Type of the "text /" type. Many browsers and mail clients interpret ISO-8859-1 as Windows-1252 in order to fix some errors due to encoding but this is not a correct behavior and is therefore to be avoided (by those who develop browsers)

isoiec8859 1

Windows-1252 was created by Microsoft (it's a set compatible with ISO 8859-1) and used as the default standard for European versions of Windows. Windows-1252 also matches ISO-8859-1 for the ranges 0x00 to 0x7F and 0xA0 to 0xFF, but not in the range 0x80 to 0x9F.

cp1252

A new encoding called Unicode was developed in 1991 to be able to code more characters in a standard way and allow the use of multiple extended character sets (e.g. Greek and Cyrillic) in a single document; this set of characters is now widely used. Initially it provided for 65,536 characters (code points) and was later extended to 1,114,112 (= 220 + 216) and so far about 101,000 have been assigned. The first 256 code points follow exactly those of ISO 8859-1. Most codes are used to code languages such as Chinese, Japanese and Korean. The complete list of Unicode tables can be reached at the following link: http://www.unicode.org/charts/

About

Our Services

Marketing Materials

Marketing Materials1