Tuesday, July 27, 2004

When Does Cake Expire

coding: a look at Jena2

Since the most recent posts have focused on coding issues, I would say something about the new (almost a year:)) version of the HP Jena framework, version 2.1.
I had watched the old version and I was upgraded to the new, e uno dei cambiamenti che mi ha colpito e' proprio a causa della codifica.
http://jena.sourceforge.net/IO/iohowto.html

E' interessante vedere quanto segue:

"What when wrong with character encoding?

The java.io.* classes based around Reader's and Writer's are intended to help us avoid encoding problems. The encoding attribute in the XML declaration at the top of an XML document is intended to help us avoid encoding problems. Unfortunately, these are two different approaches; and Jena1 went with the Java conventions, whereas the Web scalable conventions are those used by XML.

The Java approach is that the machine on which the Java is running has some default encoding. I/O done with FileReader's and PrintWriter's etc, then is done using that encoding, unless there is a specific user instruction when the Reader or Writer is created. It is not possible to change the encoding used by a Reader or Writer while it is being used.

The XML approach is that XML documents are in UTF-8 or UTF-16 unless they say otherwise in the first line of the document (this first line is sufficiently restricted to make it possible to read it without knowing the encoding). Hence, an XML reader should start by looking at the first few bytes and work out from those whether it is UTF-8 UTF-16 or some other encoding as declared in the first line. From then on, it uses that encoding.


E qui c'e' il passo più important

The Java approach is designed for ease of use on a single machine, Which uses a single encoding; Often Being a one-byte encoding, eg for European languages Which do not need Thousands of different characters.

The XML approach is designed for the Web Which uses multiple encodings, and Some of Them requiring Thousands of characters. "


In practice, the dilemma and now 'solved: just different targets suitable for different purposes.

0 comments:

Post a Comment