Welcome to European Tribune. It's gone a bit quiet around here these days, but it's still going.
That seems to be scoop processing.

The way that it is encoding the Unicode is with numeric entities, such as 奥 for 奥

However, the "159;" is missing the leading &# and the value is much too low ... its one less than the start of the Latin-1 code-points.

Unicode was originally 16-bit (UCS-2), but that was found to not provide enough character code points, so that was redefined to be the first 16-bit "page" of multiple pages in the 32-bit UCS-4.

Chinese is one of the character sets that spills over outside of UCS-2, so it may be that scoop has a 16-bit integer for characters and its being messed up by one of the Chinese characters in the next page of UCS-4.

I've been accused of being a Marxist, yet while Harpo's my favourite, it's Groucho I'm always quoting. Odd, that.

by BruceMcF (agila61 at netscape dot net) on Sun Feb 21st, 2010 at 10:58:37 PM EST
[ Parent ]

Others have rated this comment as follows:

marco 4


Occasional Series