Welcome to European Tribune. It's gone a bit quiet around here these days, but it's still going.
Display:
It's mostly okay, but for some reason, some characters (or maybe character combinations) get corrupted and show up with numbers and weird symbols.

For example, in the first line of the Chinese text in the diary, do you see the "340;" as follows:

奥巴马总统本周与达赖喇嘛 340;会面充满争议

Or do you see a boxed symbol followed by 159; following it approximately in the second line as follows:

中国表示,西藏 159;

Those numbers shouldn't be appearing.  (I am guessing they are Unicode numbers for certain characters, but I don't see why the glyphs themselves are not getting displayed and are throwing up the Unicode numbers instead.)

Partially to see what would happen on DailyKos, I put this up as a diary over there, and while it looked fine in Preview mode, the same problem appears once the diary is published.  Since it also still runs on Scoop (I think), I am betting this is an issue with Scoop or a configuration setting on Scoop.

(Since I can't recall seeing this problem on other websites, I am tentatively ruling out issues with my own computer's configuration.  But if no one else sees these number cropping up, then I'll double-check.)

The march of civilizations is a series of defenses that man has put up against the dread of pure existence.

by marco on Sun Feb 21st, 2010 at 01:45:21 PM EST
[ Parent ]
Right, there are numbers as you show.
by afew (afew(a in a circle)eurotrib_dot_com) on Sun Feb 21st, 2010 at 02:19:49 PM EST
[ Parent ]
Thanks for checking that out.

The march of civilizations is a series of defenses that man has put up against the dread of pure existence.
by marco on Sun Feb 21st, 2010 at 02:28:33 PM EST
[ Parent ]
The source looks like this:


click and hold to enlarge

End-of-line numbers are getting chopped, and their second part displayed as numerals (see every other line beginning).

Perhaps the table width is doing it, but I don't know how to overcome that.

by afew (afew(a in a circle)eurotrib_dot_com) on Sun Feb 21st, 2010 at 02:29:35 PM EST
[ Parent ]
OK, managed to edit the line breaks.

Tibet found in translation.

by afew (afew(a in a circle)eurotrib_dot_com) on Sun Feb 21st, 2010 at 02:32:38 PM EST
[ Parent ]
Wow!  You rock, afew!

The march of civilizations is a series of defenses that man has put up against the dread of pure existence.
by marco on Sun Feb 21st, 2010 at 02:36:40 PM EST
[ Parent ]
nah, nah, nah! <attempts rocking bow, falls over>

I expect it's tribext's Translate function's line break management that can't find a space anywhere in the long line of codes, and so puts a break in the middle of a code.

For you, the hassle is to check that out in the editing window before posting. Make sure the line breaks happen after a semi-colon so no odd numerals are carried over to the next line. Lines should begin with an ampersand. So if you see

............................&#35
340;&#26548;......................

place the cursor at the end of the first line and use Delete (or at the beginning of the second line and use Backspace), so the code joins up again:

............................&#35340;&#26548;........

by afew (afew(a in a circle)eurotrib_dot_com) on Sun Feb 21st, 2010 at 02:55:31 PM EST
[ Parent ]
Interestingly, this solution does not seem to work on DailyKos: the bad line-breaks just keep getting reinserted when the diary is re-published -- even though everything looks fine in preview mode.  Whatever:  my main concern was figuring out how to get simplified Chinese characters working on ET, and even if it's a bit of work, at least we have a solution.  Thanks very much.

The march of civilizations is a series of defenses that man has put up against the dread of pure existence.
by marco on Sun Feb 21st, 2010 at 02:51:21 PM EST
[ Parent ]
Spurious line breaks?

Always use HTML formatted when copying material from other sites.

En un viejo país ineficiente, algo así como España entre dos guerras civiles, poseer una casa y poca hacienda y memoria ninguna. -- Gil de Biedma

by Migeru (migeru at eurotrib dot com) on Sun Feb 21st, 2010 at 03:53:45 PM EST
[ Parent ]
That seems to be scoop processing.

The way that it is encoding the Unicode is with numeric entities, such as &#22885; for 奥

However, the "159;" is missing the leading &# and the value is much too low ... its one less than the start of the Latin-1 code-points.

Unicode was originally 16-bit (UCS-2), but that was found to not provide enough character code points, so that was redefined to be the first 16-bit "page" of multiple pages in the 32-bit UCS-4.

Chinese is one of the character sets that spills over outside of UCS-2, so it may be that scoop has a 16-bit integer for characters and its being messed up by one of the Chinese characters in the next page of UCS-4.


I've been accused of being a Marxist, yet while Harpo's my favourite, it's Groucho I'm always quoting. Odd, that.

by BruceMcF (agila61 at netscape dot net) on Sun Feb 21st, 2010 at 10:58:37 PM EST
[ Parent ]

Display:

Occasional Series