printing java unicode

Discussion in 'Computer Science & Culture' started by DNA100, Mar 10, 2009.

Thread Status:
Not open for further replies.
  1. DNA100 Registered Senior Member

    Messages:
    259
    my os is windows xp and i think i loaded all(or at least ,most of ) the language packs while installing windows.also, all the japanese,korean and chinese sites are being displayed fine while browsing the ineternet.


    but i am having trouble printing the unicode characters with java.for example i still can't print pi symbol in greek,although i can print stuff like 'g' or 'p'.as much a i know, unocode characters are placed in 16 bits and you print it using a 4-digit hexadecimal prefixed by \u.the lower valued hexadecimal numbers work,but the higher or larger hexademical numbers simply prints '?'.

    for example:
    Code:
    public class UnicodeTest
     {
      public static void main (String[] argv) 
    	{
       char unicodeMessage ='\u01AB';
      
        System.out.println(unicodeMessage);
      }
    }
    prints ?,
    but with \u00AB,it prints <<.
    but the funny thing is:\u1AB is not the only thing for which it prints '?',it prints '?' anything bigger than 01AB too,for example.


    but i also heard that there are unicode characters beyond the 16 bit standard unicode.that is 0000 to FFFF isn't sufficient to denote them.so,how do you print them?i don't think it's possible to use the same /u prefix for those characters,as it will exceed the 4 digits for hexadecimal representation.how do you print them?

    i also tried something from the internet:
    Code:
    import java.io.PrintStream;
    import java.io.UnsupportedEncodingException;
    
    public class UnicodeTest {
      public static void main (String[] argv) throws UnsupportedEncodingException {
        String unicodeMessage =
        "\u7686\u3055\u3093\u3001\u3053\u3093\u306b\u3061\u306f";
    
        PrintStream out = new PrintStream(System.out, true, "UTF-8");
        out.println(unicodeMessage);
      }
    }
    the claim was that:
    The corrected code has the proper output of:
    皆さん、こんにちは
    which is some japanese word,i suppose.

    but the funny thing was
    jpad printed this:
    皆ã•ã‚“ã€ã“ã‚“ã«ã¡ã¯
    jcreator printed this:
    皆ã•ã‚“ã€ã“ã‚“ã«ã¡ã¯
    (it looks same as jpad when i paste it here ,but looks slightly different in it's own window.)

    but running it on command prompt(run->cmd) gave a very different result which appeared to be full of weird letters containg many distorted greek pi.i don't know how to paste that result here.

    i even tried changing the utf-8 to utf-16 and 32,but again i got weird and different results like:
    þÿv†0U0“00S0“0k0a0o etc
    command promt result was full of emoticons


    how the hell do i get those japanese letters printed?

    Please Register or Log in to view the hidden image!


    and what about the characters beyond standard unicode letters which are 2^16 in number?
     
    Last edited: Mar 10, 2009
  2. Google AdSense Guest Advertisement



    to hide all adverts.
  3. Stryder Keeper of "good" ideas. Valued Senior Member

    Messages:
    13,105
    A simple test to see if your OS has all the Font's/Language packs is to click your Start button, navigate to All Programs, then to Accessories, then to System Tools and then lastly to the program Character Map.
    (You could type: %SystemRoot%\system32\charmap.exe into the Run... dialogue to get there quicker, If of course it's installed.)

    You should be able to scroll through all the characters of a font to identify if they have been installed.

    Chinese/Japanese are extra font packages that are "Optional" from Windows Update, so as standard they won't be installed.

    As for using them with Java, I guess and found online that you have to define what Font you are using. ....
    (Taken from: http://www.javaworld.com/javaworld/jw-12-2000/jw-1201-print.html?page=4)

     
  4. Google AdSense Guest Advertisement



    to hide all adverts.
  5. RubiksMaster Real eyes realize real lies Registered Senior Member

    Messages:
    1,646
    This only applies if you are using a GUI.

    System.out is just a stream that goes to standard out. Since you are using System.out.println, You are constrained by what is printable in your output console. It's not a direct problem with your code, it's a problem with the environment under which it's running. Either way your program is pushing the same data into the stream, but it's being interpreted differently depending on how you run it.

    Unfortunately, I don't know of a solution (I'm sure you can find one if you google hard enough). Working with different character encodings can be annoying.
     
  6. Google AdSense Guest Advertisement



    to hide all adverts.
  7. river-wind Valued Senior Member

    Messages:
    2,671
    I just ran into this today in my Java class.

    The font was installed, and it would even display correctly in the Eclipse console when pasted in from character map. However, reading the line in and then spitting it back out using System.out would not work.

    UTF-8 characters printed fine, but the rest of the Unicode character set came back as question marks.


    RubiksMaster is likely right in part, though I think there may also be a limitation in the System.out methods as well (at least under windows), given the behavior I saw.

    I'm going to try this in swing tomorrow to see if avoiding the console avoids the problem.
     
  8. DNA100 Registered Senior Member

    Messages:
    259
    Ah,thanks guys,for your replies.

    The thing is, it's a little hard to understand if there are all the characters in charactre map.From what i counted,there shouldn't be.There are only arond 350 letters in the character map.the chinese /japanese letters aren't there.But as much as I remember,I did install them when installing windows.

    And I do have the windows installation cd.So,if it isn't installed ,can I install it now without re-installing windows?But you know, when I go to chinese /japanese websites,they get displayed fine.If these scripts weren't installed ,would that be possible?

    And what's the problem with the environment?Are you telling me that it should work fine in the GUI and swing,but it can not be displayed in the standard environment?
     
  9. river-wind Valued Senior Member

    Messages:
    2,671
    In the character map, check off "Advanced View", and you should see "Unicode" as the Character Set. There shoudl be other character sets available, so perhaps your other kanji-type characters are available there?


    As for the Java-specific issue, it does seem that the characters outside of UTF-8 appear in Swing. I think there's a bug in System.out via the console. Perhaps it's a limit of the windows console itself?


    edit:
    this google search seems to suggest its a limitation of the Window's console:
    http://www.google.com/search?q=wind...s=org.mozilla:en-US:official&client=firefox-a
     
  10. stbalaji2u Registered Member

    Messages:
    21
    Thanks for the link mate

    Please Register or Log in to view the hidden image!

     
Thread Status:
Not open for further replies.

Share This Page