Shoutbox

unicode problem - Printable Version

-Shoutbox (https://shoutbox.menthix.net)
+-- Forum: MsgHelp Archive (/forumdisplay.php?fid=58)
+--- Forum: Messenger Plus! for Live Messenger (/forumdisplay.php?fid=4)
+---- Forum: Scripting (/forumdisplay.php?fid=39)
+----- Thread: unicode problem (/showthread.php?tid=71324)

unicode problem by ceceboy on 02-02-2007 at 10:10 PM

I'm having problems when trying to open a file that is in Unicode.
Is there a way to read the contents? I tried AJAX and fsObj but they both only return the first character of the content of the file. When the file is not saved in unicode it works fine.
Yes, it's necessary that the file is in unicode.

Hopefully someone can help me.


RE: unicode problem by J-Thread on 02-03-2007 at 09:05 AM

You can do it with the Scripting.FileSystemObject.

See this page for more info.


RE: unicode problem by matty on 02-03-2007 at 05:47 PM

code:
/*

    _readFile coded by Matty

    This function will read the contents of a file and print out the contents regardless of being unicode or ansi.
    If you use this function please make sure to give credit where credit is due.

*/


function OnEvent_Initialize(MessengerStart) {
    _readFile('C:\\WndSelect.handler.js', true);
}

function _readFile(_s_file, _is_debugging){
    /* Use the windows api function CreateFileW to open the file ONLY if it already exists */
    var _h_file = Interop.Call('kernel32', 'CreateFileW', '\\\\?\\'+_s_file,
                                                          0x80000000 /* GENERIC_READ */,
                                                          0x01 /* FILE_SHARE_READ */,
                                                          0,
                                                          3 /* OPEN_EXISTING */,
                                                          0,
                                                          0);
    /* Check to make sure the pointer is actually to a file instead of an invalid handle */
    if (_h_file > -1 /* INVALID_HANDLE_VALUE */) {
        if (_is_debugging === true) Debug.Trace('_h_file : '+_h_file);
        /* Now that we have a pointer to the file its time to read the size of the file so we know how much to read */
        var _n_size = Interop.Call('kernel32', 'GetFileSize', _h_file, 0);
        if (_is_debugging === true) Debug.Trace('_n_size : '+_n_size);
        /* Create a buffer to read in the data and another buffer to store the number of bytes that have been read */
        var _n_buffer = Interop.Allocate((2*_n_size)+2);
        var _n_bytes_read = Interop.Allocate(4);
        /* Read the file */
        Interop.Call('kernel32', 'ReadFile', _h_file, _n_buffer, _n_size, _n_bytes_read, 0);
        /* Close the file */
        Interop.Call('kernel32', 'CloseHandle', _h_file);
       
        /* Check if the file is unicode, if it is read the contents of _n_buffer as unicode, otherwise read it as ansi */
        if (_n_buffer.ReadString(0, false) === '˙ūv'){
            if (_is_debugging === true) Debug.Trace('_n_buffer : '+_n_buffer.ReadString(0, true));
            return _n_buffer.ReadString(0, true);
        } else {
            if (_is_debugging === true) Debug.Trace('_n_buffer : '+_n_buffer.ReadString(0, false));
            return _n_buffer.ReadString(0, false);
        }
    }
}

RE: unicode problem by Eljay on 02-03-2007 at 10:53 PM

As usual, I am going to take Matty's code apart and optimize it!

code:
/*----------
Title:       _ReadFile
Description: Read the contents of a file, whether it is Unicode or ASCII.
Author:      Eljay <leejeffery@gmail.com>
----------*/


function _ReadFile(Path){
  var Handle = Interop.Call("kernel32", "CreateFileW", Path, 0x80000000, 1, 0, 3, 0, 0);
  if(Handle == -1) return; //If file doesn't exist, just exit function.
  var FileSize = Interop.Call("kernel32", "GetFileSize", Handle, 0);
  var Buffer = Interop.Allocate(FileSize + 1);
  var BytesRead = Interop.Allocate(4);
  Interop.Call("kernel32", "ReadFile", Handle, Buffer, FileSize, BytesRead, 0);
  var IsUnicode = Interop.Call("advapi32", "IsTextUnicode", Buffer, FileSize, 0);
  Interop.Call("kernel32", "CloseHandle", Handle);
  if(IsUnicode) return Buffer.ReadString(2, true, (FileSize / 2) - 1);
  else return Buffer.ReadString(0, false, FileSize);
}


Changes:
+Uses IsTextUnicode API to perform more accurate tests for file encoding.
+Changed file buffer size, it doesnt need to be twice the file size for unicode as the size is in bytes not characters.
+No need for everything to be inside if block if you don't do anything with else.
+Removed first 2 bytes from Unicode files (this doesn't contain file data, just encoding information).
+Read whole file, not just up to first null byte (third parameter of ReadString).

RE: unicode problem by CookieRevised on 02-04-2007 at 12:39 AM

quote:
Originally posted by Eljay
+Removed first 2 bytes from Unicode files (this doesn't contain file data, just encoding information).
1) Not all unicode files have a BOM. Result: this will actually remove content of such files.
2) Not all BOMs are 2 bytes, they can be 4 bytes too.... Result: there will be 2 bytes left in the output which don't belong to the actual 'textual' contents of the file.

see MSDN docs about unicode file handling.

quote:
Originally posted by Eljay
+Read whole file, not just up to first null byte (third parameter of ReadString).
ReadString will read x bytes (3rd parameter) if it does not encounter a null byte sooner.

aka: this will not return x bytes (inlcuding null bytes), but still will only be returning up to the first null byte (and if no null bytes exist, then it will return only x bytes).

You must use ReadBSTR for this if the contents is unicode. If the contents is ansi, you first must convert it to unicode in order to use ReadBSTR.
RE: unicode problem by Eljay on 02-04-2007 at 08:35 AM

quote:
Originally posted by CookieRevised
quote:
Originally posted by Eljay
+Read whole file, not just up to first null byte (third parameter of ReadString).
ReadString will read x bytes (3rd parameter) if it does not encounter a null byte sooner.

Thanks for that, although the docs are rather unclear about this.

quote:
Size
[number,optional] The size (in characters) of the string to read. If not specified, Messenger Plus! stops at the first null character.

I assumed this meant if you DID specify the size of the string, it would read past the null bytes. Oh well...
RE: unicode problem by CookieRevised on 02-04-2007 at 03:20 PM

Yeah,...

Personally I still see this as a (annoying :p) bug. However, Patcou does not and I can see his POV too. ReadString works like C++ works regarding strings where a null byte is always considered as the end of a string.

Hence why he included ReadBSTR (after some whining of some people :p). But this is still not a 'good' solution, as you see in the "read contents of file" example code above.

more about this:
[BUG] WriteString() doesn't handle BSTR's as BSTR's
(only accessable for beta testers)