What happened to the Messenger Plus! forums on msghelp.net?
Shoutbox » MsgHelp Archive » Messenger Plus! for Live Messenger » Scripting » unicode problem

unicode problem
Author: Message:
ceceboy
New Member
*


Posts: 1
Joined: Feb 2007
O.P. unicode problem
I'm having problems when trying to open a file that is in Unicode.
Is there a way to read the contents? I tried AJAX and fsObj but they both only return the first character of the content of the file. When the file is not saved in unicode it works fine.
Yes, it's necessary that the file is in unicode.

Hopefully someone can help me.
02-02-2007 10:10 PM
Profile E-Mail PM Find Quote Report
J-Thread
Full Member
***

Avatar

Posts: 467
Reputation: 8
– / Male / –
Joined: Jul 2004
RE: unicode problem
You can do it with the Scripting.FileSystemObject.

See this page for more info.
02-03-2007 09:05 AM
Profile E-Mail PM Find Quote Report
matty
Scripting Guru
*****


Posts: 8336
Reputation: 109
39 / Male / Flag
Joined: Dec 2002
Status: Away
RE: unicode problem
code:
/*

    _readFile coded by Matty

    This function will read the contents of a file and print out the contents regardless of being unicode or ansi.
    If you use this function please make sure to give credit where credit is due.

*/


function OnEvent_Initialize(MessengerStart) {
    _readFile('C:\\WndSelect.handler.js', true);
}

function _readFile(_s_file, _is_debugging){
    /* Use the windows api function CreateFileW to open the file ONLY if it already exists */
    var _h_file = Interop.Call('kernel32', 'CreateFileW', '\\\\?\\'+_s_file,
                                                          0x80000000 /* GENERIC_READ */,
                                                          0x01 /* FILE_SHARE_READ */,
                                                          0,
                                                          3 /* OPEN_EXISTING */,
                                                          0,
                                                          0);
    /* Check to make sure the pointer is actually to a file instead of an invalid handle */
    if (_h_file > -1 /* INVALID_HANDLE_VALUE */) {
        if (_is_debugging === true) Debug.Trace('_h_file : '+_h_file);
        /* Now that we have a pointer to the file its time to read the size of the file so we know how much to read */
        var _n_size = Interop.Call('kernel32', 'GetFileSize', _h_file, 0);
        if (_is_debugging === true) Debug.Trace('_n_size : '+_n_size);
        /* Create a buffer to read in the data and another buffer to store the number of bytes that have been read */
        var _n_buffer = Interop.Allocate((2*_n_size)+2);
        var _n_bytes_read = Interop.Allocate(4);
        /* Read the file */
        Interop.Call('kernel32', 'ReadFile', _h_file, _n_buffer, _n_size, _n_bytes_read, 0);
        /* Close the file */
        Interop.Call('kernel32', 'CloseHandle', _h_file);
       
        /* Check if the file is unicode, if it is read the contents of _n_buffer as unicode, otherwise read it as ansi */
        if (_n_buffer.ReadString(0, false) === '˙ūv'){
            if (_is_debugging === true) Debug.Trace('_n_buffer : '+_n_buffer.ReadString(0, true));
            return _n_buffer.ReadString(0, true);
        } else {
            if (_is_debugging === true) Debug.Trace('_n_buffer : '+_n_buffer.ReadString(0, false));
            return _n_buffer.ReadString(0, false);
        }
    }
}

This post was edited on 02-03-2007 at 05:50 PM by matty.
02-03-2007 05:47 PM
Profile E-Mail PM Find Quote Report
Eljay
Elite Member
*****

Avatar
:O

Posts: 2949
Reputation: 77
– / Male / –
Joined: May 2004
RE: unicode problem
As usual, I am going to take Matty's code apart and optimize it!

code:
/*----------
Title:       _ReadFile
Description: Read the contents of a file, whether it is Unicode or ASCII.
Author:      Eljay <leejeffery@gmail.com>
----------*/


function _ReadFile(Path){
  var Handle = Interop.Call("kernel32", "CreateFileW", Path, 0x80000000, 1, 0, 3, 0, 0);
  if(Handle == -1) return; //If file doesn't exist, just exit function.
  var FileSize = Interop.Call("kernel32", "GetFileSize", Handle, 0);
  var Buffer = Interop.Allocate(FileSize + 1);
  var BytesRead = Interop.Allocate(4);
  Interop.Call("kernel32", "ReadFile", Handle, Buffer, FileSize, BytesRead, 0);
  var IsUnicode = Interop.Call("advapi32", "IsTextUnicode", Buffer, FileSize, 0);
  Interop.Call("kernel32", "CloseHandle", Handle);
  if(IsUnicode) return Buffer.ReadString(2, true, (FileSize / 2) - 1);
  else return Buffer.ReadString(0, false, FileSize);
}


Changes:
+Uses IsTextUnicode API to perform more accurate tests for file encoding.
+Changed file buffer size, it doesnt need to be twice the file size for unicode as the size is in bytes not characters.
+No need for everything to be inside if block if you don't do anything with else.
+Removed first 2 bytes from Unicode files (this doesn't contain file data, just encoding information).
+Read whole file, not just up to first null byte (third parameter of ReadString).
02-03-2007 10:53 PM
Profile PM Find Quote Report
CookieRevised
Elite Member
*****

Avatar

Posts: 15517
Reputation: 173
– / Male / Flag
Joined: Jul 2003
Status: Away
RE: unicode problem
quote:
Originally posted by Eljay
+Removed first 2 bytes from Unicode files (this doesn't contain file data, just encoding information).
1) Not all unicode files have a BOM. Result: this will actually remove content of such files.
2) Not all BOMs are 2 bytes, they can be 4 bytes too.... Result: there will be 2 bytes left in the output which don't belong to the actual 'textual' contents of the file.

see MSDN docs about unicode file handling.

quote:
Originally posted by Eljay
+Read whole file, not just up to first null byte (third parameter of ReadString).
ReadString will read x bytes (3rd parameter) if it does not encounter a null byte sooner.

aka: this will not return x bytes (inlcuding null bytes), but still will only be returning up to the first null byte (and if no null bytes exist, then it will return only x bytes).

You must use ReadBSTR for this if the contents is unicode. If the contents is ansi, you first must convert it to unicode in order to use ReadBSTR.

This post was edited on 02-04-2007 at 12:55 AM by CookieRevised.
.-= A 'frrrrrrrituurrr' for Wacky =-.
02-04-2007 12:39 AM
Profile PM Find Quote Report
Eljay
Elite Member
*****

Avatar
:O

Posts: 2949
Reputation: 77
– / Male / –
Joined: May 2004
RE: unicode problem
quote:
Originally posted by CookieRevised
quote:
Originally posted by Eljay
+Read whole file, not just up to first null byte (third parameter of ReadString).
ReadString will read x bytes (3rd parameter) if it does not encounter a null byte sooner.

Thanks for that, although the docs are rather unclear about this.

quote:
Size
[number,optional] The size (in characters) of the string to read. If not specified, Messenger Plus! stops at the first null character.

I assumed this meant if you DID specify the size of the string, it would read past the null bytes. Oh well...
02-04-2007 08:35 AM
Profile PM Find Quote Report
CookieRevised
Elite Member
*****

Avatar

Posts: 15517
Reputation: 173
– / Male / Flag
Joined: Jul 2003
Status: Away
RE: unicode problem
Yeah,...

Personally I still see this as a (annoying :p) bug. However, Patcou does not and I can see his POV too. ReadString works like C++ works regarding strings where a null byte is always considered as the end of a string.

Hence why he included ReadBSTR (after some whining of some people :p). But this is still not a 'good' solution, as you see in the "read contents of file" example code above.

more about this:
[BUG] WriteString() doesn't handle BSTR's as BSTR's
(only accessable for beta testers)

This post was edited on 02-04-2007 at 03:21 PM by CookieRevised.
.-= A 'frrrrrrrituurrr' for Wacky =-.
02-04-2007 03:20 PM
Profile PM Find Quote Report
« Next Oldest Return to Top Next Newest »


Threaded Mode | Linear Mode
View a Printable Version
Send this Thread to a Friend
Subscribe | Add to Favorites
Rate This Thread:

Forum Jump:

Forum Rules:
You cannot post new threads
You cannot post replies
You cannot post attachments
You can edit your posts
HTML is Off
myCode is On
Smilies are On
[img] Code is On