unicode problem

login | register | shoutbox

What happened to the Messenger Plus! forums on msghelp.net?

Shoutbox » MsgHelp Archive » Messenger Plus! for Live Messenger » Scripting » unicode problem

unicode problem

Author:

Message:

ceceboy
New Member

Posts: 1
Joined: Feb 2007

O.P. unicode problem

I'm having problems when trying to open a file that is in Unicode.
Is there a way to read the contents? I tried AJAX and fsObj but they both only return the first character of the content of the file. When the file is not saved in unicode it works fine.
Yes, it's necessary that the file is in unicode.

Hopefully someone can help me.

02-02-2007 10:10 PM

J-Thread
Full Member

Posts: 467
Reputation: 8
– / Male

/ –
Joined: Jul 2004

RE: unicode problem

You can do it with the Scripting.FileSystemObject.

See this page for more info.

NickChange Plugin / NickChange Script | NoEmoticons

02-03-2007 09:05 AM

matty
Scripting Guru

Posts: 8327
Reputation: 109
41 / Male

Joined: Dec 2002
Status: Away

RE: unicode problem

code:
/*

_readFile coded by Matty

This function will read the contents of a file and print out the contents regardless of being unicode or ansi.
If you use this function please make sure to give credit where credit is due.

*/

function OnEvent_Initialize(MessengerStart) {
_readFile('C:\\WndSelect.handler.js', true);
}

function _readFile(_s_file, _is_debugging){
/* Use the windows api function CreateFileW to open the file ONLY if it already exists */
var _h_file = Interop.Call('kernel32', 'CreateFileW', '\\\\?\\'+_s_file,
0x80000000 /* GENERIC_READ */,
0x01 /* FILE_SHARE_READ */,
0,
3 /* OPEN_EXISTING */,
0,
0);
/* Check to make sure the pointer is actually to a file instead of an invalid handle */
if (_h_file > -1 /* INVALID_HANDLE_VALUE */) {
if (_is_debugging === true) Debug.Trace('_h_file : '+_h_file);
/* Now that we have a pointer to the file its time to read the size of the file so we know how much to read */
var _n_size = Interop.Call('kernel32', 'GetFileSize', _h_file, 0);
if (_is_debugging === true) Debug.Trace('_n_size : '+_n_size);
/* Create a buffer to read in the data and another buffer to store the number of bytes that have been read */
var _n_buffer = Interop.Allocate((2*_n_size)+2);
var _n_bytes_read = Interop.Allocate(4);
/* Read the file */
Interop.Call('kernel32', 'ReadFile', _h_file, _n_buffer, _n_size, _n_bytes_read, 0);
/* Close the file */
Interop.Call('kernel32', 'CloseHandle', _h_file);

/* Check if the file is unicode, if it is read the contents of _n_buffer as unicode, otherwise read it as ansi */
if (_n_buffer.ReadString(0, false) === 'ÿþv'){
if (_is_debugging === true) Debug.Trace('_n_buffer : '+_n_buffer.ReadString(0, true));
return _n_buffer.ReadString(0, true);
} else {
if (_is_debugging === true) Debug.Trace('_n_buffer : '+_n_buffer.ReadString(0, false));
return _n_buffer.ReadString(0, false);
}
}
}

This post was edited on 02-03-2007 at 05:50 PM by matty.

02-03-2007 05:47 PM

Eljay
Elite Member

:O

Posts: 2945
Reputation: 77
– / Male

/ –
Joined: May 2004

RE: unicode problem

As usual, I am going to take Matty's code apart and optimize it!

code:
/*----------
Title: _ReadFile
Description: Read the contents of a file, whether it is Unicode or ASCII.
Author: Eljay <leejeffery@gmail.com>
----------*/

function _ReadFile(Path){
var Handle = Interop.Call("kernel32", "CreateFileW", Path, 0x80000000, 1, 0, 3, 0, 0);
if(Handle == -1) return; //If file doesn't exist, just exit function.
var FileSize = Interop.Call("kernel32", "GetFileSize", Handle, 0);
var Buffer = Interop.Allocate(FileSize + 1);
var BytesRead = Interop.Allocate(4);
Interop.Call("kernel32", "ReadFile", Handle, Buffer, FileSize, BytesRead, 0);
var IsUnicode = Interop.Call("advapi32", "IsTextUnicode", Buffer, FileSize, 0);
Interop.Call("kernel32", "CloseHandle", Handle);
if(IsUnicode) return Buffer.ReadString(2, true, (FileSize / 2) - 1);
else return Buffer.ReadString(0, false, FileSize);
}

Changes:
+Uses IsTextUnicode API to perform more accurate tests for file encoding.
+Changed file buffer size, it doesnt need to be twice the file size for unicode as the size is in bytes not characters.
+No need for everything to be inside if block if you don't do anything with else.
+Removed first 2 bytes from Unicode files (this doesn't contain file data, just encoding information).
+Read whole file, not just up to first null byte (third parameter of ReadString).

02-03-2007 10:53 PM

CookieRevised
Elite Member

Posts: 15494
Reputation: 173
– / Male

Joined: Jul 2003
Status: Away

RE: unicode problem

quote:
Originally posted by Eljay
+Removed first 2 bytes from Unicode files (this doesn't contain file data, just encoding information).

1) Not all unicode files have a BOM. Result: this will actually remove content of such files.
2) Not all BOMs are 2 bytes, they can be 4 bytes too.... Result: there will be 2 bytes left in the output which don't belong to the actual 'textual' contents of the file.

see MSDN docs about unicode file handling.

quote:
Originally posted by Eljay
+Read whole file, not just up to first null byte (third parameter of ReadString).

ReadString will read x bytes (3rd parameter) if it does not encounter a null byte sooner.

aka: this will not return x bytes (inlcuding null bytes), but still will only be returning up to the first null byte (and if no null bytes exist, then it will return only x bytes).

You must use ReadBSTR for this if the contents is unicode. If the contents is ansi, you first must convert it to unicode in order to use ReadBSTR.

This post was edited on 02-04-2007 at 12:55 AM by CookieRevised.

.-= A 'frrrrrrrituurrr' for Wacky =-.

02-04-2007 12:39 AM

Eljay
Elite Member

:O

Posts: 2945
Reputation: 77
– / Male

/ –
Joined: May 2004

RE: unicode problem

quote:
Originally posted by CookieRevised

quote:
Originally posted by Eljay
+Read whole file, not just up to first null byte (third parameter of ReadString).
ReadString will read x bytes (3rd parameter) if it does not encounter a null byte sooner.

Thanks for that, although the docs are rather unclear about this.

quote:
Size
[number,optional] The size (in characters) of the string to read. If not specified, Messenger Plus! stops at the first null character.

I assumed this meant if you DID specify the size of the string, it would read past the null bytes. Oh well...

02-04-2007 08:35 AM

CookieRevised
Elite Member

Posts: 15494
Reputation: 173
– / Male

Joined: Jul 2003
Status: Away

RE: unicode problem

Yeah,...

Personally I still see this as a (annoying

) bug. However, Patcou does not and I can see his POV too. ReadString works like C++ works regarding strings where a null byte is always considered as the end of a string.

Hence why he included ReadBSTR (after some whining of some people

). But this is still not a 'good' solution, as you see in the "read contents of file" example code above.

more about this:
[BUG] WriteString() doesn't handle BSTR's as BSTR's
(only accessable for beta testers)

This post was edited on 02-04-2007 at 03:21 PM by CookieRevised.

.-= A 'frrrrrrrituurrr' for Wacky =-.

02-04-2007 03:20 PM

« Next Oldest

Return to Top

Next Newest »

Threaded Mode | Linear Mode
View a Printable Version
Send this Thread to a Friend
Subscribe | Add to Favorites

Forum Rules:
You cannot post new threads You cannot post replies You cannot post attachments You can edit your posts	HTML is Off myCode is On Smilies are On [img] Code is On