unicode problem |
Author: |
Message: |
ceceboy
New Member
Posts: 1
Joined: Feb 2007
|
O.P. unicode problem
I'm having problems when trying to open a file that is in Unicode.
Is there a way to read the contents? I tried AJAX and fsObj but they both only return the first character of the content of the file. When the file is not saved in unicode it works fine.
Yes, it's necessary that the file is in unicode.
Hopefully someone can help me.
|
|
02-02-2007 10:10 PM |
|
|
J-Thread
Full Member
Posts: 467 Reputation: 8
– / / –
Joined: Jul 2004
|
RE: unicode problem
You can do it with the Scripting.FileSystemObject.
See this page for more info.
|
|
02-03-2007 09:05 AM |
|
|
matty
Scripting Guru
Posts: 8336 Reputation: 109
39 / /
Joined: Dec 2002
Status: Away
|
RE: unicode problem
code: /*
_readFile coded by Matty
This function will read the contents of a file and print out the contents regardless of being unicode or ansi.
If you use this function please make sure to give credit where credit is due.
*/
function OnEvent_Initialize(MessengerStart) {
_readFile('C:\\WndSelect.handler.js', true);
}
function _readFile(_s_file, _is_debugging){
/* Use the windows api function CreateFileW to open the file ONLY if it already exists */
var _h_file = Interop.Call('kernel32', 'CreateFileW', '\\\\?\\'+_s_file,
0x80000000 /* GENERIC_READ */,
0x01 /* FILE_SHARE_READ */,
0,
3 /* OPEN_EXISTING */,
0,
0);
/* Check to make sure the pointer is actually to a file instead of an invalid handle */
if (_h_file > -1 /* INVALID_HANDLE_VALUE */) {
if (_is_debugging === true) Debug.Trace('_h_file : '+_h_file);
/* Now that we have a pointer to the file its time to read the size of the file so we know how much to read */
var _n_size = Interop.Call('kernel32', 'GetFileSize', _h_file, 0);
if (_is_debugging === true) Debug.Trace('_n_size : '+_n_size);
/* Create a buffer to read in the data and another buffer to store the number of bytes that have been read */
var _n_buffer = Interop.Allocate((2*_n_size)+2);
var _n_bytes_read = Interop.Allocate(4);
/* Read the file */
Interop.Call('kernel32', 'ReadFile', _h_file, _n_buffer, _n_size, _n_bytes_read, 0);
/* Close the file */
Interop.Call('kernel32', 'CloseHandle', _h_file);
/* Check if the file is unicode, if it is read the contents of _n_buffer as unicode, otherwise read it as ansi */
if (_n_buffer.ReadString(0, false) === '˙ūv'){
if (_is_debugging === true) Debug.Trace('_n_buffer : '+_n_buffer.ReadString(0, true));
return _n_buffer.ReadString(0, true);
} else {
if (_is_debugging === true) Debug.Trace('_n_buffer : '+_n_buffer.ReadString(0, false));
return _n_buffer.ReadString(0, false);
}
}
}
This post was edited on 02-03-2007 at 05:50 PM by matty.
|
|
02-03-2007 05:47 PM |
|
|
Eljay
Elite Member
:O
Posts: 2949 Reputation: 77
– / / –
Joined: May 2004
|
RE: unicode problem
As usual, I am going to take Matty's code apart and optimize it!
code: /*----------
Title: _ReadFile
Description: Read the contents of a file, whether it is Unicode or ASCII.
Author: Eljay <leejeffery@gmail.com>
----------*/
function _ReadFile(Path){
var Handle = Interop.Call("kernel32", "CreateFileW", Path, 0x80000000, 1, 0, 3, 0, 0);
if(Handle == -1) return; //If file doesn't exist, just exit function.
var FileSize = Interop.Call("kernel32", "GetFileSize", Handle, 0);
var Buffer = Interop.Allocate(FileSize + 1);
var BytesRead = Interop.Allocate(4);
Interop.Call("kernel32", "ReadFile", Handle, Buffer, FileSize, BytesRead, 0);
var IsUnicode = Interop.Call("advapi32", "IsTextUnicode", Buffer, FileSize, 0);
Interop.Call("kernel32", "CloseHandle", Handle);
if(IsUnicode) return Buffer.ReadString(2, true, (FileSize / 2) - 1);
else return Buffer.ReadString(0, false, FileSize);
}
Changes:
+Uses IsTextUnicode API to perform more accurate tests for file encoding.
+Changed file buffer size, it doesnt need to be twice the file size for unicode as the size is in bytes not characters.
+No need for everything to be inside if block if you don't do anything with else.
+Removed first 2 bytes from Unicode files (this doesn't contain file data, just encoding information).
+Read whole file, not just up to first null byte (third parameter of ReadString).
|
|
02-03-2007 10:53 PM |
|
|
CookieRevised
Elite Member
Posts: 15517 Reputation: 173
– / /
Joined: Jul 2003
Status: Away
|
RE: unicode problem
quote: Originally posted by Eljay
+Removed first 2 bytes from Unicode files (this doesn't contain file data, just encoding information).
1) Not all unicode files have a BOM. Result: this will actually remove content of such files.
2) Not all BOMs are 2 bytes, they can be 4 bytes too.... Result: there will be 2 bytes left in the output which don't belong to the actual 'textual' contents of the file.
see MSDN docs about unicode file handling.
quote: Originally posted by Eljay
+Read whole file, not just up to first null byte (third parameter of ReadString).
ReadString will read x bytes (3rd parameter) if it does not encounter a null byte sooner.
aka: this will not return x bytes (inlcuding null bytes), but still will only be returning up to the first null byte (and if no null bytes exist, then it will return only x bytes).
You must use ReadBSTR for this if the contents is unicode. If the contents is ansi, you first must convert it to unicode in order to use ReadBSTR.
This post was edited on 02-04-2007 at 12:55 AM by CookieRevised.
.-= A 'frrrrrrrituurrr' for Wacky =-.
|
|
02-04-2007 12:39 AM |
|
|
Eljay
Elite Member
:O
Posts: 2949 Reputation: 77
– / / –
Joined: May 2004
|
RE: unicode problem
quote: Originally posted by CookieRevised
quote: Originally posted by Eljay
+Read whole file, not just up to first null byte (third parameter of ReadString).
ReadString will read x bytes (3rd parameter) if it does not encounter a null byte sooner.
Thanks for that, although the docs are rather unclear about this.
quote: Size
[number,optional] The size (in characters) of the string to read. If not specified, Messenger Plus! stops at the first null character.
I assumed this meant if you DID specify the size of the string, it would read past the null bytes. Oh well...
|
|
02-04-2007 08:35 AM |
|
|
CookieRevised
Elite Member
Posts: 15517 Reputation: 173
– / /
Joined: Jul 2003
Status: Away
|
RE: unicode problem
Yeah,...
Personally I still see this as a (annoying ) bug. However, Patcou does not and I can see his POV too. ReadString works like C++ works regarding strings where a null byte is always considered as the end of a string.
Hence why he included ReadBSTR (after some whining of some people ). But this is still not a 'good' solution, as you see in the " read contents of file" example code above.
more about this:
[BUG] WriteString() doesn't handle BSTR's as BSTR's
(only accessable for beta testers)
This post was edited on 02-04-2007 at 03:21 PM by CookieRevised.
.-= A 'frrrrrrrituurrr' for Wacky =-.
|
|
02-04-2007 03:20 PM |
|
|
|
|