What happened to the Messenger Plus! forums on msghelp.net?
Shoutbox » MsgHelp Archive » Messenger Plus! for Live Messenger » Scripting » [SOLVED] Remove HTML Content

[SOLVED] Remove HTML Content
Author: Message:
Spunky
Former Super Mod
*****

Avatar

Posts: 3658
Reputation: 61
36 / Male / Flag
Joined: Aug 2006
O.P. [SOLVED] Remove HTML Content
I'm reading information from a website and have managed to break it all down in to the chunks i need using a regex (the first one I've successfully written on my own). The chinks have a few <DIV> and <SPAN> tags though that I want to get rid of if possible and I'm not sure if they are always the same length. Is there any way to remove text between 2 markers (such as <DIV> and </DIV>)?

This post was edited on 12-03-2006 at 04:31 PM by Spunky.
<Eljay> "Problems encountered: shit blew up" :zippy:
12-03-2006 03:23 PM
Profile PM Find Quote Report
markee
Veteran Member
*****

Avatar

Posts: 1621
Reputation: 50
36 / Male / Flag
Joined: Jan 2006
RE: [?] Remove HTML Content
I did something like this with my Message Converter script if you have a look.  It isn't the easiest of things but here is an extract of what I did.

code:
[...]
var bb = /\[b\]/gi;
var eb = /\[\/b\]/gi;
[...]
function OnEvent_ChatWndSendMessage(ChatWnd,Message){
        while ((arr = eb.exec(Message)) != null){
        var prev = RegExp.leftContext;
        var MessageEnd = RegExp.rightContext;
        Inner3:
        while ((arr1 = bb.exec(prev)) != null){
            var extract = RegExp.rightContext;
            var MessageBeg = RegExp.leftContext;
            if (eb.exec(extract)!=null){
                prev = MessageBeg;
                while(bb.exec(prev)!=null){
                }
                continue Inner3;
            }
        }
        Message = MessageBeg+""+extract+""+MessageEnd;
        while(eb.exec(Message)!=null){
        }
        continue Outer3;
    }

[...]
    return Message;
}

There are probably some better ways of going about some of the things but I know this works as a starting point.  This extract just finds the last instance of [/b] and finds the corresponding [b ] and will exclude any full [b ]...[/ b] in the middle(EDIT: relised it didn't so this, only the colour and background parts did as the likes of bolding doesn't need to be, though i should really check for any and make them not do anything rather than add 2 pointless characters) and replaces the beginning and end with the bold and ending of the bold for IRC style.  I hope this helps and if you need any help understanding my code then you can add me the WLM.

EDIT2: as I look over the code again, I probably don't need the continues in there as it should do the same thing with or without them there. Sorry for the edits btw.
* markee writes a mental note about his dodgy scripting ability

This post was edited on 12-03-2006 at 03:52 PM by markee.
[Image: markee.png]
12-03-2006 03:39 PM
Profile PM Find Quote Report
Spunky
Former Super Mod
*****

Avatar

Posts: 3658
Reputation: 61
36 / Male / Flag
Joined: Aug 2006
O.P. RE: [?] Remove HTML Content
Thanks for the code but I couldn't understand quite how to use it... Luckily, i've just really got the hang of RegExps (about time too) so I can remove formatting tags and use wildcards on things like DIVS :D
<Eljay> "Problems encountered: shit blew up" :zippy:
12-03-2006 04:30 PM
Profile PM Find Quote Report
-dt-
Scripting Contest Winner
*****

Avatar
;o

Posts: 1819
Reputation: 74
35 / Male / Flag
Joined: Mar 2004
RE: [SOLVED] Remove HTML Content
quote:
Originally posted by markee
I did something like this with my Message Converter script if you have a look.  It isn't the easiest of things but here is an extract of what I did.

code:
[...]
var bb = /\[b\]/gi;
var eb = /\[\/b\]/gi;
[...]
function OnEvent_ChatWndSendMessage(ChatWnd,Message){
        while ((arr = eb.exec(Message)) != null){
        var prev = RegExp.leftContext;
        var MessageEnd = RegExp.rightContext;
        Inner3:
        while ((arr1 = bb.exec(prev)) != null){
            var extract = RegExp.rightContext;
            var MessageBeg = RegExp.leftContext;
            if (eb.exec(extract)!=null){
                prev = MessageBeg;
                while(bb.exec(prev)!=null){
                }
                continue Inner3;
            }
        }
        Message = MessageBeg+""+extract+""+MessageEnd;
        while(eb.exec(Message)!=null){
        }
        continue Outer3;
    }

[...]
    return Message;
}

There are probably some better ways of going about some of the things but I know this works as a starting point.  This extract just finds the last instance of [/b] and finds the corresponding [b ] and will exclude any full [b ]...[/ b] in the middle(EDIT: relised it didn't so this, only the colour and background parts did as the likes of bolding doesn't need to be, though i should really check for any and make them not do anything rather than add 2 pointless characters) and replaces the beginning and end with the bold and ending of the bold for IRC style.  I hope this helps and if you need any help understanding my code then you can add me the WLM.

EDIT2: as I look over the code again, I probably don't need the continues in there as it should do the same thing with or without them there. Sorry for the edits btw.
* markee writes a mental note about his dodgy scripting ability

....

code:
Message = Message.replace(/\[b\](.*?)\[\/b\]/g, "$1");


thats all im going to say
[Image: dt2.0v2.png]      Happy Birthday, WDZ
12-03-2006 05:10 PM
Profile PM Web Find Quote Report
markee
Veteran Member
*****

Avatar

Posts: 1621
Reputation: 50
36 / Male / Flag
Joined: Jan 2006
RE: RE: [SOLVED] Remove HTML Content
quote:
Originally posted by -dt-
code:
Message = Message.replace(/\[b\](.*?)\[\/b\]/g, "$1");


thats all im going to say

The reason for me not doing that was because I had to get the corresponding ends and beginnings when it it came to the colouring of the text and background an I just copied the part I needed rather than try thinking it all through again, but thanks for that though.

* markee goes back to see where else he made his code too long...
[Image: markee.png]
12-04-2006 12:15 AM
Profile PM Find Quote Report
« Next Oldest Return to Top Next Newest »


Threaded Mode | Linear Mode
View a Printable Version
Send this Thread to a Friend
Subscribe | Add to Favorites
Rate This Thread:

Forum Jump:

Forum Rules:
You cannot post new threads
You cannot post replies
You cannot post attachments
You can edit your posts
HTML is Off
myCode is On
Smilies are On
[img] Code is On