Shoutbox

[SOLVED] Remove HTML Content - Printable Version

-Shoutbox (https://shoutbox.menthix.net)
+-- Forum: MsgHelp Archive (/forumdisplay.php?fid=58)
+--- Forum: Messenger Plus! for Live Messenger (/forumdisplay.php?fid=4)
+---- Forum: Scripting (/forumdisplay.php?fid=39)
+----- Thread: [SOLVED] Remove HTML Content (/showthread.php?tid=69105)

[SOLVED] Remove HTML Content by Spunky on 12-03-2006 at 03:23 PM

I'm reading information from a website and have managed to break it all down in to the chunks i need using a regex (the first one I've successfully written on my own). The chinks have a few <DIV> and <SPAN> tags though that I want to get rid of if possible and I'm not sure if they are always the same length. Is there any way to remove text between 2 markers (such as <DIV> and </DIV>)?


RE: [?] Remove HTML Content by markee on 12-03-2006 at 03:39 PM

I did something like this with my Message Converter script if you have a look.  It isn't the easiest of things but here is an extract of what I did.

code:
[...]
var bb = /\[b\]/gi;
var eb = /\[\/b\]/gi;
[...]
function OnEvent_ChatWndSendMessage(ChatWnd,Message){
        while ((arr = eb.exec(Message)) != null){
        var prev = RegExp.leftContext;
        var MessageEnd = RegExp.rightContext;
        Inner3:
        while ((arr1 = bb.exec(prev)) != null){
            var extract = RegExp.rightContext;
            var MessageBeg = RegExp.leftContext;
            if (eb.exec(extract)!=null){
                prev = MessageBeg;
                while(bb.exec(prev)!=null){
                }
                continue Inner3;
            }
        }
        Message = MessageBeg+""+extract+""+MessageEnd;
        while(eb.exec(Message)!=null){
        }
        continue Outer3;
    }

[...]
    return Message;
}

There are probably some better ways of going about some of the things but I know this works as a starting point.  This extract just finds the last instance of [/b] and finds the corresponding [b ] and will exclude any full [b ]...[/ b] in the middle(EDIT: relised it didn't so this, only the colour and background parts did as the likes of bolding doesn't need to be, though i should really check for any and make them not do anything rather than add 2 pointless characters) and replaces the beginning and end with the bold and ending of the bold for IRC style.  I hope this helps and if you need any help understanding my code then you can add me the WLM.

EDIT2: as I look over the code again, I probably don't need the continues in there as it should do the same thing with or without them there. Sorry for the edits btw.
* markee writes a mental note about his dodgy scripting ability
RE: [?] Remove HTML Content by Spunky on 12-03-2006 at 04:30 PM

Thanks for the code but I couldn't understand quite how to use it... Luckily, i've just really got the hang of RegExps (about time too) so I can remove formatting tags and use wildcards on things like DIVS :D


RE: [SOLVED] Remove HTML Content by -dt- on 12-03-2006 at 05:10 PM

quote:
Originally posted by markee
I did something like this with my Message Converter script if you have a look.  It isn't the easiest of things but here is an extract of what I did.

code:
[...]
var bb = /\[b\]/gi;
var eb = /\[\/b\]/gi;
[...]
function OnEvent_ChatWndSendMessage(ChatWnd,Message){
        while ((arr = eb.exec(Message)) != null){
        var prev = RegExp.leftContext;
        var MessageEnd = RegExp.rightContext;
        Inner3:
        while ((arr1 = bb.exec(prev)) != null){
            var extract = RegExp.rightContext;
            var MessageBeg = RegExp.leftContext;
            if (eb.exec(extract)!=null){
                prev = MessageBeg;
                while(bb.exec(prev)!=null){
                }
                continue Inner3;
            }
        }
        Message = MessageBeg+""+extract+""+MessageEnd;
        while(eb.exec(Message)!=null){
        }
        continue Outer3;
    }

[...]
    return Message;
}

There are probably some better ways of going about some of the things but I know this works as a starting point.  This extract just finds the last instance of [/b] and finds the corresponding [b ] and will exclude any full [b ]...[/ b] in the middle(EDIT: relised it didn't so this, only the colour and background parts did as the likes of bolding doesn't need to be, though i should really check for any and make them not do anything rather than add 2 pointless characters) and replaces the beginning and end with the bold and ending of the bold for IRC style.  I hope this helps and if you need any help understanding my code then you can add me the WLM.

EDIT2: as I look over the code again, I probably don't need the continues in there as it should do the same thing with or without them there. Sorry for the edits btw.
* markee writes a mental note about his dodgy scripting ability

....

code:
Message = Message.replace(/\[b\](.*?)\[\/b\]/g, "$1");


thats all im going to say
RE: RE: [SOLVED] Remove HTML Content by markee on 12-04-2006 at 12:15 AM

quote:
Originally posted by -dt-
code:
Message = Message.replace(/\[b\](.*?)\[\/b\]/g, "$1");


thats all im going to say

The reason for me not doing that was because I had to get the corresponding ends and beginnings when it it came to the colouring of the text and background an I just copied the part I needed rather than try thinking it all through again, but thanks for that though.

* markee goes back to see where else he made his code too long...