| Author |
Message |
|
SupG
1st Sergeant
Joined: Wed Jan 23, 2002 3:00 am Posts: 33
|
 ANSI Control Characters
Anyone know of a good regex to identify all ANSI control Characters? I'm currently using but it doesn't catch the cursor movement codes.
|
| Thu Apr 04, 2013 12:27 pm |
|
 |
|
Mongoose
Commander
Joined: Mon Oct 29, 2001 3:00 am Posts: 1096 Location: Tucson, AZ
|
 Re: ANSI Control Characters
The REs I used for the JTX demo app are here: https://sourceforge.net/p/jtx/code/15/t ... exer.jplex. Note the "unknownEscape" catch-all.
_________________ Suddenly you're Busted!
|
| Thu Apr 04, 2013 12:42 pm |
|
 |
|
Micro
Ambassador
Joined: Wed Apr 20, 2011 1:19 pm Posts: 2559 Location: Oklahoma City, OK 73170 US
|
 Re: ANSI Control Characters
This is what I am using: Code: //regular expression to fit ANSI control sequences string ansiControlRegEx = (char)27 + @"\[" + "[^@-~]*" + "[@-~]";
_________________ Regards, Micro Website: http://www.microblaster.net TWGS2.20b/TW3.34: telnet://twgs.microblaster.net:2002
ICQ is Dead Jim! Join us on Discord: https://discord.gg/zvEbArscMN
|
| Thu Apr 04, 2013 12:51 pm |
|
 |
|
SupG
1st Sergeant
Joined: Wed Jan 23, 2002 3:00 am Posts: 33
|
 Re: ANSI Control Characters
Thanks, this helped - I ended up using one of the strings in here, some of them are unnecessary I found, however. I've come up with 3 reg expressions that seem to match most if not all of what I'll need to filter out. Micro wrote: This is what I am using: Code: //regular expression to fit ANSI control sequences string ansiControlRegEx = (char)27 + @"\[" + "[^@-~]*" + "[@-~]"; I initially thought something like this would work after reading the definition of the control sequences, but this regex misses quite a bit actually. Here are the 3 codes I found work quite well for my purposes. Code: "\x1b[^m]*m" "\x1b\[([0-9]+(;[0-9]+)*)?[Hf]" "\x1b\[[0-9]+[A-HJKST]" Thanks for the help guys.
|
| Thu Apr 04, 2013 2:33 pm |
|
 |
|
Micro
Ambassador
Joined: Wed Apr 20, 2011 1:19 pm Posts: 2559 Location: Oklahoma City, OK 73170 US
|
 Re: ANSI Control Characters
I copied mine from some sample source code, and it worked, so I didn't give much more thought to it. You are missing AutoWrap Mode (^[?7h), save/restore cursor position (^[s / ^[u), and hide cursor (^[l). TheDraw likes to put ^[?7h at the beginning of every file it creates. Would the following catch everything in a single expression? Code: "\x1b\[([0-9]+(;[0-9]+)*)(?)[A-HJKSTsuflhm]" or simpler Code: "\x1b\[([0-9]+(;[0-9]+)*)(?)[A-z]" or simplest
_________________ Regards, Micro Website: http://www.microblaster.net TWGS2.20b/TW3.34: telnet://twgs.microblaster.net:2002
ICQ is Dead Jim! Join us on Discord: https://discord.gg/zvEbArscMN
Last edited by Micro on Thu Apr 04, 2013 5:03 pm, edited 1 time in total.
|
| Thu Apr 04, 2013 4:33 pm |
|
 |
|
Mongoose
Commander
Joined: Mon Oct 29, 2001 3:00 am Posts: 1096 Location: Tucson, AZ
|
 Re: ANSI Control Characters
You can't compress [A-Za-z] to [A-z]. There are some non-alphabetic characters in between. Also, the "\[*" in your expression says match any number of '['. I assume you meant match any number of any character, which would be ".*". But that's still probably not what you want, because of the principle of maximal munch. The regex "\x1b\[.*[A-z]" would match the entirety of "\x1b\[0;31mHello, World!\x1b\[0m", not just the "\x1b\[0;31m". The catch-all in the lexer spec I linked vaguely reads, "match an escape, followed by a '[', followed by any number of non-alphabetic characters, followed by any alphabetic character." This matches an infinite number of strings that aren't valid ANSI codes, but the important thing is that it matches all strings that are valid ANSI codes. (Or at least, all the common ones used in BBS games.)
_________________ Suddenly you're Busted!
|
| Thu Apr 04, 2013 5:00 pm |
|
 |
|
Micro
Ambassador
Joined: Wed Apr 20, 2011 1:19 pm Posts: 2559 Location: Oklahoma City, OK 73170 US
|
 Re: ANSI Control Characters
It's still confusing to me :/ how about Code: "\x1b\[.*/^[A-Za-z]+$/"
_________________ Regards, Micro Website: http://www.microblaster.net TWGS2.20b/TW3.34: telnet://twgs.microblaster.net:2002
ICQ is Dead Jim! Join us on Discord: https://discord.gg/zvEbArscMN
Last edited by Micro on Thu Apr 04, 2013 5:09 pm, edited 2 times in total.
|
| Thu Apr 04, 2013 5:04 pm |
|
 |
|
Mongoose
Commander
Joined: Mon Oct 29, 2001 3:00 am Posts: 1096 Location: Tucson, AZ
|
 Re: ANSI Control Characters
Hmm... some of the notation doesn't survive being quoted.
Micro, what do the '/' and '@' mean in your regexes? I'm not familiar with that notation. And outside of a character range, I know '^' as the beginning-of-line operator.
_________________ Suddenly you're Busted!
|
| Thu Apr 04, 2013 5:08 pm |
|
 |
|
Micro
Ambassador
Joined: Wed Apr 20, 2011 1:19 pm Posts: 2559 Location: Oklahoma City, OK 73170 US
|
 Re: ANSI Control Characters
I don't know exactly, I'm confsued :/ My original example was from a terminal emulation tutorial somewhere on the web and the / came from here: http://stackoverflow.com/questions/6067 ... characters"[^@-~]*" + "[@-~]" in my original example @-~ is a range that would include much more than the alpha characters needed. I don't understand what the ^ does.
_________________ Regards, Micro Website: http://www.microblaster.net TWGS2.20b/TW3.34: telnet://twgs.microblaster.net:2002
ICQ is Dead Jim! Join us on Discord: https://discord.gg/zvEbArscMN
|
| Thu Apr 04, 2013 5:14 pm |
|
 |
|
GsuP
Staff Sergeant
Joined: Tue Jul 08, 2008 2:51 pm Posts: 12
|
 Re: ANSI Control Characters
Thanks Micro, good stuff. I think the last two might cover too much though - could run the risk of matching characters you don't want. I guess I was just too lazy to combine my 3 expressions into one, but I like your first one - I'll use that with a slight modification. Here is what I'm gonna go with. Code: \x1b\[([0-9]+(;[0-9]+)*)?\??[A-HJKSTsuflhm]
|
| Thu Apr 04, 2013 5:27 pm |
|
 |
|
Mongoose
Commander
Joined: Mon Oct 29, 2001 3:00 am Posts: 1096 Location: Tucson, AZ
|
 Re: ANSI Control Characters
If '^' is the first character in a range or set, it means "not these characters". If it's any other position in a range or set, it's a literal '^'. And outside of a range or set, if it's not escaped as a literal it means "beginning of a line". Not all regex languages support beginning and end of line operators. Part of the confusion is that there's not a universal regex language. I don't recognize the language they're using in that SO post. This site gives a pretty good overview; it was helpful to me when I was first learning about REs.
_________________ Suddenly you're Busted!
|
| Thu Apr 04, 2013 5:29 pm |
|
 |
|
Micro
Ambassador
Joined: Wed Apr 20, 2011 1:19 pm Posts: 2559 Location: Oklahoma City, OK 73170 US
|
 Re: ANSI Control Characters
Mongoose, this is from you code, will it match any ANSI string? Code: "\[[^a-zA-Z]*[a-zA-Z]" What if you limit the middle a little more? Code: "\[[0-9;?]*[a-zA-Z]" What you said about the ^ makes sense. What I read elsewhere did not make sense: according to: http://www.regular-expressions.info/reference.html^ (caret) Matches at the start of the string the regex pattern is applied to. Matches a position rather than a character. Most regex flavors have an option to make the caret match after line breaks (i.e. at the start of a line in a file) as well.
_________________ Regards, Micro Website: http://www.microblaster.net TWGS2.20b/TW3.34: telnet://twgs.microblaster.net:2002
ICQ is Dead Jim! Join us on Discord: https://discord.gg/zvEbArscMN
Last edited by Micro on Thu Apr 04, 2013 5:44 pm, edited 2 times in total.
|
| Thu Apr 04, 2013 5:35 pm |
|
 |
|
Micro
Ambassador
Joined: Wed Apr 20, 2011 1:19 pm Posts: 2559 Location: Oklahoma City, OK 73170 US
|
 Re: ANSI Control Characters
GsuP wrote: Thanks Micro, good stuff. I think the last two might cover too much though - could run the risk of matching characters you don't want. I guess I was just too lazy to combine my 3 expressions into one, but I like your first one - I'll use that with a slight modification. Here is what I'm gonna go with. Code: \x1b\[([0-9]+(;[0-9]+)*)?\??[A-HJKSTsuflhm] I don't know much about regex, but I just learned a lot from this thread. Mongoose seems to have a much better understanding. what does ?\?? do?
_________________ Regards, Micro Website: http://www.microblaster.net TWGS2.20b/TW3.34: telnet://twgs.microblaster.net:2002
ICQ is Dead Jim! Join us on Discord: https://discord.gg/zvEbArscMN
|
| Thu Apr 04, 2013 5:42 pm |
|
 |
|
Mongoose
Commander
Joined: Mon Oct 29, 2001 3:00 am Posts: 1096 Location: Tucson, AZ
|
 Re: ANSI Control Characters
Micro wrote: Mongoose, this is from you code, will it match any ANSI string? Code: "\[[^a-zA-Z]*[a-zA-Z]" What if you limit the middle a little more? Code: "\[[0-9;?]*[a-zA-Z]" I think that would work. Some languages might make you escape the '?' in the range... not sure. Micro wrote: what does ?\?? do? '?' means zero or one of something. So "?\??" would be zero or one of whatever was right before it, followed by zero or one literal question mark.
_________________ Suddenly you're Busted!
|
| Thu Apr 04, 2013 5:50 pm |
|
 |
|
SupG
1st Sergeant
Joined: Wed Jan 23, 2002 3:00 am Posts: 33
|
 Re: ANSI Control Characters
I usually try to avoid using regular expressions because of the physical pain they cause me. Thus I don't know a lot about them.
I've learned stuff today AND my wife made me a sammich.
Must be a good day.
|
| Thu Apr 04, 2013 5:59 pm |
|
 |
|