View unanswered posts | View active topics It is currently Sun Dec 28, 2025 11:34 am



Reply to topic  [ 22 posts ]  Go to page 1, 2  Next
 ANSI Control Characters 
Author Message
1st Sergeant

Joined: Wed Jan 23, 2002 3:00 am
Posts: 33
Unread post ANSI Control Characters
Anyone know of a good regex to identify all ANSI control Characters?

I'm currently using
Code:
\x1b[^m]*m
but it doesn't catch the cursor movement codes.


Thu Apr 04, 2013 12:27 pm
Profile ICQ WWW
Commander
User avatar

Joined: Mon Oct 29, 2001 3:00 am
Posts: 1096
Location: Tucson, AZ
Unread post Re: ANSI Control Characters
The REs I used for the JTX demo app are here: https://sourceforge.net/p/jtx/code/15/t ... exer.jplex. Note the "unknownEscape" catch-all.

_________________
Suddenly you're Busted!


Thu Apr 04, 2013 12:42 pm
Profile WWW
Ambassador
User avatar

Joined: Wed Apr 20, 2011 1:19 pm
Posts: 2559
Location: Oklahoma City, OK 73170 US
Unread post Re: ANSI Control Characters
This is what I am using:
Code:
//regular expression to fit ANSI control sequences
string ansiControlRegEx = (char)27 + @"\[" + "[^@-~]*" + "[@-~]";

_________________
Regards,
Micro

Website: http://www.microblaster.net
TWGS2.20b/TW3.34: telnet://twgs.microblaster.net:2002

ICQ is Dead Jim! Join us on Discord:
https://discord.gg/zvEbArscMN


Thu Apr 04, 2013 12:51 pm
Profile ICQ YIM WWW
1st Sergeant

Joined: Wed Jan 23, 2002 3:00 am
Posts: 33
Unread post Re: ANSI Control Characters
Mongoose wrote:
The REs I used for the JTX demo app are here: https://sourceforge.net/p/jtx/code/15/t ... exer.jplex. Note the "unknownEscape" catch-all.


Thanks, this helped - I ended up using one of the strings in here, some of them are unnecessary I found, however. I've come up with 3 reg expressions that seem to match most if not all of what I'll need to filter out.


Micro wrote:
This is what I am using:
Code:
//regular expression to fit ANSI control sequences
string ansiControlRegEx = (char)27 + @"\[" + "[^@-~]*" + "[@-~]";


I initially thought something like this would work after reading the definition of the control sequences, but this regex misses quite a bit actually.

Here are the 3 codes I found work quite well for my purposes.

Code:
"\x1b[^m]*m"
"\x1b\[([0-9]+(;[0-9]+)*)?[Hf]"
"\x1b\[[0-9]+[A-HJKST]"


Thanks for the help guys.


Thu Apr 04, 2013 2:33 pm
Profile ICQ WWW
Ambassador
User avatar

Joined: Wed Apr 20, 2011 1:19 pm
Posts: 2559
Location: Oklahoma City, OK 73170 US
Unread post Re: ANSI Control Characters
I copied mine from some sample source code, and it worked, so I didn't give much more thought to it. You are missing AutoWrap Mode (^[?7h), save/restore cursor position (^[s / ^[u), and hide cursor (^[l). TheDraw likes to put ^[?7h at the beginning of every file it creates.

Would the following catch everything in a single expression?

Code:
"\x1b\[([0-9]+(;[0-9]+)*)(?)[A-HJKSTsuflhm]"

or simpler
Code:
"\x1b\[([0-9]+(;[0-9]+)*)(?)[A-z]"

or simplest
Code:
"\x1b\[*[A-z]"

_________________
Regards,
Micro

Website: http://www.microblaster.net
TWGS2.20b/TW3.34: telnet://twgs.microblaster.net:2002

ICQ is Dead Jim! Join us on Discord:
https://discord.gg/zvEbArscMN


Last edited by Micro on Thu Apr 04, 2013 5:03 pm, edited 1 time in total.



Thu Apr 04, 2013 4:33 pm
Profile ICQ YIM WWW
Commander
User avatar

Joined: Mon Oct 29, 2001 3:00 am
Posts: 1096
Location: Tucson, AZ
Unread post Re: ANSI Control Characters
Code:
"\x1b\[*[A-z]"


You can't compress [A-Za-z] to [A-z]. There are some non-alphabetic characters in between.

Also, the "\[*" in your expression says match any number of '['. I assume you meant match any number of any character, which would be ".*". But that's still probably not what you want, because of the principle of maximal munch. The regex "\x1b\[.*[A-z]" would match the entirety of "\x1b\[0;31mHello, World!\x1b\[0m", not just the "\x1b\[0;31m".

The catch-all in the lexer spec I linked vaguely reads, "match an escape, followed by a '[', followed by any number of non-alphabetic characters, followed by any alphabetic character." This matches an infinite number of strings that aren't valid ANSI codes, but the important thing is that it matches all strings that are valid ANSI codes. (Or at least, all the common ones used in BBS games.)

_________________
Suddenly you're Busted!


Thu Apr 04, 2013 5:00 pm
Profile WWW
Ambassador
User avatar

Joined: Wed Apr 20, 2011 1:19 pm
Posts: 2559
Location: Oklahoma City, OK 73170 US
Unread post Re: ANSI Control Characters
It's still confusing to me :/

how about

Code:
"\x1b\[.*/^[A-Za-z]+$/"

_________________
Regards,
Micro

Website: http://www.microblaster.net
TWGS2.20b/TW3.34: telnet://twgs.microblaster.net:2002

ICQ is Dead Jim! Join us on Discord:
https://discord.gg/zvEbArscMN


Last edited by Micro on Thu Apr 04, 2013 5:09 pm, edited 2 times in total.



Thu Apr 04, 2013 5:04 pm
Profile ICQ YIM WWW
Commander
User avatar

Joined: Mon Oct 29, 2001 3:00 am
Posts: 1096
Location: Tucson, AZ
Unread post Re: ANSI Control Characters
Hmm... some of the notation doesn't survive being quoted.

Micro, what do the '/' and '@' mean in your regexes? I'm not familiar with that notation. And outside of a character range, I know '^' as the beginning-of-line operator.

_________________
Suddenly you're Busted!


Thu Apr 04, 2013 5:08 pm
Profile WWW
Ambassador
User avatar

Joined: Wed Apr 20, 2011 1:19 pm
Posts: 2559
Location: Oklahoma City, OK 73170 US
Unread post Re: ANSI Control Characters
I don't know exactly, I'm confsued :/
My original example was from a terminal emulation tutorial somewhere on the web and the / came from here:

http://stackoverflow.com/questions/6067 ... characters

"[^@-~]*" + "[@-~]" in my original example @-~ is a range that would include much more than the alpha characters needed. I don't understand what the ^ does.

_________________
Regards,
Micro

Website: http://www.microblaster.net
TWGS2.20b/TW3.34: telnet://twgs.microblaster.net:2002

ICQ is Dead Jim! Join us on Discord:
https://discord.gg/zvEbArscMN


Thu Apr 04, 2013 5:14 pm
Profile ICQ YIM WWW
Staff Sergeant

Joined: Tue Jul 08, 2008 2:51 pm
Posts: 12
Unread post Re: ANSI Control Characters
Thanks Micro, good stuff. I think the last two might cover too much though - could run the risk of matching characters you don't want. I guess I was just too lazy to combine my 3 expressions into one, but I like your first one - I'll use that with a slight modification. Here is what I'm gonna go with.

Code:
\x1b\[([0-9]+(;[0-9]+)*)?\??[A-HJKSTsuflhm]


Thu Apr 04, 2013 5:27 pm
Profile
Commander
User avatar

Joined: Mon Oct 29, 2001 3:00 am
Posts: 1096
Location: Tucson, AZ
Unread post Re: ANSI Control Characters
If '^' is the first character in a range or set, it means "not these characters". If it's any other position in a range or set, it's a literal '^'. And outside of a range or set, if it's not escaped as a literal it means "beginning of a line". Not all regex languages support beginning and end of line operators.

Part of the confusion is that there's not a universal regex language. I don't recognize the language they're using in that SO post. This site gives a pretty good overview; it was helpful to me when I was first learning about REs.

_________________
Suddenly you're Busted!


Thu Apr 04, 2013 5:29 pm
Profile WWW
Ambassador
User avatar

Joined: Wed Apr 20, 2011 1:19 pm
Posts: 2559
Location: Oklahoma City, OK 73170 US
Unread post Re: ANSI Control Characters
Mongoose, this is from you code, will it match any ANSI string?
Code:
"\[[^a-zA-Z]*[a-zA-Z]"

What if you limit the middle a little more?
Code:
"\[[0-9;?]*[a-zA-Z]"


What you said about the ^ makes sense. What I read elsewhere did not make sense:

according to:
http://www.regular-expressions.info/reference.html

^ (caret) Matches at the start of the string the regex pattern is applied to. Matches a position rather than a character. Most regex flavors have an option to make the caret match after line breaks (i.e. at the start of a line in a file) as well.

_________________
Regards,
Micro

Website: http://www.microblaster.net
TWGS2.20b/TW3.34: telnet://twgs.microblaster.net:2002

ICQ is Dead Jim! Join us on Discord:
https://discord.gg/zvEbArscMN


Last edited by Micro on Thu Apr 04, 2013 5:44 pm, edited 2 times in total.



Thu Apr 04, 2013 5:35 pm
Profile ICQ YIM WWW
Ambassador
User avatar

Joined: Wed Apr 20, 2011 1:19 pm
Posts: 2559
Location: Oklahoma City, OK 73170 US
Unread post Re: ANSI Control Characters
GsuP wrote:
Thanks Micro, good stuff. I think the last two might cover too much though - could run the risk of matching characters you don't want. I guess I was just too lazy to combine my 3 expressions into one, but I like your first one - I'll use that with a slight modification. Here is what I'm gonna go with.

Code:
\x1b\[([0-9]+(;[0-9]+)*)?\??[A-HJKSTsuflhm]


I don't know much about regex, but I just learned a lot from this thread. Mongoose seems to have a much better understanding.

what does ?\?? do?

_________________
Regards,
Micro

Website: http://www.microblaster.net
TWGS2.20b/TW3.34: telnet://twgs.microblaster.net:2002

ICQ is Dead Jim! Join us on Discord:
https://discord.gg/zvEbArscMN


Thu Apr 04, 2013 5:42 pm
Profile ICQ YIM WWW
Commander
User avatar

Joined: Mon Oct 29, 2001 3:00 am
Posts: 1096
Location: Tucson, AZ
Unread post Re: ANSI Control Characters
Micro wrote:
Mongoose, this is from you code, will it match any ANSI string?
Code:
"\[[^a-zA-Z]*[a-zA-Z]"

What if you limit the middle a little more?
Code:
"\[[0-9;?]*[a-zA-Z]"


I think that would work. Some languages might make you escape the '?' in the range... not sure.

Micro wrote:
what does ?\?? do?


'?' means zero or one of something. So "?\??" would be zero or one of whatever was right before it, followed by zero or one literal question mark.

_________________
Suddenly you're Busted!


Thu Apr 04, 2013 5:50 pm
Profile WWW
1st Sergeant

Joined: Wed Jan 23, 2002 3:00 am
Posts: 33
Unread post Re: ANSI Control Characters
I usually try to avoid using regular expressions because of the physical pain they cause me. Thus I don't know a lot about them.

I've learned stuff today AND my wife made me a sammich.

Must be a good day.


Thu Apr 04, 2013 5:59 pm
Profile ICQ WWW
Display posts from previous:  Sort by  
Reply to topic   [ 22 posts ]  Go to page 1, 2  Next

Who is online

Users browsing this forum: No registered users and 10 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by wSTSoftware.