More RegEx
I'm kinda stuck on a RegEx at the moment so, I think I will give you a tutorial on it. Generally when I ramble on in this thread it actually turns into something on my end. If you don't want to learn RegEx, don't read any further.
I'd like to start this off with the fact that I technically don't know shit about RegEx. I approach RegEx the same way I approach .bat files. I learn as much as I need to know to get my current goal accomplished and no more.
RegEx is short for Regular Expression. I don't know who came up with that name but, it is incredibly deceiving. It's sounds easy, it's not. It's some confusing, annoying crap that drives me gabonkers.
RegEx is a very symbolic language. The same symbol can mean a lot of things depending on how it's used. Here is a for instance.
Now you may be looking at that thinking "How is that a versus? They are totally different." Well, that's because I haven't explained it yet. Hold your horses
. In the first example the parenthesis represent a group. The regex engine will look for the first instance of :{. The second example the parenthesis are more like an if statement. It's technically called a lookahead. What it does is match as many letter characters as are in a row until it reaches some other character, then it checks to see if that other character is a colon, if it is, it returns the word (without colon). If not, it's null.
Another example of characters having multiple uses is question mark. The question mark in the above example is checking if something exists on a true/false level. The question mark in the bottom example is signifying that everything in parenthesis before it might not exist... and that's ok, look for the next stuff.
the above would find word: or word[0]:
That's a little bit of the complex stuff. Let's look at some simpler stuff. All along I have been throwing a chunk of the alphabet between some brackets and I never explained it.
the above regEx will find one letter, case insensitive, between and including a to Z. By adding a plus sign after, it will go to the first letter and then get all the letters that come after it, until it hits something that isn't a letter.
There are more symbols we can add here but, I don't really understand them and every time I add them into my regEx it breaks everything. For shits and giggles here's something you can do, that I don't get.
supposedly that is to get a whole word. When I add those symbols I get null and when I get rid of them I get a whole word...ooooooh wait, hmm I think I just realized something but the shit doesn't ever work so it doesn't really matter.
lookahead not true - only get the word if it isn't followed by a colon
lookbehind - can't get this to work for me. This was my AHA! earlier in this post. Maybe if I use the whole word symbols this will work because, it will look behind the whole word, as opposed to the last letter, which is what I fear is really happening.
Only get words that do not have a comma before them
only get words that do have a comma before them
There are shorthand expressions as well
\d = 1 digit
\d+ = as many digits as there are in a row
\w = supposed to be word but it never works for me
\w+ = maybe this means get as many words as there are in a row, I'm just making that up though so, maybe not.
\s = space
\r = return
\t = tab
\n = new line
\b = word boundary. used like this
word boundary has to match word characters that come before or after non word characters. Let's take the next sentence and apply it to it.
"This is an island" - word boundary will only match the word "is", because the "is" in "this" and "island" have word characters on both sides of one of the boundaries.
[space]\bis\b[space] = yes cause space is not a word character
Th\bis\b[space] = the last part is good but the first part is bad so this will not match
[space]\bis\bland - opposite of above
note: the above 3 lines are not a regEx. It is an example of how the regEx will fall in the string and why it will or will not work.
Anyway, I told you I don't really know regEx and now you know as much as I do. Oh wait, no you don't.
We can get every instance of a pattern with the global flag
that pattern will return an array of every time :{ appears in the string. Whereas having an array like this - array(":{",":{",":{") may seem completely useless. That count is actually the backbone of my system for finding the proper close delimiter. Also, having a regex spit the count back to you (via array length) is way faster than counting them in a loop.
I hope you enjoyed my pinche' tutorial. Here is the site that I use to learn regEx. It is very thorough. [edit: I just bought his book. 4.99 donation for what is "arguably the most comprehensive regEx information on earth". I've been using his sites for little regEx's for years. He deserves $5 even if I never download his book. I also get to use his site without ads. I'll be honest, I never noticed his ads til he told me I could read the site without them. I think my brain works better than adblocker.]
I'm kinda stuck on a RegEx at the moment so, I think I will give you a tutorial on it. Generally when I ramble on in this thread it actually turns into something on my end. If you don't want to learn RegEx, don't read any further.
I'd like to start this off with the fact that I technically don't know shit about RegEx. I approach RegEx the same way I approach .bat files. I learn as much as I need to know to get my current goal accomplished and no more.
RegEx is short for Regular Expression. I don't know who came up with that name but, it is incredibly deceiving. It's sounds easy, it's not. It's some confusing, annoying crap that drives me gabonkers.
RegEx is a very symbolic language. The same symbol can mean a lot of things depending on how it's used. Here is a for instance.
Code:
/(:\{)/ vs /[a-zA-Z]+(?=:)/

Another example of characters having multiple uses is question mark. The question mark in the above example is checking if something exists on a true/false level. The question mark in the bottom example is signifying that everything in parenthesis before it might not exist... and that's ok, look for the next stuff.
Code:
/[a-zA-Z]+(\[\d\])?:/
That's a little bit of the complex stuff. Let's look at some simpler stuff. All along I have been throwing a chunk of the alphabet between some brackets and I never explained it.
Code:
/[a-zA-Z]/
Code:
/[a-zA-Z]+/
Code:
/^[a-zA-Z]+$/
lookahead not true - only get the word if it isn't followed by a colon
Code:
/[a-zA-Z]+(?!:)/
Only get words that do not have a comma before them
Code:
/[a-zA-Z]+(?<!,)/
Code:
/[a-zA-Z]+(?<=,)/
\d = 1 digit
\d+ = as many digits as there are in a row
\w = supposed to be word but it never works for me
\w+ = maybe this means get as many words as there are in a row, I'm just making that up though so, maybe not.
\s = space
\r = return
\t = tab
\n = new line
\b = word boundary. used like this
Code:
/\bis\b/
"This is an island" - word boundary will only match the word "is", because the "is" in "this" and "island" have word characters on both sides of one of the boundaries.
[space]\bis\b[space] = yes cause space is not a word character
Th\bis\b[space] = the last part is good but the first part is bad so this will not match
[space]\bis\bland - opposite of above
note: the above 3 lines are not a regEx. It is an example of how the regEx will fall in the string and why it will or will not work.
Anyway, I told you I don't really know regEx and now you know as much as I do. Oh wait, no you don't.
We can get every instance of a pattern with the global flag
Code:
/(:\{)/g
I hope you enjoyed my pinche' tutorial. Here is the site that I use to learn regEx. It is very thorough. [edit: I just bought his book. 4.99 donation for what is "arguably the most comprehensive regEx information on earth". I've been using his sites for little regEx's for years. He deserves $5 even if I never download his book. I also get to use his site without ads. I'll be honest, I never noticed his ads til he told me I could read the site without them. I think my brain works better than adblocker.]
Comment