Java PatternSyntaxException: Unmatched closing '(' -
I need to remove all URLs found in Twitter messages. I have around 200,000 messages whose speed is so important! To do this, I am using Java as a programming language, here is an example of my code:
public string display strip () {string tweet = this.getRawTweet ( ); String urlPattern = "((| | | | http |): // (bit \\. Ly | t \\. Co | lnkd \\. | | | |;;; T] \\ S *) \\ b"; Pattern P = Pattern.compile (urlPattern, Pattern.CASE_INSENSITIVE); Micker M = P. MATTURE (IT); Int i = 0; while (m.find ()) {tweet = tweet.replace all (m.group (i) , ""). Trim (); i ++;} tweets in return;} It works fine in the following cases:
http : //t.co/nhWp9hldEH - & gt; (empty string) http://t.co/nhWp9hldEH "- & gt;" http://t.co/nhWp9hldEH)aaa "- & gt; aaa" aaa (Http://t.co/nhWp9hldEH "-> aaa (" aaa (http://t.co/nhWp9hldEH) "-> gt; aaa ()" However, when I get a case for this:
h Ttp: //t.co/nhWp9hldEH) aaa " I get an error
java.util.regex.PatternSyntaxException: unmatched completion ') Index number 21
java.util.regex.Pattern.error (at Pattern.java1924) java.util at .regex. Pattern.compile (Pattern.java:1669) at java.util.regex.Pattern & Lt; Init & gt; (Pattern.java:1337) at java.util.regex.Pattern.compile (Pattern.java:1022). Init & gt; (Java.lang.String.replaceAll (String.java.210) com.anturo.preprocess.url.UrlStripper.performStrip (UrlStripper.java:47) is at com.anturo.preprocess.testing.ReadIn & ReadIn on lt. Java: 35) on com.anturo.preprocess.testing.Main.main (Main.java:6) I already saw this error in many similar questions, Still no one has worked till now ... no one can hope to help me here.
The problem is that you may have a regex special character in a URL, as you can see Are there.
Concise solution: Use your code then:
tweet = tweet. Substitute all (Pattern Properties (m.group (i)), ""). Trim (); Note: Only available from JDK 1.5, but you use this or better, right?
Another solution is to use only .replace () : tweet = tweet. Location (m.group (i), ""). Trim (); Does the address in relation to its name .replaceAll () , .replace () does Unlike all events in place; It's just that it does not take a regex as a replacement string. See also .replaceFirst () . Last but not least, you feel that .group () is being misused! You should have a loop: while (m.find ()) tweet = tweet. Location (m.group (), ""). Trim (); There is no need for the i variable; m.group (i) would be for a match , matching the group i in its regex.
Comments
Post a Comment