Java PatternSyntaxException: Unmatched closing '(' -


I need to remove all URLs found in Twitter messages. I have around 200,000 messages whose speed is so important! To do this, I am using Java as a programming language, here is an example of my code:

  public string display strip () {string tweet = this.getRawTweet ( ); String urlPattern = "((| | | | http |): // (bit \\. Ly | t \\. Co | lnkd \\. | | | |;;; T] \\ S *) \\ b"; Pattern P = Pattern.compile (urlPattern, Pattern.CASE_INSENSITIVE); Micker M = P. MATTURE (IT); Int i = 0; while (m.find ()) {tweet = tweet.replace all (m.group (i) , ""). Trim (); i ++;} tweets in return;}   

It works fine in the following cases:

  http : //t.co/nhWp9hldEH - & gt; (empty string) http://t.co/nhWp9hldEH "- & gt;" http://t.co/nhWp9hldEH)aaa "- & gt; aaa" aaa (Http://t.co/nhWp9hldEH "-> aaa (" aaa (http://t.co/nhWp9hldEH) "-> gt; aaa ()"   

However, when I get a case for this:

  h Ttp: //t.co/nhWp9hldEH) aaa "  

I get an error

  java.util.regex.PatternSyntaxException: unmatched completion ') Index number 21   

  java.util.regex.Pattern.error (at Pattern.java1924) java.util at .regex. Pattern.compile (Pattern.java:1669) at java.util.regex.Pattern & Lt; Init & gt; (Pattern.java:1337) at java.util.regex.Pattern.compile (Pattern.java:1022). Init & gt; (Java.lang.String.replaceAll (String.java.210) com.anturo.preprocess.url.UrlStripper.performStrip (UrlStripper.java:47) is at com.anturo.preprocess.testing.ReadIn & ReadIn on lt. Java: 35) on com.anturo.preprocess.testing.Main.main (Main.java:6)   

I already saw this error in many similar questions, Still no one has worked till now ... no one can hope to help me here.

The problem is that you may have a regex special character in a URL, as you can see Are there.

Concise solution: Use your code then:

  tweet = tweet. Substitute all (Pattern Properties (m.group (i)), ""). Trim ();   

Note: Only available from JDK 1.5, but you use this or better, right?

Another solution is to use only .replace () :

  tweet = tweet. Location (m.group (i), ""). Trim ();   

Does the address in relation to its name .replaceAll () , .replace () does Unlike all events in place; It's just that it does not take a regex as a replacement string. See also .replaceFirst () .

Last but not least, you feel that .group () is being misused! You should have a loop:

  while (m.find ()) tweet = tweet. Location (m.group (), ""). Trim ();   

There is no need for the i variable; m.group (i) would be for a match , matching the group i in its regex.

Comments

Popular posts from this blog

c - Mpirun hangs when mpi send and recieve is put in a loop -

python - Apply coupon to a customer's subscription based on non-stripe related actions on the site -

java - Unable to get JDBC connection in Spring application to MySQL -