Regex in java output results not matching

I have this code

String pattern = "(\\.([a-z]*[A-Z]*|\\.)+)"; String input = "http://localhost-tes-folder.mySite.co.us:8080/"; Pattern p = Pattern.compile(pattern); java.util.regex.Matcher m = p.matcher(input); if (m.find()) { System.out.println(m.group(1)); }

If I use http://www.regexr.com/ with this pattern: (\.([a-z]*[A-Z]*|\.)+) and this input I have this results:

.mySite.co.us

and if run this code i have this results:

.mySite

Can anyone help me to have the same results like output from regexr.com ? Thanks.

-------------Problems Reply------------

You can do this:

String pattern = "(\\.[a-zA-Z]*|\\.)+";
String input = "http://localhost-tes-folder.mySite.co.us:8080/";
Pattern p = Pattern.compile(pattern);
java.util.regex.Matcher m = p.matcher(input);
if (m.find()) {
System.out.println(m.group(0));
}

Ideone

This pattern will match 3 groups:

  • .mysite
  • .co
  • .us

The if block will only print the first group. If you use a while block the output will be the following:

.mySite
.co
.us

For the correct output you're looking for, you have to change the code with a while block and concat the output strings like this:

String pattern = "(\\.([a-z]*[A-Z]*|\\.)+)";
String input = "http://localhost-tes-folder.mySite.co.us:8080/";
Pattern p = Pattern.compile(pattern);
java.util.regex.Matcher m = p.matcher(input);
String output = "";
while (m.find()) {
output += m.group(1);
}

System.out.println(output);

This will print:

.mySite.co.us

Otherwise you can change the pattern in order to recover all the chars after the dot and before the colon.

You can tweak a bit your regex to optimize it and make it faster.

Non-Java, it is (\.([a-z]*[A-Z]*|\.)+).

  • Your first group takes the whole of the line, so you don't need it → \.([a-z]*[A-Z]*|\.)+ (Note the other group is not needed in Java, so you could make it non-capturing to make processing faster)
  • [a-z]*[A-Z]* is repeated so order is not a problem and it can be reduced to [a-zA-Z]* → \.([a-zA-Z]*|\.)+
  • The new first group allows at least one character which is a dot or a letter → \.([a-zA-Z.])+ This change has impact: the | allowed for breaking the matching at each dot. This version does not allow it, but it matches your actual need. This removes the need for a while and String concatenation.
  • Cleaning useless group again → \.[a-zA-Z.]+

So, adapting your code to use the new optimized version of the regex:

String pattern = "\\.[a-zA-Z.]+";
String input = "http://localhost-tes-folder.mySite.co.us:8080/";
Pattern p = Pattern.compile(pattern);
java.util.regex.Matcher m = p.matcher(input);
if (m.find()) {
System.out.println(m.group(0));
}

This prints:

.mySite.co.us

Be aware Regex are time-consuming and should be optimized when they can (e.g. no useless capturing groups

Category:java Views:5 Time:2018-12-31
Tags: regex java

Related post

Copyright (C) dskims.com, All Rights Reserved.

processed in 0.178 (s). 11 q(s)