Class DirectoryScanner


  • public class DirectoryScanner
    extends java.lang.Object
    Class for scanning a directory for files/directories which match certain criteria.

    These criteria consist of selectors and patterns which have been specified. With the selectors you can select which files you want to have included. Files which are not selected are excluded. With patterns you can include or exclude files based on their filename.

    The idea is simple. A given directory is recursively scanned for all files and directories. Each file/directory is matched against a set of selectors, including special support for matching against filenames with include and and exclude patterns. Only files/directories which match at least one pattern of the include pattern list or other file selector, and don't match any pattern of the exclude pattern list or fail to match against a required selector will be placed in the list of files/directories found.

    When no list of include patterns is supplied, "**" will be used, which means that everything will be matched. When no list of exclude patterns is supplied, an empty list is used, such that nothing will be excluded. When no selectors are supplied, none are applied.

    The filename pattern matching is done as follows: The name to be matched is split up in path segments. A path segment is the name of a directory or file, which is bounded by File.separator ('/' under UNIX, '\' under Windows). For example, "abc/def/ghi/xyz.java" is split up in the segments "abc", "def","ghi" and "xyz.java". The same is done for the pattern against which should be matched.

    The segments of the name and the pattern are then matched against each other. When '**' is used for a path segment in the pattern, it matches zero or more path segments of the name.

    There is a special case regarding the use of File.separators at the beginning of the pattern and the string to match:
    When a pattern starts with a File.separator, the string to match must also start with a File.separator. When a pattern does not start with a File.separator, the string to match may not start with a File.separator. When one of these rules is not obeyed, the string will not match.

    When a name path segment is matched against a pattern path segment, the following special characters can be used:
    '*' matches zero or more characters
    '?' matches one character.

    Examples:

    "**\*.class" matches all .class files/dirs in a directory tree.

    "test\a??.java" matches all files/dirs which start with an 'a', then two more characters and then ".java", in a directory called test.

    "**" matches everything in a directory tree.

    "**\test\**\XYZ*" matches all files/dirs which start with "XYZ" and where there is a parent directory called test (e.g. "abc\test\def\ghi\XYZ123").

    Case sensitivity may be turned off if necessary. By default, it is turned on.

    Example of usage:

     String[] includes = { "**\\*.class" };
     String[] excludes = { "modules\\*\\**" };
     ds.setIncludes( includes );
     ds.setExcludes( excludes );
     ds.setBasedir( new File( "test" ) );
     ds.setCaseSensitive( true );
     ds.scan();
    
     System.out.println( "FILES:" );
     String[] files = ds.getIncludedFiles();
     for ( int i = 0; i < files.length; i++ )
     {
         System.out.println( files[i] );
     }
     

    This will scan a directory called test for .class files, but excludes all files in all proper subdirectories of a directory called "modules"

    This class must not be used from multiple Threads concurrently!

    • Field Summary

      Fields 
      Modifier and Type Field Description
      private java.io.File basedir
      The base directory to be scanned.
      static java.lang.String[] DEFAULTEXCLUDES
      Patterns which should be excluded by default.
      private java.util.List<java.lang.String> dirsExcluded
      The directories which matched at least one include and at least one exclude.
      private java.util.List<java.lang.String> dirsIncluded
      The directories which matched at least one include and no excludes and were selected.
      private java.util.List<java.lang.String> dirsNotIncluded
      The directories which were found and did not match any includes.
      private java.lang.String[] excludes
      The patterns for the files to be excluded.
      private MatchPatterns excludesPatterns  
      private java.util.List<java.lang.String> filesExcluded
      The files which matched at least one include and at least one exclude.
      private java.util.List<java.lang.String> filesIncluded
      The files which matched at least one include and no excludes and were selected.
      private java.util.List<java.lang.String> filesNotIncluded
      The files which did not match any includes or selectors.
      private boolean followSymlinks
      Whether or not symbolic links should be followed.
      private boolean haveSlowResults
      Whether or not our results were built by a slow scan.
      private java.lang.String[] includes
      The patterns for the files to be included.
      private MatchPatterns includesPatterns  
      private boolean isCaseSensitive
      Whether or not the file system should be treated as a case sensitive one.
      private ScanConductor.ScanAction scanAction
      The last ScanAction.
      private ScanConductor scanConductor
      A ScanConductor an control the scanning process.
    • Constructor Summary

      Constructors 
      Constructor Description
      DirectoryScanner()
      Sole constructor.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void addDefaultExcludes()
      Adds default exclusions to the current exclusions set.
      private static <T> java.util.Set<T> arrayAsHashSet​(T[] array)
      Take an array of type T and convert it into a HashSet of type T.
      (package private) boolean couldHoldIncluded​(java.lang.String name)
      Tests whether or not a name matches the start of at least one include pattern.
      static DirectoryScanResult diffFiles​(java.lang.String[] oldFiles, java.lang.String[] newFiles)  
      DirectoryScanResult diffIncludedFiles​(java.lang.String... oldFiles)
      Determine the file differences between the currently included files and a previously captured list of files.
      private java.lang.String[] doNotFollowSymbolicLinks​(java.io.File dir, java.lang.String vpath, java.lang.String[] newfiles)  
      java.io.File getBasedir()
      Returns the base directory to be scanned.
      java.lang.String[] getExcludedDirectories()
      Returns the names of the directories which matched at least one of the include patterns and at least one of the exclude patterns.
      java.lang.String[] getExcludedFiles()
      Returns the names of the files which matched at least one of the include patterns and at least one of the exclude patterns.
      java.lang.String[] getIncludedDirectories()
      Returns the names of the directories which matched at least one of the include patterns and none of the exclude patterns.
      java.lang.String[] getIncludedFiles()
      Returns the names of the files which matched at least one of the include patterns and none of the exclude patterns.
      java.lang.String[] getNotIncludedDirectories()
      Returns the names of the directories which matched none of the include patterns.
      java.lang.String[] getNotIncludedFiles()
      Returns the names of the files which matched none of the include patterns.
      (package private) boolean isExcluded​(java.lang.String name)
      Tests whether or not a name matches against at least one exclude pattern.
      (package private) boolean isIncluded​(java.lang.String name)
      Tests whether or not a name matches against at least one include pattern.
      (package private) boolean isSymbolicLink​(java.io.File parent, java.lang.String name)
      Checks whether a given file is a symbolic link.
      void scan()
      Scans the base directory for files which match at least one include pattern and don't match any exclude patterns.
      (package private) void scandir​(java.io.File dir, java.lang.String vpath, boolean fast)
      Scans the given directory for files and directories.
      void setBasedir​(java.io.File basedir)
      Sets the base directory to be scanned.
      void setBasedir​(java.lang.String basedir)
      Sets the base directory to be scanned.
      void setCaseSensitive​(boolean isCaseSensitiveParameter)
      Sets whether or not the file system should be regarded as case sensitive.
      void setExcludes​(java.lang.String... excludes)
      Sets the list of exclude patterns to use.
      void setFollowSymlinks​(boolean followSymlinks)
      Sets whether or not symbolic links should be followed.
      void setIncludes​(java.lang.String... includes)
      Sets the list of include patterns to use.
      void setScanConductor​(ScanConductor scanConductor)  
      private void setupDefaultFilters()  
      private void setupMatchPatterns()  
      (package private) void slowScan()
      Top level invocation for a slow scan.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • DEFAULTEXCLUDES

        public static final java.lang.String[] DEFAULTEXCLUDES
        Patterns which should be excluded by default.
        See Also:
        addDefaultExcludes()
      • basedir

        private java.io.File basedir
        The base directory to be scanned.
      • includes

        private java.lang.String[] includes
        The patterns for the files to be included.
      • excludes

        private java.lang.String[] excludes
        The patterns for the files to be excluded.
      • filesIncluded

        private java.util.List<java.lang.String> filesIncluded
        The files which matched at least one include and no excludes and were selected.
      • filesNotIncluded

        private java.util.List<java.lang.String> filesNotIncluded
        The files which did not match any includes or selectors.
      • filesExcluded

        private java.util.List<java.lang.String> filesExcluded
        The files which matched at least one include and at least one exclude.
      • dirsIncluded

        private java.util.List<java.lang.String> dirsIncluded
        The directories which matched at least one include and no excludes and were selected.
      • dirsNotIncluded

        private java.util.List<java.lang.String> dirsNotIncluded
        The directories which were found and did not match any includes.
      • dirsExcluded

        private java.util.List<java.lang.String> dirsExcluded
        The directories which matched at least one include and at least one exclude.
      • haveSlowResults

        private boolean haveSlowResults
        Whether or not our results were built by a slow scan.
      • isCaseSensitive

        private boolean isCaseSensitive
        Whether or not the file system should be treated as a case sensitive one.
      • followSymlinks

        private boolean followSymlinks
        Whether or not symbolic links should be followed.
      • scanAction

        private ScanConductor.ScanAction scanAction
        The last ScanAction. We need to store this in the instance as the scan() method doesn't return
    • Constructor Detail

      • DirectoryScanner

        public DirectoryScanner()
        Sole constructor.
    • Method Detail

      • setBasedir

        public void setBasedir​(java.lang.String basedir)
        Sets the base directory to be scanned. This is the directory which is scanned recursively. All '/' and '\' characters are replaced by File.separatorChar, so the separator used need not match File.separatorChar.
        Parameters:
        basedir - The base directory to scan. Must not be null.
      • setBasedir

        public void setBasedir​(@Nonnull
                               java.io.File basedir)
        Sets the base directory to be scanned. This is the directory which is scanned recursively.
        Parameters:
        basedir - The base directory for scanning. Should not be null.
      • getBasedir

        public java.io.File getBasedir()
        Returns the base directory to be scanned. This is the directory which is scanned recursively.
        Returns:
        the base directory to be scanned
      • setCaseSensitive

        public void setCaseSensitive​(boolean isCaseSensitiveParameter)
        Sets whether or not the file system should be regarded as case sensitive.
        Parameters:
        isCaseSensitiveParameter - whether or not the file system should be regarded as a case sensitive one
      • setFollowSymlinks

        public void setFollowSymlinks​(boolean followSymlinks)
        Sets whether or not symbolic links should be followed.
        Parameters:
        followSymlinks - whether or not symbolic links should be followed
      • setIncludes

        public void setIncludes​(java.lang.String... includes)
        Sets the list of include patterns to use. All '/' and '\' characters are replaced by File.separatorChar, so the separator used need not match File.separatorChar.

        When a pattern ends with a '/' or '\', "**" is appended.

        Parameters:
        includes - A list of include patterns. May be null, indicating that all files should be included. If a non-null list is given, all elements must be non-null.
      • setExcludes

        public void setExcludes​(java.lang.String... excludes)
        Sets the list of exclude patterns to use. All '/' and '\' characters are replaced by File.separatorChar, so the separator used need not match File.separatorChar.

        When a pattern ends with a '/' or '\', "**" is appended.

        Parameters:
        excludes - A list of exclude patterns. May be null, indicating that no files should be excluded. If a non-null list is given, all elements must be non-null.
      • scan

        public void scan()
                  throws java.lang.IllegalStateException
        Scans the base directory for files which match at least one include pattern and don't match any exclude patterns. If there are selectors then the files must pass muster there, as well.
        Throws:
        java.lang.IllegalStateException - if the base directory was set incorrectly (i.e. if it is null, doesn't exist, or isn't a directory).
      • diffIncludedFiles

        public DirectoryScanResult diffIncludedFiles​(java.lang.String... oldFiles)
        Determine the file differences between the currently included files and a previously captured list of files. This method will not look for a changed in content but sole in the list of files given.

        The method will compare the given array of file Strings with the result of the last directory scan. It will execute a scan() if no result of a previous scan could be found.

        The result of the diff can be queried by the methods DirectoryScanResult.getFilesAdded() and DirectoryScanResult.getFilesRemoved()

        Parameters:
        oldFiles - the list of previously captured files names.
        Returns:
        the result of the directory scan.
      • diffFiles

        public static DirectoryScanResult diffFiles​(@Nullable
                                                    java.lang.String[] oldFiles,
                                                    @Nullable
                                                    java.lang.String[] newFiles)
        Parameters:
        oldFiles - array of old files.
        newFiles - array of new files.
        Returns:
        calculated differerence.
      • arrayAsHashSet

        private static <T> java.util.Set<T> arrayAsHashSet​(@Nullable
                                                           T[] array)
        Take an array of type T and convert it into a HashSet of type T. If null or an empty array gets passed, an empty Set will be returned.
        Parameters:
        array - The array
        Returns:
        the filled HashSet of type T
      • slowScan

        void slowScan()
        Top level invocation for a slow scan. A slow scan builds up a full list of excluded/included files/directories, whereas a fast scan will only have full results for included files, as it ignores directories which can't possibly hold any included files/directories.

        Returns immediately if a slow scan has already been completed.

      • scandir

        void scandir​(@Nonnull
                     java.io.File dir,
                     @Nonnull
                     java.lang.String vpath,
                     boolean fast)
        Scans the given directory for files and directories. Found files and directories are placed in their respective collections, based on the matching of includes, excludes, and the selectors. When a directory is found, it is scanned recursively.
        Parameters:
        dir - The directory to scan. Must not be null.
        vpath - The path relative to the base directory (needed to prevent problems with an absolute path when using dir). Must not be null.
        fast - Whether or not this call is part of a fast scan.
        See Also:
        filesIncluded, filesNotIncluded, filesExcluded, dirsIncluded, dirsNotIncluded, dirsExcluded, slowScan()
      • doNotFollowSymbolicLinks

        private java.lang.String[] doNotFollowSymbolicLinks​(java.io.File dir,
                                                            java.lang.String vpath,
                                                            java.lang.String[] newfiles)
      • isIncluded

        boolean isIncluded​(java.lang.String name)
        Tests whether or not a name matches against at least one include pattern.
        Parameters:
        name - The name to match. Must not be null.
        Returns:
        true when the name matches against at least one include pattern, or false otherwise.
      • couldHoldIncluded

        boolean couldHoldIncluded​(@Nonnull
                                  java.lang.String name)
        Tests whether or not a name matches the start of at least one include pattern.
        Parameters:
        name - The name to match. Must not be null.
        Returns:
        true when the name matches against the start of at least one include pattern, or false otherwise.
      • isExcluded

        boolean isExcluded​(@Nonnull
                           java.lang.String name)
        Tests whether or not a name matches against at least one exclude pattern.
        Parameters:
        name - The name to match. Must not be null.
        Returns:
        true when the name matches against at least one exclude pattern, or false otherwise.
      • getIncludedFiles

        public java.lang.String[] getIncludedFiles()
        Returns the names of the files which matched at least one of the include patterns and none of the exclude patterns. The names are relative to the base directory.
        Returns:
        the names of the files which matched at least one of the include patterns and none of the exclude patterns. May also contain symbolic links to files.
      • getNotIncludedFiles

        public java.lang.String[] getNotIncludedFiles()
        Returns the names of the files which matched none of the include patterns. The names are relative to the base directory. This involves performing a slow scan if one has not already been completed.
        Returns:
        the names of the files which matched none of the include patterns.
        See Also:
        slowScan()
      • getExcludedFiles

        public java.lang.String[] getExcludedFiles()
        Returns the names of the files which matched at least one of the include patterns and at least one of the exclude patterns. The names are relative to the base directory. This involves performing a slow scan if one has not already been completed.
        Returns:
        the names of the files which matched at least one of the include patterns and at at least one of the exclude patterns.
        See Also:
        slowScan()
      • getIncludedDirectories

        public java.lang.String[] getIncludedDirectories()
        Returns the names of the directories which matched at least one of the include patterns and none of the exclude patterns. The names are relative to the base directory.
        Returns:
        the names of the directories which matched at least one of the include patterns and none of the exclude patterns. May also contain symbolic links to directories.
      • getNotIncludedDirectories

        public java.lang.String[] getNotIncludedDirectories()
        Returns the names of the directories which matched none of the include patterns. The names are relative to the base directory. This involves performing a slow scan if one has not already been completed.
        Returns:
        the names of the directories which matched none of the include patterns.
        See Also:
        slowScan()
      • getExcludedDirectories

        public java.lang.String[] getExcludedDirectories()
        Returns the names of the directories which matched at least one of the include patterns and at least one of the exclude patterns. The names are relative to the base directory. This involves performing a slow scan if one has not already been completed.
        Returns:
        the names of the directories which matched at least one of the include patterns and at least one of the exclude patterns.
        See Also:
        slowScan()
      • addDefaultExcludes

        public void addDefaultExcludes()
        Adds default exclusions to the current exclusions set.
      • isSymbolicLink

        boolean isSymbolicLink​(java.io.File parent,
                               java.lang.String name)
                        throws java.io.IOException
        Checks whether a given file is a symbolic link.

        It doesn't really test for symbolic links but whether the canonical and absolute paths of the file are identical - this may lead to false positives on some platforms.

        Parameters:
        parent - the parent directory of the file to test
        name - the name of the file to test.
        Throws:
        java.io.IOException
      • setupDefaultFilters

        private void setupDefaultFilters()
      • setupMatchPatterns

        private void setupMatchPatterns()