Class BlockSort
- java.lang.Object
-
- org.apache.commons.compress.compressors.bzip2.BlockSort
-
class BlockSort extends java.lang.Object
Encapsulates the Burrows-Wheeler sorting algorithm needed byBZip2CompressorOutputStream
.This class is based on a Java port of Julian Seward's blocksort.c in his libbzip2
The Burrows-Wheeler transform is a reversible transform of the original data that is supposed to group similar bytes close to each other. The idea is to sort all permutations of the input and only keep the last byte of each permutation. E.g. for "Commons Compress" you'd get:
CompressCommons Commons Compress CompressCommons essCommons Compr mmons CompressCo mons CompressCom mpressCommons Co ns CompressCommo ommons CompressC ompressCommons C ons CompressComm pressCommons Com ressCommons Comp s CompressCommon sCommons Compres ssCommons Compre
Which results in a new text "ss romooCCmmpnse", in adition the index of the first line that contained the original text is kept - in this case it is 1. The idea is that in a long English text all permutations that start with "he" are likely suffixes of a "the" and thus they end in "t" leading to a larger block of "t"s that can better be compressed by the subsequent Move-to-Front, run-length und Huffman encoding steps.
For more information see for example:
-
-
Field Summary
Fields Modifier and Type Field Description private static int
CLEARMASK
private static int
DEPTH_THRESH
private int[]
eclass
private static int
FALLBACK_QSORT_SMALL_THRESH
private static int
FALLBACK_QSORT_STACK_SIZE
private boolean
firstAttempt
private int[]
ftab
private static int[]
INCS
private boolean[]
mainSort_bigDone
private int[]
mainSort_copy
private int[]
mainSort_runningOrder
private static int
QSORT_STACK_SIZE
private char[]
quadrant
Array instance identical to Data's sfmap, both are used only temporarily and indepently, so we do not need to allocate additional memory.private static int
SETMASK
private static int
SMALL_THRESH
private int[]
stack_dd
private int[]
stack_hh
private int[]
stack_ll
private static int
STACK_SIZE
private static int
WORK_FACTOR
private int
workDone
private int
workLimit
-
Constructor Summary
Constructors Constructor Description BlockSort(BZip2CompressorOutputStream.Data data)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description (package private) void
blockSort(BZip2CompressorOutputStream.Data data, int last)
private void
fallbackQSort3(int[] fmap, int[] eclass, int loSt, int hiSt)
private void
fallbackSimpleSort(int[] fmap, int[] eclass, int lo, int hi)
(package private) void
fallbackSort(int[] fmap, byte[] block, int nblock)
(package private) void
fallbackSort(BZip2CompressorOutputStream.Data data, int last)
Adapt fallbackSort to the expected interface of the rest of the code, in particular deal with the fact that block starts at offset 1 (in libbzip2 1.0.6 it starts at 0).private int
fmin(int a, int b)
private int[]
fpop(int sp)
private void
fpush(int sp, int lz, int hz)
private void
fswap(int[] fmap, int zz1, int zz2)
swaps two values in fmapprivate void
fvswap(int[] fmap, int yyp1, int yyp2, int yyn)
swaps two intervals starting at yyp1 and yyp2 of length yyn inside fmap.private int[]
getEclass()
private void
mainQSort3(BZip2CompressorOutputStream.Data dataShadow, int loSt, int hiSt, int dSt, int last)
Method "mainQSort3", file "blocksort.c", BZip2 1.0.2private boolean
mainSimpleSort(BZip2CompressorOutputStream.Data dataShadow, int lo, int hi, int d, int lastShadow)
This is the most hammered method of this class.(package private) void
mainSort(BZip2CompressorOutputStream.Data dataShadow, int lastShadow)
private static byte
med3(byte a, byte b, byte c)
private static void
vswap(int[] fmap, int p1, int p2, int n)
-
-
-
Field Detail
-
QSORT_STACK_SIZE
private static final int QSORT_STACK_SIZE
- See Also:
- Constant Field Values
-
FALLBACK_QSORT_STACK_SIZE
private static final int FALLBACK_QSORT_STACK_SIZE
- See Also:
- Constant Field Values
-
STACK_SIZE
private static final int STACK_SIZE
- See Also:
- Constant Field Values
-
workDone
private int workDone
-
workLimit
private int workLimit
-
firstAttempt
private boolean firstAttempt
-
stack_ll
private final int[] stack_ll
-
stack_hh
private final int[] stack_hh
-
stack_dd
private final int[] stack_dd
-
mainSort_runningOrder
private final int[] mainSort_runningOrder
-
mainSort_copy
private final int[] mainSort_copy
-
mainSort_bigDone
private final boolean[] mainSort_bigDone
-
ftab
private final int[] ftab
-
quadrant
private final char[] quadrant
Array instance identical to Data's sfmap, both are used only temporarily and indepently, so we do not need to allocate additional memory.
-
FALLBACK_QSORT_SMALL_THRESH
private static final int FALLBACK_QSORT_SMALL_THRESH
- See Also:
- Constant Field Values
-
eclass
private int[] eclass
-
INCS
private static final int[] INCS
-
SMALL_THRESH
private static final int SMALL_THRESH
- See Also:
- Constant Field Values
-
DEPTH_THRESH
private static final int DEPTH_THRESH
- See Also:
- Constant Field Values
-
WORK_FACTOR
private static final int WORK_FACTOR
- See Also:
- Constant Field Values
-
SETMASK
private static final int SETMASK
- See Also:
- Constant Field Values
-
CLEARMASK
private static final int CLEARMASK
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
BlockSort
BlockSort(BZip2CompressorOutputStream.Data data)
-
-
Method Detail
-
blockSort
void blockSort(BZip2CompressorOutputStream.Data data, int last)
-
fallbackSort
final void fallbackSort(BZip2CompressorOutputStream.Data data, int last)
Adapt fallbackSort to the expected interface of the rest of the code, in particular deal with the fact that block starts at offset 1 (in libbzip2 1.0.6 it starts at 0).
-
fallbackSimpleSort
private void fallbackSimpleSort(int[] fmap, int[] eclass, int lo, int hi)
- Parameters:
fmap
- points to the index of the starting point of a permutation inside the block of data in the current partially sorted ordereclass
- points from the index of a character inside the block to the first index in fmap that contains the bucket of its suffix that is sorted in this step.lo
- lower boundary of the fmap-interval to be sortedhi
- upper boundary of the fmap-interval to be sorted
-
fswap
private void fswap(int[] fmap, int zz1, int zz2)
swaps two values in fmap
-
fvswap
private void fvswap(int[] fmap, int yyp1, int yyp2, int yyn)
swaps two intervals starting at yyp1 and yyp2 of length yyn inside fmap.
-
fmin
private int fmin(int a, int b)
-
fpush
private void fpush(int sp, int lz, int hz)
-
fpop
private int[] fpop(int sp)
-
fallbackQSort3
private void fallbackQSort3(int[] fmap, int[] eclass, int loSt, int hiSt)
- Parameters:
fmap
- points to the index of the starting point of a permutation inside the block of data in the current partially sorted ordereclass
- points from the index of a character inside the block to the first index in fmap that contains the bucket of its suffix that is sorted in this step.loSt
- lower boundary of the fmap-interval to be sortedhiSt
- upper boundary of the fmap-interval to be sorted
-
getEclass
private int[] getEclass()
-
fallbackSort
final void fallbackSort(int[] fmap, byte[] block, int nblock)
- Parameters:
fmap
- points to the index of the starting point of a permutation inside the block of data in the current partially sorted orderblock
- the original datanblock
- size of the blockoff
- offset of first byte to sort in block
-
mainSimpleSort
private boolean mainSimpleSort(BZip2CompressorOutputStream.Data dataShadow, int lo, int hi, int d, int lastShadow)
This is the most hammered method of this class.This is the version using unrolled loops. Normally I never use such ones in Java code. The unrolling has shown a noticable performance improvement on JRE 1.4.2 (Linux i586 / HotSpot Client). Of course it depends on the JIT compiler of the vm.
-
vswap
private static void vswap(int[] fmap, int p1, int p2, int n)
-
med3
private static byte med3(byte a, byte b, byte c)
-
mainQSort3
private void mainQSort3(BZip2CompressorOutputStream.Data dataShadow, int loSt, int hiSt, int dSt, int last)
Method "mainQSort3", file "blocksort.c", BZip2 1.0.2
-
mainSort
final void mainSort(BZip2CompressorOutputStream.Data dataShadow, int lastShadow)
-
-