Synopsis:
  java zimmermann_quest_reimplementation [-N int | -L int | -I int | -T int | -D int | -c double | -C | -?]
The following options are available:
-N int  Number of used items (default 1000)
-L int  Number of potentially large itemsets (default 2000)
-I int  Average size of potentially large itemsets (default 4)
-T int  Average size of transactions (default 10)
-D int  Number of transactions (size of data set) (default 10000)
-c double       Correlation level (default 0.5), if set to 0, all potentially large itemsets are randomly drawn 
		from the pool of items
-C      No corruption allowed -- all source itemsets are embedded as is
-?      Help (exits program)

---

Algorithmic Remarks:

This program implements the process described:

Rakesh Agrawal, Ramakrishnan Srikant '94
"Fast Algorithms for Mining Association Rules"

which is however ambiguous at some points. First, and foremost, this implementation *does not* allow duplicates of 
items in potentially large itemsets or transactions, which therefore also do not count against the size limit.

Furthermore:

-- "Items in the first itemset are chosen randomly." - Items are sampled uniformly

-- "To model the phenomenon that large itemsets often have common items, some fraction of items in subsequent 
itemsets are chosen from the previous itemset generated. We use an exponentially distributed random variable with 
mean equal to the correlation level to decide this fraction for each itemset." - Sampling an exponential distribution 
can return values larger than 1, those are rejected. Also, if a potentially large itemset has much greater size 
then its predecessor then a "fraction" of its items can be larger than said predecessor - this would lead to redundant
itemsets and we resample so that only a true subset of the predecessor itemset is included

-- The interplay of "If the large itemset on hand does not fit in the transaction, the itemset is put in the 
transaction anyway in half the cases, and the itemset is moved to the next transaction the rest of the cases." 
and "To model the phenomenon that all the items in a large itemset are not always bought together, we assign each 
itemset in T a corruption level c. When adding an itemset to a transaction, we keep dropping an item from the itemset 
as long as a uniformly distributed random number between 0 and 1 is less than c." is ambiguous. - This implementation 
checks the size of potentially large itemset *before* embedding in the transaction (and hence corruption, and inclusion 
check in the transaction) and makes the decision whether to defer the itemset to the next transaction.

-- "The corruption level for an itemset is fixed and is obtained from a normal distribution with mean 0.5 and variance 0.1." -
Sampling from this distribution can lead both to corruption levels > 1 and < 0, these are rejected and resampled.

Potentially as a result of these decisions, the generated data shows somewhat different characteristics w.r.t. item 
count distributions from the T10I4D100K and T40I10D100K data available at http://fimi.ua.ac.be/.

---

Output:

Generator output goes to the command line and consists of:

- Parameter settings, preceded by '#', each on a separate line
- Potentially large itemsets and corresponding weight, separated by ':', and corruption level, separated by ",", line preceded by '#', each on a separate line
- Transaction data, each on a separate line