Click or drag to resize

PFPGrowth Class

JobRunner for the Parallel FP-growth algorithm.
Inheritance Hierarchy
SystemObject
  Ookii.Jumbo.JetConfigurable
    Ookii.Jumbo.Jet.JobsBaseJobRunner
      Ookii.Jumbo.Jet.Jobs.BuilderJobBuilderJob
        Ookii.Jumbo.Jet.Samples.FPGrowthPFPGrowth

Namespace: Ookii.Jumbo.Jet.Samples.FPGrowth
Assembly: Ookii.Jumbo.Jet.Samples (in Ookii.Jumbo.Jet.Samples.dll) Version: 0.3.0+dc1307f20e065bb638e0b73a34cd216f57e486f1
Syntax
public class PFPGrowth : JobBuilderJob, IParser<PFPGrowth>, 
	IParserProvider<PFPGrowth>

The PFPGrowth type exposes the following members.

Constructors
 NameDescription
Public methodPFPGrowthObsolete.
Initializes a new instance of the PFPGrowth class
Top
Properties
 NameDescription
Public propertyAccumulatorTaskCount Gets or sets the number of feature count accumulator tasks.
Public propertyAggregateTaskCount Gets or sets the aggregate task count.
Public propertyBinaryOutput Gets or sets a value indicating whether the output format is binary.
Public propertyBlockSize Gets or sets the block size of the job's output files.
(Inherited from BaseJobRunner)
Public propertyConfigOnly Gets or sets a value indicating whether the job runner will only create and print the job configuration, instead of running the job.
(Inherited from JobBuilderJob)
Public propertyDfsConfiguration Gets or sets the configuration used to access the Distributed File System.
(Inherited from Configurable)
Protected propertyFileSystemClient Gets the DFS client.
(Inherited from BaseJobRunner)
Public propertyFPGrowthTaskCount Gets or sets the FP growth task count.
Public propertyGroups Gets or sets the number of groups.
Public propertyInputPath Gets or sets the input path.
Public propertyIsInteractive Gets or sets a value that indicates whether the job runner should wait for user input before starting the job and before exitting.
(Inherited from BaseJobRunner)
Protected propertyJetClient Gets the jet client.
(Inherited from BaseJobRunner)
Public propertyJetConfiguration Gets or sets the configuration used to access the Jet servers.
(Inherited from Configurable)
Public propertyJobOrStageProperties Gets or sets the property values that will override predefined values in the job configuration.
(Inherited from BaseJobRunner)
Public propertyJobOrStageSettings Gets or sets additional job or stage settings that will be defined in the job configuration.
(Inherited from BaseJobRunner)
Public propertyMinSupport Gets or sets the min support.
Public propertyOutputPath Gets or sets the output path.
Public propertyOverwriteOutput Gets or sets a value that indicates whether the output directory should be deleted, if it exists, before the job is executed.
(Inherited from BaseJobRunner)
Public propertyPartitionsPerTask Gets or sets a value indicating the number of partitions per task for the MineTransactions stage.
Public propertyPatternCount Gets or sets the pattern count.
Public propertyReplicationFactor Gets or sets the replication factor of the job's output files.
(Inherited from BaseJobRunner)
Public propertyTaskContext Gets or sets the configuration for the task attempt.
(Inherited from Configurable)
Top
Methods
 NameDescription
Public methodStatic memberAccumulateFeatureCounts Accumulates the feature counts.
Public methodStatic memberAggregatePatterns Aggregates the patterns.
Protected methodApplyJobPropertiesAndSettings Adds the values of properties marked with the JobSettingAttribute to the JobSettings dictionary, applies properties set by the JobOrStageProperties property, and adds settings defined by the JobOrStageSettings property, and .
(Inherited from BaseJobRunner)
Protected methodBuildJob Constructs the job configuration using the specified job builder.
(Overrides JobBuilderJobBuildJob(JobBuilder))
Protected methodCheckAndCreateOutputPath If OverwriteOutput is , deletes the output path and then re-creates it; otherwise, checks if the output path exists and creates it if it doesn't exist and fails if it does.
(Inherited from BaseJobRunner)
Public methodStatic memberCountFeatures Counts the features.
Public methodStatic memberCreateParser Creates a CommandLineParserT instance using the specified options.
Public methodEqualsDetermines whether the specified object is equal to the current object.
(Inherited from Object)
Protected methodFinalizeAllows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection.
(Inherited from Object)
Public methodFinishJob Called after the job finishes.
(Inherited from BaseJobRunner)
Public methodStatic memberGenerateGroupTransactions Generates the group transactions.
Public methodGetHashCodeServes as the default hash function.
(Inherited from Object)
Protected methodGetInputFileSystemEntry Gets a JumboFileSystemEntry instance for the specified path, or throws an exception if the input doesn't exist.
(Inherited from BaseJobRunner)
Public methodGetTypeGets the Type of the current instance.
(Inherited from Object)
Protected methodMemberwiseCloneCreates a shallow copy of the current Object.
(Inherited from Object)
Public methodStatic memberMineTransactions Mines the transactions.
Public methodNotifyConfigurationChanged Indicates the configuration has been changed. ApplyConfiguration(Object, DfsConfiguration, JetConfiguration, TaskContext) calls this method after setting the configuration.
(Inherited from BaseJobRunner)
Protected methodOnJobCreated Called when the job has been created on the job server, but before running it.
(Inherited from JobBuilderJob)
Public methodStatic memberParse(ParseOptions) Parses the arguments returned by the EnvironmentGetCommandLineArgs method, handling errors and showing usage help as required.
Public methodStatic memberParse(ReadOnlyMemoryString, ParseOptions) Parses the specified command line arguments, handling errors and showing usage help as required.
Public methodStatic memberParse(String, ParseOptions) Parses the specified command line arguments, handling errors and showing usage help as required.
Protected methodPromptIfInteractive Prompts the user to start or exit, if IsInteractive is .
(Inherited from BaseJobRunner)
Public methodRunJob Starts the job.
(Inherited from JobBuilderJob)
Public methodToStringReturns a string that represents the current object.
(Inherited from Object)
Protected methodWriteOutput Writes the result of the operation to the DFS using this instance's settings for BlockSize and ReplicationFactor.
(Inherited from JobBuilderJob)
Top
Remarks

This job is an implementation of the Parallel FP Growth algorithm described in the paper "PFP: Parallel FP-Growth for Query Recommendation" by Li et al., 2008.

This algorithm calculates the top-K frequent patterns for each item in the database, only regarding patterns that have the specified minimum support.

The algorithm has three steps: first, it counts how often each item occurs in the input database, filters out the infrequent features, and divides the resulting feature list into groups. Next, it generates group-dependent transactions from the input and runs the FP-Growth algorithm on each group. Finally, the results from each group are aggregated to form the final result.

The number of groups should be carefully selected so that the number of items per group it not too large. Ideally, each group should have 5-10 items at most for a large database.

The input for this job should be a plain text file (or files) where each line represents a transaction containing a space-delimited list of transactions.

This example demonstrates a more complicated Jumbo job, with several stages including more than one stage with file input. It uses scheduling dependencies, group aggregation, partition-based grouping using multiple partitions per task, dynamic partition assignment, and custom progress providers.

See Also