org.apache.hadoop.mapred.join
Class CompositeInputFormat<K extends WritableComparable>

java.lang.Object
  extended by org.apache.hadoop.mapred.join.CompositeInputFormat<K>
All Implemented Interfaces:
InputFormat<K,TupleWritable>, ComposableInputFormat<K,TupleWritable>

public class CompositeInputFormat<K extends WritableComparable>
extends Object
implements ComposableInputFormat<K,TupleWritable>

An InputFormat capable of performing joins over a set of data sources sorted and partitioned the same way.

See Also:
A user may define new join types by setting the property mapred.join.define.<ident> to a classname. In the expression mapred.join.expr, the identifier will be assumed to be a ComposableRecordReader. mapred.join.keycomparator can be a classname used to compare keys in the join., JoinRecordReader, MultiFilterRecordReader

Constructor Summary
CompositeInputFormat()
           
 
Method Summary
protected  void addDefaults()
          Adds the default set of identifiers to the parser.
static String compose(Class<? extends InputFormat> inf, String path)
          Convenience method for constructing composite formats.
static String compose(String op, Class<? extends InputFormat> inf, Path... path)
          Convenience method for constructing composite formats.
static String compose(String op, Class<? extends InputFormat> inf, String... path)
          Convenience method for constructing composite formats.
 ComposableRecordReader<K,TupleWritable> getRecordReader(InputSplit split, JobConf job, Reporter reporter)
          Construct a CompositeRecordReader for the children of this InputFormat as defined in the init expression.
 InputSplit[] getSplits(JobConf job, int numSplits)
          Build a CompositeInputSplit from the child InputFormats by assigning the ith split from each child to the ith composite split.
 void setFormat(JobConf job)
          Interpret a given string as a composite expression.
 void validateInput(JobConf job)
          Verify that this composite has children and that all its children can validate their input.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CompositeInputFormat

public CompositeInputFormat()
Method Detail

setFormat

public void setFormat(JobConf job)
               throws IOException
Interpret a given string as a composite expression. func ::= <ident>([<func>,]*<func>) func ::= tbl(<class>,"<path>") class ::= @see java.lang.Class#forName(java.lang.String) path ::= @see org.apache.hadoop.fs.Path#Path(java.lang.String) Reads expression from the mapred.join.expr property and user-supplied join types from mapred.join.define.<ident> types. Paths supplied to tbl are given as input paths to the InputFormat class listed.

Throws:
IOException
See Also:
compose(java.lang.String, java.lang.Class, java.lang.String...)

addDefaults

protected void addDefaults()
Adds the default set of identifiers to the parser.


validateInput

public void validateInput(JobConf job)
                   throws IOException
Verify that this composite has children and that all its children can validate their input.

Specified by:
validateInput in interface InputFormat<K extends WritableComparable,TupleWritable>
Parameters:
job - job configuration.
Throws:
InvalidInputException - if the job does not have valid input
IOException

getSplits

public InputSplit[] getSplits(JobConf job,
                              int numSplits)
                       throws IOException
Build a CompositeInputSplit from the child InputFormats by assigning the ith split from each child to the ith composite split.

Specified by:
getSplits in interface InputFormat<K extends WritableComparable,TupleWritable>
Parameters:
job - job configuration.
numSplits - the desired number of splits, a hint.
Returns:
an array of InputSplits for the job.
Throws:
IOException

getRecordReader

public ComposableRecordReader<K,TupleWritable> getRecordReader(InputSplit split,
                                                               JobConf job,
                                                               Reporter reporter)
                                                                                   throws IOException
Construct a CompositeRecordReader for the children of this InputFormat as defined in the init expression. The outermost join need only be composable, not necessarily a composite. Mandating TupleWritable isn't strictly correct.

Specified by:
getRecordReader in interface InputFormat<K extends WritableComparable,TupleWritable>
Specified by:
getRecordReader in interface ComposableInputFormat<K extends WritableComparable,TupleWritable>
Parameters:
split - the InputSplit
job - the job that this split belongs to
Returns:
a RecordReader
Throws:
IOException

compose

public static String compose(Class<? extends InputFormat> inf,
                             String path)
Convenience method for constructing composite formats. Given InputFormat class (inf), path (p) return: tbl(<inf>, <p>)


compose

public static String compose(String op,
                             Class<? extends InputFormat> inf,
                             String... path)
Convenience method for constructing composite formats. Given operation (op), Object class (inf), set of paths (p) return: <op>(tbl(<inf>,<p1>),tbl(<inf>,<p2>),...,tbl(<inf>,<pn>))


compose

public static String compose(String op,
                             Class<? extends InputFormat> inf,
                             Path... path)
Convenience method for constructing composite formats. Given operation (op), Object class (inf), set of paths (p) return: <op>(tbl(<inf>,<p1>),tbl(<inf>,<p2>),...,tbl(<inf>,<pn>))



Copyright © 2008 The Apache Software Foundation