- 定义带类型参数的类
在定义带类型参数的类时,在紧跟类命之后的<>内,指定一个或多个类型参数的名字,同时也可以对类型参数的取值范围进行限定,多个类型参数之间用,号分隔。定义完类型参数后,可以在定义位置之后的类的几乎任意地方(静态块,静态属性,静态方法除外)使用类型参数,就像使用普通的类型一样。注意,父类定义的类型参数不能被子类继承。
public class TestClassDefine<T, S extends T> {
....
}
- 定义带类型参数方法
在定义带类型参数的方法时,在紧跟可见范围修饰(例如public)之后的<>内,指定一个或多个类型参数的名字,同时也可以对类型参数的取值范围进行限定,多个类型参数之间用,号分隔。定义完类型参数后,可以在定义位置之后的方法的任意地方使用类型参数,就像使用普通的类型一样。
例如:
public <T, S extends T> T testGenericMethodDefine(T t, S s){
...
}
注意:定义带类型参数的方法,主要目的是为了表达多个参数以及返回值之间的关系。例如本例子中T和S的继
承关系, 返回值的类型和第一个类型参数的值相同。如果仅仅是想实现多态,请优先使用通配符解决。通配符的内容见下面章节。
public <T> void testGenericMethodDefine2(List<T> s){
...
}
应改为
public void testGenericMethodDefine2(List<?> s){
...
}
- 对带类型参数的类进行类型参数赋值
对带类型参数的类进行类型参数赋值有两种方式
第一声明类变量或者实例化时。例如
List<String> list;
list = new ArrayList<String>;
第二继承类或者实现接口时。例如
public class MyList<E> extends ArrayList<E> implements List<E> {...}
- 对带类型参数方法进行赋值
当调用范型方法时,编译器自动对类型参数进行赋值,当不能成功赋值时报编译错误。例如
public <T> T testGenericMethodDefine3(T t, List<T> list){
...
}
public <T> T testGenericMethodDefine4(List<T> list1, List<T> list2){
...
}
Number n = null;
Integer i = null;
Object o = null;
testGenericMethodDefine(n, i);//此时T为Number, S为Integer
testGenericMethodDefine(o, i);//T为Object, S为Integer
6)Class的范型处理
Java 5之后,Class变成范型化了。JDK1.5中一个变化是类 java.lang.Class是泛型化的。这是把泛型扩展到容器类之外的一个很有意思的例子。现在,Class有一个类型参数T, 你很可能会问,T 代表什么?它代表Class对象代表的类型。比如说,String.class类型代表 Class<String>,Serializable.class代表 Class<Serializable>。这可以被用来提高你的反射代码的类型安全。
特别的,因为 Class的 newInstance() 方法现在返回一个T, 你可以在使用反射创建对象时得到更精确的类型。比如说,假定你要写一个工具方法来进行一个数据库查询,给定一个SQL语句,并返回一个数据库中符合查询条件的对象集合(collection)。一个方法是显式的传递一个工厂对象,像下面的代码:
interface Factory<T> {
public T[] make();
}
public <T> Collection<T> select(Factory<T> factory, String statement) {
Collection<T> result = new ArrayList<T>();
/* run sql query using jdbc */
for ( int i=0; i<10; i++ ) { /* iterate over jdbc results */
T item = factory.make();
/* use reflection and set all of item’s fields from sql results */
result.add( item );
}
return result;
}
你可以这样调用:
select(new Factory<EmpInfo>(){
public EmpInfo make() {
return new EmpInfo();
}
} , ”selection string”);
也可以声明一个类 EmpInfoFactory 来支持接口 Factory:
class EmpInfoFactory implements Factory<EmpInfo> { ...
public EmpInfo make() { return new EmpInfo();}
}
然后调用:
select(getMyEmpInfoFactory(), "selection string");
这个解决方案的缺点是它需要下面的二者之一:调用处那冗长的匿名工厂类,或为每个要使用的类型声明一个工厂类并传递其对象给调用的地方,这很不自然。使用class类型参数值是非常自然的,它可以被反射使用。没有泛型的代码可能是:
Collection emps = sqlUtility.select(EmpInfo.class, ”select * from emps”); ...
public static Collection select(Class c, String sqlStatement) {
Collection result = new ArrayList();
/* run sql query using jdbc */
for ( /* iterate over jdbc results */ ) {
Object item = c.newInstance();
/* use reflection and set all of item’s fields from sql results */
result.add(item);
}
return result;
}
但是这不能给我们返回一个我们要的精确类型的集合。现在Class是泛型的,我们可以写:
Collection<EmpInfo> emps=sqlUtility.select(EmpInfo.class, ”select * from emps”); ...
public static <T> Collection<T> select(Class<T>c, String sqlStatement) {
Collection<T> result = new ArrayList<T>();
/* run sql query using jdbc */
for ( /* iterate over jdbc results */ ) {
T item = c.newInstance();
/* use reflection and set all of item’s fields from sql results */
result.add(item);
}
return result;
}
来通过一种类型安全的方式得到我们要的集合。这项技术是一个非常有用的技巧,它已成为一个在处理注释(annotations)的新API中被广泛使用的习惯用法。
7)新老代码兼容
- 为了保证代码的兼容性,下面的代码编译器(javac)允许,类型安全有你自己保证
List l = new ArrayList<String>();
List<String> l = new ArrayList();
- 在将你的类库升级为范型版本时,慎用协变式返回值。
例如,将代码
public class Foo {
public Foo create(){
return new Foo();
}
}
public class Bar extends Foo {
public Foo create(){
return new Bar();
}
}
采用协变式返回值风格,将Bar修改为
public class Bar extends Foo {
public Bar create(){
return new Bar();
}
}
要小心你类库的客户端。
- Reduce 的数目可以通过Job.setNumReduceTasks(int)来设置。一般来说,Reduce的数目是节点数的0.95到1.75倍。
- Reduce的数目也可以设置为0,那么这样map的输出会直接写到文件系统中。
- Reduce中还可以使用Mark-Reset的功能。简而言之就是可以在遍历map产生的中间态的数据的时候可以进行标记,然后在后面适当的时候用reset回到最近标记的位置。当然这是有一点限制的,如下面的例子,必须自己在Reduce方法中对reduce的Iterator重新new 一个MarkableIterator才能使用。
public void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
MarkableIterator<IntWritable> mitr = new MarkableIterator<IntWritable>(values.iterator());
// Mark the position
mitr.mark();
while (mitr.hasNext()) {
i = mitr.next();
// Do the necessary processing
}
// Reset
mitr.reset();
// Iterate all over again. Since mark was called before the first
// call to mitr.next() in this example, we will iterate over all
// the values now
while (mitr.hasNext()) {
i = mitr.next();
// Do the necessary processing
}
}
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Maps input key/value pairs to a set of intermediate key/value pairs.
*
* <p>Maps are the individual tasks which transform input records into a
* intermediate records. The transformed intermediate records need not be of
* the same type as the input records. A given input pair may map to zero or
* many output pairs.</p>
*
* <p>The Hadoop Map-Reduce framework spawns one map task for each
* {@link InputSplit} generated by the {@link InputFormat} for the job.
* <code>Mapper</code> implementations can access the {@link Configuration} for
* the job via the {@link JobContext#getConfiguration()}.
*
* <p>The framework first calls
* {@link #setup(org.apache.hadoop.mapreduce.Mapper.Context)}, followed by
* {@link #map(Object, Object, Context)}
* for each key/value pair in the <code>InputSplit</code>. Finally
* {@link #cleanup(Context)} is called.</p>
*
* <p>All intermediate values associated with a given output key are
* subsequently grouped by the framework, and passed to a {@link Reducer} to
* determine the final output. Users can control the sorting and grouping by
* specifying two key {@link RawComparator} classes.</p>
*
* <p>The <code>Mapper</code> outputs are partitioned per
* <code>Reducer</code>. Users can control which keys (and hence records) go to
* which <code>Reducer</code> by implementing a custom {@link Partitioner}.
*
* <p>Users can optionally specify a <code>combiner</code>, via
* {@link Job#setCombinerClass(Class)}, to perform local aggregation of the
* intermediate outputs, which helps to cut down the amount of data transferred
* from the <code>Mapper</code> to the <code>Reducer</code>.
*
* <p>Applications can specify if and how the intermediate
* outputs are to be compressed and which {@link CompressionCodec}s are to be
* used via the <code>Configuration</code>.</p>
*
* <p>If the job has zero
* reduces then the output of the <code>Mapper</code> is directly written
* to the {@link OutputFormat} without sorting by keys.</p>
*
* <p>Example:</p>
* <p><blockquote><pre>
* public class TokenCounterMapper
* extends Mapper<Object, Text, Text, IntWritable>{
*
* private final static IntWritable one = new IntWritable(1);
* private Text word = new Text();
*
* public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
* StringTokenizer itr = new StringTokenizer(value.toString());
* while (itr.hasMoreTokens()) {
* word.set(itr.nextToken());
* context.write(word, one);
* }
* }
* }
* </pre></blockquote></p>
*
* <p>Applications may override the {@link #run(Context)} method to exert
* greater control on map processing e.g. multi-threaded <code>Mapper</code>s
* etc.</p>
*
* @see InputFormat
* @see JobContext
* @see Partitioner
* @see Reducer
*/
@InterfaceAudience.Public
@InterfaceStability.Stable
public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> {
/**
* The <code>Context</code> passed on to the {@link Mapper} implementations.
*/
public abstract class Context
implements MapContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
}
/**
* Called once at the beginning of the task.
*/
protected void setup(Context context
) throws IOException, InterruptedException {
// NOTHING
}
/**
* Called once for each key/value pair in the input split. Most applications
* should override this, but the default is the identity function.
*/
@SuppressWarnings("unchecked")
protected void map(KEYIN key, VALUEIN value,
Context context) throws IOException, InterruptedException {
context.write((KEYOUT) key, (VALUEOUT) value);
}
/**
* Called once at the end of the task.
*/
protected void cleanup(Context context
) throws IOException, InterruptedException {
// NOTHING
}
/**
* Expert users can override this method for more complete control over the
* execution of the Mapper.
* @param context
* @throws IOException
*/
public void run(Context context) throws IOException, InterruptedException {
setup(context);
try {
while (context.nextKeyValue()) {
map(context.getCurrentKey(), context.getCurrentValue(), context);
}
} finally {
cleanup(context);
}
}
}
12、Reducer类说明
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Reduces a set of intermediate values which share a key to a smaller set of
* values.
*
* <p><code>Reducer</code> implementations
* can access the {@link Configuration} for the job via the
* {@link JobContext#getConfiguration()} method.</p>
* <p><code>Reducer</code> has 3 primary phases:</p>
* <ol>
* <li>
*
* <h4 id="Shuffle">Shuffle</h4>
*
* <p>The <code>Reducer</code> copies the sorted output from each
* {@link Mapper} using HTTP across the network.</p>
* </li>
*
* <li>
* <h4 id="Sort">Sort</h4>
*
* <p>The framework merge sorts <code>Reducer</code> inputs by
* <code>key</code>s
* (since different <code>Mapper</code>s may have output the same key).</p>
*
* <p>The shuffle and sort phases occur simultaneously i.e. while outputs are
* being fetched they are merged.</p>
*
* <h5 id="SecondarySort">SecondarySort</h5>
*
* <p>To achieve a secondary sort on the values returned by the value
* iterator, the application should extend the key with the secondary
* key and define a grouping comparator. The keys will be sorted using the
* entire key, but will be grouped using the grouping comparator to decide
* which keys and values are sent in the same call to reduce.The grouping
* comparator is specified via
* {@link Job#setGroupingComparatorClass(Class)}. The sort order is
* controlled by
* {@link Job#setSortComparatorClass(Class)}.</p>
*
*
* For example, say that you want to find duplicate web pages and tag them
* all with the url of the "best" known example. You would set up the job
* like:
* <ul>
* <li>Map Input Key: url</li>
* <li>Map Input Value: document</li>
* <li>Map Output Key: document checksum, url pagerank</li>
* <li>Map Output Value: url</li>
* <li>Partitioner: by checksum</li>
* <li>OutputKeyComparator: by checksum and then decreasing pagerank</li>
* <li>OutputValueGroupingComparator: by checksum</li>
* </ul>
* </li>
*
* <li>
* <h4 id="Reduce">Reduce</h4>
*
* <p>In this phase the
* {@link #reduce(Object, Iterable, Context)}
* method is called for each <code><key, (collection of values)></code> in
* the sorted inputs.</p>
* <p>The output of the reduce task is typically written to a
* {@link RecordWriter} via
* {@link Context#write(Object, Object)}.</p>
* </li>
* </ol>
*
* <p>The output of the <code>Reducer</code> is <b>not re-sorted</b>.</p>
*
* <p>Example:</p>
* <p><blockquote><pre>
* public class IntSumReducer<Key> extends Reducer<Key,IntWritable,
* Key,IntWritable> {
* private IntWritable result = new IntWritable();
*
* public void reduce(Key key, Iterable<IntWritable> values,
* Context context) throws IOException, InterruptedException {
* int sum = 0;
* for (IntWritable val : values) {
* sum += val.get();
* }
* result.set(sum);
* context.write(key, result);
* }
* }
* </pre></blockquote></p>
*
* @see Mapper
* @see Partitioner
*/
@Checkpointable
@InterfaceAudience.Public
@InterfaceStability.Stable
public class Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
/**
* The <code>Context</code> passed on to the {@link Reducer} implementations.
*/
public abstract class Context
implements ReduceContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
}
/**
* Called once at the start of the task.
*/
protected void setup(Context context
) throws IOException, InterruptedException {
// NOTHING
}
/**
* This method is called once for each key. Most applications will define
* their reduce class by overriding this method. The default implementation
* is an identity function.
*/
@SuppressWarnings("unchecked")
protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context
) throws IOException, InterruptedException {
for(VALUEIN value: values) {
context.write((KEYOUT) key, (VALUEOUT) value);
}
}
/**
* Called once at the end of the task.
*/
protected void cleanup(Context context
) throws IOException, InterruptedException {
// NOTHING
}
/**
* Advanced application writers can use the
* {@link #run(org.apache.hadoop.mapreduce.Reducer.Context)} method to
* control how the reduce task works.
*/
public void run(Context context) throws IOException, InterruptedException {
setup(context);
try {
while (context.nextKey()) {
reduce(context.getCurrentKey(), context.getValues(), context);
// If a back up store is used, reset it
Iterator<VALUEIN> iter = context.getValues().iterator();
if(iter instanceof ReduceContext.ValueIterator) {
((ReduceContext.ValueIterator<VALUEIN>)iter).resetBackupStore();
}
}
} finally {
cleanup(context);
}
}
}
13、StringUtils类说明
- IsEmpty/IsBlank - checks if a String contains text
- Trim/Strip - removes leading and trailing whitespace
- Equals - compares two strings null-safe
- startsWith - check if a String starts with a prefix null-safe
- endsWith - check if a String ends with a suffix null-safe
- IndexOf/LastIndexOf/Contains - null-safe index-of checks
- IndexOfAny/LastIndexOfAny/IndexOfAnyBut/LastIndexOfAnyBut - index-of any of a set of Strings
- ContainsOnly/ContainsNone/ContainsAny - does String contains only/none/any of these characters
- Substring/Left/Right/Mid - null-safe substring extractions
- SubstringBefore/SubstringAfter/SubstringBetween - substring extraction relative to other strings
- Split/Join - splits a String into an array of substrings and vice versa
- Remove/Delete - removes part of a String
- Replace/Overlay - Searches a String and replaces one String with another
- Chomp/Chop - removes the last part of a String
- LeftPad/RightPad/Center/Repeat - pads a String
- UpperCase/LowerCase/SwapCase/Capitalize/Uncapitalize - changes the case of a String
- CountMatches - counts the number of occurrences of one String in another
- IsAlpha/IsNumeric/IsWhitespace/IsAsciiPrintable - checks the characters in a String
- DefaultString - protects against a null input String
- Reverse/ReverseDelimited - reverses a String
- Abbreviate - abbreviates a string using ellipsis
- Difference - compares Strings and reports on their differences
- LevensteinDistance - the number of changes needed to change one String into another
(二)下面列出了此类的几个常用方法:
(1)Append 方法可用来将文本或对象的字符串表示形式添加到由当前 StringBuilder对象表示的字符串的结尾处。以下示例将一个 StringBuilder对象初始化为“Hello World”,然后将一些文本追加到该对象的结尾处。将根据需要自动分配空间。
StringBuilderMyStringBuilder = new StringBuilder("Hello World!");
MyStringBuilder.Append(" What a beautiful day.");
Console.WriteLine(MyStringBuilder);
此示例将 Hello World! What abeautiful day.显示到控制台。
(2)AppendFormat 方法将文本添加到 StringBuilder的结尾处,而且实现了 IFormattable接口,因此可接受格式化部分中描述的标准格式字符串。可以使用此方法来自定义变量的格式并将这些值追加到 StringBuilder的后面。以下示例使用 AppendFormat方法将一个设置为货币值格式的整数值放置到 StringBuilder的结尾。
int MyInt= 25;
StringBuilder MyStringBuilder = new StringBuilder("Your total is ");
MyStringBuilder.AppendFormat("{0:C} ", MyInt);
Console.WriteLine(MyStringBuilder);
此示例将 Your total is $25.00显示到控制台。
如果程序对附加字符串的需求很频繁,不建议使用+来进行字符串的串联。可以考虑使用java.lang.StringBuilder 类,使用这个类所产生的对象默认会有16个字符的长度,您也可以自行指定初始长度。如果附加的字符超出可容纳的长度,则StringBuilder 对象会自动增加长度以容纳被附加的字符。如果有频繁作字符串附加的需求,使用StringBuilder 类能使效率大大提高。如下代码:
Java代码
public class AppendStringTest
{
public static void main(String[] args)
{
String text = "" ;
long beginTime = System.currentTimeMillis();
for ( int i= 0 ;i< 10000 ;i++)
text = text + i;
long endTime = System.currentTimeMillis();
System.out.println("执行时间:" +(endTime-beginTime));
StringBuilder sb = new StringBuilder ( "" );
beginTime = System.currentTimeMillis();
for ( int i= 0 ;i< 10000 ;i++)
sb.append(String.valueOf(i));
endTime = System.currentTimeMillis();
System.out.println("执行时间:" +(endTime-beginTime));
}
}
public class AppendStringTest
{
public static void main(String[] args)
{
String text = "";
long beginTime = System.currentTimeMillis();
for(int i=0;i<10000;i++)
text = text + i;
long endTime = System.currentTimeMillis();
System.out.println("执行时间:"+(endTime-beginTime));
StringBuilder sb = new StringBuilder ("");
beginTime = System.currentTimeMillis();
for(int i=0;i<10000;i++)
sb.append(String.valueOf(i));
endTime = System.currentTimeMillis();
System.out.println("执行时间:"+(endTime-beginTime));
}
} 此段代码输出:
执行时间:3188
执行时间:15
StringBuilder 是j2se1.5.0才新增的类,在此之前的版本若有相同的需求,则使用java.util.StringBuffer。事实上StringBuilder 被设计为与StringBuffer具有相同的操作接口。在单机非线程(MultiThread)的情况下使用StringBuilder 会有较好的效率,因为StringBuilder 没有处理同步的问题。StringBuffer则会处理同步问题,如果StringBuilder 会有多线程下被操作,则要改用StringBuffer,让对象自行管理同步问题。
public int hashCode() {
return super.hashCode();
}
public int hashCode() {
return WritableComparator.hashBytes(getBytes(), getLength());
}
/** Compute hash for binary data. */
public static int hashBytes(byte[] bytes, int length) {
return hashBytes(bytes, 0, length);
}
/** Compute hash for binary data. */
public static int hashBytes(byte[] bytes, int offset, int length) {
int hash = 1;
for (int i = offset; i < offset + length; i++)
hash = (31 * hash) + (int)bytes[i];
return hash;
}
25、secondary sort实现说明
主要步骤:
1)【排序】定制key class,实现WritableComparable,作为Map outputKey。重点需要实现compareTo()方法。
key class必须实现WritableComparable,这需要定义compareTo方法。
如果没有定制key class,将无法实现部分outputvalue参与排序过程。
public int compareTo(Stock arg0) {
// TODO Auto-generated method stub
int response = this.symbol.compareTo(arg0.symbol);
if(response==0){
response = this.date.compareTo(arg0.date);
}
return response;
}
Compares this object with the specified object for order. Returns a negative integer, zero, or a positive integer as this object is less than, equal to, or greater than the specified object.
The implementor must ensure sgn(x.compareTo(y)) == -sgn(y.compareTo(x)) for all x and y. (This implies that x.compareTo(y) must throw an exception iff y.compareTo(x) throws an exception.)
The implementor must also ensure that the relation is transitive: (x.compareTo(y)>0 && y.compareTo(z)>0) implies x.compareTo(z)>0.
Finally, the implementor must ensure that x.compareTo(y)==0 implies that sgn(x.compareTo(z)) == sgn(y.compareTo(z)), for all z.
It is strongly recommended, but not strictly required that (x.compareTo(y)==0) == (x.equals(y)). Generally speaking, any class that implements the Comparable interface and violates this condition should clearly indicate this fact. The recommended language is
"Note: this class has a natural ordering that is inconsistent with equals."
In the foregoing description, the notation sgn(expression) designates the mathematical signum function, which is defined to return one of -1, 0, or 1 according to whether the value of expression is negative, zero or positive.
setGroupingComparatorClass说明:Define the comparator that controls which keys are grouped together for a single call to Reducer.reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context)
Set the user defined RawComparator comparator for grouping keys in the input to the reduce.
This comparator should be provided if the equivalence rules for keys for sorting the intermediates are
different from those for grouping keys before each call to Reducer.reduce(Object, java.util.Iterator,
OutputCollector, Reporter).
For key-value pairs (K1,V1) and (K2,V2), the values (V1, V2) are passed in a single call to the reduce
function if K1 and K2 compare as equal.
Since setOutputKeyComparatorClass(Class) can be used to control how keys are sorted, this can be used
in conjunction to simulate secondary sort on values.
Note: This is not a guarantee of the reduce sort being stable in any sense. (In any case, with the order of
available map-outputs to the reduce being non-deterministic, it wouldn't make that much sense.)
Parameters:
theClass the comparator class to be used for grouping keys. It should implement RawComparator.
See Also:
setOutputKeyComparatorClass(Class)
setCombinerKeyGroupingComparator(Class)
26、执行jar包错误
15/12/01 03:27:34 INFO mapreduce.Job: Task Id : attempt_1447915042936_0028_r_000000_0, Status : FAILED
Error: java.lang.NumberFormatException: For input string: "dividends"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
at java.lang.Double.parseDouble(Double.java:540)
at customSort.Customsort$CustomReducer.reduce(Customsort.java:46)
at customSort.Customsort$CustomReducer.reduce(Customsort.java:35)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
错误原因:格式转换错误,提示输入数据有“dividends”这样的英文字符串,无法格式转换。
解决方法:代码中有如下内容,String类型对比不能直接使用!=,而应该使用equals()方法。
String [] words = StringUtils.split(value.toString(), ',');
if(words[0]!="exchange”) //需要修改为words[0].equals(“exchange")
equals()方法说明:
/**
* Compares this string to the specified object. The result is {@code
* true} if and only if the argument is not {@code null} and is a {@code
* String} object that represents the same sequence of characters as this
* object.
*
* @param anObject
* The object to compare this {@code String} against
*
* @return {@code true} if the given object represents a {@code String}
* equivalent to this string, {@code false} otherwise
*
* @see #compareTo(String)
* @see #equalsIgnoreCase(String)
*/
public boolean equals(Object anObject) {
if (this == anObject) {
return true;
}
if (anObject instanceof String) {
String anotherString = (String)anObject;
int n = value.length;
if (n == anotherString.value.length) {
char v1[] = value;
char v2[] = anotherString.value;
int i = 0;
while (n-- != 0) {
if (v1[i] != v2[i])
return false;
i++;
}
return true;
}
}
return false;
}
27、执行jar包出错,namenode为safemode状态
[root@sandbox jar]# yarn jar customsort.jar /xuefei/dividends /xuefei/output
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot delete /xuefei/output. Name node is in safe mode.
Resources are low on NN. Please add or free up more resources then turn off safe mode manually. NOTE: If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to turn safe mode off.
错误原因:磁盘空间不足,导致namenode无法写文件(原因之一)
解决方法:
1)尝试执行
[root@sandbox jar]# hdfs dfsadmin -safemode leave
safemode: Access denied for user root. Superuser privilege is required
2)删除大文件,清理磁盘,尝试重启后恢复
Int constructor i——this.i: 10——11
String constructor: ok
String constructor: ok again!
Int constructor: 21
String constructor: ok again!
14
细节问题注释已经写的比较清楚了,这里不在赘述,只是总结一下,其实this主要要三种用法:
1、表示对当前对象的引用!
2、表示用类的成员变量,而非函数参数,注意在函数参数和成员变量同名是进行区分!其实这是第一种用法的特例,比较常用,所以那出来强调一下。
3、用于在构造方法中引用满足指定参数类型的构造器(其实也就是构造方法)。但是这里必须非常注意:只能引用一个构造方法且必须位于开始!
还有就是注意:this不能用在static方法中!所以甚至有人给static方法的定义就是:没有this的方法!虽然夸张,但是却充分说明this不能在static方法中使用!
29、执行jar包出错
15/12/03 23:41:28 INFO mapreduce.Job: Task Id : attempt_1449157066412_0001_r_000000_0, Status : FAILED
Error: java.lang.NullPointerException at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:128)
at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:158)
at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121)
at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
错误原因:WritableComparator类没有重构构造函数
解决方法:重构构造函数,在类中增加如下代码
public CustomCompator() {
super(Stock.class, true);
// TODO Auto-generated constructor stub
}
31、InputFormat说明
1)继承RecordReader类,重点实现initialize()、nextKeyValue()、getCurrentKey()、getCurrentValue()、getProgress()等方法
RecordRead类:The record reader breaks the data into key/value pairs for input to the Mapper。相关方法被mapper调用,获取key/value作为mapper输入
- initialize() Called once at initialization.
- nextKeyValue() Read the next key, value pair, return true if a key/value pair was read
- getCurrentKey() Get the current value, return the current key or null if there is no current key
- getCurrentValue() Get the current value, return the object that was read
- getProgress() The current progress of the record reader through its data, return a number between 0.0 and 1.0 that is the fraction of the data read
- close() close the record reader
2)继承FileInputFormat类,重点实现createRecordReader()方法
Create a record reader for a given split. The framework will call RecordReader.initialize(InputSplit, TaskAttemptContext) before the split is used.
Overrides: createRecordReader(...) in InputFormat
Parameters:
split the split to be read
context the information about the task
Returns:
a new record reader
Throws:
IOException
InterruptedException
3)设置job.setInputFormatClass(xx)
32、outputFormatClass为sequenceFileOutputFormat,查看class出错
[root@sandbox jar]# hdfs dfs -text /xuefei/output/custominput/part-r-00000
-text: Fatal internal error
java.lang.RuntimeException: java.io.IOException: WritableName can't load class: customInput.Stock
at org.apache.hadoop.io.SequenceFile$Reader.getKeyClass(SequenceFile.java:2016)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1947)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1813)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1762)
错误原因:sequenceFileOutputFormat不可以直接查看,需要load相关类
解决方法:把bin目录下的class文件放到hadoop classpath中
33、CombineFileSplit类说明
A sub-collection of input files. Unlike FileSplit, CombineFileSplit class does not represent a split of a file, but a split of input files into smaller sets. A split may contain blocks from different file but all the blocks in the same split are probably local
to some rack .
CombineFileSplit can be used to implement RecordReader's, with reading one record per file.
34、LineRecordReader类说明
Treats keys as offset in file and value as line.
35、MultipleInputs类说明
This class supports MapReduce jobs that have multiple input paths with a different InputFormat and Mapper for each path
有以下4个方法:
- public static void addInputPath(Job job, Path path, Class< extends InputFormat> inputFormatClass)
- public static void addInputPath(Job job, Path path, Class< extends InputFormat> inputFormatClass, Class< extends Mapper> mapperClass)
- static Map<Path, InputFormat> getInputFormatMap(JobContext job)
- static Map<Path, Class< extends Mapper>> getMapperTypeMap(JobContext job)
简单说:
1.extends是继承父类,只要那个类不是声明为final或者那个类定义为abstract的就能继承,
2.JAVA中不支持多重继承(一个类可以同时继承多个父类的行为和特征功能),但是可以用接口来实现,这样就要用到implements,
3.继承只能继承一个类,但implements可以实现多个接口,用逗号分开就行了 ,
比如 class A extends B implements C,D,E
@Public
@Stable
A serializable object which implements a simple, efficient, serialization protocol, based on DataInput and DataOutput.
Any key or value type in the Hadoop Map-Reduce framework implements this interface.
Implementations typically implement a static read(DataInput) method which constructs a new instance, calls readFields(DataInput) and returns the instance.
Example:
public class MyWritable implements Writable {
// Some data
private int counter;
private long timestamp;
public void write(DataOutput out) throws IOException {
out.writeInt(counter);
out.writeLong(timestamp);
}
public static MyWritable read(DataInput in) throws IOException {
MyWritable w = new MyWritable();
w.readFields(in);
return w;
}
}
public interface WritableComparable<T> extends Writable, Comparable<T> {
}
writable
org.apache.hadoop.io.Writable
@Public
@Stable
A serializable object which implements a simple, efficient, serialization protocol, based on DataInput and DataOutput.
Any key or value type in the Hadoop Map-Reduce framework implements this interface.
Implementations typically implement a static read(DataInput) method which constructs a new instance, calls readFields(DataInput) and returns the instance.
Example:
public class MyWritable implements Writable {
// Some data
private int counter;
private long timestamp;
public void write(DataOutput out) throws IOException {
out.writeInt(counter);
out.writeLong(timestamp);
}
public static MyWritable read(DataInput in) throws IOException {
MyWritable w = new MyWritable();
w.readFields(in);
return w;
}
}
@InterfaceAudience.Public
@InterfaceStability.Stable
public interface Writable {
/**
* Serialize the fields of this object to <code>out</code>.
*
* @param out <code>DataOuput</code> to serialize this object into.
* @throws IOException
*/
void write(DataOutput out) throws IOException;
/**
* Deserialize the fields of this object from <code>in</code>.
*
* <p>For efficiency, implementations should attempt to re-use storage in the
* existing object where possible.</p>
*
* @param in <code>DataInput</code> to deseriablize this object from.
* @throws IOException
*/
void readFields(DataInput in) throws IOException;
}
Comparable
java.lang.Comparable<Stock>
This interface imposes a total ordering on the objects of each class that implements it. This ordering is referred to as the class's natural ordering, and the class's compareTo method is referred to as its natural comparison method.
Lists (and arrays) of objects that implement this interface can be sorted automatically by Collections.sort (and Arrays.sort). Objects that implement this interface can be used as keys in a sorted map or as elements in a sorted set, without the need to specify
a comparator.
The natural ordering for a class C is said to be consistent with equals if and only if e1.compareTo(e2) == 0 has the same boolean value as e1.equals(e2) for every e1 and e2 of class C. Note that null is not an instance of any class, and e.compareTo(null) should
throw a NullPointerException even though e.equals(null) returns false.
It is strongly recommended (though not required) that natural orderings be consistent with equals. This is so because sorted sets (and sorted maps) without explicit comparators behave "strangely" when they are used with elements (or keys) whose natural ordering
is inconsistent with equals. In particular, such a sorted set (or sorted map) violates the general contract for set (or map), which is defined in terms of the equals method.
For example, if one adds two keys a and b such that (!a.equals(b) && a.compareTo(b) == 0) to a sorted set that does not use an explicit comparator, the second add operation returns false (and the size of the sorted set does not increase) because a and b are
equivalent from the sorted set's perspective.
Virtually all Java core classes that implement Comparable have natural orderings that are consistent with equals. One exception is java.math.BigDecimal, whose natural ordering equates BigDecimal objects with equal values and different precisions (such as 4.0
and 4.00).
For the mathematically inclined, the relation that defines the natural ordering on a given class C is:
{(x, y) such that x.compareTo(y) <= 0}.
The quotient for this total order is:
{(x, y) such that x.compareTo(y) == 0}.
It follows immediately from the contract for compareTo that the quotient is an equivalence relation on C, and that the natural ordering is a total order on C. When we say that a class's natural ordering is consistent with equals, we mean that the quotient for
the natural ordering is the equivalence relation defined by the class's equals(Object) method:
{(x, y) such that x.equals(y)}.
This interface is a member of the Java Collections Framework.
Type Parameters:
<T> the type of objects that this object may be compared to
Since:
1.2
Author:
Josh Bloch
See Also:
java.util.Comparator
public interface Comparable<T> {
public int compareTo(T o);
}
@Public
@Stable
A Comparator for WritableComparables.
This base implemenation uses the natural ordering. To define alternate orderings, override compare(WritableComparable, WritableComparable).
One may optimize compare-intensive operations by overriding compare(byte [], int, int, byte [], int, int). Static utility methods are provided to assist in optimized implementations of this method.
public class WritableComparator implements RawComparator {
private static final ConcurrentHashMap<Class, WritableComparator> comparators
= new ConcurrentHashMap<Class, WritableComparator>(); // registry
/** Get a comparator for a {@link WritableComparable} implementation. */
public static WritableComparator get(Class< extends WritableComparable> c) {
WritableComparator comparator = comparators.get(c);
if (comparator == null) {
// force the static initializers to run
forceInit(c);
// look to see if it is defined now
comparator = comparators.get(c);
// if not, use the generic one
if (comparator == null) {
comparator = new WritableComparator(c, true);
}
}
return comparator;
}
/**
* Force initialization of the static members.
* As of Java 5, referencing a class doesn't force it to initialize. Since
* this class requires that the classes be initialized to declare their
* comparators, we force that initialization to happen.
* @param cls the class to initialize
*/
private static void forceInit(Class<> cls) {
try {
Class.forName(cls.getName(), true, cls.getClassLoader());
} catch (ClassNotFoundException e) {
throw new IllegalArgumentException("Can't initialize class " + cls, e);
}
}
/** Register an optimized comparator for a {@link WritableComparable}
* implementation. Comparators registered with this method must be
* thread-safe. */
public static void define(Class c, WritableComparator comparator) {
comparators.put(c, comparator);
}
private final Class< extends WritableComparable> keyClass;
private final WritableComparable key1;
private final WritableComparable key2;
private final DataInputBuffer buffer;
protected WritableComparator() {
this(null);
}
/** Construct for a {@link WritableComparable} implementation. */
protected WritableComparator(Class< extends WritableComparable> keyClass) {
this(keyClass, false);
}
/** Returns the WritableComparable implementation class. */
public Class< extends WritableComparable> getKeyClass() { return keyClass; }
/** Construct a new {@link WritableComparable} instance. */
public WritableComparable newKey() {
return ReflectionUtils.newInstance(keyClass, null);
}
/** Optimization hook. Override this to make SequenceFile.Sorter's scream.
*
* <p>The default implementation reads the data into two {@link
* WritableComparable}s (using {@link
* Writable#readFields(DataInput)}, then calls {@link
* #compare(WritableComparable,WritableComparable)}.
*/
@Override
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
try {
buffer.reset(b1, s1, l1); // parse key1
key1.readFields(buffer);
} catch (IOException e) {
throw new RuntimeException(e);
}
return compare(key1, key2); // compare them
}
/** Compare two WritableComparables.
*
* <p> The default implementation uses the natural ordering, calling {@link
* Comparable#compareTo(Object)}. */
@SuppressWarnings("unchecked")
public int compare(WritableComparable a, WritableComparable b) {
return a.compareTo(b);
}
@Override
public int compare(Object a, Object b) {
return compare((WritableComparable)a, (WritableComparable)b);
}
/** Lexicographic order of binary data. */
public static int compareBytes(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
return FastByteComparisons.compareTo(b1, s1, l1, b2, s2, l2);
}
/** Compute hash for binary data. */
public static int hashBytes(byte[] bytes, int offset, int length) {
int hash = 1;
for (int i = offset; i < offset + length; i++)
hash = (31 * hash) + (int)bytes[i];
return hash;
}
/** Compute hash for binary data. */
public static int hashBytes(byte[] bytes, int length) {
return hashBytes(bytes, 0, length);
}
/** Parse an unsigned short from a byte array. */
public static int readUnsignedShort(byte[] bytes, int start) {
return (((bytes[start] & 0xff) << 8) +
((bytes[start+1] & 0xff)));
}
/** Parse an integer from a byte array. */
public static int readInt(byte[] bytes, int start) {
return (((bytes[start ] & 0xff) << 24) +
((bytes[start+1] & 0xff) << 16) +
((bytes[start+2] & 0xff) << 8) +
((bytes[start+3] & 0xff)));
}
/** Parse a float from a byte array. */
public static float readFloat(byte[] bytes, int start) {
return Float.intBitsToFloat(readInt(bytes, start));
}
/** Parse a long from a byte array. */
public static long readLong(byte[] bytes, int start) {
return ((long)(readInt(bytes, start)) << 32) +
(readInt(bytes, start+4) & 0xFFFFFFFFL);
}
/** Parse a double from a byte array. */
public static double readDouble(byte[] bytes, int start) {
return Double.longBitsToDouble(readLong(bytes, start));
}
/**
* Reads a zero-compressed encoded long from a byte array and returns it.
* @param bytes byte array with decode long
* @param start starting index
* @throws java.io.IOException
* @return deserialized long
*/
public static long readVLong(byte[] bytes, int start) throws IOException {
int len = bytes[start];
if (len >= -112) {
return len;
}
boolean isNegative = (len < -120);
len = isNegative -(len + 120) : -(len + 112);
if (start+1+len>bytes.length)
throw new IOException(
"Not enough number of bytes for a zero-compressed integer");
long i = 0;
for (int idx = 0; idx < len; idx++) {
i = i << 8;
i = i | (bytes[start+1+idx] & 0xFF);
}
return (isNegative (i ^ -1L) : i);
}
/**
* Reads a zero-compressed encoded integer from a byte array and returns it.
* @param bytes byte array with the encoded integer
* @param start start index
* @throws java.io.IOException
* @return deserialized integer
*/
public static int readVInt(byte[] bytes, int start) throws IOException {
return (int) readVLong(bytes, start);
}
}
Comparator
org.apache.hadoop.io.Text.Comparator
/** A WritableComparator optimized for Text keys. */
public static class Comparator extends WritableComparator {
public Comparator() {
super(Text.class);
}
@Override
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
int n1 = WritableUtils.decodeVIntSize(b1[s1]);
int n2 = WritableUtils.decodeVIntSize(b2[s2]);
return compareBytes(b1, s1+n1, l1-n1, b2, s2+n2, l2-n2);
}
}
Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits
nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the
HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load
factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that
is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.
As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The
expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the
load factor, no rehash operations will ever occur.
If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys
with the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.
Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or
deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such
object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:
Map m = Collections.synchronizedMap(new HashMap(...));
The iterators returned by all of this class's "collection view methods" are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException.
Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on
a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
This class is a member of the Java Collections Framework.
Type Parameters:
<K> the type of keys maintained by this map
<V> the type of mapped values
Since:
1.2
Author:
Doug Lea
Josh Bloch
Arthur van Hoff
Neal Gafter
See Also:
Object.hashCode()
Collection
Map
TreeMap
Hashtable
第二个,从HttpServletRequest的javadoc中可以看出,getRequestURI返回一个String,“the part of this request’s URL from the protocol name up to the query string in the first line of the HTTP request”,比如“POST /some/path.htmla=b HTTP/1.1”,则返回的值为”/some/path.html”。现在可以明白为什么是getRequestURI而不是getRequestURL了,因为此处返回的是相对的路径。而getRequestURL返回一个StringBuffer,“The
returned URL contains a protocol, server name, port number, and server path, but it does not include query string parameters.”,完整的请求资源路径,不包括querystring。
ps:
java.net.URL类不提供对标准RFC2396规定的特殊字符的转义,因此需要调用者自己对URL各组成部分进行encode。而java.net.URI则会提供转义功能。因此The recommended way to manage the encoding and decoding of URLs is to use java.net.URI. 可以使用URI.toURL()和URL.toURI()方法来对两个类型的对象互相转换。对于HTML FORM的url encode/decode可以使用java.net.URLEncoder和java.net.URLDecoder来完成,但是对URL对象不适用。
Constructs a URI by parsing the given string.
This constructor parses the given string exactly as specified by the grammar in RFC 2396, Appendix A, except for the following deviations:
An empty authority component is permitted as long as it is followed by a non-empty path, a query component, or a fragment component. This allows the parsing of URIs such as "file:///foo/bar", which seems to be the intent of RFC 2396 although the grammar does
not permit it. If the authority component is empty then the user-information, host, and port components are undefined.
Empty relative paths are permitted; this seems to be the intent of RFC 2396 although the grammar does not permit it. The primary consequence of this deviation is that a standalone fragment such as "#foo" parses as a relative URI with an empty path and the given
fragment, and can be usefully resolved against a base URI.
IPv4 addresses in host components are parsed rigorously, as specified by RFC 2732: Each element of a dotted-quad address must contain no more than three decimal digits. Each element is further constrained to have a value no greater than 255.
Hostnames in host components that comprise only a single domain label are permitted to start with an alphanum character. This seems to be the intent of RFC 2396 section 3.2.2 although the grammar does not permit it. The consequence of this deviation is that
the authority component of a hierarchical URI such as s://123, will parse as a server-based authority.
IPv6 addresses are permitted for the host component. An IPv6 address must be enclosed in square brackets ('[' and ']') as specified by RFC 2732. The IPv6 address itself must parse according to RFC 2373. IPv6 addresses are further constrained to describe no
more than sixteen bytes of address information, a constraint implicit in RFC 2373 but not expressible in the grammar.
Characters in the other category are permitted wherever RFC 2396 permits escaped octets, that is, in the user-information, path, query, and fragment components, as well as in the authority component if the authority is registry-based. This allows URIs to contain
Unicode characters beyond those in the US-ASCII character set.
Parameters:
str The string to be parsed into a URI
Throws:
NullPointerException - If str is null
URISyntaxException - If the given string violates RFC 2396, as augmented by the above deviations
39、执行jar包出错,nullPointerException
15/12/15 22:50:46 INFO mapreduce.Job: Task Id : attempt_1450188830584_0013_m_000000_1, Status : FAILED
Error: java.lang.NullPointerException
at mapsidejoin.Mapsidejoin$MapsideMapper.map(Mapsidejoin.java:37)
at mapsidejoin.Mapsidejoin$MapsideMapper.map(Mapsidejoin.java:26)
错误原因:hashmap使用containKey方法,需要调用key class的equals()方法
解决方法:做stock class(implements WritableComparable)漏重构方法equal()和hashCode()
Java code
/** * Returns the entry associated with the specified key in the * HashMap. Returns null if the HashMap contains no mapping * for the key. */
final Entry<K,V> getEntry(Object key) {
int hash = (key == null) 0 : hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)]; e != null; e = e.next) {
40、执行jar包出错,unknowHostexception
root@HWX_Java:~/java/labs/Solutions/Lab7.1/MapSideJoin# yarn jar mapsidejoin.jar AIT
15/12/15 23:07:36 INFO impl.TimelineClientImpl: Timeline service address: http://resourcemanager:8188/ws/v1/timeline/
15/12/15 23:07:36 INFO client.RMProxy: Connecting to ResourceManager at resourcemanager/172.17.0.3:8050
15/12/15 23:07:37 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/root/.staging/job_1450188830584_0017
java.lang.IllegalArgumentException: java.net.UnknownHostException: sandbox
41、在Map Side Join中,使用hashmap存储小表的时候,需要对key的对象重载hashCode()函数,否则hashmap的containsKey()函数找不到对应内容
/**
* Performs the logic for the <code>split</code> and
* <code>splitPreserveAllTokens</code> methods that do not return a
* maximum array length.
*
* @param str the String to parse, may be <code>null</code>
* @param separatorChar the separate character
* @param preserveAllTokens if <code>true</code>, adjacent separators are
* treated as empty token separators; if <code>false</code>, adjacent
* separators are treated as one separator.
* @return an array of parsed Strings, <code>null</code> if null String input
*/
private static String[] splitWorker(String str, char separatorChar, boolean preserveAllTokens) {
43、reduce side join,需要在map outputKey和map outputValuel里面都设置flag。
outputKey:用于排序数据集;
outputValue flag:用于剔除数据集中记录不存在的情况。
44、设置Partitioner()出错,提示非法分区
16/06/10 13:07:58 INFO mapreduce.Job: Task Id : attempt_1465527536456_0018_m_000000_0, Status : FAILED
Error: java.io.IOException: Illegal partition for Theoldmanandsea,!termnum (-2)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1082)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:715)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at tfidf.Tfidf$Tfmapper.cleanup(Tfidf.java:111)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:149)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
错误原因:在设置Partitioner函数中,使用key.getYear(String类型)与numPartitions取模作为返回值。由于取模函数支持负数运算,并且运算符号与被除数一致。当key.getYear.hashCode()返回负数时候, (key.getYear.hashCode()%numPartitions也是负数,导致程序出错。
解决方法:修改为 Math.abs(key.getYear.hashCode()%numPartitions);
public static class Exampartitioner extends Partitioner<Airport, DoubleWritable> {
@Override
public int getPartition(Airport key, DoubleWritable value,
int numPartitions) {
// TODO Auto-generated method stub
return key.getYear.hashCode()%numPartitions;
}
}
45、partitioner的判断条件需等于Group的条件,或者是Group条件的子集。
46、TotalOrderPartitioner出错,提示如下,
16/06/09 14:01:20 INFO mapreduce.Job: Task Id : attempt_1465443169491_0015_m_000000_2, Status : FAILED
Error: java.lang.IllegalArgumentException: Can't read partitions file
at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:701)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: Wrong number of partitions in keyset
at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:90)
... 10 more
问题原因:job.setNumReduceTask(3);放在了InputSampler.Sampler的后面。
解决方法:把job.setNumReduceTask(3)放在前面。