Wednesday, December 30, 2015

Resolve OutOfMemoryError With ExcelExport : Export Excel Utility with Apache POI Stream API (SXSSF)

                                 Whenever we try to export excel of huge data (for ex: around 200000-300000 records), most of the time we end up with OutOfMemoryError:JavaHeapSpace. And also consuming more time to process or export that much of data. Main reason to this kind of problem is that, the prior version of Apache POI (prior to 3.8) will not provide proper solution for this kind of situations and also we have other issues with the API designed with those versions. Even I had faced issues of not supporting more than 65000 rows of data during exporting excel with prior versions of POI. But with the version 3.8 and higher they come with solutions for all these problems. To resolve Memory issue and performance issue of Excel Export they have utilized stream API to design their API to support huge data export and performance issues. With stream API we can flush only few rows of data into the Memory and reamining we can flush to the hard memory (permanent Memory). In this example you can esily identify how it supports for larger data. I wrote this utility for supporting almost more than 200000 lakhs of records with one of my application. I hope it will help many who are in search of this kind of solution. This solution I have applied with Spring MVC application.

                  To solve this problem I have applied Template design pattern to create utility for excel Export of any data. This is a generic implementation which you can use wherever you want with respective implementation. Please find below the First Abstract class which is generic class which we need to extend to implement export functionality for our corresponding module.


import java.util.List;

import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.CellStyle;
import org.apache.poi.ss.usermodel.IndexedColors;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;

/**
 * @author Shidram
 *
 * @param <E>
 */
public abstract class ExcelExportUtility< E extends Object > {

    protected SXSSFWorkbook wb;
    protected Sheet sh;
    protected static final String EMPTY_VALUE = " ";

    /**
     * This method demonstrates how to Auto resize Excel column
     */
    private void autoResizeColumns(int listSize) {

        for (int colIndex = 0; colIndex < listSize; colIndex++) {
            sh.autoSizeColumn(colIndex);
        }
    }

    /**
     * 
     * This method will return Style of Header Cell
     * 
     * @return
     */

    protected CellStyle getHeaderStyle() {
        CellStyle style = wb.createCellStyle();
        style.setFillForegroundColor(IndexedColors.ORANGE.getIndex());
        style.setFillPattern(CellStyle.SOLID_FOREGROUND);
        style.setBorderBottom(CellStyle.BORDER_THIN);
        style.setBottomBorderColor(IndexedColors.BLACK.getIndex());
        style.setBorderLeft(CellStyle.BORDER_THIN);
        style.setLeftBorderColor(IndexedColors.BLACK.getIndex());
        style.setBorderRight(CellStyle.BORDER_THIN);
        style.setRightBorderColor(IndexedColors.BLACK.getIndex());
        style.setBorderTop(CellStyle.BORDER_THIN);
        style.setTopBorderColor(IndexedColors.BLACK.getIndex());
        style.setAlignment(CellStyle.ALIGN_CENTER);

        return style;
    }

    /**
     * 
     * This method will return style for Normal Cell
     * 
     * @return
     */

    protected CellStyle getNormalStyle() {
        CellStyle style = wb.createCellStyle();
        style.setBorderBottom(CellStyle.BORDER_THIN);
        style.setBottomBorderColor(IndexedColors.BLACK.getIndex());
        style.setBorderLeft(CellStyle.BORDER_THIN);
        style.setLeftBorderColor(IndexedColors.BLACK.getIndex());
        style.setBorderRight(CellStyle.BORDER_THIN);
        style.setRightBorderColor(IndexedColors.BLACK.getIndex());
        style.setBorderTop(CellStyle.BORDER_THIN);
        style.setTopBorderColor(IndexedColors.BLACK.getIndex());
        style.setAlignment(CellStyle.ALIGN_CENTER);

        return style;
    }

    /**
     * @param columns
     */
    private void fillHeader(String[] columns) {
        wb = new SXSSFWorkbook(100); // keep 100 rows in memory, exceeding rows will be flushed to disk
        sh = wb.createSheet("Validated Data");
        CellStyle headerStle = getHeaderStyle();

        for (int rownum = 0; rownum < 1; rownum++) {
            Row row = sh.createRow(rownum);

            for (int cellnum = 0; cellnum < columns.length; cellnum++) {
                Cell cell = row.createCell(cellnum);
                cell.setCellValue(columns[cellnum]);
                cell.setCellStyle(headerStle);
            }

        }
    }

    /**
     * @param columns
     * @param dataList
     * @return
     */
    public final SXSSFWorkbook exportExcel(String[] columns, List<E> dataList) {

        fillHeader(columns);
        fillData(dataList);
        autoResizeColumns(columns.length);

        return wb;
    }

    /**
     * @param dataList
     */
    abstract void fillData(List<E> dataList);

} 
By Extending the above class we can implement our own excel utility to export the data. In this extended class we have to override the 'fillData()' method to provide the data for export. For Example I have taken one such class below for demo:

import java.text.SimpleDateFormat;
import java.util.List;

import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.CellStyle;
import org.apache.poi.ss.usermodel.Row;

/**
 * @author Shidram
 *
 */
public class ExportRevisionResponseExcel extends ExcelExportUtility<RevisionResponse> {

    /*
     * @see ASTExcelExportUtility#fillData(java.util.List)
     */
    void fillData(List<RevisionResponse> dataList) {

        CellStyle normalStyle = getNormalStyle();
        int rownum = 1;
        SimpleDateFormat dtFormat = new SimpleDateFormat("E MMM dd HH:mm:ss z yyyy");

        for (RevisionResponse rev : dataList) {

            Row row = sh.createRow(rownum);

            Cell cell_0 = row.createCell(0, Cell.CELL_TYPE_STRING);
            cell_0.setCellStyle(normalStyle);
            cell_0.setCellValue(rev.getRevId());

            Cell cell_1 = row.createCell(1, Cell.CELL_TYPE_STRING);
            cell_1.setCellStyle(normalStyle);
            cell_1.setCellValue(rev.getJcrCreated() != null ? dtFormat.format(rev.getJcrCreated()) : " ");

            rownum++;
        }
    }

}

Now The utility is ready, next step is to call this Utility from some Action or Controller code for exporting the data. In this case I am providing the Spring controller Method code. For the sake of understanding I am just providing only the required code snippet from the controller. For the data which is to be exported I am using ServletContext to get the already available search data to avoid multiple hits to the business methods. For this reason I am using SeverletContext to put the data from search method and getting the same data from Export method of the same controller. Please find below the code from the controller :

@Controller
public class RevisionResponseController {

      ......

    @Autowired
    private ServletContext servletContext;

      ......


    @SuppressWarnings("unchecked")
    @RequestMapping(value = "/export", method = RequestMethod.GET)
    public ModelAndView exportRevisionsToExcel(ModelAndView modelAndView) {

        List<RevisionResponse> revList = (List<RevisionResponse>) servletContext.getAttribute("revisionsResponse");
        DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd_hh_mm_ss");
        String excelFileName = "Revisions_" + formatter.format(LocalDateTime.now()) + ".xlsx";
        SXSSFWorkbook wb = (new ExportRevisionResponseExcel()).exportExcel(new String[] { "REVISION ID",
            "CREATION DATE" }, revList);

        try {
            ByteArrayOutputStream outByteStream = new ByteArrayOutputStream();
            wb.write(outByteStream);
            byte[] outArray = outByteStream.toByteArray();

            response.setContentType("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
            response.setContentLength(outArray.length);
            response.setHeader("Expires:", "0"); // eliminates browser caching
            response.setHeader("Content-Disposition", "attachment; filename=" + excelFileName);
            OutputStream outStream = response.getOutputStream();
            outStream.write(outArray);
            outStream.flush();
            wb.dispose();
            wb.close();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }

        return modelAndView;

    }


     ......

     ......
}


That's all. The functionality is ready now. Next step is to call this controller method from the UI action.

Kindly place your comments if you like it. And also provide your suggestions if you feel still you can write better approach than this. I really appreciate such suggestions. Because I feel still I need to know the depth of the same which is a day to day long run process. If you like it kindly share the same with your friends whoever in need of the same.

Thank you.

Reading a file while file being written at the same time

Below code describes how to read a file when a particular file is actively being written. Here is a full example. For the below example the mentioned file should be existed in the mentioned path else it will throw FileNotFoundException.

import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;

public class ReadingFileWhileWrite extends Thread {

    boolean running = true;
    BufferedInputStream reader = null;

    public static void main(String[] args) throws FileNotFoundException {
        ReadingFileWhileWrite tw = new ReadingFileWhileWrite();
        tw.reader = new BufferedInputStream(new FileInputStream("TestFile.txt"));
        tw.start();
    }

    public void run() {
        while (running) {
            try {
                if (reader.available() > 0) {
                    System.out.print((char) reader.read());
                } else {
                    try {
                        sleep(500);
                    } catch (InterruptedException ex) {
                        running = false;
                    }
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

}

Thursday, January 22, 2015

Consider static factory methods instead of constructors


           The normal way for a class to allow a client to obtain an instance of itself is to provide a public constructor. There is another technique that should be a part of every programmer’s toolkit. A class can provide a public static factory method, which is simply a static method that returns an instance of the class. Here’s a simple example from Boolean (the boxed primitive class for the primitive type boolean). This method translates a boolean primitive value into a Boolean object reference:
 

public static Boolean valueOf(boolean b) {
return b ? Boolean.TRUE : Boolean.FALSE;
}


          Note that a static factory method is not the same as the Factory Method pattern from Design Patterns . The static factory method described in this item has no direct equivalent in Design Patterns. A class can provide its clients with static factory methods instead of, or in addition to, constructors. Providing a static factory method instead of a public constructor has both advantages and disadvantages.
 

         One advantage of static factory methods is that, unlike constructors, they have names. If the parameters to a constructor do not, in and of themselves, describe the object being returned, a static factory with a well-chosen name is easier to use and the resulting client code easier to read. For example, the constructor BigInteger(int, int, Random), which returns a BigInteger that is probably prime, would have been better expressed as a static factory method named BigInteger. probablePrime. (This method was eventually added in the 1.4 release.)

        A class can have only a single constructor with a given signature. Programmers have been known to get around this restriction by providing two constructors whose parameter lists differ only in the order of their parameter types. This is a really bad idea. The user of such an API will never be able to remember which constructor is which and will end up calling the wrong one by mistake. People reading code that uses these constructors will not know what the code does without referring to the class documentation.


        Because they have names, static factory methods don’t share the restriction discussed in the previous paragraph. In cases where a class seems to require multiple constructors with the same signature, replace the constructors with static factory methods and carefully chosen names to highlight their differences.


       A second advantage of static factory methods is that, unlike constructors, they are not required to create a new object each time they’re invoked. This allows immutable classes  to use preconstructed instances, or to cache instances as they’re constructed, and dispense them repeatedly to avoid creating unnecessary duplicate objects. The Boolean.valueOf(boolean) method illustrates this technique: it never creates an object. This technique is similar to the Flyweight pattern . It can greatly improve performance if equivalent objects are requested often, especially if they are expensive to create.


        The ability of static factory methods to return the same object from repeated invocations allows classes to maintain strict control over what instances exist at any time. Classes that do this are said to be instance-controlled. There are several reasons to write instance-controlled classes. Instance control allows a class to guarantee that it is a singleton  or non-instantiable. Also, it allows an immutable class  to make the guarantee that no two equal instances exist: a.equals(b) if and only if a==b. If a class makes this guarantee, then its client-scan use the == operator instead of the equals(Object) method, which may result in improved performance. Enum types  provide this guarantee. A third advantage of static factory methods is that, unlike constructors, they can return an object of any subtype of their return type. This gives you great flexibility in choosing the class of the returned object.One application of this flexibility is that an API can return objects without making their classes public. Hiding implementation classes in this fashion leads to a very compact API. This technique lends itself to interface-based frameworks , where interfaces provide natural return types for static factory methods.


         Interfaces can’t have static methods, so by convention, static factory methods for an interface named Type are put in a non-instantiable class  named Types. For example, the Java Collections Framework has thirty-two convenience implementations of its collection interfaces, providing unmodifiable collections, synchronized collections, and the like. Nearly all of these implementations are exported via static factory methods in one non-instantiable class (java.util.Collections). The classes of the returned objects are all nonpublic.

        The Collections Framework API is much smaller than it would have been had it exported thirty-two separate public classes, one for each convenience implementation. It is not just the bulk of the API that is reduced, but the conceptual weight. The user knows that the returned object has precisely the API specified by its interface, so there is no need to read additional class documentation for the implementation classes. Furthermore, using such a static factory method requires the client to refer to the returned object by its interface rather than its implementation class, which is generally good practice. Not only can the class of an object returned by a public static factory method be nonpublic, but the class can vary from invocation to invocation depending on the values of the parameters to the static factory. Any class that is a subtype of the declared return type is permissible. The class of the returned object can also vary from release to release for enhanced software maintainability and performance.


         The class java.util.EnumSet, introduced in release 1.5, has no public constructors, only static factories. They return one of two implementations, depending on the size of the underlying enum type: if it has sixty-four or fewer elements, as most enum types do, the static factories return a RegularEnumSet instance, which is backed by a single long; if the enum type has sixty-five or more elements, the factories return a JumboEnumSet instance, backed by a long array. The existence of these two implementation classes is invisible to clients. If RegularEnumSet ceased to offer performance advantages for small enum types, it could be eliminated from a future release with no ill effects. Similarly, a future release could add a third or fourth implementation of EnumSet if it proved beneficial for performance. Clients neither know nor care about the class of the object they get back from the factory; they care only that it is some subclass of EnumSet.


        The class of the object returned by a static factory method need not even exist at the time the class containing the method is written. Such flexible static factory methods form the basis of service provider frameworks, such as the Java Database Connectivity API (JDBC). A service provider framework is a system in which multiple service providers implement a service, and the system makes the implementations available to its clients, decoupling them from the implementations.


        There are three essential components of a service provider framework: a service interface, which providers implement; a provider registration API, which the system uses to register implementations, giving clients access to them; and a service access API, which clients use to obtain an instance of the service. The service access API typically allows but does not require the client to specify some criteria for choosing a provider. In the absence of such a specification, the API returns an instance of a default implementation. The service access API is the “flexible static factory” that forms the basis of the service provider framework.

        An optional fourth component of a service provider framework is a service provider interface, which providers implement to create instances of their service implementation. In the absence of a service provider interface, implementations are registered by class name and instantiated reflectively. In the case of JDBC, Connection plays the part of the service interface, DriverManager.registerDriver is the provider registration API, DriverManager.getConnection is the service access API, and Driver is the service provider interface.


       There are numerous variants of the service provider framework pattern. For example, the service access API can return a richer service interface than the one required of the provider, using the Adapter pattern. Here is a simple implementation with a service provider interface and a default provider:


// Service provider framework sketch
// Service interface


public interface Service {
... // Service-specific methods go here
}


// Service provider interface
public interface Provider {
Service newService();
}


// Noninstantiable class for service registration and access


public class Services {


private Services() { } // Prevents instantiation 


// Maps service names to services
private static final Map<String, Provider> providers = new ConcurrentHashMap<String, Provider>();
public static final String DEFAULT_PROVIDER_NAME = "<def>";


// Provider registration API
public static void registerDefaultProvider(Provider p) {
registerProvider(DEFAULT_PROVIDER_NAME, p);
}


public static void registerProvider(String name, Provider p){
providers.put(name, p);
}


// Service access API
public static Service newInstance() {
return newInstance(DEFAULT_PROVIDER_NAME);
}


public static Service newInstance(String name) {
Provider p = providers.get(name);
if (p == null)
throw new IllegalArgumentException(
"No provider registered with name: " + name);
return p.newService();
}
}


       A fourth advantage of static factory methods is that they reduce the verbosity of creating parameterized type instances. Unfortunately, you must specify the type parameters when you invoke the constructor of a parameterized class even if they’re obvious from context. This typically requires you to provide the type parameters twice in quick succession:

Map<String, List<String>> m = new HashMap<String, List<String>>();


       This redundant specification quickly becomes painful as the length and complexity of the type parameters increase. With static factories, however, the compiler can figure out the type parameters for you. This is known as type inference. 


For example, suppose that HashMap provided this static factory:

public static <K, V> HashMap<K, V> newInstance() {
return new HashMap<K, V>();
}


Then you could replace the wordy declaration above with this succinct alternative:
Map<String, List<String>> m = HashMap.newInstance();


       Someday the language may perform this sort of type inference on constructor invocations as well as method invocations, but as of release 1.6, it does not.


       Unfortunately, the standard collection implementations such as HashMap do not have factory methods as of release 1.6, but you can put these methods in your own utility class. More importantly, you can provide such static factories in your own parameterized classes. The main disadvantage of providing only static factory methods is that classes without public or protected constructors cannot be subclassed. The same is true for nonpublic classes returned by public static factories. For example, it is impossible to subclass any of the convenience implementation classes in the Collections Framework. 

       Arguably this can be a blessing in disguise, as it encourages programmers to use composition instead of inheritance. A second disadvantage of static factory methods is that they are not readily distinguishable from other static methods. They do not stand out in API documentation in the way that constructors do, so it can be difficult to figure out how to instantiate a class that provides static factory methods instead of constructors.

       The Javadoc tool may someday draw attention to static factory methods. In the meantime, you can reduce this disadvantage by drawing attention to static factories in class or interface comments, and by adhering to common naming conventions.


Here are some common names for static factory methods:


• valueOf—Returns an instance that has, loosely speaking, the same value as its parameters. Such static factories are effectively type-conversion methods.


• of—A concise alternative to valueOf, popularized by EnumSet .

• getInstance—Returns an instance that is described by the parameters but cannot be said to have the same value. In the case of a singleton, getInstance takes no parameters and returns the sole instance.


• newInstance—Like getInstance, except that newInstance guarantees that each instance returned is distinct from all others.


• getType—Like getInstance, but used when the factory method is in a different class. Type indicates the type of object returned by the factory method.


• newType—Like newInstance, but used when the factory method is in a different class. Type indicates the type of object returned by the factory method.


     In summary, static factory methods and public constructors both have their uses, and it pays to understand their relative merits. Often static factories are preferable, so avoid the reflex to provide public constructors without first considering static factories.