Framework for Computation Offloading in Mobile Cloud Computing

— The inherently limited processing power and battery lifetime of mobile phones hinder the possible execution of computationally intensive applications like content-based video analysis or 3D modeling. Offloading of computationally intensive application parts from the mobile platform into a remote cloud infrastructure or nearby idle computers addresses this problem. This paper presents our Mobile Augmentation Cloud Services (MACS) middleware which enables adaptive extension of Android application execution from a mobile client into the cloud. Applications are developed by using the standard Android development pattern. The middleware does the heavy lifting of adaptive application partitioning, resource monitoring and computation offloading. These elastic mobile applications can run as usual mobile application, but they can also use remote computing resources transparently. Two prototype applications using the MACS middleware demonstrate the benefits of the approach. The evaluation shows that applications, which involve costly computations, can benefit from offloading with around 95% energy savings and significant performance gains compared to local execution only.


IV. INTRODUCTION
ESOURCE-DEMANDING multimedia applications such as 3D video games are being increasingly demanded on smart phones.Even if mobile hardware and mobile networks continue to evolve and to improve, mobile devices will always be resource-poor, less secure, with unstable connectivity, and with constrained energy.Resource poverty is major obstacle for many applications [14].Therefore, computation on mobile devices will always involve a compromise.For example, onthe-fly editing of video clips on a mobile phone is prohibited by the energy and time consumption.Same performance and functionalities on mobile devices still cannot be obtained as on desktop PCs or even notebooks when dealing with high resource-demanding tasks.
Recently, the combination of cloud computing [11], wireless communication infrastructure, ubiquitous computing devices, location-based services, and mobile Web, has laid the foundation for a novel computing model, called mobile cloud computing [9].It provides to users an online access to unlimited computing power and storage space.The cloud abstracts the complexities of provisioning computation and storage infrastructure.The end user uses them as utility and in reality they can be far-away data center or nearby idle hardware.
Offloading has gained big attention in mobile cloud computing research, because it has similar aims as the emerging cloud computing paradigm, i.e. to surmount mobile devices' shortcomings by augmenting their capabilities with external resources.Offloading or augmented execution refers to a technique used to overcome the limitations of mobile phones in terms of computation, memory and battery.Such applications, which can adaptively be split and parts offloaded [6,18], are called elastic mobile applications.Basically, this model of elastic mobile applications gives the developers the illusion as if they are programming virtually much more powerful mobile devices than the actual capacities.Furthermore, elastic mobile applications can run as standalone mobile applications, but use also external resources adaptively.Which portions of the application are executed remotely is decided at runtime based on resource availability.In contrast, client/server applications have static partitioning of code, data and business logic between the server and client, which is decided in the development phase.
Our contributions include integration with the established Android application model for development of "offloadable" applications, a lightweight application partitioning and a mechanism for seamless adaptive computation offloading.We propose Mobile Augmentation Cloud Services (MACS), a services-based mobile cloud computing middleware.Android applications that use the MACS middleware benefit from seamless offloading of computation-intensive parts of the application into nearby or remote clouds.First, from the developer perspective, the application model stays the same as on the Android platform.The only requirement is that computation-intensive parts are developed as Android services, each of which encapsulates specific functionality.Second, according to different conditions/parameters, the modules of program are divided into two groups; one group runs locally, the other group is run on the cloud side.The decision for partitioning is an optimization problem according to the input conditions of the cloud and devices, such as CPU load, available memory, remaining battery power on devices, bandwidth between the cloud and devices.Third, based on the solution of the optimization problem, our middleware offloads parts to the remote clouds and returns the corresponding results back.Two Android applications on top of MACS demonstrate the potential of our approach.
In the rest of the paper, we first review related research in work mobile cloud computing in Section 2. Then we describe our MACS middleware with detailed descriptions of the implementation (Section 3).We explain the offloading model in our middleware in Section 4. In Section 5 we introduce two use case applications, the setup of the evaluation and the corresponding evaluation.After that, we discuss the results in Section 6.Finally, we draw conclusions and refer to the future work.

V. RELATED WORK
Previous work has proposed many mechanisms that address the challenges of seamless offloaded execution from a device to a computational infrastructure (cloud).The encapsulation of the mobile device's software stack into a virtual machine image and executing it on a more powerful hardware can be considered as a "brute force" approach to offloading, such as proposed by Chun and Maniatis [1] or Satyanarayanan et al. [14].More recently, Kosta et al. [8] further improved this idea.Although such virtualized offloading can be considered as simple and general solution, it lacks flexibility and control over offloadable components.Therefore, we consider that application developers can better organize their application logic using the established Android service design patterns and benefit from the MACS middleware.
Ou et al. [13] propose class instrumenting technique, i.e. a process to transform code classes into a form which is suitable for remote execution.Two new classes are generated from the original class, one is an instrumented class which has the real implementation and the same functionality as the original class, the other is a proxy class, whose responsibility is only to call the function written in the instrumented class.Then, the instrumented class can be offloaded to remote cloud, and the call will be invoked from the instrumented in the remote cloud.In MACS we adopt similar idea, but unlike Ou et al. [13] we use standardized language for the proxy interfaces which is already widespread in the Android platform.The Cuckoo framework [6] and MAUI system in [2] implement a similar idea.Our MACS middleware is inspired by these solutions.However, MACS middleware does extra profiling and resource monitoring of applications and adapts the partitioning decision at runtime.
An important challenge in partitioned elastic applications is how to determine which parts of code should be pushed to the remote clouds.The graph based approach to model the application has been used in several works.Giurgiu et al. [3] use "consumption" graphs and decide which part should be running locally or remotely.It finds a cut of the consumption graph with a goal function, which minimizes the total sum of communication cost, transmitting cost and the cost of building local proxies.The AIDE platform [4] uses a component-based offloading algorithm, which mainly focuses on minimum historical transmission between two partitions.The 1) (  k partitioning algorithm, introduced by Ou et al. [13], is applied to a multi-cost graph representing the class-based components.A similar approach is done by Gu et al. [4,5].Zhang et al. [19,17] use a general Bayesian inference to make the partitioning decision.However, executing constantly executing graph or inference algorithms on the mobile device takes significant resources on the constrained device.We use an integer linear optimization model to describe the offloading so that it is not only easy to implement, but it can also be independently solved if the remote clouds are not available temporarily.

VI. MOBILE AUGMENTATION CLOUD SERVICES
The goal of our MACS middleware is to enable the execution of elastic mobile applications.Zhang et al. [17] consider elastic applications to have two distinct properties.First, an elastic application execution is divided partially on the device and partially on the cloud.Second, this division is not static, but is dynamically adjusted during application runtime.The benefits of having such an application model are that the mobile applications can still run independently on mobile platforms, but can also reach cloud resources on demand and availability.Thus, mobile applications are not limited by the constraints of the existing device capacities.
MACS architecture is depicted on Figure 1.In order to use MACS middleware, the application should be structured using established Android services pattern.Android is already established as the most prominent mobile phone platform.Additionally, its application architecture model allows decomposition of applications into service components which can be shared between applications.A MACS application consists of an application core (Android activities, GUI, access to devices sensors) which cannot be offloaded, and multiples services ( i S ) that encapsulate separate application functionality (usually resource-demanding components) which can be offloaded ( Ri S ).The services communicate with the application through an interface defined by the developer in the Android interface definition language (AIDL).
As service-based implementation is adopted, for each service we can profile following metadata:  type: whether can be offloaded or not  memory cost: the memory consumption of the service on the mobile device  code size: size of compiled code of the service  dependency information on other services, for each related module, we collect following: o transfer size: amount of data to be transferred o send size: amount of data to be sent o receive size: amount of data to be received Metadata is obtained by monitoring the application execution and environment.
Android services are using Android inter-process communication (IPC) channels for RPC.The services are registered in the Service Manager, and a binder maintains a handle for each service.Then an application, that wants to use a service, can query the service in the Service Manager.Upon service discovery, the Android platform will create a service proxy for the client application.All the requests to access the service will be sent through the service proxy, and then forwarded to the service by the binder.After processing the requests, results are sent back to the service proxy on the client application through the binder.Finally, the client gets the result from the service proxy.From client's point of view, there is no difference between calling a remote service or calling a local function.
The offload manager determines the execution plan, and then the services to be offloaded are pushed to the cloud.The results are sent back to the application upon completion.Our approach is similar to the Cuckoo framework [6], however, MACS allows dynamic application partitioning at runtime, where Cuckoo only enables static partitioning at compile time.MACS monitors the execution of the services and the environment parameters.Whenever the situation changes, the middleware can adapt the offloading and partitioning.
The main goal of MACS is to enable transparent computation offloading for mobile applications.Therefore, our middleware tries to fit the usual Android development process and bring the developers an easier way to offload parts of their applications to remote clouds in a transparent way.MACS hooks into the Android compile system, makes modifications of generated Java files from AIDL in the pre-compile stage.Developers need to include MACS SDK libraries into their Android project.
Since our implementation wants to bring the developers an easier way to distribute their application to remote cloud, the low-level implementation should be transparent to them.The way to hide the low-level implementation is as follows.Recalling the Android compile system and combining with the idea of using the services on Android, the possible way to make modification of generated Java file from AIDL is in the pre-compile stage.Our code is embedded there to realize the transparency to the developers.Each time while the developers compile their projects by using Ant tool, our code will be embedded without notice.The way to add a customized process while building with Ant tool is to write a new target, which can be treated as a task.
At the cloud side, the MACS middleware handles the offload requests from the clients, installs of offloaded services, their initialization and method invokes (s. Figure 9 in the Appendix).The cloud-side MACS middleware is written with pure J2SE so that it can run on any machines with installed J2SE.
MACS middleware monitors the resources on the mobile execution environment and available clouds.It then forms an optimization problem whose solution is used to decide whether the service which contains the called function should be offloaded or not.When the service is determined to be offloaded to the remote cloud, our middleware tries first to execute the service remotely.If there is no such service on the remote clouds, our framework transmits the service code (jar file) to the cloud, and the corresponding results after the service execution are returned to the mobile device.The cloud caches the jar files for subsequent executions.
Except for the computation offloading, our framework also features simple data offloading.If files are needed to be accessed on the remote cloud, MACS file transmission (MACS-FTM) transfers automatically the non-existing files from the local device and vice versa.Basically, the middleware at pre-compile time inserts a line of code after a file object is created by using the File object in Java.This code snippet retrieves file information of the file at runtime.When no such file exists in the remote cloud, MACS-FTM throws an exception which is caught by the middleware which in turn obtains the file from the device.

VII. ADAPTIVE COMPUTATION OFFLOADING
The proposed model and corresponding algorithm are supposed to be applied for scenario which is computationally intensive [7].The requirements for the developer are that the code should be structured in a model in advance.
The developer should also provide or use extra tools to extract meta-information from given modules and then tag each module with some parameters.The tagged parameters are used for deciding on code partition later.
Let us suppose that we have n number of modules which can be offloaded, S 1 , S 2 ... S n .Each of the modules has several properties described as metadata, i.e. for specific module i, its memory cost mem i , code size code i .Let us consider the k number of related module which can be offloaded.For each of them, we denote the transfer size tr 1 , tr 2 ... tr k , send size send 1 , send 2 ...send k , receive size rec 1 , rec 2 ...rec k , where {1..k} ⊆ {1..n} and send k + rec k = tr k .Meanwhile, we introduce x i for module i, which indicates whether the module i is executed locally (x i = 0) or remotely (x i = 1).The solution x 1 , x 2 ...x n represents the required offloading partitioning of the application.
The cost function is represented as follows: where There are three parts in the cost function.The first part depicts the transfer costs for the remote execution of services, including the transfer cost of its related services which are not at the same execution location.The latter part of Eq.( 2) implicitly includes the dependency relationship between modules, i.e. if the output of one module is an input of another.The memory c contains the memory costs on the mobile device,

and CPU c
the CPU costs on the mobile device is, where  is the convert factor mapping the relationship between code size and CPU instructions, which is taken to be 10 based on [12].tr w , mem w , CPU w are the weights of each costs, which can lead to different objectives, for example lowest memory costs, lowest CPU load or lowest interaction latency.The three constraints are expressed as the following: Minimized memory usage.First, the memory costs of resident service can not be more than available memory on the mobile device, i.e.
where mem avail can be obtained from the mobile device, 1 f is the factor to determine the memory threshold to be used, because the application can not occupy the whole free memory on the mobile device.
Minimized energy usage.Second, for the offloaded services, the energy consumption of offloading should not be greater than not offloading [10] The energy costs of offloading some parts to remote cloud can be expressed as the sum of energy consumption during waiting for the results from the cloud idle E , transferring (including sending s E and receiving s E ) the services to be offloaded [10] and also the additional data which may be The idle time of the mobile device waiting for the result from cloud can be treated as the execution time of remote cloud, so the formula becomes The local execution time can be expressed as the ratio of CPU instructions to local CPU frequency, meanwhile, the remote execution time consists of the time consumed by CPU, file transmission and the overhead of our middleware., is the optimized partitioning strategy.By using integer liner programming (ILP) on the mobile device, MACS gets a global optimization result.Whenever the the parameters in the model change, such as available memory or network bandwidth, the partitioning is adapted by solving the new optimization problem.
MACS middleware defines the abstraction of a decision maker so that we can apply different decision makers which determine the execution location of each "offloadable" service.
In our experiments we used Cream 1 , an open-source class 1 http://bach.istc.kobe-u.ac.jp/cream/ library for constraint programming in Java.It provides enough features to run the decision solver on an Android platform, with acceptable solving speed in the order of tens of milliseconds.During the calculation, as an objective function is taken the sum of transmission cost, memory cost and CPU cost.
Although MACS introduces the overhead because of using a proxy for communication between offloaded services and the mobile application, the overhead is relatively small, which is shown in the evaluation.MACS also profiles each offloaded module/service to dynamically change its execution plan and adjust the partitioning.

VIII. EXPERIMENTAL EVALUATION OF OFFLOADING
We evaluate our MACS middleware with two use case smartphone applications.The first application implements the well-known N-Queens problem.It is chosen because the performance bottleneck represents a pure computation problem.This use case can easily show the overhead introduced by MACS middleware.The second application involves face detection and recognition in video files.This use case involves lots of computation, but also requires much more memory resources to process and obtain results.Table 8 shows the problem space in terms of N. The second case can process a video file, and detect faces from the video file, cluster them and provide the time point cues for video navigation.The results can then be used for faster video navigation on small screen devices (Figure 2).The video file is processed with OpenCV2 and FFmpeg3 libraries.We use FFmpeg to open video files, and scan it frame by frame.Face detection in video files is done by detecting faces in video frames.In the processing, faces in the video file are detected by the existed implementation in OpenCV, and then the detected faces are recognized by the method proposed by Turk and Pentland [15], and after that, the faces are clustered.
In the implementation we used JavaCV4 for video processing.When the application gets the results from the processing, it shows all detected faces as a clustered view.The user can select a cluster, and then navigate to the time points where that face occurs in the video.Thus, the application can accelerate navigation in a video based on persons that occur within.

A. Setup of the Evaluation
Hardware.The hardware we are using in the evaluation is as follows.A Motorola Milestone mobile phone based on Android platform 2.2 is used in the evaluation.A desktop computer which includes quad-core CPU acts as a cloud provider that can host the offloaded computation.The details about the hardware components for the mobile device and desktop computer are shown in Table 1.
Network Topology.While offloading services to the remote cloud, the mobile phone connects to a nearby access point.Since the wireless local area network is encrypted with Wi-Fi Protected Access 2 (WPA2) security protocol, the data speed is not as fast as non-encrypted considering of the overhead introduced by the security protocol.The desktop computer is connected to the Internet directly by network cable, whose bandwidth is 100 Mbps.
Energy Estimation Model.We adopt a method as the one proposed by Zhang et al. [16], a power model for an Android phone and a measurement application for the energy consumption on the Android-based mobile device on the fly.
Using their software, the energy consumption of each hardware component of the Motorola Milestone such as LCD, CPU and Wi-Fi can be measured separately (see Table 2).

B. Results of the Use Case 1
We use the algorithm by Sedgewick and Wayne 5 .The basic idea is to use recursion and back-tracking to enumerate all possible solutions.Although it is not the best algorithm, it is often used for solving the N-queens puzzle.It is clear that with the increase of N, much more steps are spent to find solutions, which is extremely time-consuming for the mobile device.
We run the N-Queens on the local device and offload to the remote cloud separately, for 1 = N to 13 = N . For 14 = N , it will take hours to finish on the local device, it is not realistic not to be offloaded while doing computation after 14 = N .Table 8 in the Appendix provides overview of the problem space in terms of N . 5http://introcs.cs.princeton.edu/java/23recursion/Queens.java.html  Figure 3 shows the time duration of execution of the specific calculation service.From 1 = N to 9 = N , the execution speed on the local device is acceptable compared with the remote speed and to run the method locally is better, but after 10 = N , the remote speed dominates to be the better option as the computation time dominates the total time in the rest cases, and the remote execution speed is also relative stable, there is no huge variation for remote speed.
Figure 4 shows the different times, which are made up of the total spent time.With the increase of the queens number, the local execution time increases outstandingly, especially from 9 = N , the execution time of calculating solution occupies more than half of the total time.Meanwhile, the overhead, our framework brings in, remains constant.As for the remote execution, the overhead is broken down into three parts, one is the package offloading time, one is the decision making time, the rest is the residual overhead.It shows that our decision model costs only little time to finish the determination.The transmission time of remote package occupies also few periods of total time, since the remote package is small.The execution time of solving the N-queens is relative stable, except for 11 = N , which is a deviation during the execution and measurement.
The last Figure 5 shows the results of consumed energy with and without offloading.As for the local execution, most of the time is spent on computation, since our energy model involves CPU and LCD, and the LCD is always on while computation, so that the energy consumption of CPU and LCD dominates the total energy consumption of local execution.The execution time is significantly increased from 9 = N compared to the remote execution, which leads to the highest energy value.In contrast to those, the remote execution time is nearly stable, so that the consumed energy is almost at the same level.

C. Results of the Use Case 2
Six video files are used in the evaluation.All of them belong to a same original video file with different length of time, 10, 20, 30, 40, 50 and 60 seconds (see Table 3).The video resolution is 720 pixels  480 pixels, the fps is set to 30, the overall bit rate is 1500 Kbps and the video is compressed with MPEG-4 format, 3GPP Media Release 5 profile.The audio codec is AAC, and the bit rate for audio is 30 Kbps.
In order to get a more accurate estimation of execution time which is used in the model, we first run the face detection services locally, and keep track of the spent time (second) and the file size (bytes), and then a linear regression is used to reflect the relationship between the spent time and the file size.Considering the number of CPU instructions provided by Android API, it can only be used to make estimation on the execution which involves no native calls, we don't directly use that count, but focus on the execution time.The regression shows that, 246.09 * 0.0005 =  FileSize Time (16) We use this heuristic equation in our model to determine the execution time.
On Figure 6 can be seen clearly that the execution time is reduced hugely while offloading compared with the local execution.Even dealing with the 10 seconds video file, the local device spends more than 15 minutes on processing and detecting, but the corresponding remote offloading takes only less than 1 minute.Each time the computation is offloaded to the remote cloud, the execution speed can be reduced by more than 20 times, see Table 4.It is absolutely not acceptable to let the CPU of local mobile device 100% load for such long time, and it confirms that the video processing task is still a huge burden for the mobile device.With the huge difference between local and remote execution time, we can conclude that the local energy consumption is worse than the remote ones, because most of the time is spent on CPU and LCD, which are the top two energy consuming components (s. Figure 7).Table 5 also depicts the energy saving situation while offloading.Energy can be saved by more than 94% thanks to the offloading.Figure 8 describes the composition of the local and remote total spent time in details.As the execution time increases with the bigger video file size, the overhead our framework brings occupies about only 0.1%, which can be nearly omitted.Regarding to the remote execution, the total time spent consists of execution time, transmission time of needed files, package transmission (service offloading) time, and decision making time.With the increase of the video file size, the file transmission time also rises, but compared to the total time, it is not significant.The decision making does its determination in less than 1 second, which is only 1% of the total time spent.The total overhead our framework brings is about 5% of the total time, which is acceptable considering about the speed up and energy save above.
The face clustering can only be done on the remote cloud because of the software limitations on the local mobile device.
Detailed results can be seen in Table 6 in the Appendix.Most of the execution time is spent on building/rebuilding the training set.If the training set is already available before the remote execution, then the estimated execution time can be significantly reduced.

IX. DISCUSSION
Offloading perhaps is not suited for every mobile application, but from the results of the two use cases, we see that when an application uses complex or time consuming algorithms such as recursion, by offloading those parts into the cloud, time and energy consumption are reduced, so that the local execution time is reduced to an acceptable level.Offloading can lower the CPU load on a mobile device significantly.It can also save lots of energy, which indicates that the battery lifetime can be increased compared to the local execution, as shown in the second use case, where more than 90% of energy is saved and the calculation speed is up to 20 times over local execution.
The results also prove that the overhead of our framework is small and acceptable with the increase of needed computation, it is better to push those computations which cost considerable resources to the remote cloud.But for the small N in the N-Queens problem, the overhead occupies almost half of the total execution time because of the needed computation is small so that it takes only little time to obtain the results.This shows a clear advantage of local execution over remote offloading when the needed computation is not much.In a word, the more computation is needed, offloading has more advantages.Since we use Wi-Fi in the evaluation, the time of sending files and receiving results has small proportion, but if 3G or GPRS are used, the offloading time will surely increase.

X. CONCLUSIONS AND FUTURE WORK
The results show that the local execution time can be reduced a lot through offloading, which is sometimes not acceptable for users to wait for, and by pushing the computation to the remote cloud can lower the CPU load on mobile devices significantly thanks to the remote cloud, since most of the computations are offloaded to the remote cloud.Meanwhile, lots of energy can be saved which indicates users can have more battery time compared to the local execution.The results also prove that the overhead of our framework is small.
Our framework supports offloading of multiple Android services.If there are multiple services in one application and all of those services can be offloaded to the remote clouds, our   resource monitor natively supports this situation and can make the corresponding allocation determination, so that some of the services should be offloaded and the rest of the services should be run locally.The next steps are to enable parallelization of the offloaded services.Additionally, we can extend the current middleware so that it supports automatic partitioning arbitrary mobile applications.A great challenge is how to estimate the characteristics of an application depending on different input parameters, which is precisely the relationship between the input of the invoked method and the execution time.We could characterize the relationship between execution time and input parameters by running the target application several times and adapt the offloading algorithm accordingly.APPENDIX Figure 9 presents the registration flow of "offloadable" services (a) and the optimization process (b).
In Table 6, the data on the left side of the slash sign is for local execution, whereas on the right side is for remote execution.
Detailed results of use case 2. The data of Table 7 in the left side of slash is local execution, the middle one is remote execution of face detection, the last one is remote execution of face detection with face recognition.
Table 8 shows the number of possible solutions of the N-Queens problem in terms of N.

Fig. 1
Fig. 1 MACS architecture.Application logic is structured from multiple Android services ( i S ).Some of them can be offloaded into the cloud and r D are the total data sizes to be sent and received, extra D is the size of extra data needed because of offloading, which is determined at runtime, s B and r B are the bandwidths of sending and receiving data, and cloud S is the remote execution speed.Additionally, service is offloadable or not.Minimized execution time.Third, the third constraint is enabled when the user prefers fast execution, i.e.
which our framework brings in.According to the constraints above, we now transform the partitioning problem to an optimization problem.The middleware determines the execution location by solving a linear integer optimization problem.The decision maker receives the input parameters and execution constraints.It then returns the corresponding running locations.The solution of

Fig. 2 .Fig. 4 .
Fig. 2. Snapshot of prototype application of face detection and recognition using MACS middleware.

Fig. 8 .
Fig. 8.Total time distribution of face detection: (a) local execution and (b) remote execution.

TABLE I HARDWARE
COMPONENTS OF MOBILE DEVICES AND DESKTOP COMPUTER

TABLE V VIDEO
DURATION AND SPEED UP OF SAVED ENERGY

TABLE IV VIDEO
DURATION AND SPEED UP OF FACE DETECTION IN VIDEO FILES

TABLE VII EVALUATION
RESULTS OF USE CASE 2 IN DETAILS

TABLE VI EVALUATION
RESULTS OF USE CASE 1 IN DETAILS