Android System Stability Basics

Budhdi Sharma
4 min readJan 4, 2020

As fun as it is being spontaneous, stability is a very needed aspect of our life. This is because stability is directly proportional to being dependable. Be it our job, our relationships, our income, and everything else, there needs to be some stability in everything.

Brief description of stability

The stability of the Android system is very important to the user experience. From the performance point of view of stability issues, there are reboots, automatic shutdowns, inability to boot, frozen screens, black screens, flashbacks, and no response; etc. There are two main categories: Timeout and crash. The main classifications are as follows

1. Timeout

It cannot be completed for a long time. This is only a descriptive proposition. For the system, it must be specified that each operation exceeds the specified threshold. The completion of the reputation is judged as Timeout.

For the Android system, the more common are Service, Broadcast, provider, and input. When the ordinary app process has not been executed for a certain period of time, an Application Not Responding (ANR) dialog box will pop up. If the app Running in the system process, more accurately, it should be (System Not Responding, SNR). Although there are differences between ANR and SNR, everyone is commonly referred to as the ANR problem.

Understand the triggering principle of Android ANR. For component ANR problems, some require a longer execution time. Even if ANR is triggered, it can run normally as long as more time is given; some are due to a deadlock, even if it takes a long time. The problem of time cannot be recovered.

  • Service Timeout: For example, the foreground service is not completed within 20 sec.
  • broadcast Timeout: For example, the foreground broadcast is not completed within 10 sec.
  • ContentProvider Timeout: Content provider execution timeout
  • InputDispatching Timeout: Input event dispatch timeout 5s, including key and touch events.

In addition to ANR, there is another type that is WatchDog. WatchDog works, the most common is the “watchdog” thread running in the system process; and the “FinalizerWatchdogDaemon” running in various app processes (including the system process) The thread is used to monitor the process of performing GC. The daemon thread “FinalizerDaemon” recycles a monitor that takes too long for an object; of course, not to these, and watchdogs such as dex2oat, wifi, etc.

When ANR or WatchDog occurs, you need to collect system-related information for analysis and repair of abnormalities. See Understanding the Android ANR Information Collection Process. The output of the process Trace is the most core link in the entire process. In addition, this process will clear / data The old file /anr/traces.txt, then the original traces information is generally output to the dropbox first, and in some cases, traces will be lost. Java and Native processes use different strategies, as follows:

processes strategies

2. Crash

The crash problem, no doubt this is not a problem that can be solved in time, but an unknown exception occurs. Once the crash is triggered, the corresponding call stack will appear, but the traces of each process will not be output.

For the Java layer Crash, understanding the Java Crash processing flow is often caused by throwing an uncaught exception uncaughtException. Is it no problem to catch all the exceptions in the system, this depends on the situation, Sometimes an exception forced crash may leave a bigger problem. Some exceptions are thrown because of the need to analyze the root cause in-depth to solve the problem from the root rather than simply catch all the exceptions.

For Native Crash, understand the Native Crash processing flow, which is a crash caused by the process receiving the signal. When the process receives the signal, it will trigger the signal processing function, send information to the debugged process through the socket, and the debugged process receives the event After attaching to the target process through ptrace, and obtaining key information such as cpu / memory / traces, there are many cases of Native crash. The most common scenario is the SIGSEGV segment error exception, which is often an exception in memory, such as access to a memory address with insufficient permissions.

For Kernel layer Crash, this is a large category that is difficult to analyze. Many situations are caused by hardware, comparing CPU problems, hardware driver problems, etc.

In this article, I have mentioned about basics of android system stability. For more information, I will write some more articles. So to know about that please keep following and give a clap if you like this article.

I hope you enjoyed basic stability in the android system. If you have any comments or questions, please join the forum discussion below!

Thanks for the support :)

--

--

Budhdi Sharma

As an AOSP developer, I specialize in creating robust framework and system applications that seamlessly integrate with embedded systems on various SOCs