You have developed your first Apache Apex application, perhaps written integration tests that succeed in your IDE in embedded mode and are now ready to take the application for a spin on the cluster? Maybe you have an Apache Hadoop YARN cluster on Amazon EMR, Google Cloud Dataproc or another environment and would like to use it for this purpose?

In order to launch Apex applications on YARN, an Apex client is required. Apache Apex comes with the Apex CLI, a command line tool that can be used to launch and manage running applications. The official Apache Apex releases provide the source code, but no binary distribution.

If you don’t want to (or cannot!) compile the source code to make the CLI available on your cluster, then you can obtain pre-built binaries for the releases from here (some other download options are available here).

For example:

curl -LSO https://github.com/atrato/apex-cli-package/releases/download/v3.5.0/apex-cli-package-3.5.0-bin.zip
unzip apex-cli-package-3.5.0-bin.zip
./apex-cli-package-3.5.0/bin/apex

Type help to get a list of commands (or see documentation for more information).

The YARN services provide some information about containers and logs, but YARN is agnostic to the application framework and not sufficient to gain detailed visibility into the execution.

For example, we may want to look at containers to see if they execute as expected with list-containers:

apex (application_1492303103790_0001) > list-containers
{"containers": [
  {
    "id": "container_1492303103790_0001_01_000001",
    "host": "apex-sandbox:45013",
    "state": "ACTIVE",
    "jvmName": "1474@apex-sandbox",
    "lastHeartbeat": "-1",
    "numOperators": "0",
    "operators": null,
    "memoryMBAllocated": "1024",
    "memoryMBFree": "231",
    "gcCollectionTime": "0",
    "gcCollectionCount": "0",
    "containerLogsUrl": "http:\/\/apex-sandbox:8042\/node\/containerlogs\/container_1492303103790_0001_01_000001\/apex",
    "startedTime": "1492303174918",
    "finishedTime": "-1",
    "rawContainerLogsUrl": "http:\/\/apex-sandbox:8042\/logs\/containers\/application_1492303103790_0001\/container_1492303103790_0001_01_000001"
  },
  {
    "id": "container_1492303103790_0001_01_000002",
    "host": "apex-sandbox:45013",
    "state": "ACTIVE",
    "jvmName": "1631@apex-sandbox",
    "lastHeartbeat": "1492303193193",
    "numOperators": "1",
    "operators": {"1": "rand"},
    "memoryMBAllocated": "1024",
    "memoryMBFree": "131",
    "gcCollectionTime": "446",
    "gcCollectionCount": "5",
    "containerLogsUrl": "http:\/\/apex-sandbox:8042\/node\/containerlogs\/container_1492303103790_0001_01_000002\/apex",
    "startedTime": "1492303186523",
    "finishedTime": "-1",
    "rawContainerLogsUrl": "http:\/\/apex-sandbox:8042\/logs\/containers\/application_1492303103790_0001\/container_1492303103790_0001_01_000002"
  },
  {
    "id": "container_1492303103790_0001_01_000004",
    "host": "apex-sandbox:45013",
    "state": "ACTIVE",
    "jvmName": "1727@apex-sandbox",
    "lastHeartbeat": "1492303192991",
    "numOperators": "1",
    "operators": {"3": "console"},
    "memoryMBAllocated": "1024",
    "memoryMBFree": "84",
    "gcCollectionTime": "65",
    "gcCollectionCount": "4",
    "containerLogsUrl": "http:\/\/apex-sandbox:8042\/node\/containerlogs\/container_1492303103790_0001_01_000004\/apex",
    "startedTime": "1492303188663",
    "finishedTime": "-1",
    "rawContainerLogsUrl": "http:\/\/apex-sandbox:8042\/logs\/containers\/application_1492303103790_0001\/container_1492303103790_0001_01_000004"
  },
  {
    "id": "container_1492303103790_0001_01_000003",
    "host": "apex-sandbox:45013",
    "state": "ACTIVE",
    "jvmName": "1679@apex-sandbox",
    "lastHeartbeat": "1492303192261",
    "numOperators": "1",
    "operators": {"2": "picalc"},
    "memoryMBAllocated": "1024",
    "memoryMBFree": "153",
    "gcCollectionTime": "393",
    "gcCollectionCount": "5",
    "containerLogsUrl": "http:\/\/apex-sandbox:8042\/node\/containerlogs\/container_1492303103790_0001_01_000003\/apex",
    "startedTime": "1492303187455",
    "finishedTime": "-1",
    "rawContainerLogsUrl": "http:\/\/apex-sandbox:8042\/logs\/containers\/application_1492303103790_0001\/container_1492303103790_0001_01_000003"
  }
]}

Status of the container and frequently changing container IDs may be an indicator that something is wrong and a reason to look into the log files. For each container we can also see the operators that are deployed in it, more information about operators can be obtained using the list-operators command (in this case with filtering of the response):

# apex (application_1492303103790_0001) > list-operators picalc
{"operators": [{
  "id": "2",
  "name": "picalc",
  "className": "org.apache.apex.examples.pi.PiCalculateOperator",
  "container": "container_1492303103790_0001_01_000003",
  "host": "apex-sandbox:45013",
  "totalTuplesProcessed": "25162000",
  "totalTuplesEmitted": "615",
  "tuplesProcessedPSMA": "87288",
  "tuplesEmittedPSMA": "2",
  "cpuPercentageMA": "3.007415657392253",
  "latencyMA": "6",
  "status": "ACTIVE",
  "lastHeartbeat": "1492303481306",
  "failureCount": "0",
  "recoveryWindowId": "6409393328046998103",
  "currentWindowId": "6409393328046998119",
  "ports": [
    {
      "name": "input",
      "type": "input",
      "totalTuples": "25162000",
      "tuplesPSMA": "87288",
      "bufferServerBytesPSMA": "834000",
      "queueSizeMA": "1001",
      "recordingId": null
    },
    {
      "name": "output",
      "type": "output",
      "totalTuples": "615",
      "tuplesPSMA": "2",
      "bufferServerBytesPSMA": "38",
      "queueSizeMA": "0",
      "recordingId": null
    }
  ],
  "unifierClass": null,
  "logicalName": "picalc",
  "recordingId": null,
  "counters": null,
  "metrics": {},
  "checkpointStartTime": "1492303473094",
  "checkpointTime": "68",
  "checkpointTimeMA": "133"
}]}

Various metrics provide an insight into the execution of individual operators. The currentWindowId should progress continuously and the tuple count metrics provide an indication of how much data was processed (if the operator was a source, then emitted tuples may reflect how many elements were consumed). If an application “does not work”, it is typically best to start analysis with the input operators and work from there downstream.

The Apex CLI is performing its operations by talking to the YARN resource manager and to the Apex application master through a REST API. It is possible to write other tools against that same application master REST API for custom monitoring and diagnostics. It is also possible to provide operator specific metrics which will be available in the operator details along with the system metrics.

Check out the Apache Apex docs for more info on how to get started with Apex.