- Free Software + functional programming
- 1st time speaker @ FOSDEM
- Maintains displaylink-rpm1, GPUL2 and Trackula3
Who is OpenShine?
- Small consultancy firm
- Based in Madrid, Spain
- Remote friendly
Who is OpenShine?
Elasticsearch queries…
{ "aggs": { "by_month": { "date_histogram": { "field": "date", "interval": "month" }, "aggs": { "headquarters": { "terms": { "field": "headquarters" } }, "aggs": { "headcount": { "cardinality": "userId" }, "aggs": { /* ... */ } } } } } }
How much does an Elasticsearch query impact cluster performance?
Answer questions about code without running it.
Not only answering questions, but also modifications.
E.g., -O3
in the GNU Compiler Collection.
E.g.
E. Albert, P. Arenas, S. Genaim, G. Puebla, and D. Zanardini. 2007. Cost analysis of java bytecode. In Proceedings of the 16th European Symposium on Programming (ESOP'07), Rocco De Nicola (Ed.). Springer-Verlag, Berlin, Heidelberg, 157-172.
GCC -march
-mtune
-march
GCC uses name to determine what kind of instructions it can emit when generating assembly code.
Restricts the set of instructions.
-mtune
Specify the name of the target processor for which GCC should tune the performance of the code.
Choose the best instructions for a given set of operations.
ElasticSearch queries are not exactly compiled. Lucene AST Query.
Our cost analysis is generated from the parse tree directly.
It's a recursive tree. Costs are calculated from children in a single pass, which leads to \(O(n)\).
More sophisticated analysis possible.
The cost grammar.
Make nodes only depend on themselves and their children.
Our grammar:
GET _searchv?size=0 {"aggs": { /* root aggregation */ "sessions": { /* sub-aggregation */ "terms": { "field": "sessionId", "size": 10 }, "aggs": { "byDay": {/* Leaf aggregation */ "date_histogram": { "field": "@date", "interval": "day" } } } } }}
Not available. Elastic does not expose enough API surface.
Search engine would need to expose "compiled ASTs" a state where steps can be traced to the execution model effectively. À-la Spark's Catalyst Optimizer.
Talk is cheap. Show me the code. – Linus Torvalds
Elastic does not even provide basic access to parsed information
/** * Hackity hack. For some reason, the Elastic team has not provided the * only useful method in all the AST: recursion. */ def getSubAggregations(ag: AggregationBuilder) : Seq[AggregationBuilder] = { import implicits._ val factoriesBuilder = ag.getPrivateFieldValue[ AggregatorFactories.Builder]("factoriesBuilder") factoriesBuilder.getAggregatorFactories.asScala }
We reimplement our visibility needs. With dragons.
implicit class PrivateMethodAccessor[A](val instance: A)( implicit c: ClassTag[A]) { def getPrivateFieldValue[B](fieldName: String): B = { AccessController.doPrivileged(new PrivilegedAction[B] { override def run(): B = { val f: Field = c.runtimeClass.getDeclaredField(fieldName) f.setAccessible(true) f.get(instance).asInstanceOf[B] } }) }
When used as ES Plugin, quite hard security conditions:
grant { permission java.lang.RuntimePermission "accessDeclaredMembers"; permission java.lang.reflect.ReflectPermission "suppressAccessChecks"; };
case agg: DateHistogramAggregationBuilder => val dateExpr = agg.dateHistogramInterval.toString 1000 / dateMathExpressionToSeconds(dateExpr) case agg: DateRangeAggregationBuilder => import scala.collection.JavaConverters._ agg.ranges().asScala.map(range => analyzeRangeDates("to", range) - analyzeRangeDates("from", range)) .sum case _ => configForNode.defaultNodeCost
val childrenCost = tree.children.map(_.cost) .foldLeft(1d)(_ + _) val totalNodeCost = nodeCost * childrenCost
All deployments can be configured in the same way: config.yaml
# config.yaml default.maxTreeHeight: 3 default.whitelistChildren: ["terms", "date_histogram"] custom.terms.defaultNodeCost: 10 custom.date_histogram.whitelistChildren: ["terms"]
ESCOVA is a library.
Sub-projects to deploy as:
/_searchv
$ELASTICSEARCH_BACKEND
, if available
Chart available in the repo1. Configurable at values.yaml
.
service: name: escova type: ClusterIP externalPort: 9200 backend: # Elasticsearch backend enabled: false # Redirect non-escova calls to backend ES host: my-internal-elasticsearch port: 9200 # Example configuration limiting the tree size config: | default.defaultNodeCost: 1 default.maxTreeSize: 4
Q & A / Thank you!