Is your feature request related to a problem or challenge?
No response
Describe the solution you'd like
Currently, common CommonSubexprEliminate LogicalPlan optimizer rule analyzes common sub-expressions in a query. Then caches, common sub-expression by adding a LogicalPlan::Projection if it thinks this is beneficial.
As an example, following query
SELECT c3+c4, SUM(c3+c4) OVER(order by c3+c4)
FROM t
generates following LogicalPlan:
Projection: t.c3 + t.c4, SUM(t.c3 + t.c4) ORDER BY [t.c3 + t.c4 ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
--WindowAggr: windowExpr=[[SUM(CAST(t.c3 + t.c4t.c4t.c3 AS t.c3 + t.c4 AS Int64)) ORDER BY [t.c3 + t.c4t.c4t.c3 AS t.c3 + t.c4 ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW AS SUM(t.c3 + t.c4) ORDER BY [t.c3 + t.c4 ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW]]
----Projection: t.c3 + t.c4 AS t.c3 + t.c4t.c4t.c3, t.c3, t.c4
------TableScan: t projection=[c3, c4]
where t.c3+t.c4 is calculated once in the Projection then referred by subsequent WindowAggr as a column.
However, following query:
SELECT c3+c4, SUM(c3+c4) OVER()
FROM t
generates following LogicalPlan:
Projection: t.c3 + t.c4, SUM(t.c3 + t.c4) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
--WindowAggr: windowExpr=[[SUM(CAST(t.c3 + t.c4 AS Int64)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]]
----TableScan: t projection=[c3, c4]
instead we could generate following plan:
Projection: col(t.c3 + t.c4), SUM(t.c3 + t.c4) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
--WindowAggr: windowExpr=[[SUM(CAST(col(t.c3 + t.c4) AS t.c3 + t.c4 AS Int64)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]]
----Projection: t.c3 + t.c4 AS col(t.c3 + t.c4)
------TableScan: t projection=[c3, c4]
If were to keep track of common sub expression counts globally across different nodes in the LogicalPlan. This will enable us to generate better LogicalPlans.
Describe alternatives you've considered
No response
Additional context
No response
Is your feature request related to a problem or challenge?
No response
Describe the solution you'd like
Currently, common
CommonSubexprEliminateLogicalPlanoptimizer rule analyzes common sub-expressions in a query. Then caches, common sub-expression by adding aLogicalPlan::Projectionif it thinks this is beneficial.As an example, following query
generates following
LogicalPlan:where
t.c3+t.c4is calculated once in theProjectionthen referred by subsequentWindowAggras a column.However, following query:
generates following
LogicalPlan:instead we could generate following plan:
If were to keep track of common sub expression counts globally across different nodes in the
LogicalPlan. This will enable us to generate betterLogicalPlans.Describe alternatives you've considered
No response
Additional context
No response