编程技术网

关注微信公众号,定时推送前沿、专业、深度的编程技术资料。

 找回密码
 立即注册

QQ登录

只需一步,快速开始

极客时间

SparkSQL 是否支持子查询?:Does SparkSQL support subquery?

AryanSethi spark 2022-5-7 16:40 18人围观

腾讯云服务器
SparkSQL 是否支持子查询?的处理方法

我在 Spark shell 中运行这个查询,但它给了我错误,

I am running this query in Spark shell but it gives me error,

sqlContext.sql( "select sal from samplecsv where sal < (select MAX(sal) from samplecsv)" ).collect().foreach(println) 

错误:

java.lang.RuntimeException: [1.47] 失败: ``)'' 预期但发现标识符 MAX

java.lang.RuntimeException: [1.47] failure: ``)'' expected but identifier MAX found

从samplecsv中选择sal,其中sal <(从 samplecsv 中选择 MAX(sal))^在 scala.sys.package$.error(package.scala:27​​)谁能解释一下,谢谢

select sal from samplecsv where sal < (select MAX(sal) from samplecsv) ^ at scala.sys.package$.error(package.scala:27) Can anybody explan me,thanks

问题解答

计划功能:

  • SPARK-23945(Column.isin() 应该接受单个-列 DataFrame 作为输入).
  • SPARK-18455(对相关子查询处理的一般支持).立>
  • SPARK-23945 (Column.isin() should accept a single-column DataFrame as input).
  • SPARK-18455 (General support for correlated subquery processing).

Spark 2.0+

Spark SQL 应该支持相关和不相关的子查询.请参阅 SubquerySuite 了解详情.一些示例包括:

Spark SQL should support both correlated and uncorrelated subqueries. See SubquerySuite for details. Some examples include:

select * from l where exists (select * from r where l.a = r.c) select * from l where not exists (select * from r where l.a = r.c) select * from l where l.a in (select c from r) select * from l where a not in (select c from r) 

不幸的是,目前(Spark 2.0)无法使用 DataFrame DSL 表达相同的逻辑.

Unfortunately as for now (Spark 2.0) it is impossible to express the same logic using DataFrame DSL.

火花<2.0

Spark 支持 FROM 子句中的子查询(与 Hive <= 0.12 相同).

Spark supports subqueries in the FROM clause (same as Hive <= 0.12).

SELECT col FROM (SELECT * FROM t1 WHERE bar) t2 

它根本不支持 WHERE 子句中的子查询.一般来说,如果不升级为笛卡尔连接,就无法使用 Spark 表达任意子查询(特别是相关子查询).

It simply doesn't support subqueries in the WHERE clause.Generally speaking arbitrary subqueries (in particular correlated subqueries) couldn't be expressed using Spark without promoting to Cartesian join.

由于子查询性能通常是典型关系系统中的一个重要问题,并且每个子查询都可以使用 JOIN 表示,因此这里没有功能损失.

Since subquery performance is usually a significant issue in a typical relational system and every subquery can be expressed using JOIN there is no loss-of-function here.

这篇关于SparkSQL 是否支持子查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程技术网(www.editcode.net)!

腾讯云服务器

相关推荐

阿里云服务器
关注微信
^