官网原文:http://www.yiiframework.com/doc/guide/1.1/zh_cn/database.arr

原文出处:http://tech.pro/tutorial/1555/10-easy-steps-to-a-complete-understanding-of-sql(已经失效,现在收集如下)

官网中后半段为英文,而且中文的内容比英文少一些,先放到这里,之后有时间再翻译。

 

我们已经了解了怎样使用 Active Record (AR) 从单个数据表中获取数据。
在本节中,我们讲解怎样使用 AR
连接多个相关数据表并取回关联(join)后的数据集。

Too many programmers think SQL is a bit of a beast. It is one of the
few declarative
languages
 out
there, and as such, behaves in an entirely different way from
imperative, object-oriented, or even functional languages (although,
some say that SQL is also somewhat
functional
).

为了使用关系型
AR,我们建议在需要关联的表中定义主键-外键约束。这些约束可以帮助保证相关数据的一致性和完整性。

I’m writing SQL every day and embracing SQL with my Open Source
Software jOOQ
. I thus feel compelled to bring the
beauty of SQL a bit closer to those of you still struggling with it. The
following tutorial is destined for

为简单起见,我们使用如下所示的实体-关系(ER)图中的数据结构演示此节中的例子。

  • readers who have already worked with SQL but never completely
    understood it
  • readers who know SQL well but have never really thought about its
    syntax
  • readers who want to teach SQL to others

ca88手机版 1

This tutorial will focus on SELECT statements only. Other DML statements
will be covered in another tutorial. Here are…

信息: 对外键约束的支持在不同的 DBMS 中是不一样的。 SQLite <
3.6.19 不支持外键约束,但你依然可以在建表时声明约束。

10 Easy Steps to a Complete Understanding of SQL.

1. 声明关系 

在我们使用 AR 执行关联查询之前,我们需要让 AR 知道一个 AR
类是怎样关联到另一个的。

两个 AR 类之间的关系直接通过 AR 类所代表的数据表之间的关系相关联。
从数据库的角度来说,表 A 和 B
之间有三种关系:一对多(one-to-many,例如 tbl_user 和 tbl_post),一对一(
one-to-one 例如 tbl_user 和tbl_profile)和 多对多(many-to-many
例如 tbl_category 和 tbl_post)。 在 AR 中,有四种关系:

  • BELONGS_TO(属于): 如果表 A 和 B 之间的关系是一对多,则 表 B 属于
    表 A (例如 Post 属于 User);

  • HAS_MANY(有多个): 如果表 A 和 B 之间的关系是一对多,则 A 有多个
    B (例如 User 有多个 Post);

  • HAS_ONE(有一个): 这是 HAS_MANY 的一个特例,A 最多有一个 B
    (例如 User 最多有一个 Profile);

  • MANY_MANY: 这个对应于数据库中的 多对多 关系。 由于多数 DBMS
    不直接支持 多对多 关系,因此需要有一个关联表将 多对多 关系分割为
    一对多 关系。
    在我们的示例数据结构中,tbl_post_category 就是用于此目的的。在 AR
    术语中,我们可以解释 MANY_MANY 为 BELONGS_TO 和 HAS_MANY 的组合。
    例如,Post 属于多个(belongs to
    many) Category ,Category 有多个(has many) Post.

AR
中定义关系需要覆盖 CActiveRecord 中的 relations() 方法。此方法返回一个关系配置数组。每个数组元素通过如下格式表示一个单一的关系。

'VarName'=>array('RelationType', 'ClassName', 'ForeignKey', ...additional options)

其中 VarName 是关系的名字;RelationType 指定关系类型,可以是一下四个常量之一: self::BELONGS_TO,self::HAS_ONEself::HAS_MANY and self::MANY_MANYClassName 是此
AR 类所关联的 AR
类的名字;ForeignKey 指定关系中使用的外键(一个或多个)。额外的选项可以在每个关系的最后指定(稍后详述)。

以下代码演示了怎样定义 User 和 Post 类的关系:

class Post extends CActiveRecord
{
    ......

    public function relations()
    {
        return array(
            'author'=>array(self::BELONGS_TO, 'User', 'author_id'),
            'categories'=>array(self::MANY_MANY, 'Category',
                'tbl_post_category(post_id, category_id)'),
        );
    }
}

class User extends CActiveRecord
{
    ......

    public function relations()
    {
        return array(
            'posts'=>array(self::HAS_MANY, 'Post', 'author_id'),
            'profile'=>array(self::HAS_ONE, 'Profile', 'owner_id'),
        );
    }
}

信息: 外键可能是复合的,包含两个或更多个列。
这种情况下,我们应该将这些外键名字链接,中间用空格或逗号分割。对于 MANY_MANY 关系类型,
关联表的名字必须也必须在外键中指定。例如, Post中的 categories 关系由外键 tbl_post_category(post_id, category_id) 指定。

AR
类中的关系定义为每个关系向类中隐式添加了一个属性。在一个关联查询执行后,相应的属性将将被以关联的
AR 实例填充。 例如,如果 $author 代表一个 User AR 实例,
我们可以使用 $author->posts 访问其关联的Post 实例。

1. SQL is declarative

Get this into your head first. Declarative. The only paradigm where you
“just” declare the nature of the results that you would like to get.
Not how your computer shall compute those results. Isn’t that
wonderful?

SELECT first_name, last_name FROM employees WHERE salary > 100000

Easy to understand. You don’t care where employee records physically
come from. You just want those that have a decent salary.

2. 执行关联查询 

执行关联查询最简单的方法是读取一个 AR
实例中的关联属性。如果此属性以前没有被访问过,则一个关联查询将被初始化,它将两个表关联并使用当前
AR 实例的主键过滤。 查询结果将以所关联 AR
类的实例的方式保存到属性中。这就是传说中的 懒惰式加载(lazy
loading,也可译为
迟加载)
 方式,例如,关联查询只在关联的对象首次被访问时执行。
下面的例子演示了怎样使用这种方式:

// 获取 ID 为 10 的帖子
$post=Post::model()->findByPk(10);
// 获取帖子的作者(author): 此处将执行一个关联查询。
$author=$post->author;

信息: 如果关系中没有相关的实例,则相应的属性将为 null
或一个空数组。 BELONGS_TO 和 HAS_ONE关系的结果是
null, HAS_MANY 和 MANY_MANY 的结果是一个空数组。
注意, HAS_MANY 和MANY_MANY 关系返回对象数组,你需要在访问任何属性之前先遍历这些结果。
否则,你可能会收到 “Trying to get property of
non-object(尝试访问非对象的属性)” 错误。

懒惰式加载用起来很方便,但在某些情况下并不高效。如果我们想获取 N 个帖子的作者,使用这种懒惰式加载将会导致执行 N 个关联查询。
这种情况下,我们应该改为使用 渴求式加载(eager loading)方式。

渴求式加载方式会在获取主 AR 实例的同时获取关联的 AR 实例。
这是通过在使用 AR
中的 find 或 findAll 方法时配合使用
with 方法完成的。例如:

$posts=Post::model()->with('author')->findAll();

上述代码将返回一个 Post 实例的数组。与懒惰式加载方式不同,在我们访问每个 Post 实例中的 author 属性之前,它就已经被关联的 User 实例填充了。
渴求式加载通过 一个 关联查询返回所有帖子及其作者,而不是对每个帖子执行一次关联查询。

我们可以在 with() 方法中指定多个关系名字,渴求式加载将一次性全部取回他们。例如,如下代码会将帖子连同其作者和分类一并取回。

$posts=Post::model()->with('author','categories')->findAll();

我们也可以实现嵌套的渴求式加载。像下面这样,
我们传递一个分等级的关系名表达式到 with() 方法,而不是一个关系名列表:

$posts=Post::model()->with(
    'author.profile',
    'author.posts',
    'categories')->findAll();

上述示例将取回所有帖子及其作者和所属分类。它还同时取回每个作者的简介(author.profile)和帖子(author.posts)。

从版本 1.1.0
开始,渴求式加载也可以通过指定 CDbCriteria::with 的属性执行,就像下面这样:

$criteria=new CDbCriteria;
$criteria->with=array(
    'author.profile',
    'author.posts',
    'categories',
);
$posts=Post::model()->findAll($criteria);

或者

$posts=Post::model()->findAll(array(
    'with'=>array(
        'author.profile',
        'author.posts',
        'categories',
    )
);

What do we learn from this?

So if this is so simple, what’s the problem? The problem is that most of
us intuitively think in terms of imperative
programming
. As
in: “machine, do this, and then do that, but before, run a check and
fail if this-and-that”
. This includes storing temporary results in
variables, writing loops, iterating, calling functions, etc. etc.

Forget about all that. Think about how to declare things. Not
about how to tell the machine to compute things.

3. 关系型查询选项 

我们提到在关系声明时可以指定附加的选项。这些 名-值
对形式的选项用于自定义关系型查询。概括如下:

  • select: 关联的 AR 类中要选择(select)的列的列表。 默认为
    ‘*’,即选择所有列。此选项中的列名应该是已经消除歧义的。

  • condition:
    即 WHERE 条件。默认为空。此选项中的列名应该是已经消除歧义的。

  • params: 要绑定到所生成的 SQL 语句的参数。应该以 名-值
    对数组的形式赋值。此选项从 1.0.3 版起有效。

  • on: 即 ON 语句。此处指定的条件将会通过 AND 操作符附加到 join
    条件中。此选项中的列名应该是已经消除歧义的。
    此选项不会应用到 MANY_MANY 关系中。此选项从 1.0.2 版起有效。

  • order: 即 ORDER BY 语句。默认为空。
    此选项中的列名应该是已经消除歧义的。

  • with: a list of child related objects that should be loaded
    together with this object. Be aware that using this option
    inappropriately may form an infinite relation loop.

  • joinType: type of join for this relationship. It defaults
    to LEFT OUTER JOIN.

  • alias: the alias for the table associated with this relationship.
    It defaults to null, meaning the table alias is the same as the
    relation name.

  • together: whether the table associated with this relationship
    should be forced to join together with the primary table and other
    tables. This option is only meaningful
    for HAS_MANY and MANY_MANY relations. If this option is set
    false, the table associated with
    the HAS_MANY or MANY_MANY relation will be joined with the
    primary table in a separate SQL query, which may improve the overall
    query performance since less duplicated data is returned. If this
    option is set true, the associated table will always be joined with
    the primary table in a single SQL query, even if the primary table
    is paginated. If this option is not set, the associated table will
    be joined with the primary table in a single SQL query only when the
    primary table is not paginated. For more details, see the section
    “Relational Query Performance”.

  • join: the extra JOIN clause. It defaults to empty. This option
    has been available since version 1.1.3.

  • group: the GROUP BY clause. It defaults to empty. Column names
    referenced in this option should be disambiguated.

  • having: the HAVING clause. It defaults to empty. Column names
    referenced in this option should be disambiguated.

  • index: the name of the column whose values should be used as keys
    of the array that stores related objects. Without setting this
    option, an related object array would use zero-based integer index.
    This option can only be set
    for HAS_MANY and MANY_MANY relations.

  • scopes: scopes to apply. In case of a single scope can be used
    like 'scopes'=>'scopeName', in case of multiple scopes can be used
    like 'scopes'=>array('scopeName1','scopeName2'). This option has
    been available since version 1.1.9.

In addition, the following options are available for certain
relationships during lazy loading:

  • limit: limit of the rows to be selected. This option does NOT
    apply to BELONGS_TO relation.

  • offset: offset of the rows to be selected. This option does NOT
    apply to BELONGS_TO relation.

  • through: name of the model’s relation that will be used as a
    bridge when getting related data. This option has been available
    since version 1.1.7 where it can be used
    for HAS_ONE and HAS_MANY. Since 1.1.14 it can be used
    for BELONGS_TO as well.

Below we modify the posts relationship declaration in the User by
including some of the above options:

class User extends CActiveRecord
{
    public function relations()
    {
        return array(
            'posts'=>array(self::HAS_MANY, 'Post', 'author_id',
                            'order'=>'posts.create_time DESC',
                            'with'=>'categories'),
            'profile'=>array(self::HAS_ONE, 'Profile', 'owner_id'),
        );
    }
}

Now if we access $author->posts, we would obtain the author’s posts
sorted according to their creation time in descending order. Each post
instance also has its categories loaded.

2. SQL syntax is not “well-ordered”

A common source of confusion is the simple fact that SQL syntax elements
are not ordered in the way they are executed. The lexical ordering is:

  • SELECT [ DISTINCT ]
  • FROM
  • WHERE
  • GROUP BY
  • HAVING
  • UNION
  • ORDER BY

For simplicity, not all SQL clauses are listed. This lexical ordering
differs fundamentally from the logical order, i.e. from the order of
execution:

  • FROM
  • WHERE
  • GROUP BY
  • HAVING
  • SELECT
  • DISTINCT
  • UNION
  • ORDER BY

There are three things to note:

  1. FROM is the first clause, not SELECT. The first thing that happens
    is loading data from the disk into memory, in order to operate on
    such data.
  2. SELECT is executed after most other clauses. Most importantly, after
    FROM and GROUP BY. This is important to understand when you think
    you can reference stuff that you declare in the SELECT clause from
    the WHERE clause. The following is not possible:

    SELECT A.x + A.y AS z
    FROM A
    WHERE z = 10 -- z is not available here!
    

    If you wanted to reuse z, you have two options. Either repeat the
    expression:

    SELECT A.x + A.y AS z
    FROM A
    WHERE (A.x + A.y) = 10
    

    … or you resort to derived tables, common table expressions, or
    views to avoid code repetition. See examples further down.

  3. UNION is placed before ORDER BY in both lexical and logical
    ordering. Many people think that each UNION subselect can be
    ordered, but according to the SQL standard and most SQL dialects,
    that is not true. While some dialects allow for
    ordering subqueries or derived tables, there is no guarantee
    that such ordering will be retained after a UNION operation

Note, not all databases implement things the same way. Rule number 2,
for instance, does not apply exactly in the above way to MySQL,
PostgreSQL, and SQLite.

5. Disambiguating Column Names 

When a column name appears in two or more tables being joined together,
it needs to be disambiguated. This is done by prefixing the column name
with its table’s alias name.

In relational AR query, the alias name for the primary table is fixed
as t, while the alias name for a relational table is the same as the
corresponding relation name by default. For example, in the following
statement, the alias name
for Post and Comment is t and comments, respectively:

$posts=Post::model()->with('comments')->findAll();

Now assume both Post and Comment have a column
called create_time indicating the creation time of a post or comment,
and we would like to fetch posts together with their comments by
ordering first the posts’ creation time and then the comments’ creation
time. We need to disambiguate the create_time column like the
following:

$posts=Post::model()->with('comments')->findAll(array(
    'order'=>'t.create_time, comments.create_time'
));

Tip: The default alias of a related table is the name of the
relation. Please note that if you’re using relation from within another
relation the alias will be the former relation name only and will not be
prefixed with the parent relation. For example, the alias for
‘author.group’ relation is ‘group’, not ‘author.group’.

$posts=Post::model()->with('author', 'author.group')->findAll(array(
  'order'=>'group.name, author.name, t.title'
));

You can avoid the collision of tables’ aliases by specifying
the alias property
of the relation.

$comments=Comment::model()->with(
  'author',
  'post',
  'post.author'=>array('alias'=>'p_author'))->findAll(array(
  'order'=>'author.name, p_author.name, post.title'
));

What do we learn from this?

Always remember both the lexical order and the logical order of SQL
clauses to avoid very common mistakes. If you understand that
distinction, it will become very obvious why some things work and others
don’t.

Of course, it would have been nice if the language was designed in a way
that the lexical order actually reflected the logical order, as it
is implemented in
Microsoft’s LINQ.

6. Dynamic Relational Query Options 

We can use dynamic relational query options in
both with() and
the with option. The dynamic options will overwrite existing options
as specified in
the relations() method.
For example, with the above User model, if we want to use eager
loading approach to bring back posts belonging to an author
in ascending order (theorder option in the relation specification is
descending order), we can do the following:

User::model()->with(array(
    'posts'=>array('order'=>'posts.create_time ASC'),
    'profile',
))->findAll();

Dynamic query options can also be used when using the lazy loading
approach to perform relational query. To do so, we should call a method
whose name is the same as the relation name and pass the dynamic query
options as the method parameter. For example, the following code returns
a user’s posts whose status is 1:

$user=User::model()->findByPk(1);
$posts=$user->posts(array('condition'=>'status=1'));

3. SQL is about table references

Because of the difference between lexical ordering and logical
ordering
, most beginners are probably tricked into thinking that column
values are the first-class citizens in SQL. They are not. The most
important things are table references.

The SQL
standard
 defines
the FROM clause as such:

<from clause> ::= FROM <table reference> [ { <comma> <table reference> }... ]

The “output” of the FROM clause is a combined table reference of the
combined degree of all table references. Let’s digest this, slowly.

FROM a, b

The above produces a combined table reference of the degree of a + the
degree of b. If a has 3 columns and b has 5 columns, then the
“output table” will have 8 (3 + 5) columns.

The records contained in this combined table reference are those of the
cross product / cartesian product of a x b. In other words, each
record of a is paired with each record of b. If a has 3 records
and b has 5 records, then the above combined table reference will
produce 15 records (3 x 5).

This “output” is “fed” / “piped” into the GROUP BY clause (after
filtering in the WHERE clause), where it is transformed into a new
“output”. We’ll deal with that later on.

If we’re looking at these things from a relational
algebra
 / set
theory
 perspective, a SQL
table is a relation or a set of tuples. And each SQL clause will
transform one or several relations in order to produce new relations.

7. Relational Query Performance 

As we described above, the eager loading approach is mainly used in the
scenario when we need to access many related objects. It generates a big
complex SQL statement by joining all needed tables. A big SQL statement
is preferrable in many cases since it simplifies filtering based on a
column in a related table. It may not be efficient in some cases,
however.

Consider an example where we need to find the latest posts together with
their comments. Assuming each post has 10 comments, using a single big
SQL statement, we will bring back a lot of redundant post data since
each post will be repeated for every comment it has. Now let’s try
another approach: we first query for the latest posts, and then query
for their comments. In this new approach, we need to execute two SQL
statements. The benefit is that there is no redundancy in the query
results.

So which approach is more efficient? There is no absolute answer.
Executing a single big SQL statement may be more efficient because it
causes less overhead in DBMS for parsing and executing the SQL
statements. On the other hand, using the single SQL statement, we end up
with more redundant data and thus need more time to read and process
them.

For this reason, Yii provides the together query option so that we
choose between the two approaches as needed. By default, Yii uses eager
loading, i.e., generating a single SQL statement, except when LIMIT is
applied to the primary model. We can set the together option in the
relation declarations to be true to force a single SQL statement even
when LIMIT is used. Setting it to false will result in some of tables
will be joined in separate SQL statements. For example, in order to use
separate SQL statements to query for the latest posts with their
comments, we can declare the comments relation in Post class as
follows,

public function relations()
{
    return array(
        'comments' => array(self::HAS_MANY, 'Comment', 'post_id',
                        'together'=>false),
    );
}

We can also dynamically set this option when we perform the eager
loading:

$posts = Post::model()->with(
            array('comments'=>array(
                'together'=>false
            ))
        )->findAll();

What do we learn from this?

Always think in terms of table references to understand how data is
“pipelined” through your SQL clauses.

8. Statistical Query 

Besides the relational query described above, Yii also supports the
so-called statistical query (or aggregational query). It refers to
retrieving the aggregational information about the related objects, such
as the number of comments for each post, the average rating for each
product, etc. Statistical query can only be performed for objects
related in HAS_MANY (e.g. a post has many comments)
or MANY_MANY (e.g. a post belongs to many categories and a category
has many posts).

Performing statistical query is very similar to performing relation
query as we described before. We first need to declare the statistical
query in
the relations() method
of CActiveRecord like
we do with relational query.

class Post extends CActiveRecord
{
    public function relations()
    {
        return array(
            'commentCount'=>array(self::STAT, 'Comment', 'post_id'),
            'categoryCount'=>array(
                self::STAT, 'Category', 'post_category(post_id, category_id)'
            ),
        );
    }
}

In the above, we declare two statistical
queries: commentCount calculates the number of comments belonging to a
post, and categoryCount calculates the number of categories that a
post belongs to. Note that the relationship
between Post and Comment is HAS_MANY, while the relationship
between Post and Category isMANY_MANY (with the joining
table post_category). As we can see, the declaration is very similar
to those relations we described in earlier subsections. The only
difference is that the relation type is STAT here.

With the above declaration, we can retrieve the number of comments for a
post using the expression $post->commentCount. When we access this
property for the first time, a SQL statement will be executed implicitly
to retrieve the corresponding result. As we already know, this is the
so-called lazy loading approach. We can also use the eager
loading
 approach if we need to determine the comment count for multiple
posts:

$posts=Post::model()->with('commentCount', 'categoryCount')->findAll();

The above statement will execute three SQLs to bring back all posts
together with their comment counts and category counts. Using the lazy
loading approach, we would end up with 2*N+1 SQL queries if there
are Nposts.

By default, a statistical query will calculate the COUNT expression
(and thus the comment count and category count in the above example). We
can customize it by specifying additional options when we declare it
inrelations().
The available options are summarized as below.

  • select: the statistical expression. Defaults to COUNT(*),
    meaning the count of child objects.

  • defaultValue: the value to be assigned to those records that do
    not receive a statistical query result. For example, if a post does
    not have any comments, its commentCount would receive this value.
    The default value for this option is 0.

  • condition: the WHERE clause. It defaults to empty.

  • params: the parameters to be bound to the generated SQL statement.
    This should be given as an array of name-value pairs.

  • order: the ORDER BY clause. It defaults to empty.

  • group: the GROUP BY clause. It defaults to empty.

  • having: the HAVING clause. It defaults to empty.

4. SQL table references can be rather powerful

A table reference is something rather powerful. A simple example of
their power is the JOIN keyword, which is actually not part of the
SELECT statement, but part of a “special” table reference. The joined
table, as defined in the SQL
standard
 (simplified):

<table reference> ::=
    <table name>
  | <derived table>
  | <joined table>

If we take again the example from before:

FROM a, b

a can be a joined table as such:

a1 JOIN a2 ON a1.id = a2.id

Expanding this into the previous expression, we’d get:

FROM a1 JOIN a2 ON a1.id = a2.id, b

While it is discouraged to combine the comma-separated list of table
references syntax with the joined table syntax, you can most certainly
do this. The resulting, combined table reference will now have a degree
of a1+a2+b.

Derived tables are even more powerful than joined tables. We’ll get to
that.

9. Relational Query with Named Scopes 

Relational query can also be performed in combination with named
scopes
.
It comes in two forms. In the first form, named scopes are applied to
the main model. In the second form, named scopes are applied to the
related models.

The following code shows how to apply named scopes to the main model.

$posts=Post::model()->published()->recently()->with('comments')->findAll();

This is very similar to non-relational queries. The only difference is
that we have the with() call after the named-scope chain. This query
would bring back recently published posts together with their comments.

And the following code shows how to apply named scopes to the related
models.

$posts=Post::model()->with('comments:recently:approved')->findAll();
// or since 1.1.7
$posts=Post::model()->with(array(
    'comments'=>array(
        'scopes'=>array('recently','approved')
    ),
))->findAll();
// or since 1.1.7
$posts=Post::model()->findAll(array(
    'with'=>array(
        'comments'=>array(
            'scopes'=>array('recently','approved')
        ),
    ),
));

The above query will bring back all posts together with their approved
comments. Note that comments refers to the relation name,
while recently and approved refer to two named scopes declared in
the Comment model class. The relation name and the named scopes should
be separated by colons.

Occasionally you may need to retrieve a scoped relationship using a
lazy-loading approach, instead of the normal eager loading method shown
above. In that case, the following syntax will do what you need:

~~ [php] // note the repetition of the relationship name, which is
necessary $approvedComments = $post->comments(‘comments:approved’);
~~

Named scopes can also be specified in the with option of the
relational rules declared
inCActiveRecord::relations().
In the following example, if we access $user->posts, it would bring
back allapproved comments of the posts.

class User extends CActiveRecord
{
    public function relations()
    {
        return array(
            'posts'=>array(self::HAS_MANY, 'Post', 'author_id',
                'with'=>'comments:approved'),
        );
    }
}

// or since 1.1.7
class User extends CActiveRecord
{
    public function relations()
    {
        return array(
            'posts'=>array(self::HAS_MANY, 'Post', 'author_id',
                'with'=>array(
                    'comments'=>array(
                        'scopes'=>'approved'
                    ),
                ),
            ),
        );
    }
}

Note: Before 1.1.7 named scopes applied to related models must be
specified in CActiveRecord::scopes. As a result, they cannot be
parameterized.

Since 1.1.7 it’s possible to pass parameters for relational named
scopes. For example, if you have scope named rated in the Post that
accepts minimum rating of post, you can use it from User the following
way:

$users=User::model()->findAll(array(
    'with'=>array(
        'posts'=>array(
            'scopes'=>array(
                'rated'=>5,
            ),
        ),
    ),
));

class Post extends CActiveRecord
{
    ......

    public function rated($rating)
    {
        $this->getDbCriteria()->mergeWith(array(
            'condition'=>'rating=:rating',
            'params'=>array(':rating'=>$rating),
        ));
        return $this;
    }

    ......
}

What do we learn from this?

Always, always think in terms of table references. Not only is this
important to understand how data is “pipelined” through your SQL clauses
(see previous section), it will also help you understand how complex
table references are constructed.

And, importantly, understand that JOIN is a keyword for constructing
joined tables. Not a part of the SELECT statement. Some databases allow
for using JOIN in INSERT, UPDATE, DELETE

10. Relational Query with through 

When using through, relation definition should look like the
following:

'comments'=>array(self::HAS_MANY,'Comment',array('key1'=>'key2'),'through'=>'posts'),

 

In the above array('key1'=>'key2'):

  • key1 is a key defined in relation specified
    in through (posts is this case).
  • key2 is a key defined in a model relation points to (Comment in
    this case).

through can be used
with HAS_ONEBELONGS_TO and HAS_MANY relations.

5. SQL JOIN tables should be used rather than comma-separated tables

Before, we’ve seen this clause:

FROM a, b

Advanced SQL developers will probably tell you that it is discouraged to
use the comma-separated list at all, and always fully express your
JOINs. This will help you improve readability of your SQL statement, and
thus prevent mistakes.

One very common mistake is to forget a JOIN predicate somewhere. Think
about the following:

FROM a, b, c, d, e, f, g, h
WHERE a.a1 = b.bx
AND a.a2 = c.c1
AND d.d1 = b.bc
-- etc...

The join table syntax is both

  • Safer, as you can place join predicates close to the joined tables,
    thus preventing mistakes.
  • More expressive, as you can distinguish between OUTER JOIN, INNER
    JOIN, etc.

HAS_MANY through

HAS_MANY through ER

ca88手机版 2

An example of HAS_MANY with through is getting users from a
particular group when users are assigned to groups via roles.

A bit more complex example is getting all comments for all users of a
particular group. In this case we have to use several relations
with through in a single model:

class Group extends CActiveRecord
{
    ...
    public function relations()
    {
        return array(
            'roles'=>array(self::HAS_MANY,'Role','group_id'),
            'users'=>array(
                self::HAS_MANY,'User',array('user_id'=>'id'),'through'=>'roles'
            ),
            'comments'=>array(
                self::HAS_MANY,'Comment',array('id'=>'user_id'),'through'=>'users'
            ),
        );
    }
}

What do we learn from this?

Always use JOIN. Never use comma-separated table references in your FROM
clauses.

Usage examples

// get all groups with all corresponding users
$groups=Group::model()->with('users')->findAll();

// get all groups with all corresponding users and roles
$groups=Group::model()->with('roles','users')->findAll();

// get all users and roles where group ID is 1
$group=Group::model()->findByPk(1);
$users=$group->users;
$roles=$group->roles;

// get all comments where group ID is 1
$group=Group::model()->findByPk(1);
$comments=$group->comments;

6. SQL’s different JOIN operations

JOIN operations essentially come with five flavours:

  • EQUI JOIN
  • SEMI JOIN
  • ANTI JOIN
  • CROSS JOIN
  • DIVISION

These terms are commonly used in relational
algebra
. SQL uses
different terms for the above concepts, if they exist at all. Let’s have
a closer look:

HAS_ONE through

HAS_ONE through ER

ca88手机版 3

An example of using HAS_ONE with through is getting user address
where user is bound to address using profile. All these entities (user,
profile, and address) do have corresponding models:

class User extends CActiveRecord
{
    ...
    public function relations()
    {
        return array(
            'profile'=>array(self::HAS_ONE,'Profile','user_id'),
            'address'=>array(
                self::HAS_ONE,'Address',array('id'=>'profile_id'),
                    'through'=>'profile'
            ),
        );
    }
}

EQUI JOIN

This is the most common JOIN operation. It has two sub-flavours:

  • INNER JOIN (or just JOIN)
  • OUTER JOIN (further sub-flavoured as LEFT, RIGHT, FULL OUTER JOIN)

The difference is best explained by example:

-- This table reference contains authors and their books.
-- There is one record for each book and its author.
-- authors without books are NOT included
author JOIN book ON author.id = book.author_id

-- This table reference contains authors and their books
-- There is one record for each book and its author.
-- ... OR there is an "empty" record for authors without books
-- ("empty" meaning that all book columns are NULL)
author LEFT OUTER JOIN book ON author.id = book.author_id

Usage examples

// get address of a user whose ID is 1
$user=User::model()->findByPk(1);
$address=$user->address;

SEMI JOIN

This relational concept can be expressed in two ways in SQL: Using an IN
predicate, or using an EXISTS predicate. “Semi” means “half” in latin.
This type of join is used to join only “half” of a table reference. What
does that mean? Consider again the above joining of author and book.
Let’s imagine that we don’t want author/book combinations, but just
those authors who actually also have books. Then we can write:

-- Using IN
FROM author
WHERE author.id IN (SELECT book.author_id FROM book)

-- Using EXISTS
FROM author
WHERE EXISTS (SELECT 1 FROM book WHERE book.author_id = author.id)

While there is no general rule as to whether you should prefer IN or
EXISTS, these things can be said:

  • IN predicates tend to be more readable than EXISTS predicates
  • EXISTS predicates tend to be more expressive than IN predicates
    (i.e. it is easier to express very complex SEMI JOIN)
  • There is no formal difference in performance. There may, however, be
    huge performance difference on some
    databases
    .

Because INNER JOIN also produces only those authors that actually have
books, many beginners may think that they can then remove duplicates
using DISTINCT. They think they can express a SEMI JOIN like this:

-- Find only those authors who also have books
SELECT DISTINCT first_name, last_name
FROM author
JOIN book ON author.id = book.author_id

This is very bad practice for two reasons:

  • It is very slow, as the database has to load a lot of data into
    memory, just to remove duplicates again.
  • It is not entirely correct, even if it produces the correct result
    in this simple example. But as soon as you JOIN more table
    references, you will have a very hard time correctly removing
    duplicates from your results.

Some more information about abuse of DISTINCT can be seen in this blog
post
.

through on self

through can be used for a model bound to itself using a bridge model.
In our case it’s a user mentoring other users:

through self ER

ca88手机版 4

That’s how we can define relations for this case:

class User extends CActiveRecord
{
    ...
    public function relations()
    {
        return array(
            'mentorships'=>array(
                self::HAS_MANY,'Mentorship','teacher_id','joinType'=>'INNER JOIN'
            ),
            'students'=>array(
                self::HAS_MANY,'User',array('student_id'=>'id'),
                    'through'=>'mentorships','joinType'=>'INNER JOIN'
            ),
        );
    }
}

ANTI JOIN

This relational concept is just the opposite of a SEMI JOIN. You can
produce it simply by adding a NOT keyword to the IN or EXISTS
predicates. An example, where we’ll select those authors who do not have
any books:

-- Using IN
FROM author
WHERE author.id NOT IN (SELECT book.author_id FROM book)

-- Using EXISTS
FROM author
WHERE NOT EXISTS (SELECT 1 FROM book WHERE book.author_id = author.id)

The same rules with respect to performance, readability, expressivity
apply. However, there is a small caveat with respect to NULLs when using
NOT IN, which is a bit out of scope for this
tutorial
.

Usage examples

// get all students taught by teacher whose ID is 1
$teacher=User::model()->findByPk(1);
$students=$teacher->students;

 

 

CROSS JOIN

This produces a cross product of the two joined table references,
combining every record of the first table reference with every record of
the second table reference. We have seen before, that this can be
achieved with comma-separated table references in the FROM clause. In
the rare cases where this is really desired, you can also write a CROSS
JOIN explicitly, in most SQL dialects:

-- Combine every author with every book
author CROSS JOIN book

DIVISION

The relational division is really a beast of its own breed. In short, if
JOIN is multiplication, division is the inverse of JOIN. Relational
divisions are very tough to express in SQL. As this is a beginners’
tutorial, explaining it is out of scope. For the brave among you, read
on about it
here
hereand
here
.

What do we learn from this?

A lot. Again, let’s hammer this into our heads. SQL is about table
references. Joined tables are quite sophisticated table references. But
there is a difference in relational-speak and SQL-speak. Not all
relational join operations are also formal SQL join operations. With a
bit of practice and knowledge about relational theory, you will always
be able to choose the right type of relational JOIN and be able to
translate it to the correct SQL.

7. SQL’s derived tables are like table variables

Before, we’ve learned that SQL is a declarative language, and as such,
variables do not have a place (they do in some SQL dialects, though).
But you can write something likevariables. And those beasts are called
derived tables.

A derived table is nothing but a subquery wrapped in parentheses.

-- A derived table
FROM (SELECT * FROM author)

Note that some SQL dialects require derived tables to have
correlation name (also known as alias).

-- A derived table with an alias
FROM (SELECT * FROM author) a

Derived tables are awesome when you want to circumvent the problems
caused by thelogical ordering of SQL clauses. For instance, if you
want to reuse a column expression in both the SELECT and the WHERE
clause, just write (Oracle dialect):

-- Get authors' first and last names, and their age in days
SELECT first_name, last_name, age
FROM (
  SELECT first_name, last_name, current_date - date_of_birth age
  FROM author
)
-- If the age is greater than 10000 days
WHERE age > 10000

Note that some databases, and the SQL:1999 standard have taken derived
tables to the next level, introducing common table expressions. This
will allow you to reuse the samederived table several times within a
single SQL SELECT statement. The above query would then translate to the
(almost) equivalent:

WITH a AS (
  SELECT first_name, last_name, current_date - date_of_birth age
  FROM author
)
SELECT *
FROM a
WHERE age > 10000

Obviously, you could also externalise “a” into a standalone view for
even broader reuse of common SQL subselects. Read more about views
here
.

What do we learn from this?

Again, again, again. SQL is mostly about table references, not columns.
Make use of them. Don’t be afraid of writing derived tables or other
complex table references.

8. SQL GROUP BY transforms previous table references

Let’s reconsider our previous FROM clause:

FROM a, b

And now, let’s apply a GROUP BY clause to the above combined table
reference

GROUP BY A.x, A.y, B.z

The above produces a new table reference with only three remaining
columns (!). Let’s digest this again. If you apply GROUP BY, then you
reduce the number of available columns in all subsequent logical clauses

  • including SELECT. This is the syntactical reason why you can only
    reference columns from the GROUP BY clause in the SELECT clause.

  • Note that other columns may still be available as arguments of
    aggregate functions:

    SELECT A.x, A.y, SUM(A.z)
    FROM A
    GROUP BY A.x, A.y
    
  • Note that MySQL, unfortunately, doesn’t adhere to this
    standard
    ,
    causing nothing but confusion. Don’t fall for MySQL’s tricks. GROUP
    BY transforms table references. You can thus only reference columns
    also referenced in the GROUP BY clause.

What do we learn from this?

GROUP BY, again, operates on table references, transforming them into a
new form.

9. SQL SELECT is called projection in relational algebra

I personally like the term “projection”, as it is used in relational
algebra. Once you’ve generated your table reference, filtered it,
transformed it, you can step to projecting it to another form. The
SELECT clause is like a projector. A table function making use of arow
value expression
 to transform each record from the previously
constructed table reference into the final outcome.

Within the SELECT clause, you can finally operate on columns, creating
complex column expressions as parts of the record / row.

There are a lot of special rules with respect to the nature of available
expressions, functions, etc. Most importantly, you should remember
these:

  1. You can only use column references that can be produced from the
    “output” table reference
  2. If you have a GROUP BY clause, you may only reference columns from
    that clause, or aggregate functions.
  3. You can use window functions instead of aggregate functions, when
    you don’t have a GROUP BY clause.
  4. If you don’t have a GROUP BY clause, you must not combine aggregate
    functions with non-aggregate functions.
  5. There are some rules with respect to wrapping regular functions in
    aggregate functions and vice-versa.
  6. There are …

Well, there are lots of complex rules. They could fill yet another
tutorial. For instance, the reason why you cannot combine aggregate
functions with non-aggregate functions in the projection of a SELECT
statement without GROUP BY clause (rule number 4) is this:

  1. It doesn’t make sense. Intuitively.
  2. If intuition doesn’t help (it hardly does, with a SQL beginner),
    then syntax rules do. SQL:1999 introduced GROUPING SETS, and
    SQL:2003 introduced empty grouping sets: GROUP BY (). Whenever an
    aggregate function is present, and there is no explicit GROUP BY
    clause, an implicit, empty GROUPING SET is applied (rule number 2).
    Hence, the original rules about logical ordering aren’t exactly
    true anymore, and the projection (SELECT) influences the outcome of
    a logically preceding, yet lexically succeeding clause (GROUP BY).

Confused? Yes. Me too. Let’s get back to simpler things.

What do we learn from this?

The SELECT clause may be one of the most complex clauses in SQL, even if
it appears so simple. All other clauses just “pipe” table references
from one to another. The SELECT clause messes up the beauty of these
table references, by completely transforming them, applying some rules
to them retroactively.

In order to understand SQL, it is important to understand
everything else first, before trying to tackle SELECT. Even if SELECT
is the first clause in lexical ordering, it should be the last.

10. SQL DISTINCT, UNION, ORDER BY, and OFFSET are simple again

After the complicated SELECT, we can get back to simple things again:

  • Set operations (DISTINCT and UNION)
  • Ordering operations (ORDER BY, OFFSET .. FETCH)

Set operations

Set operations operate on “sets”, which are actually nothing other
than… tables. Well, almost. Conceptually, they’re easy to understand.

  • DISTINCT removes duplicates after the projection.
  • UNION concatenates two subselects and removes duplicates
  • UNION ALL concatenates two subselects retaining duplicates
  • EXCEPT removes records from the first subselect that are also
    contained in the second subselect (and then removes duplicates)
  • INTERSECT retains only records contained in both subselects (and
    then removes duplicates)

All of this removing duplicates is usually non-sense. Most often, you
should just use UNION ALL, when you want to concatenate subselects.

Ordering operations

Ordering is not a relational feature. It is a SQL-only feature. It is
applied at the very end of both lexical ordering and logical
ordering
 of your SQL statement. Using ORDER BY and OFFSET .. FETCH is
the only way to guarantee that records can be accessed by index in a
reliable way. All other ordering is always arbitrary and random, even if
it may appear to be reproducible.

OFFSET .. FETCH is only one syntax variant. Other variants include
MySQL’s and PostgreSQL’s LIMIT .. OFFSET, or SQL Server’s and Sybase’s
TOP .. START AT. A good overview of various ways to implement OFFSET ..
FETCH can be seen
here
.

Let’s get to work

As with every language, SQL takes a lot of practice to master. The above
10 simple steps will help you make more sense of the every day SQL that
you’re writing. On the other hand, it is also good to learn from common
mistakes. The following two articles list lots of common mistakes Java
(and other) developers make when writing SQL:

 

 

 

相关文章