Доска объявлений – Оптимизация базы данных

Этот вопрос является продолжением этого Вопроса

Проект и проблема

Проект, над которым я сейчас работаю, представляет собой доску объявлений для большой некоммерческой организации. Доска объявлений будет использоваться для разрешения межсетевых сообщений внутри организации.

Я создаю приложение и испытываю трудности с извлечением результатов, которые мне нужны из моей базы данных, потому что я не думаю, что он правильно нормирован и из-за ограничений в моем знании теории реляционных баз данных и mysql.

Я был бы признателен за вклад в разработку платы в целом и, в частности, способы улучшения структуры базы данных для облегчения эффективных запросов и ускорения разработки этого приложения и будущего приложения

Бизнес-логика

Доска объявлений будет использоваться следующим образом

  1. Публикация бюллетеней и ответов на бюллетени
    1. Сотрудники или «пользователи» в офисах по всей стране смогут размещать сообщения на доске объявлений. Бюллетени должны быть отправлены в местоположение и классифицированы – я буду называть эти «бюллетени».
    2. Пользователи смогут отправлять любое количество ответов в любой бюллетень, и пользователи смогут ответить на их собственный бюллетень – я буду называть эти «ответы».
  2. Рейтинг бюллетеней и ответов
    1. Пользователи смогут либо «нравиться», либо «не нравится» бюллетеню или ответу, и общее количество симпатий или антипатий будет показано для каждого бюллетеня или ответа.
  3. Просмотр доски объявлений и ответов
    1. Бюллетени могут отображаться в хронологическом порядке.
    2. Пользователи могут сортировать бюллетени хронологически или хронологически по последнему ответу на этот бюллетень (дайте мне знать, если вам нужно больше объяснений)
    3. Когда выбран конкретный бюллетень, ответы на этот бюллетень будут отображаться в хронологическом порядке
-- phpMyAdmin SQL Dump -- version 3.2.4 -- http://www.phpmyadmin.net -- -- Host: localhost -- Generation Time: Jan 16, 2011 at 06:44 PM -- Server version: 5.1.41 -- PHP Version: 5.3.1 SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO"; -- -- Database: `bulletinboard` -- -- -------------------------------------------------------- -- -- Table structure for table `bbs` -- CREATE TABLE IF NOT EXISTS `bbs` ( `id` int(11) NOT NULL AUTO_INCREMENT, `bb_locations_id` int(11) NOT NULL, `bb_categories_id` int(11) NOT NULL, `users_id` int(11) NOT NULL, `title` varchar(255) NOT NULL, `content` text NOT NULL, `created_date` int(11) NOT NULL, `rank` int(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=87 ; -- -- Dumping data for table `bbs` -- INSERT INTO `bbs` (`id`, `bb_locations_id`, `bb_categories_id`, `users_id`, `title`, `content`, `created_date`, `rank`) VALUES (83, 8, 28, 44, 'sdaf', 'asdfasdf', 1292712797, 0), (84, 8, 28, 44, 'asdf', 'asdfasd', 1292875089, 0), (86, 8, 28, 44, 'Robert is leaving', 'Robert is leaving and going back to the states ', 1294344916, 0); -- -------------------------------------------------------- -- -- Table structure for table `bb_categories` -- CREATE TABLE IF NOT EXISTS `bb_categories` ( `id` int(11) NOT NULL AUTO_INCREMENT, `title` varchar(255) NOT NULL, `description` varchar(255) NOT NULL, `list_order` varchar(255) NOT NULL, `admin` int(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=30 ; -- -- Dumping data for table `bb_categories` -- INSERT INTO `bb_categories` (`id`, `title`, `description`, `list_order`, `admin`) VALUES (28, 'Travel', 'Rideshares, proposed trips etc', '1', 1); -- -------------------------------------------------------- -- -- Table structure for table `bb_locations` -- CREATE TABLE IF NOT EXISTS `bb_locations` ( `id` int(11) NOT NULL AUTO_INCREMENT, `title` varchar(255) NOT NULL, `description` varchar(255) NOT NULL, `address` varchar(255) NOT NULL, `post_code` int(11) NOT NULL, `list_order` int(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=15 ; -- -- Dumping data for table `bb_locations` -- INSERT INTO `bb_locations` (`id`, `title`, `description`, `address`, `post_code`, `list_order`) VALUES (8, 'Washington DC', 'asdkf', 'dsf', 0, 1); -- -------------------------------------------------------- -- -- Table structure for table `bb_ratings` -- CREATE TABLE IF NOT EXISTS `bb_ratings` ( `id` int(11) NOT NULL AUTO_INCREMENT, `bbs_id` int(11) NOT NULL, `users_id` int(11) NOT NULL, `like_id` int(2) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=68 ; -- -- Dumping data for table `bb_ratings` -- -- -------------------------------------------------------- -- -- Table structure for table `bb_replies` -- CREATE TABLE IF NOT EXISTS `bb_replies` ( `id` int(11) NOT NULL AUTO_INCREMENT, `users_id` int(11) NOT NULL, `bbs_id` int(11) NOT NULL, `content` text NOT NULL, `created_date` int(11) NOT NULL, `rank` int(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=158 ; -- -- Dumping data for table `bb_replies` -- INSERT INTO `bb_replies` (`id`, `users_id`, `bbs_id`, `content`, `created_date`, `rank`) VALUES (156, 44, 86, 'good ridance i say\r\n', 1294788444, 0), (157, 44, 86, 'And stay away\r\n', 1294892751, 0); -- -------------------------------------------------------- -- -- Table structure for table `bb_reply_ratings` -- CREATE TABLE IF NOT EXISTS `bb_reply_ratings` ( `id` int(11) NOT NULL AUTO_INCREMENT, `bb_replies_id` int(11) NOT NULL, `users_id` int(11) NOT NULL, `like_id` tinyint(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=115 ; -- -- Dumping data for table `bb_reply_ratings` -- -- -------------------------------------------------------- -- -- Table structure for table `bb_sort_bys` -- CREATE TABLE IF NOT EXISTS `bb_sort_bys` ( `id` int(11) NOT NULL AUTO_INCREMENT, `title` varchar(20) NOT NULL, `description` varchar(255) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=3 ; -- -- Dumping data for table `bb_sort_bys` -- INSERT INTO `bb_sort_bys` (`id`, `title`, `description`) VALUES (1, 'Newest', 'Posts are sorted by their creation date'), (2, 'Popular', 'Posts are sorted by the date of their lates reply, or by post date if they have now replies'); -- -------------------------------------------------------- -- -- Table structure for table `users` -- CREATE TABLE IF NOT EXISTS `users` ( `id` int(11) NOT NULL AUTO_INCREMENT, `user_name` varchar(10) NOT NULL, `first_name` varchar(100) NOT NULL, `last_name` varchar(100) NOT NULL, `permission` int(1) NOT NULL, `bb_sort_bys_id` varchar(10) NOT NULL, `bb_locations_csv` varchar(255) NOT NULL, `defaultLocation` int(11) NOT NULL, `bb_categories_csv` varchar(255) NOT NULL, `total_bulletins` int(5) NOT NULL, `bulletins_per_page` int(5) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=45 ; 

Solutions Collecting From Web of "Доска объявлений – Оптимизация базы данных"

Ключ к тому, что эффективная база данных упрощается. Основная цель реляционной базы данных – не повторять какую-либо информацию. Я взял ваш SQL-дамп и быстро разработал более простую версию, которая, насколько мне известно, нормализована. Я оставил некоторые поля, которые у вас были для cvs ect. Я удалил поля, которые проще было бы просто пересчитать, запросив db, когда информация понадобится, например, общее количество пользователей и рейтинг заданного сообщения. Я также удалил ваши bb_replies, поскольку вы можете выполнить тот же результат, ссылаясь на родительский пост. Я немного переименовал таблицы в то, что имело для меня смысл, вы можете использовать схему, которая вам нравится. Я считаю, что использование простых терминов упрощает понимание того, как данные относятся друг к другу.

Должен признаться, что я согласен с некоторыми комментариями выше, есть много BBs, которые работают очень хорошо и будут иметь всю функциональность, которую вы ищете. И вам повезло, что я в настроении чтения сегодня вечером, это был длинный вопрос. Упрощение является ключевым во всем 🙂

alt text

 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0; SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0; SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='TRADITIONAL'; -- ----------------------------------------------------- -- Table `users` -- ----------------------------------------------------- CREATE TABLE IF NOT EXISTS `users` ( `id` INT NOT NULL AUTO_INCREMENT , `username` VARCHAR(45) NULL , `password` VARCHAR(100) NULL , `email` VARCHAR(255) NULL , `first_name` VARCHAR(100) NULL , `last_name` VARCHAR(100) NULL , `permission` INT NULL , `created` DATETIME NULL , `modified` DATETIME NULL , PRIMARY KEY (`id`) ) ENGINE = InnoDB; -- ----------------------------------------------------- -- Table `categories` -- ----------------------------------------------------- CREATE TABLE IF NOT EXISTS `categories` ( `id` INT NOT NULL AUTO_INCREMENT , `name` VARCHAR(45) NULL , `description` TEXT NULL , `order` INT NULL , `admin` INT NULL , PRIMARY KEY (`id`) ) ENGINE = InnoDB; -- ----------------------------------------------------- -- Table `locations` -- ----------------------------------------------------- CREATE TABLE IF NOT EXISTS `locations` ( `id` INT NOT NULL AUTO_INCREMENT , `name` VARCHAR(45) NULL , `description` TEXT NULL , `address` TEXT NULL , `order` INT NULL , PRIMARY KEY (`id`) ) ENGINE = InnoDB; -- ----------------------------------------------------- -- Table `posts` -- ----------------------------------------------------- CREATE TABLE IF NOT EXISTS `posts` ( `id` INT NOT NULL AUTO_INCREMENT , `post_id` INT NOT NULL , `user_id` INT NOT NULL , `category_id` INT NOT NULL , `location_id` INT NOT NULL , `title` VARCHAR(45) NULL , `content` TEXT NULL , `created` DATETIME NULL , `modified` DATETIME NULL , PRIMARY KEY (`id`, `post_id`, `user_id`, `category_id`, `location_id`) , INDEX `fk_posts_users` (`user_id` ASC) , INDEX `fk_posts_posts1` (`post_id` ASC) , INDEX `fk_posts_categories1` (`category_id` ASC) , INDEX `fk_posts_locations1` (`location_id` ASC) , CONSTRAINT `fk_posts_users` FOREIGN KEY (`user_id` ) REFERENCES `users` (`id` ) ON DELETE NO ACTION ON UPDATE NO ACTION, CONSTRAINT `fk_posts_posts1` FOREIGN KEY (`post_id` ) REFERENCES `posts` (`id` ) ON DELETE NO ACTION ON UPDATE NO ACTION, CONSTRAINT `fk_posts_categories1` FOREIGN KEY (`category_id` ) REFERENCES `categories` (`id` ) ON DELETE NO ACTION ON UPDATE NO ACTION, CONSTRAINT `fk_posts_locations1` FOREIGN KEY (`location_id` ) REFERENCES `locations` (`id` ) ON DELETE NO ACTION ON UPDATE NO ACTION) ENGINE = InnoDB; -- ----------------------------------------------------- -- Table `likes` -- ----------------------------------------------------- CREATE TABLE IF NOT EXISTS `likes` ( `id` INT NOT NULL AUTO_INCREMENT , `user_id` INT NOT NULL , `post_id` INT NOT NULL , `like` TINYINT(1) NULL , PRIMARY KEY (`id`, `user_id`, `post_id`) , INDEX `fk_posts_users_users1` (`user_id` ASC) , INDEX `fk_posts_users_posts1` (`post_id` ASC) , CONSTRAINT `fk_posts_users_users1` FOREIGN KEY (`user_id` ) REFERENCES `users` (`id` ) ON DELETE NO ACTION ON UPDATE NO ACTION, CONSTRAINT `fk_posts_users_posts1` FOREIGN KEY (`post_id` ) REFERENCES `posts` (`id` ) ON DELETE NO ACTION ON UPDATE NO ACTION) ENGINE = InnoDB; -- ----------------------------------------------------- -- Table `sort_options` -- ----------------------------------------------------- CREATE TABLE IF NOT EXISTS `sort_options` ( `id` INT NOT NULL AUTO_INCREMENT , `name` VARCHAR(45) NULL , `description` TEXT NULL , PRIMARY KEY (`id`) ) ENGINE = InnoDB; -- ----------------------------------------------------- -- Table `preferences` -- ----------------------------------------------------- CREATE TABLE IF NOT EXISTS `preferences` ( `id` INT NOT NULL AUTO_INCREMENT , `user_id` INT NOT NULL , `pagination` INT NULL , `sort_option_id` INT NOT NULL , `categories_csv` VARCHAR(45) NULL , `locations_csv` VARCHAR(45) NULL , PRIMARY KEY (`id`, `user_id`, `sort_option_id`) , INDEX `fk_preferences_users1` (`user_id` ASC) , INDEX `fk_preferences_sort_options1` (`sort_option_id` ASC) , CONSTRAINT `fk_preferences_users1` FOREIGN KEY (`user_id` ) REFERENCES `users` (`id` ) ON DELETE NO ACTION ON UPDATE NO ACTION, CONSTRAINT `fk_preferences_sort_options1` FOREIGN KEY (`sort_option_id` ) REFERENCES `sort_options` (`id` ) ON DELETE NO ACTION ON UPDATE NO ACTION) ENGINE = InnoDB; SET SQL_MODE=@OLD_SQL_MODE; SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS; SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS; 

Subquery First, then the RANK() Function

Relax, son, we'll get there! Your speed is fine.

Preparation

The first thing, you really need to get access to a decent set of manuals, for your specific flavour of MySQL. I found ▶this one◀ . As before, you have to do your own debugging, but I am now providing SQL that is as close to generic MySQL as possible. I've confirmed that everything we are going to be doing is entirely possible in that flavour of MySQL (I don't know what flavour/version yours is, except ENGINE=MyISAM).

Subquery

Ok, let's start again. I have written a ▶series of SELECTS◀ , to lead your through the process. Please complete each one, and understand it completely before progressing to the next. If you have any questions, stop, and post the question.

The code is written and tested in Sybase; then downgraded for MySQL (from perusing the web, eg. the above site), and tested as much as possible in that state.

The first bit creates and loads three tables for use.

  1. The first SELECT is a straight join of the three tables, no subquery. You need to get that to work; that is, understand what is does, fix any syntax problems; figure out the differences between the SQL I provide and the SQL runs on your server. And get used to making those changes. We can't keep stopping for that.

  2. The second SELECT produces exactly the same result set. It introduces the concept of a Subquery, which is used to populate a single column.

Drive that bus. Respond when you're done or if your have problems.

Responses to Your Comments of 03 Dec 10 17:51

  1. Straight Join
    I have never seen that way of doing joins before, I have always used left join, right join or inner join. Ok so for this first query we are just joining the two tables student and course with the studentcourse table sitting in the middle as the associative table. Results are repeated as expected because one student might be on more that one course and they will have a result for that course.

Да.

That ( x=y in the WHERE clause ) is the traditional way of identifying joins, it is much more clear; the LEFT/RIGHT/INNER/OUTER JOIN syntax is the "new" way. Much more cumbersome AFAIC, but the learning is relevant because it is fundamental to what comes later. Feel free to convert to the latter syntax, and back again, for purposes of understanding.

Repeats ? That is not what repeats or duplicates mean. All the rows are discrete, true rows in CS. You should get the same 15 rows in every report (as we progress).

(ps when i direclty create the tables using queries you provided, the names are converted to all lowercase while the column names can still be camel case.)

MySQL is very strange. (It appears to be doing the naming conventions for us!)

,
2. Simple scalar query
A few issues with query. You use the alias(in the scalar subquery) before you have defined what it is?(StudentCourse sc) I guess I always incorrectly assumed that you have to say define an alias before you use it.

You are thinking procedurally. SQL is a set-oriented language, for manipulating Relational sets of data.

The whole query gets evaluated and optimised in one pass. There is no "before" or "after". I am defining it in the same batch of SQL that I am using it.

I don't entirely understand the use of the alias 'in-ner' in the scalar subquery, is this to say that you want it to check each row individually(not sure how to explain this) instead of on a table wide check?Ie when you are doing this check make it local to the particular row you are on?(terrible explanation sorry).

For purposes of understanding/debugging, evaluate the subquery first (the contents of the brackets), alone. Understand it fully. Note the use of "sc" and keep it in your hat.

  • in_ner and sc are ALIASES, that is, handles for the table name that it sits next to in the FROM clause; that we use elsewhere in the code for convenience
  • in_ner is a descriptive name for the table referenced in the Inner Query, the Subquery
  • sc is a descriptive name for the table referenced in the Outer Query, which is only Outer because it has an Inner query, otherwise it would be a flat query
  • we could just as easy use fred and sally
  • Aliases such as in_ner and out_er are meaningful when the same table is referenced in both the Inner and Outer queries.
  • notice the join between the Inner query and the Outer query WHERE in_ner.CourseId = sc.CourseId
  • I have related the table referenced in the in_ner query to the table sc referenced in Out_er query
  • Such a subquery is called a Correlated Subquery

See if you can visualise the Outer query (result set) as a grid, a spreadsheet, 15 rows by 4 columns.

  • Make sure you understand that Outer query, "easy" as it is. Notice that it is the same as (1. Straight Join), with a different method of populating one column.

As i understand it the scalar subquery asks for Name where the courseId's in Course and studentcourse are the same.(pretty straight forward) and is an alternative to saying that in the where,

Yes, exactly.

And notice that we are after only the Course.Name which is a 1::1 join from StudentCourse to Course, on CourseId. Notice exactly the WHERE clause in (1) that we are replacing in (2); in (1) it applies to all rows.

But because we are grabbing one datum; one cell; one item for a specific row/column; not all rows; not all columns, it is called a Scalar.

We are obtaining it using a subquery, which has to be constrained to the specific row. Therefore we need to relate the row from the outer query to the row in the Inner query.

  • so the Correlation between the Inner Subquery and Outer (specific row) is required.

  • And if we did not have that identification of the specific row, we would be loading rubbish into the Scalar, or it would return a Table (not a Scalar value) and the query would fail.

    • Try that, take the WHERE CourseId = sc.CourseId out
    • So that you know what the error message is, so that when it happens in future, you will know "Aha, I am returning a table, not a scalar; I am missing something in my Inner WHERE clause; I am not identifying a specific Correlated row".
      ,
  • it is not quite "asks for Name where the courseId's in Course and studentcourse are the same"; it is getting the Course.Name for a specific StudentCourse.CourseId, which is identified from the outside, whatever sc row it is.

with the differnece that you can make this check row by row before the where.

you are thinking procedurally; there is no "row-by-row"; the dbms is set -oriented; the result set you are building is a set . Re-state the question is set terminology.

I used Course instead or in-ner, what is the point of using an alias in this case, is it just to show that aliases can be used?

Да. And to highlight issues. And to differentiate the Inner Query from the Outer query. In the Inner query, the "inner" Alias, or any alias is not demanded. Only the Alias relating to the outer query is demanded.

Something I don't understand here is that when I try to do this, 'course.Name' it says unknown Course.Name in field list. this is the way that I have always defined that i mean Name in the Course table and not some other table. What would happen if I had two tables with a name column?

В точку. If it were ambiguous, then you would have to supply the table name or alias; where it is not ambiguous, it is not demanded, but nice to have for documentary, clarity, purposes. You have to figure out why MySQL is not accepting it. Mixed case/lower case madness ?

I have also never seen that order by syntax, I can see that 1 and 4 mean the column numbers but why bother passing it two columns?

Huh ? Because I want the result set ordered by Course.Name in ascending order, and within that , by StudentCourse.Mark in Descending order.

If I did not state the order, MySql would produce the result set in whatever order it gets it from StudentCourse (chronological ?; by index ?). Whatever that default order is, find that out, you need to know it, and thus avoid an ORDER BY , when it is unnecessary.

  • Take the ORDER BY out and play with it.

  • Try ORDER BY 4 DESC, 1

It is not "passing", I am telling it what to do with my result set, in the one SQL command. The only passing you are doing is between your app (PHP ?) and MySQL.

2.1. Ok, when you finished with (2), and completely happy that you understand it, do this exercise.

 SELECT (SELECT Name FROM Course WHERE CourseId = sc.CourseId ) AS CourseName, () AS FirstName, () AS LastName, Mark FROM StudentCourse sc ORDER BY 1, 4 DESC 

  • Produce the same grid format, we want the exact same result set as (1) and (2).

  • Fill in the two pairs of empty brackets with the appropriate subquery; то есть. write a subquery to populate the FirstName column, and another to populate the LastName column

Responses to your Comments re Third Data Model

2.1. Perfect, yes, we move on.
,
You are cooking with gas, so if you don't mind, I will take your text, and annotate it a bit; notice the differences, they may or may not be subtle.

The correlated scalar subquery says that for each course id we need the highst mark, as opposed to the highest mark for all the courses. This is where the correlated aspect of this subquery comes into play because we are relating the outer query to the inner query for this particular row. [ Yes! ] The way that I am currently visualizing [That's it, use the visual part of your mind, not the serial part] it is that the outer query runs through the tables putting together the result table set, and each time it creates a row it runs the scalar subquery and picks out [a single value to fill the cell; here it is] the highest mark where the courseId's match, so when it is on a row where the course id is 66 then the scalar subquery is only looking for the max mark where the courseId is 66.

I could hardly have said it better myself.

There is no such thing as "result table".

Add one more definitive item.

  • The outer query defines the result set.

    • The subquery is independent of that; it is merely Correlated or Indexed.

Ok, so you have that SQL working, right ?

Now that you understand that, the next step is to visualise the result set, and to visulaise the subquery (3, unchanged) filling the entire column. if the above text was a balloon filling one cell at a time, then visualise hundreds of ballons, filling consecutive cells. Then visualise a bucket poured into the column.

Now leave that two dimensional result set alone for a minute, and visualise another layer on top of it. This is the parallel layer, where you write your subquery code.

If ever you have difficulty getting a subquery to work, go back to this, your way of visualising, one result set, and another layer for the subquery, which pours a bucket of scalars in, to fill the column. It eliminates all the well-known subquery coding bugs; removes the use of GROUP BY, DISTINCT, and all those ham-fisted methods of getting a long angry snake to fit into a jam jar.
,
Three more small steps before you proceed to (4).

2.2 Re-read my response (2) above, all the way down to this point. No skimming. This is because when you teach your mind something new and different, you need to re-inforce it. It is an officially recognised and labelled technique.

Responses to Comments of 08 Dec 10 20:49

2,3. Write that query (3) without using subqueries, and ensure you check the results. If you catch yourself laughing when you are writing the code , it is a good sign. As long as you produce the correct result set, you pass, but try to write the most efficient code (fewest COUNTS and GROUP BYs, etc). Do this only if you want to run circles around your peers, to be able to answer any "how do I code …" question on your database.

I'm not sure what you mean by write that query without using sub-queries? I thought we wanted to avoid the use of group by's etc

Да. Абсолютно. You've walked forward. Now walk backward without tripping . This will really help your understanding of walking forwards, when it is better to use a subquery vs a join. Code the query with GROUP BYs and COUNTs. The fewest. Don't laugh.

2,4. Write the subquery (3) on your database, to produce a list of Bulletins , the outer query has to be FROM bbs only; with a count of likes , and a count of dislikes . So trunacte the tables and do 10 or 12 meaningful INSERTS, fibe minutes, big deal.

I used the method of using sub-queries on my database to put together a list of bulletins replies, count the number of reply likes and dislikes and get a particular users rating. it was great because I didn't have to use any group by's or counts and I didn't have to create temporary tables like I did for the bulletins.

Well, that's perfect. Now we are getting a bit of Relational Power in your spinach.

Now, go and look at this question and answer ; ensure you compare the code. You've come a long way in just a few days.

When you finish (2.3), read your (2.4) query again, to refresh yourself, and move onto (4).

If you get stuck, replace the word "Rank" with "CountOfStudsWithHigherMark", and give it another go.

Responses to Comments of 11 Dec 10 13:14

2.3 I am having trouble writing that query without a scalar subquery. Scalar subqueries always made more logical sense to me even before I knew how to do them. That is why I said "I guess the problem I am running into here is, how do you refer to user-id = x in this particular row, not in all the row" in that previous question. Correlating the scalar subquery to the main query with and alias was the answer.

The (2.3) exercise is intended for you to:

  • really understand the incorrectness of the fat query with the GROUP BY (in a relational database using a set-processing relational engine) vs the correctness, elegance, and speed of the Correlated Subquery. You have achieved that. That will place you above your peers, in terms of SQL coding ability.

  • be able to identify when a fat WHERE clause and when a Correlated Subquery is appropriate. I am not sure, but it looks like you have achieved that.

  • be able to correct and debug this kind of issue when maintaining code written by others, and to be able to teach them the distinction. It sounds like you have a good visual, relational ability; which has been re-inforced by the exercise; and now you cannot go back to inferior methods. That is, you can understand and fix incorrect SQL code, but you cannot communicate that to others.

As long as you understand those distinctions and accept that, I am happy to drop (2.3) and move on.

Read your (2.4) query again, to refresh yourself, and move onto (4).

If you get stuck, replace the word "Rank" with "NumStudentsWithHigherMark", and give it another go.


Don't read further. The following is "old code"

Here's a ▶Quick Tutorial◀ on the RANK() operator (as it is commonly known). It is not ANSI SQL; it is an Oracle and MS extension. However it is not required if you understand Subqueries, which is why Sybase does not have it. I doubt MySQL has it, so you need to get your head around it. Understanding Scalar Subqueries is a pre-requisite. Sybase syntax, so whack your semi-colons in, etc. Feel free to ask specific questions.

I have never seen that approach of writing Rank = (SELECT…. Is that the same as (SELECT …) as Rank?

Yes, () AS Rank instead of Rank = () are both legal SQL; MySQL may not like the latter form. The brackets containing the Subquery, of course. Note that Rank is the name of the derived column.

I have already stated that understanding subqueries is prerequisite. That means that millions before you have had this problem, and the lecturers figured out that you would suffer less frustration if you followed the lessons in the prescribed order. So forget RANK for now, and learn subqueries.

Try this (I supply ANSI Standard SQL; I do not have MySQL; you will have to syntax-fix it for MySQL; I don't fix syntax problems; that's your job):

 SELECT COUNT(*)+1 AS Id_iot -- not you, everyone who uses them blindly (SELECT title in_ner FROM bb_locations WHERE out_er.bb_locations_id = in_ner.id) AS Location, title AS Bulletin, created_date AS Date FROM bbs out_er 

  • in_ner and out_er are ALIASES, that is, handles for the table name that it sits next to in the FROM clause; that we use elsewhere in the code for convenience
  • in_ner is a descriptive name for the table referenced in the Inner Query, the Subquery
  • out_er is a descriptive name for the table referenced in the Outer Query, which is only Outer because it has an Inner query, otherwise it would be a flat query
  • we could just as easy use fred and sally
  • notice the join
  • I have related the table referenced in the in_ner query to the table referenced in out_er query
  • Such a subquery is called a Correlated Subquery
  • This is just an example, simple, so that you can learn Subqueries; purposely chosen to provide the same result set as one you are familiar with producing, using straight joins ( bbs and bb_locations in the FROM clause, joining via the WHERE clause or JOIN syntax).
  • Because it produces a single value, it is called a Scalar Subquery (those that produce rows are Table Subqueries; and cannot be used like this, to load a single value into each row)
  • There is no suggestion that anyone should "use Subqueries instead of Joins". Absurd. Subqueries have their place, and Joins have theirs. Misue is a different thing.

Now, drive that bus. And don't talk to me about RANK until you can drive that bus around every corner in your database neighbourhood without killing any children.

I don't understand inner and outer, when I google them I get INNER JOIN what are they called so I can research further

Aliases. Refer above.

When I run that select statement I get this error You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'WHERE inner.Mark >= outer.Mark ) FROM studentmark outer ORDER B' at line 5

  • first, as per reasons detailed above, I can't write MySQL syntax, and debugging is your job
  • second, I realiise that you can't debug what you can't understand, so drop it for now (it has to do with RANK) and as you learn the MySQL flavour of SQL, all these things will be resolved
  • third, let me assure you that it runs on any Standard SQL server. It gets used in about 10 courses a year, so hundreds of participants per year. I just ran it again on Sybase, just to check.
  • first thing I would suggest is, since the MySQL optimiser sucks dead bears; it does not understand context, inner and outer are probably being treated as reserved words. So change that as per the above code.

Part II

Continuation of Part I, due to that Answer reaching maximum length.

Revised 14 Jan 11 – 05:40 PST

Comments re 11 Dec 10 13:14, Fifth Data Model and Responses

а. IDEF1X Design/Diagramming tool.
I do not know of any freeware options. The MySQL design tool reportedly crashes often. If you are happy with my diagrams, I am happy to work with you for the duration, until the final model is resolved; то есть. I provide the Data Model, and you can skip that task. For ongoing work, yes, you need a diagramming tool, perhaps not a database design tool. Refer my comments at the end of p2 in the Notation doc.
,
29. Are you clear about the PKs and FKs in each table, as per the coloured tabs in the Fifth Data Model; can I remove the tabs now ?
,
38. Closed.
,
39. All the Dtm columns will be MySQL DATETIME datatypes. The variables you use for those columns should be the same. TimeStamp has a different meaning. Using the correct Datatypes is the first (big) step towards ensuring that the data is coorect and no illegal values are allowed to enter the db. Ie, only valid dates and times will be allowed. Further, you can interrogate any date or time component (eg. month or day name) from it. Check this document .
,
40. No Problem. Instead of having just the one category hardcoded, how about (like the handling your of Permission), we implement an Category.IsRestricted and then Permission 5 becomes Post Restricted Bulletins .
,
41. Done.

You should think about doing the same for Category and User . You want to be able to delete them by setting the Indicator, without removing the entry (and all the Bulletins, Responses, Replies , etc) fro the database. It has to be retained for historical purposes, but you need to disallow the User from logging in and doing anything. I have included this in the DM.

For such column names and Booleans in general, personally I prefer to identify the minority or exception case, as in IsObsolete .

20.2. Готово. Table and column naming now progressed to InnoDB format.
,
Subquery responses in Subquery Answer.

Comments 13 Dec 10 13:14 EST and Responses

,
41. See (41) and next para, above.
,
42. I meant: either Title or Description is enough; we do not need both.
,
43.1. Implementation of Data Model. Преуспевать. That's why I gave you the Physical yesterday.
,
43.2. Design/Drawing. Преуспевать. I have already commented in (a) above.

Data Model

Sixth Data Model supplied, containing all changes as per above.

The Physical means a lot more detail required for implementation/coding: Datatypes; n::n Relations implemented as Associative tables; etc. You are pretty much ready to implement the Data Model, which means you need the Physical. And you already have the Associative tables figured out. Therefore I have taken the liberty of providing you with the Physical DM, even though you said you were in no hurry.

  • Note that Domains (User Defined Datatyptes) should always be used in a database, both for the DDL; the $variables you use. And a private Domain for each Primary Key. But this is not possible in MySQL, therefore the Datatypes are raw, regrettably.

  • Fixed length columns are much faster than variable length; I do not provide (advise) Var length. You are free to implement what you like.

  • Are you sure you need both Category.Title and Description ? I think not, but I have left it in until you confirm.

  • Enjoy the little blue glass buttons, and the navigation from the Collapsed Entities.

Please read the IDEF1X Notation document again, I expanded it last week.

Depending on how the Open Issues close, and any issues you may have, we can progress another edition, in the next day/night.

Comments 28 Dec 10 10:34 and Response

I have begun implementing the data model. I assume that the 6th data model is the physical model because it contains the associative tables.

Yes, I supplied that, and the Datatypes, because you said you were ready to implement.

There are still a few minor outstanding items. May be a good time to go through your question; all three of my answers, and check. Category.Title and Description , for instance.

I will put up a database dump once I am done.

That is not necessary, given that the model has the Dataypes defined; but if you do post it, sure, I will check it for you. Email may be better.

I will then put up a list of all the queries that I need to run on the database and begin writing them.

Very good idea, to take a structured and planned approach to the job.

Implementation of Physical model

(39) With mysql I am not able to assign more than one primary key so I am just going to make them unique and not null as you suggest in the documentation. Do you think it would be a good idea to index them as well?

Not sure what you mean, what is "them" ?:

  • you can never have more than one Primary Key on a table; the Alternate Keys are Unique, one of them is "primary"; that is carried as FK in the child tables.
  • with InnoDB (what you said you will get), you can define PRIMARY KEY constraints (which is equivalent to UNIQUE, NOT NULL)
  • with MyISAM (what you have now), you need an Index, UNIQUE, NOT NULL for the Primary Key (above the line in the model)
  • for either InnoDB or MyISAM, each Alternate Key ( AKx[.y] in the model) must be defined as an additional index, UNIQUE, NOT NULL.

Comments 07 Jan 11 14:08 and Response

(40) Could you explain why the category.CategoryCode is a char of 4 characters. Why not just use an number like we do for user?

40.1. The idea is to use good natural Identifiers . Numbers are meaningless to users. If we didn't have a large no of Users, and User churn, I would not have used a number there either. A CHAR(2) or (3) or (4) allows them to pick meaningful short code for the long Category.Description, and it is small enough to be carried as a Foreign Key in user_category and bulletin .

For the developer, when testing and debugging, that short code in a list of say bulletins , will be very handy.

(40) I don't quite understand permission for category and location. Lets say that I want all users to be able to post to the Travel category. Would I set the permission of this category at 4? Why do we need to bool IsRestricted when we are giving a permission to the category and location?

40.2. I have not changed the concept or essence of Permission; it was your idea, and it remains exactly as you explained it to me.

(All I did was implement permission as a table.)

40.3. Refer (13) and (40) original exchange. category.IsRestricted defines restricted categories; there are two categories of categories , Restricted and Unrestricted. The users need a permission of 5 to post bulletins re Restricted categories, and 4 to post bulletins re Unrestricted categories.

40.4. But whoa, son, are you introducing a change or extension; например. match the permission of the user to the permission of the category , thus allowing far more than two categories of categories ? Please don't. That would mean permission means one thing re category and a different thing re the rest of the system. Or if you do, then we have to resolve the exact need first, then implement it as a change.

40.5. Location (now office ) is exactly the same for that bit (40.2) and (40.3). If you are referring to the text at the bottom, it is my small mistake, will correct it.

40.6. As per (14), office additionally has a single AdministratorId ( UserId ). Shown on the model as (permitted) user .

40.7. But that brings up an issue: who can administer categories ? Right now anyone with permission 5 or greater, which is a different thing. I think we need something explicit, a permission 6 = Administer Category.

Другие

  • Processed your DDL and returned.

  • Data Model Updated. Number of small clarifications and two minor errors corrected.

Comments 08 Jan 11 14:08 and Response

(I think that was 09 Jan not 08 Jan … I did check for updates.)

(40) I don't quite understand permission for category and location. I haven't changed anything. Disregard the content of the previous question as explain what permission would be set to allow users to post to the 'Travel' Category. In my implementation I simply had a permission column. If a given user had the required permission or greater then they could post to a category, is that how the new system works?

Да. Unchanged. category.permission has nothing to do with it. They need user.permission 4 for unrestricted categories .

If the category.IsRestricted , they need user.permission 5.

Quite separately, an user needs user.Permission of whatever category.Permission is, in order to administer category . Do not use values less than 4.

(41) Query Delete a bulletin and all its associated replies and ratings.

I did not expect that, are you sure they have no need to keep all past bulletins for historical or audit purposes ?

Anyway, let's deal with that on the basis that deleting bulletins is allowed …

I don't even know where to get started on this. In the past I have joined would have joined up the bulletin table with the response table and the response table with the response rating table and the bulletin-rating table where the bulletin id = x and deleted them. But now any one particular bulletin is identified by three columns: the OfficeCode,IssuerId and BulletinDtm. Which are carried to the child tables as foreign keys. For a start, how do I store indicate which bulletin is to be deleted in my php? Usually I would have a link like this index.php?action=delete&bulletin-id=5 . Now will I have to have a link that is index.php?action=delete&OfficeCode=20001&IssuerId=34&BulletinDtm=14:02... I really have no clue how to do this?

а. I can't help you there, I am a database and SQL expert, not a php or MyISAM expert. You will need to post that as a new question on SO or the MySQL boards.

б. As far as my reading of that subject has taken me, I don't know the syntax, but yes, it can be done, it is normal. I checked before recommending composite keys to you. The corrected DDL succeeded, and the indices are confirmed, correct ?

с. The issue is simply the syntax required for composite or compound keys; and working with index.php . Что-то вроде:

index.php? action=delete & OfficeCode=x and IssuerId=y and BulletinDtm=z

д. Why can't you use mysql_query instead of index.php and thus use full SQL ? As I understand it, it works with MyISAM. Затем вы можете использовать:

 `$sql = "DELETE $table WHERE OfficeCode=$OfficeCode AND IssuerId=$IssuerId AND BulletinDtm=$BulletinDtm";` 

е. Delete response_rating first; then response ; then bulletin_rating ; then bulletin . When they switch to InnoDB they will have less changes to make.

е. Most important, you will have to get them to identify the basis on which a bulletin can be deleted. ANy and all bulletins should not be deleted. Something like "no activity for one year" or "closed" (which means an added column), etc.

Comments 10 Jan 11 14:08 EST and Response

(41.10-Jan-11) No problem, the method is fine, and I have detailed related issues which need address under (41.*) above. (41.f) still needs an answer … other than permissions, is there any basis for deleting bulletins ?

Comments 10 Jan 11 13:48 pst and Response

SO Editing. Don't worry, it is not you. The site is of poor technical quality. The editing is hopeless (and believe me, I have tried to work with it and around it, to make my Answers appear even somewhat like I want them to appear). It cannot handle indents or more than one level of numbering correctly.

Delete Basis. Ok, you have a valid basis. And the users who wrote responses would not mind if they were deleted without being asked ?

(41) What you are looking for is a "cascading delete" in Standard SQL, which is defined in the Foreign Key clause (which you do not have in MyISAM). Each INSERT/UPDATE/DELETE verb applies to one table only, and may affect other tables by REFERENCE .

For non-standard SQLs, you have the DELETE multiple_table method (non-standard syntax).

First, it is very important to understand this, before anything else. The FROM and WHERE (or JOIN) clauses in a DELETE command are separate to the DELETE itself; they are in fact a SELECT. The idea is: DELETE table_one (SELECT FROM table_one, table_two WHERE join_conditions).

Следовательно:

  • name the four tables in the DELETE (target)
  • name the four tables in the FROM (how you find them, via SELECT)
  • ensure you have the correct (complete) JOIN clauses for the four tables; which you can test via a SELECT
    • which means, JOIN ON OfficeCode, IssuerId, BulletinDtm (the bulletin PK affecting the thre child tables)
    • use NATURAL or INNER joins, not left joins (be explicit, do not mix them up, as a general rule)
  • ensure the WHERE identifies the specific bulletin composite Primary Key to the deleted.

Here's a link to the DELETE syntax and the JOIN syntax .

Comments 12 Jan 11 21:48 and Response

а. Don't be lazy. Write four delete statements, bottom up. That's what we have to do in the big end of town, where we do not have "cascading deletes". Write the delete for rating_response; then copy-and-paste, and delete one line of code each time. I do not understand the angst or avoidance.

б. I repeat, do not use left, right, or any kind of outer join (which is only required for the single all-encompassing delete). Use straight inner joins only (which is not a problem with 4 delete statements). Any and all upset you are experiencing is due to your need to use one delete. Give that up, and the upset and complication disappear.

  • I just wrote this code for another question. That is a single SELECT command. Three-column PK times four Subqueries. I do not understand the need to avoid long (demanded, again, due to SQL being cumbersome) commands. And I didn't even use the JOIN syntax. Took me all of ten mins to write, plus five mins to test. What exactly, is the big deal ?

с. You have not forgotten the power of Relational keys, that you recognised some weeks ago, have you. Например. ability to grab bulletin from rating_response, without having to join with rating. If you succumb to your single-column-key desires, you will lose all that. SQL is cumbersome. But that is all we have. Смирись с этим. The non-SQLs try to "make life easy" but in fact, introduce all sorts of unnecessary and avoidable complications. Case in point.

Comments of 13 Jan 2011 21:18 PST and Response

Deletes. Three flavours. Отлично. Hopefully you will have the data values in $variables, so there will not be that form of repetition. For testing, that is fine.

  1. Delete x Four Tables. (not "Indiviudally deleting records", which is a different thing altogether; each delete except the last could net hundreds of rows). I trust the cut-and-paste took seconds. You need to be careful about forgetting to change the table names.

  2. Single Delete Command. $variables for the first triplet. You could use column names in all but the first triplet.

  3. Ok, so you will convert the SELECT to a DELETE, after testing. Left Joins. Required for the Single Delete but not otherwise. That's identical to (2) with the WHERE replaced with JOIN>

I've already recommended (1), but you are more likely to go with (3).

Next query ?

Cool day. Pleasant mid-twenties. Completely abnormal weather, all over the Planet. Normally it is low thirties here, haven't had that for weeks. Queensland is flooded out after 9 years of the worst drought. Keep warm and dry. Rent a car and drive far enough that no one hears the shot. BC is not Pennsylvania; they do it all the time; the deer are in the millions.

Приветствия.

Dont be afraid of joins. If i were you i would cut down on all the DB logic you need to write and use an ORM like Doctrine or Propel, it will make things infinitely easier to design and maintain – including all those joins youre trying to avoid.