Tuesday, July 23, 2024

๐—” ๐——๐—ฒ๐—ฒ๐—ฝ ๐——๐—ถ๐˜ƒ๐—ฒ ๐—ถ๐—ป๐˜๐—ผ ๐—ค๐˜‚๐—ฒ๐—ฟ๐˜† ๐—˜๐˜…๐—ฒ๐—ฐ๐˜‚๐˜๐—ถ๐—ผ๐—ป ๐—˜๐—ป๐—ด๐—ถ๐—ป๐—ฒ ๐—ผ๐—ณ ๐—ฆ๐—ฝ๐—ฎ๐—ฟ๐—ธ ๐—ฆ๐—ค๐—Ÿ


๐Ÿšฉ๐—ค๐˜‚๐—ฒ๐˜€๐˜๐—ถ๐—ผ๐—ป

๐—” ๐——๐—ฒ๐—ฒ๐—ฝ ๐——๐—ถ๐˜ƒ๐—ฒ ๐—ถ๐—ป๐˜๐—ผ ๐—ค๐˜‚๐—ฒ๐—ฟ๐˜† ๐—˜๐˜…๐—ฒ๐—ฐ๐˜‚๐˜๐—ถ๐—ผ๐—ป ๐—˜๐—ป๐—ด๐—ถ๐—ป๐—ฒ ๐—ผ๐—ณ ๐—ฆ๐—ฝ๐—ฎ๐—ฟ๐—ธ ๐—ฆ๐—ค๐—Ÿ




๐Ÿš€๐—จ๐—ป๐—ฟ๐—ฒ๐˜€๐—ผ๐—น๐˜ƒ๐—ฒ๐—ฑ ๐—Ÿ๐—ผ๐—ด๐—ถ๐—ฐ๐—ฎ๐—น ๐—ฃ๐—น๐—ฎ๐—ป

๐Ÿ‘‰๐ŸปSyntactic (syntax check)

๐Ÿ‘‰๐ŸปSymantic verification (object should be proven)

๐Ÿ‘‰๐ŸปParsing activity should be checked.


๐Ÿš€๐—ฆ๐—ฐ๐—ต๐—ฒ๐—บ๐—ฎ/๐— ๐—ฒ๐˜๐—ฎ๐—ฑ๐—ฎ๐˜๐—ฎ ๐—–๐—ฎ๐˜๐—ฎ๐—น๐—ผ๐—ด๐˜‚๐—ฒ

๐Ÿ‘‰๐ŸปDatatype, schema details should be taken.



๐Ÿš€๐—Ÿ๐—ผ๐—ด๐—ถ๐—ฐ๐—ฎ๐—น ๐—ฃ๐—น๐—ฎ๐—ป

๐Ÿ‘‰๐ŸปCatalyst optimizer written in Scala in the form of a tree.

๐Ÿ‘‰๐ŸปEach tree will contain nodes. Each node has a child node. Each node contains rules based on the form of a tree.

๐Ÿ‘‰๐ŸปEach node has a role-based optimization there.

๐Ÿ‘‰๐ŸปNumber of logical plans to be executed


๐Ÿš€๐—ข๐—ฝ๐˜๐—ถ๐—บ๐—ถ๐˜‡๐—ฒ๐—ฑ ๐—Ÿ๐—ผ๐—ด๐—ถ๐—ฐ๐—ฎ๐—น ๐—ฃ๐—น๐—ฎ๐—ป

๐Ÿ‘‰๐ŸปRelated activities to be grouped as a micro batch

๐Ÿ‘‰๐ŸปPredicate pushdown

๐Ÿ‘‰๐ŸปProjection pushdown

๐Ÿ‘‰๐ŸปRearrange the filter

๐Ÿ‘‰๐ŸปConversion of decimal operations to integer operations

๐Ÿ‘‰๐ŸปReplacement of some regex expressions by Java's methods

๐Ÿ‘‰๐ŸปIf-else clause simplification


๐Ÿš€๐—ฃ๐—ต๐˜†๐˜€๐—ถ๐—ฐ๐—ฎ๐—น ๐—ฃ๐—น๐—ฎ๐—ป๐˜€

๐Ÿ‘‰๐ŸปOptimizer constructs multiple physical plans from an optimized logical plan.

๐Ÿ‘‰๐ŸปA physical plan defines how data will be computed.

๐Ÿ‘‰๐ŸปThe plans are also optimized.

๐Ÿ‘‰๐ŸปThe optimization can combine/merge different filters, sending predicates pushdown directly to data source to eliminate some data at data source level.


๐Ÿš€๐—ฆ๐—ฒ๐—น๐—ฒ๐—ฐ๐˜๐—ฒ๐—ฑ ๐—ฃ๐—ต๐˜†๐˜€๐—ถ๐—ฐ๐—ฎ๐—น ๐—ฃ๐—น๐—ฎ๐—ป๐˜€

๐Ÿ‘‰๐ŸปOptimizer determines which physical plan has the lowest cost of execution and ๐Ÿ‘‰๐Ÿปchooses that plan for computation.

๐Ÿ‘‰๐ŸปCost is a concept or ๐—บ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฐ ๐˜‚๐˜€๐—ฒ๐—ฑ to ๐—ฒ๐˜€๐˜๐—ถ๐—บ๐—ฎ๐˜๐—ฒ ๐˜๐—ต๐—ฒ ๐—ฐ๐—ผ๐˜€๐˜ ๐—ผ๐—ณ ๐˜๐—ต๐—ฒ ๐—ฝ๐—น๐—ฎ๐—ป๐˜€.


๐Ÿš€๐—˜๐˜…๐—ฒ๐—ฐ๐˜‚๐˜๐—ถ๐˜ƒ๐—ฒ ๐—ก๐—ฎ๐˜๐—ถ๐˜ƒ๐—ฒ ๐—ฅ๐——๐——

๐Ÿ‘‰๐ŸปOptimizer generates Java bytecode for the best physical plan. The generation is made possible thanks to Scala's feature called "๐—พ๐˜‚๐—ฎ๐˜€๐—ถ๐—พ๐˜‚๐—ผ๐˜๐—ฒ๐˜€".

๐Ÿ‘‰๐ŸปThis step is optimized by code-based optimization.

๐Ÿ‘‰๐ŸปCatalyst with Special Feature of Scala Language - Quasiquotes

๐Ÿ‘‰๐ŸปMakes code generation easier.

๐Ÿ‘‰๐Ÿป๐—ค๐˜‚๐—ฎ๐˜€๐—ถ๐—พ๐˜‚๐—ผ๐˜๐—ฒ๐˜€ let the programmatic construction of abstract syntax trees (ASTs) in the scala language which can be fed into scala compiler at runtime to generate code.


#dataengineer

#Pyspark

#Pysparkinterview

#Bigdata

#BigDataengineer

#dataanalytics

#data

#interview

#sparkdeveloper

#sparkbyexample

#pandas

No comments:

Post a Comment

"๐Ÿš€ Delta Lake's Vectorized Delete: The Secret to 10x Faster Data Operations!"

"๐Ÿš€ Delta Lake's Vectorized Delete: The Secret to 10x Faster Data Operations!" Big news for data engineers! Delta Lake 2.0+ in...